Design Video View Count System
For designing a system that counts views on videos, it's crucial to capture the requirements accurately to ensure the system's reliability, scalability, and performance.
FR
1.The system should accurately count the number of views for each video.
2.A view should only be counted if the video is played for at least a predefined threshold (e.g., 30 seconds).
3.If a user watches the same video multiple times, views should be counted as separate if they occur after a specific time interval (e.g., 24 hours).
4.The system should provide near real-time updates of the view count for users.
5.Ensure the system prevents overcounting due to race conditions when multiple users view a video simultaneously.
6.Generate daily/weekly reports of video view counts for analytics and insights.
NFR
1.When a video is viewed, the updated count should be reflected on the video page with minimal latency.
2.The system should handle a high volume of concurrent views, especially during peak traffic or for viral videos.
3.The system should prioritize eventual consistency; users may see a slightly outdated count but the data should eventually synchronize across all services.
4. System should have high availability (99.99% uptime) to handle continuous requests without downtime
5.System should prioritize eventual consistency; users may see a slightly outdated count but the data should eventually synchronize across all services.
Back of Envelope Calculation
Views QPS
Number of videos on the platform: 100 million videos
Daily views per video (average): 100 views per video
Daily active users: 500 million users
Average video length: 10 minutes
Minimum duration to count a view: 30 seconds (watch time threshold)
Average size of each view record: 200 bytes (includes video ID, user ID, timestamp, IP address, and device info)
Total Views=100M*100
Views QPS=10M*100/10^5=100k QPS
Storage Requirements
Each view record takes approximately 200 bytes of storage.
Total views =100M*100
Total storage=100M*100*200bytes=2TB/day
API’s
1.POST /api/v1/videos/{videoId}/views
request
{
"userId": "user789",
"sessionId": "sess456",
"timestamp": "2024-11-14T10:15:30Z"
}
response {
"status": "success",
"message": "View recorded successfully",
"viewRecorded": true
}
2.GET /api/v1/videos/{videoId}/views/coun
Retrieves the total view count for a specific video.
reponse {
"videoId": "abc123",
"viewCount": 12345678,
"lastUpdated": "2024-11-14T10:00:00Z"
}
3.GET /api/v1/videos/views/counts?videoIds=abc123,xyz456,lmn789
Retrieves view counts for multiple videos in a single request.
{
"viewCounts": {
"abc123": 12345678,
"xyz456": 98765432,
"lmn789": 56789012
}
}
4.GET /api/v1/videos/abc123/views/stats?startDate=2024-11-01&endDate=2024-11-07
Retrieves daily view statistics for a video over a specified date range.
{
"videoId": "abc123",
"stats": [
{ "date": "2024-11-01", "views": 1200 },
{ "date": "2024-11-02", "views": 1500 },
{ "date": "2024-11-03", "views": 1800 }
]
}
5.GET /api/v1/admin/videos/abc123/views/logs?startDate=2024-11-01&endDate=2024-11-02
{
"videoId": "abc123",
"logs": [
{
"userId": "user001",
"sessionId": "sess456",
"timestamp": "2024-11-01T12:34:56Z",
"ipAddress": "192.168.1.1",
"device": "iPhone 14"
},
{
"userId": "user002",
"sessionId": "sess789",
"timestamp": "2024-11-01T12:35:10Z",
"ipAddress": "192.168.1.2",
"device": "Chrome Desktop"
}
]
}
Database
Write-heavy operations: Efficiently store and update raw view records.
Read-heavy operations: Fetch aggregated view counts quickly.
Analytics: Generate reports and stats over historical data.
Caching: Minimize database load for frequently accessed view counts.
-- Table to store video metadata
CREATE TABLE videos (
video_id VARCHAR(50) PRIMARY KEY,
title VARCHAR(255),
description TEXT,
uploader_id VARCHAR(50),
upload_date TIMESTAMP,
duration INT,
category VARCHAR(100),
tags VARCHAR[],
status VARCHAR(20) -- e.g., "active", "deleted"
);
-- Table to store user data
CREATE TABLE users (
user_id VARCHAR(50) PRIMARY KEY,
username VARCHAR(100),
email VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW()
);
-- Table to store user sessions
CREATE TABLE sessions (
session_id VARCHAR(50) PRIMARY KEY,
user_id VARCHAR(50) REFERENCES users(user_id),
ip_address VARCHAR(45),
user_agent VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW()
);
NoSQL Database (Cassandra/DynamoDB)
Use Case:
Store raw view events and aggregated view counts. These databases are optimized for write-heavy workloads and can scale horizontally.
Schema for NoSQL Database (Cassandra):
Table for Storing Raw Views
CREATE TABLE video_views (
video_id TEXT,
view_id UUID,
user_id TEXT,
session_id TEXT,
timestamp TIMESTAMP,
ip_address TEXT,
user_agent TEXT,
PRIMARY KEY (video_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Partition Key: video_id ensures data is distributed across nodes.
Clustering Key: timestamp allows for efficient time-based queries.
Table for Storing Aggregated View Counts
CREATE TABLE video_view_counts (
video_id TEXT PRIMARY KEY,
total_views COUNTER,
daily_views MAP<TEXT, COUNTER> -- e.g., {'2024-11-14': 1500}
);
Using a Counter data type for efficient increment operations.
The daily_views map tracks views per day for analytics.
Time-Series Database (TimescaleDB/ClickHouse)
Use Case:
Store historical view counts for analytics and reporting over time. These databases are optimized for time-series data with efficient aggregations.
Schema for Time-Series Database (TimescaleDB)
-- Table to store daily view stats
CREATE TABLE video_view_stats (
video_id VARCHAR(50),
view_date DATE,
total_views INT,
unique_users INT,
avg_watch_time INT, -- in seconds
PRIMARY KEY (video_id, view_date)
);
-- Create a hypertable to optimize for time-series data
SELECT create_hypertable('video_view_stats', 'view_date');
Hypertable improves performance for time-series queries (e.g., fetching views for the past 30 days).
In-Memory Database (Redis)
Use Case:
Cache frequently accessed view counts to reduce load on primary databases. Redis can handle high-throughput read operations with low latency.
Schema for In-Memory Database (Redis)
Redis does not have a traditional schema but uses data structures like hash maps and sorted sets.
Caching Video View Counts
Key: "video:views:{video_id}" Value: Total view count (Integer) TTL: 5 minutes (300 seconds) Key: "video:daily_views:{video_id}" Value: { "2024-11-14": 1500, "2024-11-13": 1200 } (Hash map) TTL: 1 hour (3600 seconds)
HLD