Design YouTube

A YouTube-like system is a video-sharing platform that supports various user interactions.

Shashank Mishra

Dec 14, 2024

1.Users can upload videos in various formats.

2.Generate video thumbnails.

3.Allow users to search videos by title, tags, or keywords.

4.Enable users to comment on videos.

5.Allow users to like or dislike videos.

6.Provide detailed analytics for content creators such as View count, watch time, audience retention and demographics of viewers.

7.Notify users about new uploads from subscribed channels.

8.User should be able to live stream videos

NFR

1.System should be highly scalable to handle millions of concurrent users and billions of video streams globally.

2.The search feature should return results in under 500ms.

3.Ensure 99.99% uptime for video streaming and essential services, thereby making system highly available

4.Support thousands of video uploads and millions of video views per minute.

Estimates

User Base
Active users per day: 1 billion (approx. 1/7th of the world population)
Concurrent users: 5% of daily active users ≈ 50 million

Videos viewed per user per day: 20
Total daily views: 20 billion
Views QPS=20B/10^5=20*10^4=200k QPS

Upload QPS=1M videos / day
QPS=1M/10^5=10QPS

Uploads per minute: 500 hours of video
Video length: Average 10 minutes
Video size: 5 MB/min at 720p resolution (base format)
500hours/min×60min/hour×5MB/min=150,000GB/day
1,050,000GB/day×365×5=1,917,500,000GB=1.92EB (Exabytes)

API’s

1.Upload Video

Endpoint: POST /videos/upload

Description: Allows users to upload videos.

request {
  "userId": "12345",
  "title": "My Video",
  "description": "This is a test video",
  "tags": ["test", "demo"],
  "visibility": "public/private/unlisted"
}
reponse {
  "videoId": "abcd1234",
  "status": "uploading"
}

2.Check Upload Status

Endpoint: GET /videos/{videoId}/status

Description: Checks the status of a video upload.

{
  "videoId": "abcd1234",
  "status": "processing/completed/failed"
}

3.Update Video Details

Endpoint: PUT /videos/{videoId}

Description: Update metadata of a video.

{
  "title": "Updated Title",
  "description": "Updated description",
  "tags": ["updated", "tags"]
}

4.Fetch Video

Endpoint: GET /videos/{videoId}
Description: Retrieves video playback details.
```
{
  "videoId": "abcd1234",
  "title": "My Video",
  "description": "Test description",
  "url": "https://cdn.example.com/abcd1234.mp4",
  "views": 1000,
  "likes": 500
}
```
5.Stream Video
- Endpoint: GET /videos/{videoId}/stream
- Description: Fetches a streaming URL for adaptive bitrate playback.
```
{
  "streamUrl": "https://cdn.example.com/hls/abcd1234/master.m3u8"
}
```
6.Search Videos
- Endpoint: GET /search
- Description: Search for videos based on keywords, tags, or other criteria.
- Query Params:
  - query (string): Search term.
  - tags (array): Tags for filtering.
  - sort (string): relevance/recent.
```
[
  {
    "videoId": "abcd1234",
    "title": "My Video",
    "thumbnail": "https://cdn.example.com/thumbs/abcd1234.jpg"
  }
]
```
  7.Fetch Recommendations
  - Endpoint: GET /recommendations/{userId}
  - Description: Fetch personalized video recommendations.
```
[
  {
    "videoId": "xyz987",
    "title": "Recommended Video",
    "thumbnail": "https://cdn.example.com/thumbs/xyz987.jpg"
  }
]
```
  8.Like/Dislike Video
  - Endpoint: POST /videos/{videoId}/like
  - Description: Like or dislike a video.
```
{
  "userId": "12345",
  "action": "like/dislike"
}
```
9.Comment on Video
- Endpoint: POST /videos/{videoId}/comments
- Description: Add a comment to a video.
```
request {
  "userId": "12345",
  "comment": "Great video!"
}
```
10.Fetch Comments
- Endpoint: GET /videos/{videoId}/comments
- Description: Fetch comments for a video.
```
[
  {
    "commentId": "cmt123",
    "userId": "56789",
    "content": "Great video!",
    "timestamp": "2024-01-01T10:00:00Z"
  }
]
```
  11.Subscribe to Notifications
  - Endpoint: POST /notifications/subscribe
  - Description: Subscribe to notifications for a channel or user.
```
request{
  "userId": "12345",
  "channelId": "channel789"
}
```
12.Get Notifications
- Endpoint: GET /notifications
- Description: Fetch notifications for a user.
```
[
  {
    "notificationId": "notif123",
    "message": "New video uploaded by Channel XYZ",
    "timestamp": "2024-01-01T10:00:00Z"
  }
]
```

DATABASES

1. Metadata Storage

Database: Relational Database (e.g., PostgreSQL, MySQL)

Stores structured information about videos, users, and channels.
Optimized for transactional queries like search, sorting, and filtering.

Video Table

CREATE TABLE videos (
    video_id VARCHAR(50) PRIMARY KEY,
    title TEXT NOT NULL,
    description TEXT,
    user_id VARCHAR(50),
    upload_time TIMESTAMP,
    visibility ENUM('public', 'private', 'unlisted'),
    tags JSONB,
    status ENUM('processing', 'completed', 'failed'),
    views BIGINT DEFAULT 0,
    likes BIGINT DEFAULT 0,
    duration INT, -- in seconds
    thumbnail_url TEXT
);

Users Table

CREATE TABLE users (
    user_id VARCHAR(50) PRIMARY KEY,
    username TEXT NOT NULL,
    email TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    subscription_level ENUM('free', 'premium')
);

Channels Table

CREATE TABLE channels (
    channel_id VARCHAR(50) PRIMARY KEY,
    user_id VARCHAR(50),
    name TEXT NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

2. Video Content Storage

Database: Distributed Object Storage (e.g., Amazon S3, Google Cloud Storage)

Stores raw video files, thumbnails, and encoded video streams.

Schema:

Metadata links to the storage location of videos.

/videos/{video_id}/original.mp4
/videos/{video_id}/1080p.mp4
/videos/{video_id}/720p.mp4
/thumbnails/{video_id}/thumbnail.jpg

3. Search and Discovery

Database: Search Engine (e.g., Elasticsearch, Apache Solr)

Provides full-text search and relevance ranking for videos.

{
  "video_id": "abcd1234",
  "title": "My Test Video",
  "description": "This is a demo video for testing.",
  "tags": ["demo", "test"],
  "views": 1000,
  "likes": 50,
  "upload_time": "2024-01-01T12:00:00Z"
}

5. User Activity Logs

Database: Time-Series Database (e.g., InfluxDB, Prometheus)

Tracks events like views, likes, and watch times.

Schema:

Events Table

CREATE TABLE user_events (
    event_id BIGSERIAL PRIMARY KEY,
    user_id VARCHAR(50),
    video_id VARCHAR(50),
    event_type ENUM('view', 'like', 'dislike', 'comment'),
    timestamp TIMESTAMP DEFAULT NOW(),
    duration INT -- In seconds (for views)
);

6. Analytics and Metrics

Database: Columnar Database (e.g., Apache Pinot, Google BigQuery)

Handles aggregated metrics and analytics queries.

Video Anlaytics Table

CREATE TABLE video_analytics (
    video_id VARCHAR(50),
    date DATE,
    views BIGINT,
    likes BIGINT,
    watch_time BIGINT, -- Total watch time in seconds
    PRIMARY KEY (video_id, date)
);

User Analytics Table

CREATE TABLE user_analytics (
    user_id VARCHAR(50),
    date DATE,
    total_watch_time BIGINT,
    total_videos_watched BIGINT,
    PRIMARY KEY (user_id, date)
);

7. Notifications

Database: Key-Value Store (e.g., Redis, Memcached)

Stores real-time notifications for users.

Schema:

Key: notification:{user_id}

Value: List of notification messages.

{
  "notifications": [
    {
      "message": "New video uploaded by Channel XYZ",
      "timestamp": "2024-01-01T12:00:00Z"
    },
    {
      "message": "Your video has reached 1,000 views!",
      "timestamp": "2024-01-02T14:00:00Z"
    }
  ]
}

8. Caching

Database: In-Memory Cache (e.g., Redis, Memcached)

Caches frequently accessed data like video metadata and recommendations.

Schema:

Key: video:{video_id}
Value: Serialized video metadata.
TTL: Set for invalidation after a specific duration.

This multi-database architecture ensures scalability, optimized performance for various use cases, and smooth integration across the platform

Open Ended Questions

Will the client directly talk to CDN to fetch the video or some service will talk to it?
1. Client-Server Interaction in Streaming
A. Initial Interaction with Backend Services
1. The client communicates with the backend (streaming or video service) for:
  - Fetching the video metadata (title, description, duration, etc.).
  - Obtaining the manifest file URL (e.g., master.m3u8 or manifest.mpd).
2. The backend provides a URL pointing to the manifest file on the CDN or object storage.
B. Direct Interaction with CDN
1. After receiving the manifest URL, the client directly fetches:
  - The manifest file from the CDN.
  - The video chunks specified in the manifest file, also served by the CDN.
2. The streaming service itself is not involved in the actual delivery of video chunks.
2. Why Clients Talk Directly to the CDN
A. Scalability
- CDNs are designed for massive scale, with distributed edge servers that can handle billions of requests efficiently.
- Offloading video delivery to a CDN reduces the load on backend servers, allowing them to focus on metadata and control logic.
B. Reduced Latency
- CDNs cache content closer to end users, minimizing latency compared to fetching chunks from a central server or origin storage.
C. Cost Efficiency
- Serving data from a CDN is typically more cost-effective than using backend services or raw object storage for direct content delivery.
D. Optimized for Streaming
- CDNs support adaptive bitrate streaming, ensuring seamless playback by dynamically adjusting the video quality based on the user's bandwidth.

Does the Streaming Service Ever Proxy CDN?

In rare cases, the streaming service might act as an intermediary between the client and CDN:

A. When Proxying Makes Sense

Security or Authentication:
- If stricter access control is required, the streaming service can authenticate the client and generate short-lived signed URLs for CDN access.
Customized Responses:
- Dynamic generation of manifests based on user-specific preferences (e.g., language, subtitles).
Logging or Analytics:
- To gather detailed insights about playback behavior or errors.

B. Downsides of Proxying

Adds latency and increases backend load.
Reduces the benefits of CDN scalability.

What Database YouTube uses for video metadata

What is Vitess?

Vitess is an open-source, distributed database solution that provides horizontal scaling for MySQL databases. It was originally developed by YouTube to manage its massive scale of data while keeping MySQL as the underlying relational database.

Vitess essentially acts as a MySQL cluster that is highly scalable, sharded, and capable of handling large amounts of traffic and data. It abstracts the complexity of managing a distributed MySQL setup and provides a single entry point to interact with the database.

Why Vitess for YouTube?

YouTube, being a massive platform, generates a huge amount of user and video-related metadata (e.g., user profiles, video views, likes, comments). Storing and querying this data efficiently, especially at scale, requires robust database technology. Here's why Vitess was used:

Horizontal Scalability: Vitess allows YouTube to split and distribute data across multiple MySQL instances, ensuring that it can scale horizontally to handle increasing traffic and data without compromising performance.
Sharding: Vitess uses sharding to distribute data across multiple databases based on predefined key values. This makes it possible to store and manage data in smaller chunks across different machines, making it more manageable and efficient.
High Availability and Reliability: Vitess provides built-in mechanisms for replication, failover, and backup, ensuring that YouTube's user metadata is always available, even during system failures.
SQL Compatibility: Vitess works with MySQL syntax and tooling, meaning YouTube can continue to use its existing MySQL tools and queries without having to migrate to a completely new database system.
Distributed Querying: It optimizes complex queries and allows users to run distributed queries across multiple shards seamlessly.

Vitess Use at YouTube

Vitess has been integral in scaling YouTube's infrastructure to handle millions of user interactions and metadata, such as:

User profiles, subscriptions, and activity history.
Video metadata, such as views, likes, comments, etc.
Channel and playlist management.
Real-time operations like querying and updating video metadata or user stats.

Vitess is used by YouTube to manage user metadata efficiently at massive scale. Its ability to scale MySQL horizontally through sharding and replication makes it an ideal choice for handling YouTube's vast, growing data, ensuring high availability and consistent performance.

How Pre-signed URLs Work for Video Uploads:

Backend Generates Pre-signed URL:
- The server (backend) generates a pre-signed URL using cloud storage APIs (e.g., AWS S3 SDK).
- This URL is signed with specific permissions (such as PUT access for uploading) and an expiration time, which restricts access.
- The URL can also include other parameters, such as content-type restrictions or metadata that will be associated with the uploaded file.
Client Uploads Video:
- The client (browser or mobile app) uses this pre-signed URL to upload the video directly to the cloud storage (e.g., S3) without going through your backend.
- The upload is done via HTTP methods (typically PUT for direct file uploads) to the pre-signed URL.
Security:
- Pre-signed URLs help ensure secure, temporary access to the storage service.
- The server generates the URL with limited permissions, ensuring that the client can only upload to a specific bucket or path, and only within a defined time window.
Post-upload Processing:
- Once the video is uploaded, the cloud storage service will trigger an event (like a notification or webhook), allowing the backend to perform any necessary post-processing, such as:
  - Transcoding the video.
  - Storing metadata (like duration, resolution, etc.) in a database.
  - Generating thumbnails or other assets.
Advantages of Using Pre-signed URLs for Video Uploads:
- Reduced Server Load: The backend doesn’t need to handle large file uploads, reducing server load and network usage.
- Scalability: Offloading file uploads to a cloud provider (like S3 or GCS) helps scale the upload process as your user base grows.
- Security: Pre-signed URLs are time-limited, meaning they’re valid only for a short duration, reducing the risk of unauthorized access.
- Direct Uploads: The client can upload directly to cloud storage, making the process faster and more efficient.

What is a Manifest File and how it useful in YouTube design?

A manifest file is a text-based file (usually in JSON or XML format) that provides metadata about the video content and its available formats. It acts as an index for the video player, listing:

Available video qualities (bitrate, resolution).
Audio tracks (languages, bitrate).
Subtitle tracks (languages).
Segment locations for streaming (URLs of video chunks).

For example, in the DASH protocol, the manifest file is called a Media Presentation Description (MPD), and for HLS, it's a playlist (.m3u8).

Purpose of Manifest Files in YouTube

1. Enabling Adaptive Bitrate Streaming (ABR)

What is ABR? Adaptive Bitrate Streaming allows YouTube to dynamically adjust the video quality based on the user's network conditions, device capabilities, and screen size.
The manifest file contains the URLs for video chunks at different bitrates and resolutions (e.g., 144p, 360p, 720p, 1080p, 4K). The player reads this file and switches between these versions during playback.

2. Video Chunking and Streaming

Instead of downloading the entire video file, YouTube divides videos into smaller chunks (e.g., 2-10 seconds per chunk).
The manifest file provides the location of these chunks, enabling the player to request only the segments it needs, reducing latency and buffering.

3. Efficient Bandwidth Usage

Users with slower networks can watch videos at lower resolutions. The manifest file allows the video player to choose the appropriate quality, ensuring smooth playback without buffering.
It prevents wasting bandwidth by fetching only the needed video and audio segments.

4. Support for Multiple Audio and Subtitle Tracks

If a video has multiple audio tracks (e.g., in different languages) or subtitles, the manifest file lists these options so the player can display them to the user.

5. Resilience to Failures

The manifest file contains multiple URLs for the same video chunk hosted on different CDNs or servers. This allows the player to switch servers if one becomes unavailable, ensuring uninterrupted playback.

6. Synchronization of Audio and Video Tracks

For videos with separate audio and video streams (common in adaptive streaming), the manifest file ensures that these tracks are synchronized during playback.

7. Enabling Advanced Features

Features like live streaming, DVR capabilities, and seeking within a video are supported by manifest files that contain information about the live stream or segment durations.

How Manifest Files Work in YouTube's Workflow

Video Transcoding and Chunking:
- After a video is uploaded, YouTube transcodes it into multiple resolutions and formats (e.g., 144p to 4K).
- The video is divided into small chunks for adaptive streaming.
Manifest File Creation:
- A manifest file is generated for the video, listing the available bitrates, resolutions, audio tracks, and chunk URLs.
Storage in CDN:
- The manifest file, along with the video and audio chunks, is stored on CDNs for efficient delivery to users worldwide.
Player Fetches the Manifest:
- When a user plays a video, the YouTube player fetches the manifest file from the CDN and begins requesting video/audio chunks based on the user’s network and device capabilities.
Dynamic Adaptation:
- As the network conditions change (e.g., a drop in speed), the player refers to the manifest file to switch to a lower-quality video stream.

Benefits of Using Manifest Files in YouTube

Scalability: Efficiently handles billions of daily views with varying network conditions.
Seamless Playback: Ensures smooth video delivery with adaptive streaming.
Bandwidth Optimization: Saves bandwidth by delivering only necessary video/audio segments.
Flexibility: Supports a wide variety of devices, resolutions, and network conditions.

How FCM/APN works in sending push notifications to the client device?

FCM is Google's cloud service for sending notifications to Android devices (and other platforms). Here's how it works:

1. Device Registers with FCM

The device (e.g., Android phone) registers with FCM and receives a registration token.
This registration token is a unique identifier for the device, similar to APN's device token, and is used to route messages to the correct device.

2. Backend Server Stores Registration Token

Your backend receives the registration token from the device and stores it. This token is essential for targeting the correct device for push notifications.

3. Backend Sends Notification Request to FCM

When you want to send a notification, your backend server constructs a push notification payload. The payload can include:
- Title (e.g., "New Message").
- Body (e.g., "You have a new message from John!").
- Data (optional, to include custom key-value pairs, such as links, actions, etc.).
- Notification options (e.g., priority, sound, icon).
Your backend sends the notification to FCM's HTTP endpoint. The server must authenticate using an API key or OAuth token.
```
POST https://fcm.googleapis.com/fcm/send
Content-Type: application/json
Authorization: key=YOUR_SERVER_KEY

{
  "to": "device_registration_token",
  "notification": {
    "title": "New Message",
    "body": "You have a new message from John!"
  },
  "data": {
    "custom_key": "custom_value"
  }
}
```
4. FCM Delivers the Notification
- FCM receives the notification request from your backend and attempts to deliver the notification to the target device using the registration token.
- If the device is unreachable or the app is not in the foreground, FCM will store the message and retry delivery.
5. Device Receives and Displays Notification
- The device receives the push notification. If the app is in the foreground, the notification may be handled programmatically (e.g., showing an in-app banner). If it’s in the background, the system handles the display based on the notification's content.

Notification Flow Summary:

Device Registration: Device registers with the respective service (APN or FCM) and gets a unique token.
Backend: The backend stores the device token and constructs a notification payload.
Push Notification: The backend sends the payload to APN (for iOS) or FCM (for Android) along with the device token.
Service Delivery: APN or FCM delivers the notification to the device.
Device: The device processes and displays the notification, either in the background or foreground.

What Database to be used for likes/dislikes DB ?

Choosing the right database for a likes/dislikes database depends on factors like the scale of the application, the expected read/write patterns, and the need for consistency and latency.

For a platform like YouTube, which has billions of users and high engagement, the database must handle:

High Write Throughput: As millions of users continuously like/dislike videos.
Low-Latency Reads: For quickly displaying likes/dislikes count.
Efficient Aggregation: To calculate and display metrics like the total number of likes/dislikes.
Scalability: To accommodate billions of records for videos and user interactions

1. NoSQL Databases

NoSQL databases are generally well-suited for high-throughput, low-latency operations.

a. Cassandra

Why?
- Distributed, highly scalable, and fault-tolerant.
- Optimized for write-heavy workloads.
- Allows for efficient querying of likes/dislikes counts and individual user interactions.
- Horizontal scaling is easy.

Data Model:

Partition Key: Video ID.
Clustering Key: User ID (to track individual user actions).

Columns: Action (like/dislike), timestamp

CREATE TABLE likes_dislikes (
  video_id UUID,
  user_id UUID,
  action TEXT,      -- 'like' or 'dislike'
  timestamp TIMESTAMP,
  PRIMARY KEY (video_id, user_id)
);

b. DynamoDB
- Why?
  - Managed NoSQL database with built-in scalability.
  - Strong performance for high-volume read/write operations.
  - Supports TTL (Time to Live) for data retention policies if needed.
- Data Model:
  - Partition Key: Video ID.
  - Sort Key: User ID.
  - Additional Attributes: Action, timestamp.
c. Redis (or Redis Streams)
- Why?
  - In-memory database for ultra-low-latency reads/writes.
  - Suitable for maintaining real-time counts of likes/dislikes.
  - Can be used as a cache for frequently accessed data.
- Use Case:
  - Store real-time counters of likes/dislikes per video.
  - Use a backend store (like Cassandra or DynamoDB) for persistent storage.
```
HSET video_likes:video_id likes_count 123 dislikes_count 45
```

One can use Kafka and Flink to create a robust, real-time pipeline for processing likes/dislikes data in a scalable and fault-tolerant way.

Here’s how and why this setup works:

Why Use Kafka and Flink?

Decoupling Components:
- Kafka acts as a durable message broker that decouples the system components.
- Even if Flink or another consumer is temporarily unavailable, Kafka ensures no data is lost.
Real-Time Aggregation:
- Flink can process streams of likes/dislikes data in real time and maintain aggregated counters (e.g., total likes/dislikes per video).
- It supports stateful processing with exactly-once guarantees.
Scalability:
- Kafka handles millions of events per second.
- Flink scales horizontally to process high-throughput streams.
Durability and Replay:
- Kafka retains the event logs, allowing you to replay data streams if needed (e.g., to recompute aggregates or debug issues).

Workflow: Putting Likes in Redis, Kafka, and Aggregating with Flink

1. User Action

A user clicks "like" or "dislike" for a video.
This action is recorded in Redis for immediate real-time read/write operations.

2. Push to Kafka

After writing to Redis, the like/dislike action (along with metadata such as user ID, video ID, timestamp, and action) is pushed to a Kafka topic.
Kafka Schema example
```
{
  "user_id": "12345",
  "video_id": "abcd1234",
  "action": "like",
  "timestamp": 1699999999999
}
```
3. Flink for Real-Time Aggregation
- Stream Processing:
  - Flink reads the data from Kafka.
  - Aggregates likes/dislikes per video in real time using tumbling or sliding windows.
- Stateful Aggregation:
  - Flink maintains state (e.g., running totals of likes/dislikes) in its internal state store or an external state backend (e.g., RocksDB).
- Output Aggregation:
  - Results are stored in a persistent database (e.g., Cassandra, DynamoDB) for long-term storage or used to update the Redis counters for faster reads.
- 4. Write Aggregates Back to Redis
  - The aggregated counts from Flink are periodically updated in Redis for real-time display on the frontend.

Detailed Data Flow

Frontend to Redis:
- Write: User actions (like/dislike) are immediately written to Redis for caching.
- Read: The frontend fetches counts from Redis to display likes/dislikes.
Redis to Kafka:
- The user action is asynchronously pushed to Kafka after being stored in Redis.
- Kafka topics may include:
  - likes-dislikes-raw: Raw user actions.
  - likes-dislikes-aggregates: Aggregated data from Flink.
Kafka to Flink:
- Flink consumes messages from likes-dislikes-raw:
  - Groups by video ID.
  - Maintains a running total of likes/dislikes.
  - Outputs aggregated data to likes-dislikes-aggregates.
Flink to Persistent Store:
- Flink updates:
  - Redis: For real-time counters.
  - Long-Term DB: For persistent storage (e.g., Cassandra, DynamoDB).

Advantages of This Pipeline

Real-Time Feedback: Redis ensures the system responds instantly to user actions.
Scalability: Kafka and Flink can handle high throughput, ensuring scalability for billions of users.
Fault Tolerance: Kafka ensures durability, and Flink provides exactly-once guarantees for data consistency.
Flexibility: Aggregation logic in Flink can be modified to add new metrics without affecting the rest of the system.
When Not to Use Flink or Kafka
If the system is small (e.g., fewer than 10 million users) or the real-time aggregation requirements are less demanding:
- Skip Kafka and Flink.
- Use Redis alone for counters and perform periodic batch updates to a persistent database.

HLD

what is the shard Key that is used for video metadata?

Using region_id + video_id as a sharding strategy for video metadata in a system like YouTube has both pros and cons. Let’s evaluate whether it is a good strategy based on the use case and query patterns.

Advantages

Data Locality for Regional Queries:
- Region-specific queries (e.g., fetching videos popular in a specific region) would benefit as the metadata for all videos from a particular region will reside on the same shard.
- This reduces cross-shard lookups for region-specific queries.
Improved Distribution:
- Adding video_id ensures better distribution of data within each region, as video_id acts as a secondary unique identifier. This prevents hotspots in shards dominated by a particular region.
Scalability:
- The combination provides a scalable partitioning key by balancing data across shards geographically (via region_id) and by the volume of videos (via video_id).
Efficient Updates:
- Video updates are localized to a single shard, improving the performance of update operations as no cross-shard communication is needed.

Disadvantages

Category-Based or Global Queries:
- Queries that span multiple regions, such as global searches or trending videos worldwide, will require querying multiple shards. This could increase latency for such use cases.
Uneven Load Distribution:
- Certain regions might have significantly higher traffic than others (e.g., North America vs. a smaller region), leading to uneven load distribution across shards.
- If region_id dominates the sharding key, high-traffic regions could overload certain shards, even with the video_id component.
Complex Query Logic:
- Combining region_id and video_id requires application logic to be aware of this composite key when querying or inserting data. This increases complexity.
Multi-Region Users:
- Videos that are popular in multiple regions may lead to redundant queries or the need to replicate metadata across shards, complicating synchronization.

When is `region_id + video_id` a Good Strategy?

This strategy works well in these scenarios:

Region-Specific Features:
- The majority of queries involve fetching videos based on regional preferences or regulations.
- Example: Serving region-specific trending videos, or complying with region-based content restrictions.
Balanced Regional Traffic:
- Regions have relatively balanced traffic or sharding logic includes additional load-balancing mechanisms.
Write-Heavy Systems:
- If the system is write-heavy and updates are more frequent than global reads, this strategy avoids cross-shard writes and ensures good write throughput.

When to Avoid `region_id + video_id`?

Global or Category-Centric Queries Dominate:
- Systems where queries often fetch videos across regions or categories (e.g., trending videos across categories worldwide).
- In such cases, consider sharding by category + video_id or just video_id for more even distribution.
Imbalanced Regional Load:
- When certain regions generate significantly higher traffic, consider other load-distributing strategies (e.g., hash(video_id)) to avoid regional hotspots.

Alternatives

Hash(video_id):
- Ensures even data distribution and avoids regional hotspots. However, it sacrifices data locality for region-specific queries.
Category + video_id:
- Useful if most queries are category-based (e.g., "all sports videos"). However, it may not handle regional queries efficiently.
Hybrid Strategy (Region-Specific and Global Clusters):
- Maintain separate clusters for region-specific queries and global queries.
- Use region_id + video_id in regional clusters and hash(video_id) for global clusters.

Conclusion

Using region_id + video_id as a sharding key is good for systems with a strong focus on regional queries and balanced regional traffic. However, for systems with heavy global query demands or imbalanced regional loads, you may need to consider hybrid or alternative strategies to balance locality and scalability. Always base the decision on query patterns and traffic distribution.

Shashank’s Substack

2. Video Content Storage

Database: Distributed Object Storage (e.g., Amazon S3, Google Cloud Storage)

Schema:

7. Notifications

Database: Key-Value Store (e.g., Redis, Memcached)

Schema:

Discussion about this post

Shashank’s Substack

Design YouTube

A YouTube-like system is a video-sharing platform that supports various user interactions.

3.Update Video Details

1. Metadata Storage

Database: Relational Database (e.g., PostgreSQL, MySQL)

2. Video Content Storage

Database: Distributed Object Storage (e.g., Amazon S3, Google Cloud Storage)

Schema:

3. Search and Discovery

Database: Search Engine (e.g., Elasticsearch, Apache Solr)

5. User Activity Logs

Database: Time-Series Database (e.g., InfluxDB, Prometheus)

Schema:

6. Analytics and Metrics

Database: Columnar Database (e.g., Apache Pinot, Google BigQuery)

7. Notifications

Database: Key-Value Store (e.g., Redis, Memcached)

Schema:

8. Caching

Database: In-Memory Cache (e.g., Redis, Memcached)

Schema:

1. Client-Server Interaction in Streaming

A. Initial Interaction with Backend Services

B. Direct Interaction with CDN

2. Why Clients Talk Directly to the CDN

A. Scalability

B. Reduced Latency

C. Cost Efficiency

D. Optimized for Streaming

Does the Streaming Service Ever Proxy CDN?

A. When Proxying Makes Sense

B. Downsides of Proxying

What is Vitess?

Why Vitess for YouTube?

Vitess Use at YouTube

How Pre-signed URLs Work for Video Uploads:

Advantages of Using Pre-signed URLs for Video Uploads:

What is a Manifest File and how it useful in YouTube design?

Purpose of Manifest Files in YouTube

1. Enabling Adaptive Bitrate Streaming (ABR)

2. Video Chunking and Streaming

3. Efficient Bandwidth Usage

4. Support for Multiple Audio and Subtitle Tracks

5. Resilience to Failures

6. Synchronization of Audio and Video Tracks

7. Enabling Advanced Features

How Manifest Files Work in YouTube's Workflow

Benefits of Using Manifest Files in YouTube

1. Device Registers with FCM

2. Backend Server Stores Registration Token

3. Backend Sends Notification Request to FCM

4. FCM Delivers the Notification

5. Device Receives and Displays Notification

Notification Flow Summary:

1. NoSQL Databases

a. Cassandra

Why Use Kafka and Flink?

Workflow: Putting Likes in Redis, Kafka, and Aggregating with Flink

1. User Action

2. Push to Kafka

3. Flink for Real-Time Aggregation

Detailed Data Flow

Advantages of This Pipeline

When Not to Use Flink or Kafka

Advantages

Disadvantages

When is region_id + video_id a Good Strategy?

When to Avoid region_id + video_id?

Alternatives

Conclusion

Discussion about this post

When is `region_id + video_id` a Good Strategy?

When to Avoid `region_id + video_id`?