Design Facebook Messenger
Designing Facebook Messenger (or a similar messaging system) requires identifying its functional and non-functional requirements. Here's a breakdown of the functional requirements for such a system:
FR
1.Users can send and receive real-time text messages.
2.Support for sending images,GIFs, and documents.
3.User can see online/offline status updates.
4.User can see message delivered indicator(like double tick /single tick)
5.Search within conversations (text, multimedia, or links).
6. Notifications for new messages(push)
7.Show when a user is typing(just like in slack we see)
8.Should support group messages also
Adv Req
1.Support for calls n videos
NFR
1.Ensure real-time message delivery with latency below 100ms for text messages in ideal network conditions.
2.System should be highly scalable to support billions of users with geographically distributed infrastructure.
3.System should be highly available to ensure minimum downtime
4.System should be reliable to ensure messages are delivered exactly once and in the correct order within a conversation.
Estimates
User Base: 1 billion active users.
Messages per User per Day: 50 messages.
Peak Traffic: 10% of daily traffic occurs during peak hours.
Average Message Size: 500 bytes (including metadata).
Storage Retention Period: 1 year.
Read-Write Ratio: Each message is read twice on average.
write QPS=1B*50messages=50B/10^5=50*10^4 QPS=500k QPS
Storage requirments=1B*50*500=25 TB/day
Storing it for 5 years=5*25Tb/day=125TB/day
API’s
1.POST api/v1/users/register
request
{
"username": "johndoe",
"email": "johndoe@example.com",
"password": "securepassword"
}
response
{
"userId": "12345",
"message": "User registered successfully"
}
2.POST api/v1/users/login
request
{
"email": "johndoe@example.com",
"password": "securepassword"
}
response
{
"token": "jwt-token",
"userId": "12345"
}
3.GET /users/{userId}
get user profile
Headers:
Authorization: Bearer <token>
{
"userId": "12345",
"username": "johndoe",
"status": "Online",
"profilePicture": "https://cdn.example.com/images/johndoe.jpg"
}
4.POST api/v1/messages
request
{
"senderId": "12345",
"receiverId": "67890",
"message": "Hello, how are you?",
"type": "text" // Options: text, image, video, file
}
response
{
"messageId": "54321",
"timestamp": "2024-11-25T12:34:56Z"
}
5.GET api/v1/conversations/{conversationId}
to get conversations
Query Parameters:
limit
: Number of messages to fetch (default: 50).offset
: Pagination offset.{ "conversationId": "98765", "messages": [ { "messageId": "54321", "senderId": "12345", "message": "Hello, how are you?", "timestamp": "2024-11-25T12:34:56Z" }, ... ] }
6.DELETE api/v1/messages/{messageId}
{
"message": "Message deleted successfully"
}
7.POST api/v1/groups/create
{
"groupName": "Family Chat",
"adminId": "12345",
"members": ["67890", "11223"]
}
8.GET api/v1/search/conversations
Query Parameters:
query
: Search term.userId
: User performing the search.{ "results": [ { "conversationId": "98765", "lastMessage": "Hello, how are you?", "timestamp": "2024-11-25T12:34:56Z" }, ... ] }
Databases
1. User Management
Database: Relational Database (e.g., PostgreSQL, MySQL)
Reason: User data involves structured relationships and requires ACID transactions for consistency.
Schema: User Table
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(255) NOT NULL UNIQUE,
email VARCHAR(255) NOT NULL UNIQUE,
password_hash VARCHAR(255) NOT NULL,
status VARCHAR(50) DEFAULT 'offline',
profile_picture_url TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
2. Messaging
Database: NoSQL Document Store (e.g., MongoDB, DynamoDB)
Reason: Messages involve high write throughput and hierarchical data (e.g., conversations, media links).
Schema: Messages Collection
{
"message_id": "uuid",
"conversation_id": "uuid",
"sender_id": "user_id",
"receiver_id": "user_id",
"content": "Hello, how are you?",
"type": "text", // Options: text, image, video
"media_url": null,
"timestamp": "2024-11-25T12:34:56Z",
"status": "delivered" // Options: sent, delivered, read
}
3. Group Chats
Database: Relational Database (e.g., PostgreSQL, MySQL)
Reason: Group chat data often involves many-to-many relationships.
CREATE TABLE groups ( group_id SERIAL PRIMARY KEY, group_name VARCHAR(255) NOT NULL, admin_id INT REFERENCES users(user_id), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
Group member Table
CREATE TABLE group_members ( group_id INT REFERENCES groups(group_id), user_id INT REFERENCES users(user_id), joined_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (group_id, user_id) );
Group Message collection
{
"message_id": "uuid",
"group_id": "uuid",
"sender_id": "user_id",
"content": "Meeting at 3 PM!",
"type": "text",
"timestamp": "2024-11-25T12:34:56Z"
}
4. Real-Time Communication
Database: In-Memory Store (e.g., Redis)
Reason: Typing indicators, online presence, and active conversations need low-latency data access.
Schema: Typing Indicators
Key: typing:{conversation_id}:{user_id}
Value: { "typing": true, "timestamp": "2024-11-25T12:34:56Z" }
HLD
SOME IMPORTANT QUESTIONS
1.How to ensure proper ordering of messages in a group chat in WhatsApp, which operates on a highly distributed system, is a complex problem due to the need for consistency, scalability, and low latency
1. Message Ordering Challenges
Concurrent Sends: Multiple users in a group may send messages simultaneously.
Distributed Systems: Users are connected to different servers across a distributed network.
Network Delays: Messages might arrive at servers or clients out of order due to variable network latency.
2. WhatsApp’s Approach to Message Ordering
WhatsApp ensures eventual consistency and ordering through a combination of strategies:
3. Message Sequencing
Server-Side Timestamps:
Each message is assigned a timestamp by the server when it is received from the sender.
This timestamp ensures that messages from different users in a group have a reference for ordering.
Sequence Numbers:
Messages may also be assigned a monotonically increasing sequence number specific to the group chat.
The server ensures that each new message has a higher sequence number than the last.
Conflict Resolution
If two messages have the same timestamp (unlikely, but possible), the tie is broken using:
Sequence Number: Priority is given to the message with the lower sequence number.
Sender ID: As a fallback, messages are ordered lexicographically by sender ID.
Client-Side Ordering
The WhatsApp client maintains a local queue for incoming messages.
Messages are inserted into the queue based on their sequence number or timestamp.
The client displays messages in the correct order, even if they arrive out of sequence.
WorkFLow
Message Sent:
User A sends a message to Group G.
Timestamp and Sequence Assignment:
Server S1 assigns a timestamp and sequence number to the message.
Fan-Out to Group Members:
Server S1 sends the message to all servers handling Group G’s members.
Message Delivery:
Each server delivers the message to its local clients in sequence.
Clients reorder messages locally if necessary.
Missing Message Recovery:
If a gap in sequence numbers is detected, the server retransmits missing messages.
2.Assigning timestamps in a distributed system like WhatsApp's architecture, where multiple servers are involved, requires careful coordination to ensure message ordering is consistent.
Using an atomic clock across servers is not practical due to complexity and cost. Instead, WhatsApp and similar systems employ logical time synchronization techniques combined with server coordination mechanisms. Here’s how this is managed:
Challenges in Timestamping Across Servers
Clock Skew:
Servers in a distributed system may have slightly different local times due to drift.
Network Latency:
Messages may arrive at servers at different times, further complicating timestamp accuracy.
Consistency Requirements:
Timestamps must maintain a logical order, especially in scenarios like group messaging.
Techniques for Managing Timestamps
1.Network Time Protocol(NTP)
How It Works:
Servers synchronize their clocks periodically with NTP servers, which maintain a consistent time source.
NTP provides reasonably accurate synchronization (within a few milliseconds).
Limitations:
NTP alone cannot guarantee perfectly synchronized timestamps in a high-frequency messaging system.
It may still lead to small inconsistencies, especially during transient network delays.
2.Lamport Clocks
Logical clocks are used to assign timestamps based on event order rather than real time.
Each server maintains a counter:
Increment the counter whenever a new message is sent.
Include the counter value with the message.
On receiving a message, set the local counter to max(local_counter, received_counter) + 1.
This ensures a consistent ordering of events without relying on synchronized physical clocks.
Why atomic clocks are not used
Cost: Atomic clocks are expensive and impractical for each server in a large-scale distributed system.
Precision Overhead: The level of precision provided by atomic clocks (nanoseconds) exceeds the requirements for messaging systems.
Alternative Approaches: Logical clocks and hybrid approaches are sufficient for ordering guarantees and are more scalable.
How Push Notifications are pushed when users are offline
1. Components of the Push Notification System
Application Server:
The backend server of the app (e.g., WhatsApp server) sends push notification requests.
Push Notification Service:
Platforms provided by device manufacturers handle the actual delivery:
Apple Push Notification Service (APNs) for iOS.
Firebase Cloud Messaging (FCM) for Android.
Client Device:
The user's phone or tablet where the app resides.
2. Offline Push Notification Workflow
Step 1: Sending the Notification
The app's backend server generates a push notification with the necessary data (e.g., message text, user ID).
The notification is sent to the push notification service (e.g., APNs or FCM).
Step 2: Storing the Notification
If the user's device is offline:
The push notification service stores the notification in its delivery queue.
Notifications are queued with metadata such as:
Device Token: Unique identifier for the device.
Expiration Time: Specifies how long the notification should be retained if undelivered.
Step 3: Delivering the Notification
When the user's device comes online:
The push notification service detects the device's availability via its regular heartbeat or sync mechanisms.
The service delivers the stored notification to the device.
The notification is routed to the app via the operating system’s push handling mechanisms.
Step 4: User Interaction
The notification appears on the user's screen.
If the user interacts with the notification:
The app retrieves additional content (if needed) from the application server.
Handling Offline Users in Messaging Apps
Messaging apps like WhatsApp integrate push notifications into their system for offline users:
Message Notification:
When a user receives a message while offline, the app server sends a push notification to FCM/APNs.
The notification carries lightweight metadata (e.g., "You have a new message").
Message Retrieval:
Once the user opens the app, the app retrieves the full message data from the server.
Example Scenarios
User Turns Off Phone:
Notifications are queued on the push service.
Once the phone is turned on, the queued notifications are delivered.
User Temporarily Loses Internet:
The push service detects the device is offline and queues the notifications.
As soon as the connection is restored, notifications are sent.