Design Global shopping site delivering over 1B notifications (12K/sec) via email, iOS, and Android, with multi-team design flexibility.
Let's dissect it further
Problem Statement
This is a global shopping site that is sending notifications. Over 1B notifications is around 12K notifications per second.
Notifications could be email, iOS or Android notifications and different teams inside the company can design these notifications.
FR
1.The system must support processing and dispatching notifications at a sustained rate of 12K notifications per second
2.Support for various delivery channels such as email, iOS push notifications, and Android push notifications, with channel-specific formatting and protocols.
3.Provide self-service tools or APIs for different teams to design, manage, and update notification templates.
4.Support immediate as well as scheduled notifications.
5.Enable dynamic content insertion to personalize notifications based on user data.(for Eg dynamically insert order details in an email or push notification)
6.Manage opt-in/opt-out preferences for various channels.
7.Implement reliable delivery mechanisms, including handling transient failures with automatic retries.
8.Capture metrics such as delivery status, open rates, click-through rates, and failure rates.
9.Notification should be sent exactly once
NFR
1.System should be highly scalable to handle at least 12K notifications per second and scale horizontally to support peak loads.
2.System should have low latency to have Notification generation (including template processing, content insertion, and rendering) should complete within 50ms per request.
3.System should be highly available.If one notification channel fails (e.g., push notifications), the system should retry with another channel (e.g., email).
4.System should ensure eventual consistency in multi-region deployments when user data updates affect notification content.
Estimates
User ID: 16 bytes (UUID)
Notification Type: 1 byte (Email, iOS, Android, etc.)
Template ID: 4 bytes
Personalization Data: 256 bytes (Name, Order Details, etc.)
Timestamp: 8 bytes
Delivery Status: 1 byte (Success/Failure/Pending)
Total per notification: ~286 bytes
Daily Storage Requirement:1𝐵×286bytes=286GB/day
Yearly Storage: 286 * 400=110TB
Compute Capacity
To process 12k notifications per second
Each notification needs 10ms CPU time for rendering and personalization.
Single core handles ~100 ops/sec.
Total CPU Cores Required:= 12000/100=120 CPU cores
Network bandwidth
Assuming an average notification payload (with metadata) is 500 bytes:
12k*500bytes=6000kbytes=6MB/sec=500GB/day
API’s
1.POST /notifications/send
request {
"user_id": "123e4567-e89b-12d3-a456-426614174000",
"channel": "push",
"template_id": "order_shipped",
"personalization": {
"username": "Alex",
"order_id": "ORD12345",
"delivery_date": "2025-02-16"
},
"scheduled_at": "2025-02-16T10:00:00Z"
}
response {
"notification_id": "abc123",
"status": "queued"
}
2.POST /notifications/bulk-send
request {
"campaign_id": "promo_2025",
"channel": "email",
"template_id": "discount_offer",
"users": [
{"user_id": "1", "personalization": {"name": "John"}},
{"user_id": "2", "personalization": {"name": "Emma"}}
]
}
response {
"campaign_id": "promo_2025",
"status": "processing",
"estimated_delivery_time": "2025-02-16T10:15:00Z"
}
3.GET /notifications/status/{notification_id}
response {
"notification_id": "abc123",
"status": "delivered",
"delivered_at": "2025-02-16T10:01:23Z"
}
4.GET /users/{user_id}/preferences
Manages user opt-ins & notification settings
{
"user_id": "123",
"preferences": {
"email": true,
"push": false,
"sms": true
}
}
5.POST /templates
Allows teams to create/update templates
request {
"template_id": "order_shipped",
"channel": "email",
"subject": "Your Order is on the Way!",
"body": "Hi {{username}}, your order #{{order_id}} will be delivered by {{delivery_date}}."
}
response {
"template_id": "order_shipped",
"status": "created"
}
6.GET /notifications/analytics?from=2025-02-10&to=2025-02-16
{
"total_sent": 5000000,
"total_delivered": 4800000,
"open_rate": 0.45,
"click_rate": 0.12
}
7.POST /notifications/retry
request {
"notification_id": "abc123"
}
response
{
"notification_id": "abc123",
"status": "retrying"
}
GET /notifications/bulk/status/{bulk_id}
response [ {
"notification_id": "abc123",
"status": "delivered",
"delivered_at": "2025-02-16T10:01:23Z"
},{}....]
Databases
1. Notification Metadata Storage (Cassandra/DynamoDB)
🔹 Stores notification records efficiently with fast writes & reads.
🔹 Partitioned by user_id
for fast lookups.
Table: notifications
(Wide Column Store – Cassandra/DynamoDB)
CREATE TABLE notifications (
user_id UUID, -- Partition Key
notification_id UUID, -- Unique Notification ID
channel TEXT, -- email, push, sms
template_id TEXT, -- Reference to template
personalization JSON, -- Dynamic content data
status TEXT, -- queued, sent, delivered, failed
created_at TIMESTAMP, -- When notification was created
updated_at TIMESTAMP, -- Status update time
PRIMARY KEY (user_id, created_at) -- Query per user’s history
);
Why NoSQL?
✅ Fast writes & reads
✅ Horizontally scalable
✅ Handles high throughput (~12K/sec)
2. User Preferences Storage (PostgreSQL/MySQL)
🔹 Stores user opt-in/opt-out settings for compliance (GDPR, CAN-SPAM, etc.).
Table: user_preferences
(Relational DB – PostgreSQL/MySQL)
CREATE TABLE user_preferences (
user_id UUID PRIMARY KEY,
email_enabled BOOLEAN DEFAULT TRUE,
push_enabled BOOLEAN DEFAULT TRUE,
sms_enabled BOOLEAN DEFAULT FALSE,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Why SQL?
✅ Strong consistency required for preference updates
✅ Rare writes, frequent reads
3. Notification Templates Storage (PostgreSQL/MySQL)
🔹 Stores notification message templates with placeholders for personalization.
Table: notification_templates
(Relational DB – PostgreSQL/MySQL)
CREATE TABLE notification_templates (
template_id TEXT PRIMARY KEY,
channel TEXT CHECK (channel IN ('email', 'push', 'sms')),
subject TEXT, -- For email
body TEXT, -- Dynamic content with placeholders
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
{
"template_id": "order_shipped",
"subject": "Your Order is on the Way!",
"body": "Hi {{username}}, your order #{{order_id}} will be delivered by {{delivery_date}}."
}
✅ Why SQL? Strong consistency required
✅ Rare writes, frequent reads
4. Notification Status Updates (Real-Time Tracking) (Cassandra/DynamoDB)
🔹 Stores status updates of sent notifications.
Table: notification_status
(NoSQL – Cassandra/DynamoDB)
CREATE TABLE notification_status (
notification_id UUID PRIMARY KEY,
user_id UUID,
channel TEXT,
status TEXT, -- queued, sent, failed, delivered
provider_response JSON, -- e.g., Firebase/APNs response
last_updated TIMESTAMP
);
✅ Optimized for fast lookups & real-time updates
5. Notification Retry & Dead Letter Queue (Kafka/RabbitMQ)
🔹 Stores failed notifications for retry processing.
Dead Letter Queue (DLQ) Schema (Kafka/RabbitMQ)
{
"notification_id": "abc123",
"user_id": "123e4567-e89b-12d3-a456-426614174000",
"channel": "email",
"retry_count": 2,
"last_attempt": "2025-02-16T10:05:00Z",
"error_message": "SMTP Timeout"
}
✅ Retried with exponential backoff
6. Analytics & Logging (Elasticsearch/OpenSearch)
🔹 For searching, tracking, and analytics
Index: notification_logs
(Elasticsearch/OpenSearch)
{
"notification_id": "abc123",
"user_id": "123e4567-e89b-12d3-a456-426614174000",
"channel": "push",
"status": "delivered",
"sent_at": "2025-02-16T10:01:23Z",
"clicked_at": "2025-02-16T10:03:00Z"
}
✅ Efficient analytics on delivery rates, open rates, etc.
HLD
Good read, design diagram could have been more clear