HLD: Design Notification System
Let's look into FR
Functional Requirements
The system should send notifications to users through multiple channels such as Email, SMS, Push, and In-App.
The system should support real-time as well as scheduled (delayed) notifications.
The system should allow applications to trigger notifications via APIs or events.
The system should personalize notifications using templates and user-specific data.
The system should support user preferences for notification channels, frequency, and opt-in/opt-out.
The system should ensure reliable delivery with retries and fallback channels on failure.
The system should handle high throughput and bulk notifications efficiently.
The system should provide delivery status tracking (sent, delivered, failed, read).
The system should support notification prioritization and ordering where required.
The system should expose APIs and dashboards for managing templates, campaigns, and notification history.
Non-Functional Requirements (NFR)
The system should be highly available and fault-tolerant with no single point of failure.
The system should scale horizontally to handle millions of notifications per day.
The system should ensure low latency for real-time notifications.
The system should guarantee message durability and prevent notification loss.
The system should ensure security, authentication, and authorization for all APIs.
The system should be observable with monitoring, logging, and alerting for failures and delays.
1. Send NotificationPOST /api/v1/notifications/send
Request
{
"userId": "u123",
"channel": ["EMAIL", "PUSH"],
"templateId": "order_success",
"data": { "orderId": "o789", "amount": 499 },
"priority": "HIGH"
}
Response
{
"notificationId": "n456",
"status": "QUEUED"
}
2. Send Bulk NotificationsPOST /api/v1/notifications/bulk
Request
{
"userIds": ["u1", "u2", "u3"],
"channel": ["SMS"],
"templateId": "promo_offer",
"data": { "discount": "20%" }
}
Response
{
"batchId": "b101",
"status": "ACCEPTED"
}
3. Schedule NotificationPOST /api/v1/notifications/schedule
Request
{
"userId": "u123",
"channel": ["EMAIL"],
"templateId": "reminder",
"scheduleAt": "2026-01-15T10:00:00Z",
"data": { "event": "Meeting" }
}
Response
{
"notificationId": "n789",
"status": "SCHEDULED"
}
4. Get Notification StatusGET /api/v1/notifications/{notificationId}
Response
{
"notificationId": "n456",
"status": "DELIVERED",
"deliveredAt": "2026-01-13T11:30:00Z"
}
5. Manage Notification TemplatesPOST /api/v1/templates
Request
{
"templateId": "order_success",
"channel": "EMAIL",
"content": "Your order {{orderId}} of amount {{amount}} is successful"
}
Response
{
"status": "CREATED"
}
6. Update User Notification PreferencesPUT /api/v1/users/{userId}/preferences
Request
{
"channels": {
"EMAIL": true,
"SMS": false,
"PUSH": true
}
}
Response
{
"status": "UPDATED"
}
7. Get User Notification PreferencesGET /api/v1/users/{userId}/preferences
Response
{
"channels": {
"EMAIL": true,
"SMS": false,
"PUSH": true
}
}
8. Notification HistoryGET /api/v1/users/{userId}/notifications
Response
{
"notifications": [
{
"notificationId": "n456",
"channel": "EMAIL",
"status": "DELIVERED"
}
]
}
Databases to be Used
Relational DB (PostgreSQL / MySQL)
→ For users, templates, preferences, notification metadata, scheduling.NoSQL / Wide-column DB (Cassandra / DynamoDB)
→ For high-volume notification events, delivery status, history.In-Memory Store (Redis)
→ For retries, rate limiting, idempotency keys, delayed queues.
Database Schema
1. users (RDBMS)
users (
user_id VARCHAR(64) PRIMARY KEY,
email VARCHAR(255),
phone VARCHAR(20),
push_token VARCHAR(255),
created_at TIMESTAMP
)
2. user_notification_preferences (RDBMS)
user_notification_preferences (
user_id VARCHAR(64),
channel VARCHAR(20), -- EMAIL, SMS, PUSH
is_enabled BOOLEAN,
PRIMARY KEY (user_id, channel)
)
3. notification_templates (RDBMS)
notification_templates (
template_id VARCHAR(64) PRIMARY KEY,
channel VARCHAR(20),
title VARCHAR(255),
body TEXT,
created_at TIMESTAMP
)
4. notifications (RDBMS – Metadata)
notifications (
notification_id VARCHAR(64) PRIMARY KEY,
user_id VARCHAR(64),
template_id VARCHAR(64),
priority VARCHAR(20),
scheduled_at TIMESTAMP,
created_at TIMESTAMP,
status VARCHAR(20) -- QUEUED, SENT, FAILED
)
5. notification_events (NoSQL – High Write)
notification_events (
notification_id TEXT,
channel TEXT,
status TEXT, -- SENT, DELIVERED, FAILED, READ
failure_reason TEXT,
event_time TIMESTAMP,
PRIMARY KEY ((notification_id), event_time)
)
Why NoSQL?
• Millions of writes/day
• Append-only events
• Fast reads for history & analytics
6. scheduled_notifications (RDBMS / Redis)
scheduled_notifications (
notification_id VARCHAR(64) PRIMARY KEY,
execute_at TIMESTAMP,
status VARCHAR(20)
)
7. retry_queue (Redis)
KEY: retry:{notification_id}
VALUE: retry_count, next_retry_time
TTL: based on retry policy
Indexing Strategy
CREATE INDEX idx_notifications_user ON notifications(user_id);
CREATE INDEX idx_notifications_status ON notifications(status);
CREATE INDEX idx_events_time ON notification_events(event_time);
Why This Design Works (Interview Angle)
• RDBMS ensures consistency for templates & preferences
• NoSQL handles high-throughput event tracking
• Redis ensures fast retries, rate limits, and scheduling
• Clean separation between metadata and delivery events
Below is a clean, interview-grade breakdown of Microservices and their interactions for the Notification System HLD.
Microservices
1. Notification API Service
Responsibility
Exposes REST APIs to clients
Validates request & idempotency
Fetches user preferences
Persists notification metadata
Tech
Spring Boot / Go
PostgreSQL
2. Template Service
Responsibility
Manage notification templates
Template rendering with dynamic data
Channel-specific formatting
Tech
Spring Boot
PostgreSQL
3. Preference Service
Responsibility
Manage user notification preferences
Opt-in / opt-out handling
Channel enablement checks
Tech
Spring Boot
PostgreSQL
4. Notification Orchestrator
Responsibility
Decides delivery path per channel
Applies priority & scheduling rules
Publishes messages to queues/topics
Tech
Java / Go
Kafka / SQS
5. Scheduler Service
Responsibility
Handles delayed notifications
Triggers notifications at scheduled time
Pushes events to Orchestrator
Tech
Quartz / Redis Sorted Sets
PostgreSQL
6. Channel Delivery Services
(Separate service per channel)
a. Email Service
b. SMS Service
c. Push Service
Responsibility
Consume messages from queue
Integrate with external providers
Handle retries & failures
Tech
Kafka Consumers
External APIs (SES, Twilio, FCM)
7. Retry & Dead Letter Service
Responsibility
Retry failed notifications
Move permanent failures to DLQ
Apply exponential backoff
Tech
Kafka DLQ
Redis
8. Status & Tracking Service
Responsibility
Track delivery status
Store events (sent, delivered, read)
Expose notification history APIs
Tech
Cassandra / DynamoDB
Microservice Interaction Flow
Real-Time Notification Flow
Client → Notification API Service (send request)
API Service → Preference Service (check opt-in)
API Service → Template Service (render message)
API Service → Notification Orchestrator
Orchestrator → Kafka Topic (per channel)
Channel Service → External Provider
Channel Service → Status Service (delivery event)
Status Service → DB (store event)
Scheduled Notification Flow
Client → Notification API Service
API Service → Scheduler Service
Scheduler waits until
execute_atScheduler → Notification Orchestrator
Follow real-time flow from step 5
Kafka Topic Design
notification.email
notification.sms
notification.push
notification.retry
notification.dlq
Partition Key
user_id OR notification_id
Failure Handling Interaction
Channel Service fails delivery
Event pushed to retry topic
Retry Service schedules retry via Redis
Exceeded retries → DLQ
Alert triggered for manual intervention
Why This Microservice Design Scales Well
• Loose coupling via message queues
• Independent scaling per channel
• Failure isolation per delivery mechanism
• Supports millions of notifications/day
• Easy to add new channels (WhatsApp, Slack)
+------------------+
| Client / App |
+------------------+
|
v
+--------------------------------+
| Notification API Service |
+--------------------------------+
| | |
v v v
+-------------+ +-------------+ +------------------+
| Preference | | Template | | Scheduler |
| Service | | Service | | Service |
+-------------+ +-------------+ +------------------+
| | |
+-----------+--------------------+
|
v
+----------------------------------+
| Notification Orchestrator |
+----------------------------------+
|
v
+------------------------------+
| Message Queue (Kafka) |
+------------------------------+
| | |
v v v
+---------------+ +---------------+ +---------------+
| Email Service | | SMS Service | | Push Service |
+---------------+ +---------------+ +---------------+
| | |
v v v
+--------------------------------------+
| External Providers (SES / Twilio / |
| FCM / APNS etc.) |
+--------------------------------------+
|
v
+--------------------------------+
| Status & Tracking Service |
+--------------------------------+
|
v
+--------------------------------+
| Events DB (Cassandra/Dynamo) |
+--------------------------------+
Failure & Retry Flow (Logical)
Channel Service Failure
|
v
+-------------------+
| Retry Queue |
| (Kafka / Redis) |
+-------------------+
|
v
Retry Attempts → Success → Status Service
|
v
Exceeded Retries
|
v
+-------------------+
| Dead Letter Queue |
+-------------------+
HLD

