HLD: Design DB migration platform for a company like TicketMaster transitioning to a nosql based datastore.(Asked in Mongo DB interview)

Shashank Mishra

Apr 10, 2026

If you’re looking for paid 1:1 mentorship with a strong focus on LLD (core emphasis on Multithreading), HLD, DSA, and system design research papers, feel free to reach out.

📩 Contact: programmingappliedai@gmail.com

Support bulk migration of historical data from RDBMS to NoSQL
Capture and replicate real-time data changes (CDC) from source to target
Transform relational schema into NoSQL-compatible data models
Enable dual writes to both databases during migration phase
Ensure seamless incremental sync between bulk load and live updates
Provide data validation mechanisms (counts, checksums, record-level diff)
Support controlled traffic cutover from old DB to new datastore
Allow rollback to source database in case of failures
Handle write conflicts and ensure consistency during dual-write phase
Provide monitoring, alerting, and operational controls (pause/resume/retry)

NFR

High Availability – Migration should not cause downtime; ticket booking and browsing must remain continuously available
Strong Consistency (Critical Flows) – Ensure strict consistency for operations like seat booking to prevent double bookings
Scalability – System should handle large-scale data (TB–PB) and high throughput during peak traffic events
Fault Tolerance – System should recover from failures using retries, idempotent operations, and checkpointing
Low Latency Impact – Migration (especially dual writes and CDC) should not significantly increase application latency
Observability – Provide detailed metrics, logging, and alerting for migration progress, lag, errors, and system health

API’s

1. Start Bulk Migration

POST /migration/start

Request:

{
  “migrationId”: “mig_123”,
  “sourceDB”: “mysql_cluster_A”,
  “targetDB”: “cassandra_cluster_B”,
  “tables”: [”events”, “tickets”, “bookings”],
  “batchSize”: 10000
}

Response:

{
  “status”: “STARTED”,
  “migrationId”: “mig_123”
}

2. Get Migration Status

GET /migration/{migrationId}/status

Response:

{
  “migrationId”: “mig_123”,
  “status”: “IN_PROGRESS”,
  “progress”: “65%”,
  “lagInMs”: 1200,
  “errors”: 10
}

3. Pause Migration

POST /migration/{migrationId}/pause

Response:

{
  “status”: “PAUSED”
}

4. Resume Migration

POST /migration/{migrationId}/resume

Response:

{
  “status”: “RESUMED”
}

5. Start CDC (Real-Time Sync)

POST /migration/{migrationId}/cdc/start

Request:

{
  “source”: “binlog”,
  “offset”: “log_456”
}

Response:

{
  “status”: “CDC_STARTED”
}

6. Get CDC Lag

GET /migration/{migrationId}/cdc/lag

Response:

{
  “lagInMs”: 800,
  “lastSyncedTimestamp”: “2026-04-10T10:00:00Z”
}

7. Validate Data

POST /migration/{migrationId}/validate

Request:

{
  “type”: “checksum”,
  “tables”: [”bookings”]
}

Response:

{
  “status”: “VALIDATION_STARTED”
}

8. Get Validation Report

GET /migration/{migrationId}/validation/report

Response:

{
  “status”: “COMPLETED”,
  “mismatchedRecords”: 25,
  “details”: “checksum mismatch in bookings”
}

9. Trigger Cutover

POST /migration/{migrationId}/cutover

Request:

{
  “mode”: “phased”,
  “trafficPercentage”: 50
}

Response:

{
  “status”: “CUTOVER_IN_PROGRESS”
}

10. Rollback Migration

POST /migration/{migrationId}/rollback

Response:

{
  “status”: “ROLLED_BACK”
}

11. Dual Write Toggle

POST /migration/{migrationId}/dual-write

Request:

{
  “enabled”: true
}

Response:

{
  “status”: “DUAL_WRITE_ENABLED”
}

Databases and Schema

Databases Used in Migration Platform

1. Source Database (RDBMS)

Example: MySQL / PostgreSQL
Purpose: Existing transactional system (Ticket booking, inventory)

2. Target Database (NoSQL)

Example: Cassandra / DynamoDB
Purpose: Scalable, high-throughput datastore post-migration

3. Migration Metadata DB

Example: PostgreSQL / MySQL
Purpose: Track migration jobs, states, checkpoints

4. Validation / Audit Store

Example: S3 / Blob store / Cassandra
Purpose: Store validation reports, mismatches

5. Cache (Optional)

Example: Redis
Purpose: Speed up reads during dual-read phase

Schema Design

1. Source DB Schema (Relational - Simplified)

Events Table

Events(
  event_id BIGINT PRIMARY KEY,
  name VARCHAR,
  venue VARCHAR,
  date TIMESTAMP
)

Tickets Table

Tickets(
  ticket_id BIGINT PRIMARY KEY,
  event_id BIGINT,
  seat_number VARCHAR,
  price DECIMAL,
  status VARCHAR,  -- AVAILABLE / BOOKED
  FOREIGN KEY (event_id) REFERENCES Events(event_id)
)

Bookings Table

Bookings(
  booking_id BIGINT PRIMARY KEY,
  user_id BIGINT,
  ticket_id BIGINT,
  status VARCHAR, -- CONFIRMED / CANCELLED
  created_at TIMESTAMP
)

👉 Highly normalized → joins required

2. Target DB Schema (NoSQL - Cassandra Style)

Design Principle

Query-driven
Denormalized
Avoid joins

Table: Event_Tickets (Primary Read Path)

Event_Tickets(
  event_id,
  seat_number,
  ticket_id,
  price,
  status,
  user_id,
  booking_status,
  PRIMARY KEY (event_id, seat_number)
)

👉 Fetch all seats for an event quickly

👉 Handles seat availability + booking in one place

Table: User_Bookings

User_Bookings(
  user_id,
  booking_id,
  event_id,
  ticket_id,
  status,
  created_at,
  PRIMARY KEY (user_id, booking_id)
)

👉 Query bookings per user

Table: Event_Metadata

Event_Metadata(
  event_id PRIMARY KEY,
  name,
  venue,
  date
)

3. Migration Metadata DB Schema

Migration Jobs

Migration_Jobs(
  migration_id VARCHAR PRIMARY KEY,
  status VARCHAR, -- STARTED / IN_PROGRESS / COMPLETED
  start_time TIMESTAMP,
  end_time TIMESTAMP
)

Table Progress Tracking

Migration_Progress(
  migration_id,
  table_name,
  last_processed_pk BIGINT,
  status VARCHAR,
  PRIMARY KEY (migration_id, table_name)
)

👉 Used for checkpointing + resume

CDC Offsets

CDC_Offsets(
  migration_id,
  source,
  offset VARCHAR,
  updated_at TIMESTAMP,
  PRIMARY KEY (migration_id, source)
)

👉 Tracks binlog/Kafka offset

4. Validation / Audit Schema

Validation Results

Validation_Results(
  migration_id,
  table_name,
  total_records_source BIGINT,
  total_records_target BIGINT,
  mismatch_count BIGINT,
  status VARCHAR,
  PRIMARY KEY (migration_id, table_name)
)

Mismatch Records (Optional)

Mismatch_Records(
  migration_id,
  table_name,
  record_id,
  source_value TEXT,
  target_value TEXT
)

Key Design Insights (Interview Gold)

Denormalization in NoSQL → avoids joins, improves latency
Multiple tables per query pattern → common in Cassandra
Checkpointing via Migration_Progress → ensures fault tolerance
CDC offsets → critical for exactly-once / no data loss
Event_Tickets table → solves seat consistency + fast lookup

Microservices in DB Migration Platform

1. Migration Orchestrator Service

Responsibility:

Central controller of migration lifecycle
Starts/stops jobs, manages phases (bulk → CDC → cutover)

Key tasks:

Trigger bulk migration
Start/stop CDC
Initiate cutover / rollback
Maintain state machine

2. Bulk Migration Service

Responsibility:

Migrate historical data from RDBMS → NoSQL

Key tasks:

Read data in batches
Transform schema (normalize → denormalize)
Write to NoSQL
Update checkpoints

3. CDC (Change Data Capture) Service

Responsibility:

Capture real-time changes from source DB

Key tasks:

Read binlogs (Debezium/Kafka)
Convert events → NoSQL format
Push to stream (Kafka)

4. Stream Processing / Replication Service

Responsibility:

Apply CDC events to NoSQL

Key tasks:

Consume Kafka topics
Ensure ordering (per key)
Idempotent writes
Conflict resolution

5. Schema Transformation Service

Responsibility:

Convert relational schema → NoSQL model

Key tasks:

Flatten joins
Build denormalized records
Maintain mapping rules

6. Dual Write Proxy Service

Responsibility:

Enable application to write to both DBs

Key tasks:

Intercept write requests
Write to RDBMS + NoSQL
Handle partial failures (retry / compensate)

7. Validation Service

Responsibility:

Ensure data correctness

Key tasks:

Row count validation
Checksums
Record-level diff
Generate reports

8. Cutover Service

Responsibility:

Switch traffic from RDBMS → NoSQL

Key tasks:

Gradual traffic shift (0% → 100%)
Enable shadow reads
Monitor errors

9. Rollback Service

Responsibility:

Revert to old system if needed

Key tasks:

Redirect traffic back
Disable dual writes
Ensure consistency

10. Monitoring & Observability Service

Responsibility:

System visibility

Key tasks:

Track lag (CDC)
Migration progress
Error rates
Alerts

Interaction Flow (End-to-End)

Phase 1: Bulk Migration

Orchestrator → triggers Bulk Migration Service
Bulk Service:
- Reads from RDBMS
- Calls Schema Transformer
- Writes to NoSQL
Updates checkpoints in Metadata DB

Phase 2: Real-Time Sync (CDC)

CDC Service:
- Reads binlogs → pushes to Kafka
Stream Processor:
- Consumes events
- Transforms + writes to NoSQL
Offsets stored for recovery

Phase 3: Dual Writes

Application → Dual Write Proxy
Proxy:
- Writes to RDBMS
- Writes to NoSQL
Handles failures with retries / idempotency

Phase 4: Validation

Validation Service:
- Compares source vs target
- Generates reports
Orchestrator decides readiness

Phase 5: Cutover

Cutover Service:
- Gradually shifts reads to NoSQL
- Then writes fully to NoSQL
Shadow reads verify correctness

Phase 6: Rollback (if needed)

Rollback Service:
- Redirects traffic back to RDBMS
- Stops CDC / dual writes

High-Level Interaction Diagram (Textual)

Client
  ↓
Dual Write Proxy
  ↓              ↘
RDBMS         NoSQL

RDBMS → CDC Service → Kafka → Stream Processor → NoSQL

Bulk Migration Service → NoSQL

Validation Service ↔ RDBMS & NoSQL

Orchestrator → Controls all services

Key Design Insights (What interviewer looks for)

Decoupling via Kafka → avoids tight coupling between CDC and writes
Idempotent consumers → ensures no duplicate writes
Partitioning by event_id / ticket_id → preserves ordering
Dual write + CDC overlap → ensures zero data loss
Shadow reads before cutover → reduces risk

If you want to go next level (staff-level discussion), I can explain:

Exactly-once semantics in CDC pipeline
Handling seat booking consistency in Cassandra (LWT vs Redis locks)
Or failure scenarios (Kafka lag, partial writes, replay strategies) 🚀

flow diagram

Here’s a clean, interview-ready flow diagram you can draw on a whiteboard 👇

DB Migration Platform – Flow Diagram

                  +----------------------+
                  |   Migration          |
                  |   Orchestrator       |
                  +----------+-----------+
                             |
        -----------------------------------------------
        |                     |                      |
        ↓                     ↓                      ↓

+----------------+   +----------------+    +----------------+
| Bulk Migration |   |   CDC Service  |    |  Validation    |
|   Service      |   | (Binlog Reader)|    |   Service      |
+--------+-------+   +--------+-------+    +--------+-------+
         |                    |                     |
         ↓                    ↓                     |
   +------------+     +---------------+            |
   |  RDBMS     |---->|    Kafka      |------------+
   +------------+     +-------+-------+
                               |
                               ↓
                      +----------------------+
                      | Stream Processor     |
                      | (Replication Layer)  |
                      +----------+-----------+
                                 |
                                 ↓
                         +--------------+
                         |   NoSQL DB   |
                         +--------------+

------------------------------------------------------------

                 (Application Traffic Flow)

                     +------------------+
Client  -----------> | Dual Write Proxy |
                     +--------+---------+
                              |
                  -------------------------
                  |                       |
                  ↓                       ↓
             +---------+           +------------+
             |  RDBMS  |           |  NoSQL DB  |
             +---------+           +------------+

------------------------------------------------------------

                 (Cutover Phase)

Client  -------------------------------------> NoSQL DB
                     (RDBMS removed gradually)

How to Explain This in Interview (Quick Script)

Top layer → Orchestrator controls everything
Left path → Bulk migration moves historical data
Middle path → CDC + Kafka handles real-time updates
Right path → Validation ensures correctness
Bottom → Dual writes keep both DBs in sync
Final step → Cutover shifts traffic fully to NoSQL

Shashank’s Substack

Discussion about this post

Ready for more?