Design Remote File Sync Service
Remote File Sync Service with focus on delta sync, conflict detection, and versioning:
FR
1.Users can upload, download, update, and delete files/folders.
2.Support for multiple devices per user.
3.Service should sync files across devices automatically.
4.Efficient delta sync: transfer only modified parts of files instead of full files.
5.Change Detection & Versioning ,Detect changes (modifications, deletions, renames, moves) in files/folders.Maintain version history of files.
6.Detect conflicts when two users/devices modify the same file concurrently.
7.Delta Sync Handling ,Break large files into blocks/chunks.
NFR
Low Latency Sync: File changes should propagate across devices within a few seconds.
Efficient Delta Transfer: Minimize bandwidth by syncing only changed chunks/blocks.
Resumable Transfers: Handle network interruptions gracefully.
Scalable Versioning: Efficiently store and retrieve multiple file versions without slowing down access
System should be highly scalable to handle millions of users with billions of files.
System should be Highly Available service (≥ 99.9%).
Eventual consistency across devices (updates should converge).
End-to-end encryption:
At rest (AES-256 or similar).
In transit (TLS).
Back of envelope Calculation
📝 Assumptions
Users: 100M active users
Average files per user: 1,000 files (docs, images, small binaries)
Average file size: 1 MB
Total data per user: ~1 GB
Delta change rate: 5% of files change per day
Delta size per file change: 100 KB (only part of file changes, not full file)🔹 Storage Estimation
Raw data
100M users × 1 GB/user = 100 PB
Versioning overhead
Assume 10 versions, with delta compression (30% of original size)
Overhead ≈ 100 PB × 30% = 30 PB
Metadata overhead
Store metadata per file: 1 KB/file × 1000 files × 100M users = 100 TB
👉 Total storage ≈ 130 PB + 100 TB (metadata)
🔹 Daily Sync Traffic
Files changed per day
100M users × 1,000 files × 5% = 5B file changes/day
Delta per change
100 KB average delta → 5B × 100 KB = 500 TB/day sync traffic
Peak traffic (assume 10% users active at same time)
~50 TB/hour = ~14 GB/s sustained throughput globallyAPI’s
For a Remote File Sync Service with delta sync, conflict detection, and versioning, the APIs should cover file operations, sync metadata, versioning, and conflict handling.
Here’s a clean API design (REST-style; gRPC would work too):
📌 APIs for Remote File Sync Service
1. Authentication & User Management
POST /auth/login
→ Authenticate user and issue token (OAuth2/JWT).POST /auth/register
→ Create new user account.
2. File Operations
POST /files
→ Upload a new file.{ "path": "/docs/report.docx", "content": "<binary or chunked upload>", "metadata": { "checksum": "abc123", "size": 12345 } }GET /files/{fileId}
→ Download latest version of a file.PATCH /files/{fileId}
→ Update file using delta upload.
{
"delta": "<binary patch>",
"baseVersion": "v12"
}
DELETE /files/{fileId}
→ Delete a file (soft delete for recovery).
3. Folder/Listing
GET /folders/{folderId}/files
→ List files & metadata in a folder.POST /folders
→ Create a new folder.
4. Delta Sync APIs
POST /files/{fileId}/delta/checksum
→ Client sends block checksums to detect which parts changed.{ "checksums": ["block1hash", "block2hash", ...] }Response → Which blocks need upload.
POST /files/{fileId}/delta/upload
→ Upload only changed blocks.GET /files/{fileId}/delta/download
→ Get changed blocks since last sync.
5. Conflict Detection & Resolution
GET /files/{fileId}/conflicts
→ Get conflicts for a file.POST /files/{fileId}/conflicts/resolve
{
"resolution": "keep_both", // or "last_writer_wins", "manual"
"preferredVersion": "v14"
}
6. Versioning
GET /files/{fileId}/versions
→ List all versions of a file.GET /files/{fileId}/versions/{versionId}
→ Download specific version.POST /files/{fileId}/versions/{versionId}/restore
→ Restore file to a previous version.
7. Sync Status & Metadata
GET /sync/status
→ Current sync state for user/device.POST /sync/heartbeat
→ Device sends heartbeat (active, last synced time).GET /sync/changes?since={timestamp}
→ Get list of changed files since last sync.
8. Sharing (Optional)
POST /files/{fileId}/share
{
"userId": "user456",
"permission": "read" // or "write"
}
GET /files/shared-with-me
→ Files/folders shared with user.
✅ API Design Covers:
Delta sync → checksum + block-level APIs.
Conflict detection → list + resolve endpoints.
Versioning → list, restore, and retrieve old versions.
Sync orchestration → heartbeat + changes since timestamp.
Databases
File content itself would go to an object store (like S3/GCS/HDFS), and DB stores only metadata & pointers.
📌 Database Choice
Relational DB (Postgres/MySQL/Spanner) → For metadata, versions, conflicts, sharing.
Object Store (S3/GCS/MinIO/HDFS) → For file blocks, deltas, versions.
Optional NoSQL (Cassandra/DynamoDB) → If ultra-scale metadata ops needed.
Schema (SQL)
-- Users
CREATE TABLE users (
user_id BIGINT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Devices
CREATE TABLE devices (
device_id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
device_name VARCHAR(255),
last_seen TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
-- Files (logical representation, not actual storage)
CREATE TABLE files (
file_id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
folder_id BIGINT,
file_name VARCHAR(255) NOT NULL,
current_version_id BIGINT,
size BIGINT,
checksum VARCHAR(64),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
deleted BOOLEAN DEFAULT FALSE,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
-- Folders
CREATE TABLE folders (
folder_id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
parent_folder_id BIGINT,
folder_name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (parent_folder_id) REFERENCES folders(folder_id)
);
-- File Versions
CREATE TABLE file_versions (
version_id BIGINT PRIMARY KEY AUTO_INCREMENT,
file_id BIGINT NOT NULL,
version_number INT NOT NULL,
storage_path VARCHAR(1024) NOT NULL, -- pointer to object store
delta_path VARCHAR(1024), -- if delta stored separately
size BIGINT,
checksum VARCHAR(64),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by_device BIGINT,
FOREIGN KEY (file_id) REFERENCES files(file_id),
FOREIGN KEY (created_by_device) REFERENCES devices(device_id)
);
-- Conflicts
CREATE TABLE file_conflicts (
conflict_id BIGINT PRIMARY KEY AUTO_INCREMENT,
file_id BIGINT NOT NULL,
version_a BIGINT NOT NULL,
version_b BIGINT NOT NULL,
resolution_status ENUM('PENDING','RESOLVED') DEFAULT 'PENDING',
resolved_version BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (file_id) REFERENCES files(file_id),
FOREIGN KEY (version_a) REFERENCES file_versions(version_id),
FOREIGN KEY (version_b) REFERENCES file_versions(version_id),
FOREIGN KEY (resolved_version) REFERENCES file_versions(version_id)
);
-- Sharing
CREATE TABLE file_shares (
share_id BIGINT PRIMARY KEY AUTO_INCREMENT,
file_id BIGINT NOT NULL,
shared_with BIGINT NOT NULL, -- user_id
permission ENUM('READ','WRITE') NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (file_id) REFERENCES files(file_id),
FOREIGN KEY (shared_with) REFERENCES users(user_id)
);
-- Sync Changes (to track deltas & propagate)
CREATE TABLE sync_changes (
change_id BIGINT PRIMARY KEY AUTO_INCREMENT,
file_id BIGINT NOT NULL,
version_id BIGINT NOT NULL,
device_id BIGINT NOT NULL,
change_type ENUM('CREATE','UPDATE','DELETE','RENAME') NOT NULL,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (file_id) REFERENCES files(file_id),
FOREIGN KEY (version_id) REFERENCES file_versions(version_id),
FOREIGN KEY (device_id) REFERENCES devices(device_id)
);
How this ties to FR/NFR
Delta sync →
file_versions.delta_pathpoints to changed blocks.Conflict detection →
file_conflictstable holds unresolved conflicts.Versioning →
file_versionsmaintains all versions, linked tofiles.current_version_id.Sync propagation →
sync_changestracks changes per device.
HLD
📌 Microservices
1. Auth Service
Handles user authentication (JWT/OAuth2).
Manages user accounts & device registration.
APIs:/login,/register,/devices
2. Metadata Service
Stores file/folder metadata, versions, and sync state.
Tracks file ownership, checksums, timestamps, and current version.
Responsible for conflict detection (detects concurrent modifications).
DB: Relational DB (Postgres/Spanner) or NoSQL (Cassandra/DynamoDB).
APIs:/files,/folders,/versions,/conflicts,/sync/changes
3. Storage Service
Handles actual file/delta block storage.
Writes to object store (S3, GCS, MinIO, HDFS).
Provides APIs for upload/download, chunked transfer, resumable upload.
APIs:/upload,/download,/delta/upload,/delta/download
4. Delta Sync Service
Calculates block-level checksums and identifies changed blocks.
Works with Storage Service to transfer only deltas.
Ensures efficient network utilization.
APIs:/delta/checksum,/delta/apply
5. Conflict Resolution Service
Detects conflicts (using Metadata Service logs).
Provides strategies: last-writer-wins, keep both, manual merge.
Integrates with Versioning Service to store alternate versions.
APIs:/conflicts,/conflicts/resolve
6. Versioning Service
Manages file versions.
Supports rollback to older versions.
Works with Storage Service to fetch old versions.
APIs:/versions,/versions/restore
7. Sync Orchestrator Service
Manages device sync states (heartbeats, last synced timestamp).
Pushes change notifications (via WebSockets, Kafka, or gRPC streaming).
Pull model fallback → devices ask for changes since last sync.
APIs:/sync/status,/sync/heartbeat,/sync/changes
8. Notification Service
Sends alerts about sync completion, conflicts, or version restores.
Channels: email, push notifications, WebSocket.
APIs:/notify
9. Sharing & Access Control Service (optional)
Manages file/folder sharing with other users.
Enforces ACLs/permissions.
APIs:/share,/permissions
📌 Interaction Flow (Example)
🔹 File Update (Delta Sync + Versioning + Conflict Detection)
Client → Delta Sync Service: send file block checksums.
Delta Sync Service → Client: return changed block list.
Client → Storage Service: upload only changed blocks.
Storage Service → Metadata Service: update new version record.
Metadata Service → Conflict Resolution Service: check if another device updated same version concurrently.
If no conflict → update current version pointer.
If conflict → log entry in
file_conflicts, notify user.
Sync Orchestrator Service: pushes "file updated" event to all user devices.
🔹 Conflict Resolution
Conflict Resolution Service detects conflicting versions.
User chooses resolution (keep both / last writer wins / manual merge).
Service updates
file_conflictstable + updatesfile_versions.Notifies devices via Sync Orchestrator.
🔹 Version Restore
User requests rollback → Versioning Service.
Versioning Service fetches old version from Storage Service.
Metadata Service updates current version pointer.
Sync Orchestrator notifies all devices.
📌 Communication Pattern
Sync-critical services (Metadata, Delta Sync, Conflict, Versioning) → synchronous REST/gRPC.
Event notifications (sync changes, conflict alerts) → asynchronous via Kafka / PubSub.
Storage → object store APIs (chunked uploads).
✅ This setup ensures:
Delta sync handled efficiently.
Conflicts resolved cleanly.
Versioning reliable & user-friendly.
Scalable (storage + metadata + async events).
HLD


