Design Image Hosting Service
Let's look into FR
Let’s look into FR
Image upload
Support single + batch uploads.
Accept common formats (JPEG, PNG, GIF, WebP, HEIC).
Enforce per-file and total size limits.
Support resumable/ chunked uploads for large files (e.g., multipart/resumable).
Acceptance: user can upload 1+ images, see progress, and resume if connection breaks.
Image storage & retrieval
Store original image and metadata.
Retrieve images by ID or by public URL.
Support content-type headers and correct caching headers.
Acceptance: GET /images/{id} returns image bytes and correct Content-Type.
Thumbnail / derivative generation
Automatically generate thumbnails & common sizes (small/medium/large).
Support on-demand resizing/format conversion.
Acceptance: Uploading image produces thumbnails accessible at predictable URLs.
Public/private access controls
Images can be public, unlisted (unguessable), or private.
Per-image ACLs (owner, collaborators, share links).
Acceptance: Private image returns 403 to unauthenticated users.
Signed/temporary URLs
Generate time-limited signed URLs for secure access/download.
Acceptance: Signed URL works until expiration, then returns 403.
Metadata extraction & storage
Extract EXIF, dimensions, color profile, upload timestamp, file size, MIME type, GPS (opt-in).
Store custom user-provided metadata (title, description, tags).
Acceptance: GET /images/{id}/metadata returns EXIF + user fields.
Search & listing
List images by user, tag, album, date, and metadata.
Full-text search on title/description/tags.
Pagination + sort (newest, most viewed).
Acceptance: /images?user={id}&tag=summer returns correct paginated set.
Delete / restore / permanent purge
Soft-delete with retention window, and permanent delete.
Option to restore within retention.
Acceptance: After delete, image returns 404 but can be restored within retention.
Rate limiting & quotas
Per-user upload/download rate limits and storage quotas.
Quota warning notifications when near limit.
Acceptance: Over-quota uploads are rejected with clear error.
Multi-tenant / user accounts
Users have isolated namespaces, account settings.
Support organizations/teams with scoped resources.
Acceptance: Team member with permission can upload to team album.
NFR
. Performance
Upload latency:
Single image < 2s for <10MB file (excluding network bandwidth).
Batch uploads should support parallelism and resumable uploads.
Retrieval latency:
CDN-served images should load in <200ms for cached requests, <1s for cold storage retrieval.
Thumbnail/derivative generation:
Must complete within 3s for small (<5MB) files; large files within 10s.
2. Scalability
Support millions of concurrent users and billions of stored images.
Storage should scale horizontally (e.g., S3/object store with lifecycle policies).
API endpoints must scale linearly with traffic (autoscaling + load balancing).
3. Availability & Reliability
Service uptime target: 99.9%+ SLA.
No single point of failure: redundant storage, CDN, and API servers.
Images must remain retrievable even if processing (thumbnails, metadata) fails.
Data durability: 11 nines (99.999999999%) durability for stored images.
4. Security
All uploads/downloads over HTTPS (TLS 1.2+).
Access control via OAuth2/JWT for user requests, presigned URLs for secure access.
Support encryption at rest (AES-256) and in transit.
Virus/malware scanning before making image publicly accessible.
Prevent direct object store exposure (all links signed or CDN-backed).
Estimates
Total images = A × I = 100,000 × 10 = 1,000,000 images
Total storage = 1,000,000 × 2 MB × (1 + 0.5) = 3,000,000 MB = 2,929 GB ≈ 2.93 TB
Monthly new uploads = A × U = 100,000 × 1 = 100,000 images → new storage = 100,000 × 2 × 1.5 = 300,000 MB = 293 GB/month
Views/month = 1,000,000 × 20 = 20,000,000 views
Bytes served = 20,000,000 × 0.3 MB = 6,000,000 MB = 5,859 GB ≈ 5.86 TB
Origin egress (10% misses) = 0.1 × 5.86 TB = 0.586 TB (≈ 600 GB)
Thumbnail CPU work = 100,000 uploads × 0.5s = 50,000 CPU-s/month → avg concurrency = 50,000 / 2,592,000 ≈ 0.019 CPU cores (i.e., tiny; bursts matter)Peak view RPS if uniform: Views/sec = 20,000,000 / 2,592,000 ≈ 7.7 requests/sec (very low). Peak-hour can be 10× average → ~~77 RPS.API’s
1. Auth APIs
POST /v1/auth/signup
Request
{
"username": "alice",
"email": "alice@example.com",
"password": "securePass123"
}
response
{
"userId": "u_12345",
"username": "alice",
"email": "alice@example.com",
"createdAt": "2025-09-17T10:20:00Z"
}
POST /v1/auth/login
Request
{
"email": "alice@example.com",
"password": "securePass123"
}
response
{
"accessToken": "jwt-token-here",
"refreshToken": "refresh-token-here",
"expiresIn": 3600
}
2. Upload & Image Management
POST /v1/uploads (initiate upload – presigned URL flow)
Request
{
"filename": "cat.png",
"size": 2048000,
"contentType": "image/png"
}
response
{
"uploadId": "upl_789",
"presignedUrl": "https://s3.example.com/upl_789?...",
"expiresIn": 900
}
POST /v1/images (finalize after upload)
Request
{
"uploadId": "upl_789",
"title": "My Cat",
"description": "Cute kitty picture",
"tags": ["cat", "pet", "cute"],
"visibility": "private"
}
response
{
"imageId": "img_001",
"url": "https://cdn.example.com/img_001/original.png",
"thumbnailUrl": "https://cdn.example.com/img_001/thumb.jpg",
"visibility": "private",
"createdAt": "2025-09-17T10:30:00Z"
}
GET /v1/images/{imageId} (get metadata)
Response (200)
{
"imageId": "img_001",
"ownerId": "u_12345",
"title": "My Cat",
"description": "Cute kitty picture",
"tags": ["cat", "pet", "cute"],
"size": 2048000,
"format": "png",
"width": 1024,
"height": 768,
"visibility": "private",
"createdAt": "2025-09-17T10:30:00Z",
"updatedAt": "2025-09-17T10:30:00Z"
}
GET /v1/images/{imageId}/file (actual file download or CDN redirect)
Response (302 Redirect)
Location: https://cdn.example.com/img_001/original.png
PATCH /v1/images/{imageId} (update metadata)
Request
{
"title": "My Cute Cat",
"visibility": "public",
"tags": ["cat", "pet", "adorable"]
}
Response (200)
{
"imageId": "img_001",
"title": "My Cute Cat",
"visibility": "public",
"tags": ["cat", "pet", "adorable"],
"updatedAt": "2025-09-17T11:00:00Z"
}
DELETE /v1/images/{imageId} (soft delete)
Response (200)
{
"imageId": "img_001",
"status": "deleted",
"deletedAt": "2025-09-17T11:10:00Z"
}
3. Listing & Search
GET /v1/users/{userId}/images?limit=10&page=1&tag=cat
Response (200)
{
"images": [
{
"imageId": "img_001",
"title": "My Cute Cat",
"thumbnailUrl": "https://cdn.example.com/img_001/thumb.jpg",
"visibility": "public",
"createdAt": "2025-09-17T10:30:00Z"
},
{
"imageId": "img_002",
"title": "Cat Sleeping",
"thumbnailUrl": "https://cdn.example.com/img_002/thumb.jpg",
"visibility": "private",
"createdAt": "2025-09-16T09:15:00Z"
}
],
"page": 1,
"limit": 10,
"total": 42
}
4. Sharing & Access
POST /v1/images/{imageId}/share (generate signed URL)
Request
{
"expiresIn": 3600,
"permissions": ["view"]
}
Response (200)
{
"shareUrl": "https://cdn.example.com/img_001?token=abc123&exp=1694952600",
"expiresAt": "2025-09-17T11:30:00Z"
}
Databases and their Schema
📂 Databases to be Used
1. Object Storage (for images)
Purpose: Store the actual image files (original + thumbnails).
Tech Choices: AWS S3, GCP Cloud Storage, Azure Blob, MinIO (self-hosted).
Data Model:
Images are stored as blobs with unique keys (e.g.,
user_id/image_id/version).Metadata (size, type, checksum, URL) goes into a relational/NoSQL DB.
2. Relational Database (for metadata & users)
Purpose: Store structured metadata like user accounts, image metadata, sharing permissions.
Tech Choices: PostgreSQL, MySQL (with read replicas).
Use Cases: Transactions, strong consistency (user <-> image mapping).
SQL Schema
-- Users table
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
plan VARCHAR(20) CHECK (plan IN ('free', 'premium')) DEFAULT 'free',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Images table
CREATE TABLE images (
image_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
filename VARCHAR(255) NOT NULL,
storage_key VARCHAR(500) NOT NULL, -- path in object store
mime_type VARCHAR(50) NOT NULL,
size_bytes BIGINT NOT NULL,
checksum VARCHAR(64),
visibility VARCHAR(20) CHECK (visibility IN ('private', 'public', 'shared')) DEFAULT 'private',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);
-- Image versions (original, thumbnail, compressed)
CREATE TABLE image_versions (
version_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
image_id UUID NOT NULL,
type VARCHAR(20) CHECK (type IN ('original', 'thumbnail', 'compressed')),
storage_key VARCHAR(500) NOT NULL,
width INT,
height INT,
size_bytes BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (image_id) REFERENCES images(image_id) ON DELETE CASCADE
);
-- Image shares (permissions & expiry)
CREATE TABLE image_shares (
share_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
image_id UUID NOT NULL,
shared_with VARCHAR(255), -- email or user_id
permissions VARCHAR(20) CHECK (permissions IN ('view', 'edit', 'download')) DEFAULT 'view',
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (image_id) REFERENCES images(image_id) ON DELETE CASCADE
);
-- Optional: Tags for images
CREATE TABLE image_tags (
tag_id SERIAL PRIMARY KEY,
image_id UUID NOT NULL,
tag VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (image_id) REFERENCES images(image_id) ON DELETE CASCADE
);
-- Indexes for performance
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_images_user_id ON images(user_id);
CREATE INDEX idx_images_visibility ON images(visibility);
CREATE INDEX idx_versions_image_id ON image_versions(image_id);
CREATE INDEX idx_shares_image_id ON image_shares(image_id);
CREATE INDEX idx_tags_image_id ON image_tags(image_id);
CREATE INDEX idx_tags_tag ON image_tags(tag);
3. NoSQL Database (optional, for performance)
Purpose: Fast reads for image metadata, feeds, search.
Tech Choices: DynamoDB, MongoDB, Cassandra.
Use Cases: Store pre-computed metadata for quick retrieval.
Example: MongoDB collection for
images_metadata_cache:
{
"image_id": "uuid",
"user_id": "uuid",
"url": "https://cdn.example.com/uuid",
"tags": ["sunset", "beach"],
"likes_count": 123,
"views_count": 456
}
4. Search Index (for tags & search)
Purpose: Enable search by tags, filename, metadata.
Tech Choices: Elasticsearch, OpenSearch, Solr.
Schema (Index Mapping Example):
{
"image_id": "uuid",
"user_id": "uuid",
"filename": "beach.png",
"tags": ["beach", "sunset", "vacation"],
"created_at": "2025-09-17T10:00:00Z"
}
✅ Summary
Object Store → Images (raw + derived).
RDBMS → Users, metadata, relationships.
NoSQL (optional) → High-volume metadata + quick lookups.
Search Index → Full-text & tag-based queries.
🏗 Microservices for Image Hosting Service
1. API Gateway
Entry point for all clients (mobile, web).
Handles authentication, rate limiting, routing.
Routes requests to appropriate microservices.
2. Auth Service
Manages user registration, login, tokens (OAuth2/JWT).
Validates identity before allowing uploads/downloads.
3. User Service
Manages user profiles, plans (free/premium), quotas.
Keeps track of storage usage & bandwidth consumption.
4. Image Upload Service
Handles upload requests (streaming upload to object storage like S3/Blob).
Stores image metadata in metadata DB.
Publishes an event to a queue (Kafka/SQS) for processing (thumbnails, compression).
5. Image Processing Service
Consumes events from the upload service.
Generates thumbnails, compressed formats, WebP/AVIF variants.
Updates metadata DB with version info.
6. Image Metadata Service
CRUD operations on image metadata (title, tags, visibility, shares).
Handles search queries (delegates to Search Service).
Provides APIs to fetch image details for UI.
7. Search Service
Uses Elasticsearch/OpenSearch for indexing metadata & tags.
Supports search by filename, tags, date.
8. Share/Link Service
Manages shareable links, permissions (view/edit/download), and expiry.
Issues signed URLs for secure access via CDN.
9. CDN/Delivery Service
Delivers images globally with low latency.
Protects origin servers (object store).
Works with signed URLs for restricted content.
10. Notification Service (optional)
Sends email/notifications when an image is shared, storage is about to exceed, etc.
11. Analytics/Reporting Service
Tracks views, downloads, storage usage.
Provides usage data for billing or dashboards.
12. Admin Service
For platform administrators:
Manage abusive content (DMCA takedowns).
Monitor quotas, service health.
🔄 Interactions (Workflow Examples)
🖼️ Image Upload Flow
Client → API Gateway → Auth validation.
Gateway → Image Upload Service.
Upload Service stores file in Object Storage.
Upload Service → writes metadata into Metadata DB.
Upload Service → sends event to Kafka/SQS.
Image Processing Service consumes event → generates thumbnails & compressed versions → stores them → updates metadata.
Metadata Service updates search index via Search Service.
Client can now fetch processed image via CDN (signed URL).
🔍 Image Search Flow
Client → API Gateway → Metadata Service.
Metadata Service → queries Search Service (Elasticsearch).
Search Service → returns matching image IDs.
Metadata Service → fetches metadata from DB.
Client receives results with CDN URLs.
📤 Image Sharing Flow
Client → API Gateway → Share Service.
Share Service → generates signed URL with expiry & permissions.
Client shares the link.
Viewer → API Gateway → CDN (signed URL validation) → Image delivered.
[ Client ]
|
[ API Gateway ]
|----> [ Auth Service ]
|----> [ User Service ]
|----> [ Image Upload Service ] ---> [ Object Storage ]
| |
| V
|----> [ Metadata Service ] ----> [ Metadata DB ]
| | |
| | V
| +--> [ Search Service ] (Elasticsearch)
|
|----> [ Share Service ] ---> [ Signed URLs/CDN ]
|
|----> [ Analytics Service ]
HLD


