Design distributed lock in multipod environment
The distributed lock system prioritizes strong consistency, fault tolerance, and safety over availability, ensuring no two pods can hold the same lock even under failures or network partitions
FR
1.A client (pod/service instance) must be able to request a lock for a given resource identifier.
2.The system must ensure mutual exclusion:
At any point in time, only one pod holds the lock for a resource.
3.Lock acquisition should support:
Blocking acquisition (wait until lock is available)
Non-blocking acquisition (fail immediately if lock is held)
4.A client holding a lock must be able to explicitly release it and only the current lock owner is allowed to release the lock.
5.Releasing a lock should immediately make it available to other waiting pods.
6.Locks must be time-bound (lease-based), basically it should have TTL
7.Each lock must be associated with:
lockId/resourceKeyownerId(pod ID / service instance ID / unique token)
8.A lock holder must be able to renew/extend the lock lease before TTL expiry
9.The system may support FIFO ordering and Lock requests are granted in the order they arrive
10.The system may support re-entrant locks:
Same pod acquiring the same lock multiple times.
11.Idempotency
Repeated requests with the same lock token should:
Not create duplicate locks
Return consistent responses
NFR
1.The system must guarantee strong consistency for lock state:
At most one owner per lock at any time
2.Lock acquisition and release must be atomic operations.
3.Lock service should be highly available:
Survive pod restarts and node failures
4.System should be fault tolerant , incase of partition locks must not be granted if ownership cannot be safely determined
5.Lock operations should be low latency to acquire / release / renew in single-digit milliseconds under normal load.
6.System should scale horizontally by increasing number of pods and increasing number of locks
7.System should be highly durable to persist lock metadata, even if survive node crashes.
Below are clean, production-grade APIs for a Distributed Lock Service in a multi-pod (K8s) environment, written at HLD level and aligned with the FR + NFRs we defined.
This is the level interviewers expect: clear contracts, idempotency, ownership safety, TTL, and failure handling.
Distributed Lock Service – API Design
Design Principles
RESTful APIs (easy to reason about in interviews)
Idempotent operations
Lease-based locking
Owner validation
Safe retries
Common Concepts
Identifiers
lockKey– Unique resource identifier
Example:inventory:sku:123ownerId– Unique pod/service instance ID
Example:order-service-pod-7f9clockToken– Server-generated unique token returned on acquire
Used for ownership validation
1. Acquire Lock
Endpoint
POST /v1/locks/{lockKey}/acquire
Request Body
{
"ownerId": "order-service-pod-7f9c",
"ttlMillis": 30000,
"wait": false,
"requestId": "req-12345"
}
Request Fields
| Field | Description |
| ----------- | -------------------------------------- |
| `ownerId` | Unique identifier of the caller |
| `ttlMillis` | Lease duration |
| `wait` | If true, block until lock is available |
| `requestId` | Idempotency key |
Success Response (200)
{
"lockKey": "inventory:sku:123",
"lockToken": "f2a9e91c-cc1b-4c1e",
"ownerId": "order-service-pod-7f9c",
"expiresAt": 1700000000000
}
Failure Response (409 – Lock Held)
{
"error": "LOCK_ALREADY_HELD",
"currentOwner": "payment-service-pod-23a",
"retryAfterMillis": 12000
}
2. Release Lock
Endpoint
POST /v1/locks/{lockKey}/release
Request Body
{
"lockToken": "f2a9e91c-cc1b-4c1e",
"ownerId": "order-service-pod-7f9c"
}
Success Response (200)
{
"status": "RELEASED",
"lockKey": "inventory:sku:123"
}
Failure Response (403 – Ownership Violation)
{
"error": "NOT_LOCK_OWNER"
}
3. Renew / Extend Lock (Heartbeat)
Endpoint
POST /v1/locks/{lockKey}/renew
Request Body
{
"lockToken": "f2a9e91c-cc1b-4c1e",
"ownerId": "order-service-pod-7f9c",
"ttlMillis": 30000
}
Success Response (200)
{
"lockKey": "inventory:sku:123",
"expiresAt": 1700000030000
}
Failure Response (409 – Lock Expired)
{
"error": "LOCK_EXPIRED"
}
4. Get Lock Status
Endpoint
GET /v1/locks/{lockKey}
Success Response (200)
{
"lockKey": "inventory:sku:123",
"locked": true,
"ownerId": "order-service-pod-7f9c",
"expiresAt": 1700000030000
}
If Lock Does Not Exist (404)
{
"locked": false
}
5. Force Unlock (Admin / Recovery Only)
⚠️ Restricted API – Used for operational recovery.
Endpoint
POST /v1/locks/{lockKey}/force-release
Request Body
{
"reason": "stuck lock after pod crash"
}
Success Response
{
"status": "FORCE_RELEASED"
}
6. List Locks (Optional – Debugging)
Endpoint
GET /v1/locks?namespace=inventory-service
Response
{
"locks": [
{
"lockKey": "inventory:sku:123",
"ownerId": "order-service-pod-7f9c",
"expiresAt": 1700000030000
}
]
}
HTTP Status Code Usage
| Code | Meaning |
| ---- | ------------------- |
| 200 | Success |
| 409 | Lock conflict |
| 403 | Ownership violation |
| 404 | Lock not found |
| 500 | Internal error |
Idempotency Behavior
requestIdensures:Duplicate acquire calls return same lockToken
Safe retries during network failures
Example Acquire Flow (Non-Blocking)
Pod A → Acquire lock
Lock granted → token returned
Pod B → Acquire same lock
Receives 409 (lock held)
Pod A releases → Pod B retries
Databases and Schema
Primary (Recommended): Key-Value store (Redis / etcd style)
Alternative: Relational DB (Postgres/MySQL)
You can pick one in interviews and briefly mention the other.
Option 1️⃣: Key-Value Store Schema (Redis / etcd)
Key Pattern
lock:{namespace}:{lock_key}
Example Key
lock:inventory-service:sku-123
Value (JSON / Protobuf equivalent)
{
"lockKey": "inventory:sku:123",
"ownerId": "order-service-pod-7f9c",
"lockToken": "f2a9e91c-cc1b-4c1e",
"ttlMillis": 30000,
"expiresAt": 1700000030000,
"createdAt": 1700000000000,
"updatedAt": 1700000010000,
"reentrancyCount": 1
}
Redis Commands (Atomic)
Acquire Lock (NX + TTL)
SET lock:inventory-service:sku-123 "<value>" NX PX 30000
Renew Lock (Lua Script)
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("PEXPIRE", KEYS[1], ARGV[2])
else
return 0
end
Release Lock (Lua Script)
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
Why This Works
TTL guarantees auto-expiry
NX ensures mutual exclusion
Lua ensures atomicity
No clock dependency on clients
Option 2️⃣: Relational Database Schema (Postgres / MySQL)
Used when KV store is not allowed (rare, but interview-safe fallback).
Table: distributed_locks
CREATE TABLE distributed_locks (
lock_key VARCHAR(255) PRIMARY KEY,
namespace VARCHAR(100) NOT NULL,
owner_id VARCHAR(255) NOT NULL,
lock_token VARCHAR(255) NOT NULL,
expires_at BIGINT NOT NULL,
created_at BIGINT NOT NULL,
updated_at BIGINT NOT NULL,
reentrancy_count INT DEFAULT 1
);
Indexes
CREATE INDEX idx_expires_at ON distributed_locks (expires_at);
CREATE INDEX idx_namespace ON distributed_locks (namespace);
Acquire Lock (Atomic Insert)
INSERT INTO distributed_locks (
lock_key,
namespace,
owner_id,
lock_token,
expires_at,
created_at,
updated_at
)
VALUES (?, ?, ?, ?, ?, ?, ?)
ON CONFLICT (lock_key) DO NOTHING;
Release Lock
DELETE FROM distributed_locks
WHERE lock_key = ?
AND lock_token = ?;
Renew Lock
UPDATE distributed_locks
SET expires_at = ?, updated_at = ?
WHERE lock_key = ?
AND lock_token = ?;
Cleanup Expired Locks (Background Job)
DELETE FROM distributed_locks
WHERE expires_at < ?;
Downsides (Mention in Interview)
Needs cleanup job
Higher latency
Harder to scale
Risky under partitions
Option 3️⃣: etcd / ZooKeeper (Conceptual Schema)
Path
/locks/{namespace}/{lockKey}
Value
{
"ownerId": "order-service-pod-7f9c",
"lockToken": "uuid",
"leaseId": "etcd-lease-id"
}
Lease expiration == TTL
Strong consistency via Raft
Ideal for leader election
“I’d use Redis or etcd as the primary lock store because TTL + atomic operations guarantee correctness with minimal latency. SQL is only a fallback.”
Microservices Architecture – Distributed Lock System
High-Level View
Client Pods
|
v
Lock Client Library
|
v
API Gateway / Service Mesh
|
v
Lock Management Service
|
+--> Lock Store (Redis / etcd)
|
+--> Metadata Store (optional)
|
+--> Metrics / Logs
1. Lock Client Library (SDK)
Responsibility
Used by application pods.
Abstracts retry logic, idempotency, backoff.
Sends heartbeats (renew calls).
Handles token storage safely.
Why Needed
Prevents every service from re-implementing lock logic.
Standardizes retries and TTL renewal.
Interaction
App Pod → Lock Client → Lock Service APIs
2. API Gateway / Service Mesh (Optional)
Responsibility
Authentication / authorization
Rate limiting
Routing
TLS termination
Examples
Istio / Linkerd
NGINX / Envoy
Interaction
Client → Gateway → Lock Service
3. Lock Management Service (Core Service)
Responsibility
Exposes REST APIs:
Acquire
Release
Renew
Get status
Validates ownership using
lockTokenEnforces TTL rules
Ensures idempotency
Performs atomic operations against lock store
Key Properties
Stateless
Horizontally scalable
Multiple replicas behind a load balancer
4. Lock Store (Strongly Consistent Store)
Options
Redis (Primary choice)
etcd (Best for infra / leader election)
ZooKeeper (legacy but valid)
Responsibility
Single source of truth for locks
Atomic operations
TTL enforcement
Interaction
Lock Service → Lock Store
5. Metadata Store (Optional)
Responsibility
Store historical data:
Lock acquisition logs
Forced unlocks
Debug info
Not in critical path
Example
PostgreSQL / Elasticsearch
6. Monitoring & Observability Stack
Responsibility
Metrics:
Lock acquisition latency
Failures
Expired locks
Logs:
Owner changes
Forced releases
Alerts:
High contention
Stuck locks
Tools
Prometheus
Grafana
ELK
Interaction Flows (Sequence-Level)
1️⃣ Acquire Lock Flow
Pod A
|
| Acquire(lockKey)
v
Lock Client
|
v
Lock Service
|
| SET lockKey NX PX ttl
v
Lock Store
|
| Success
v
Lock Service → Lock Client → Pod A
2️⃣ Lock Conflict Flow
Pod B
|
| Acquire(lockKey)
v
Lock Service
|
| SET NX fails
v
Lock Store
|
| Conflict
v
409 LOCK_ALREADY_HELD
3️⃣ Heartbeat / Renew Flow
Pod A
|
| Renew(lockKey)
v
Lock Service
|
| Validate token
| Extend TTL
v
Lock Store
4️⃣ Pod Crash Scenario
Pod A crashes
|
| No renew
v
TTL expires in Lock Store
|
v
Lock becomes available
5️⃣ Release Lock Flow
Pod A
|
| Release(lockKey)
v
Lock Service
|
| Validate token
| Delete key
v
Lock Store
Failure Handling Interactions
| Failure | System Behavior |
| ------------------ | --------------------------- |
| Pod crash | TTL expiry releases lock |
| Lock Service crash | Stateless → retry safe |
| Redis node failure | Failover via replication |
| Network partition | Fail closed (no lock grant) |
Why This Design Works (Interview Gold)
Stateless lock service → easy scaling
Centralized lock store → single source of truth
Lease-based locking → no deadlocks
Client SDK → fewer bugs, safer retries
Consistency first → no split-brain
“Clients interact via a lightweight SDK with a stateless Lock Service, which performs atomic lease-based operations against a strongly consistent lock store like Redis or etcd.”
Want to Go Deeper?
Next logical steps:
Full sequence diagram
Redis Lua scripts deep dive
Leader election vs mutex locks
Hot-key mitigation & sharding
CAP tradeoff justification
Some of real world use case
1️⃣ Database Migration Lock (Very Common)
Problem
You have multiple pods of the same service.
On startup, each pod tries to run:
ALTER TABLE / schema migration
If two pods do this together → data corruption 💥
Distributed Lock Solution
All pods try to acquire:
lockKey = "db-migration:user-service"
Only one pod gets the lock.
That pod runs migration.
Others wait or skip.
Why Distributed Lock?
Pods are stateless
Leader election via lock
Prevents double migration
Interview One-Liner
“We use a distributed lock to ensure only one pod performs schema migration during deployment.”
2️⃣ Cron Job Deduplication Across Pods
Problem
You deploy a cron job as a Kubernetes CronJob or scheduled task.
But:
Service has N replicas
Each pod fires the same cron logic
Result:
Same job runs N times ❌
Distributed Lock Solution
lockKey = "cron:daily-report"
Flow:
All pods try to acquire lock at 12:00 AM
One pod succeeds
Others exit immediately
Real Companies
Payment settlements
Daily analytics aggregation
Cache warmup jobs
3️⃣ Leader Election (Classic Distributed Systems Use Case)
Problem
You want one leader pod to:
Consume Kafka partitions
Push config updates
Manage cluster metadata
Distributed Lock Solution
lockKey = "leader:election:order-service"
Pod holding lock = leader
TTL-based lock
If leader dies → lock expires → new leader elected
Where Used
Kafka consumer group coordinators
Controller services
Scheduler services
Interview Gold Line
“Leader election is implemented using a lease-based distributed lock.”
4️⃣ Inventory Reservation (E-commerce)
Problem
Two users try to buy the last item simultaneously.
Without lock:
Inventory goes negative ❌
Distributed Lock Solution
lockKey = "inventory:sku:iphone15"
Flow:
Pod A acquires lock
Checks stock
Reserves item
Releases lock
Why DB Lock Is Not Enough
Microservices
Multiple DBs
External services involved
5️⃣ Payment Deduplication / Exactly-Once Processing
Problem
Payment gateway retries webhook multiple times.
Multiple pods may process same payment.
Distributed Lock Solution
lockKey = "payment:txn:98765"
First pod processes payment
Others fail fast
Ensures idempotency
6️⃣ Cache Rebuild / Cache Stampede Prevention
Problem
Cache expires.
100 pods try to rebuild it simultaneously → thundering herd 🐘
Distributed Lock Solution
lockKey = "cache:rebuild:product-list"
One pod rebuilds cache
Others serve stale data or wait
Real Usage
Recommendation systems
Search indexes
Pricing engines
7️⃣ File Processing / Batch Jobs
Problem
Multiple workers read same file from S3 / GCS.
Distributed Lock Solution
lockKey = "file:process:2026-01-05.csv"
One pod processes file
Prevents duplicate uploads / processing
Mapping to Your Design (Interview Ready)
| Component | Real World Role |
| --------- | ----------------- |
| Lock Key | Business resource |
| Owner ID | Pod / instance |
| TTL | Crash safety |
| Renew | Long task |
| Release | Normal completion |
One-Line Summary (Perfect Interview Answer)
“Distributed locks are used to coordinate critical operations like leader election, cron jobs, database migrations, inventory updates, and payment deduplication across multiple pods.”
why do we need leader election in this case?
This is a very important question, and interviewers ask it to check whether you understand why distributed locks exist at all, not just how to build them.
Let’s answer it logically + with a real-world mindset, not theory-heavy.
Short Answer (What Interviewers Want)
We need leader election so that exactly one pod is responsible for performing a critical, cluster-wide task, while others stay passive and take over automatically if the leader fails.
Now the Real Explanation (Step-by-Step)
Problem in a Multi-Pod Environment
In Kubernetes:
Pods are stateless
Pods can be:
Restarted
Rescheduled
Scaled up/down
You cannot rely on “pod A is special”
Yet some tasks must be done by only one instance.
What Goes Wrong Without Leader Election?
Example: Cron Job / Scheduler
You deploy:
order-service (5 pods)
Each pod has this logic:
@Scheduled(cron = "0 0 * * *")
generateDailyReport();
Without Leader Election ❌
All 5 pods run the job
Report generated 5 times
Emails sent 5 times
Data duplicated or corrupted
Why Not Just “Use a Distributed Lock Once”?
Good question.
Distributed Lock Alone
Works per operation
Each task tries to acquire a lock every time
Leader Election
Chooses one long-lived owner
Leader handles all such tasks
Other pods don’t even try
Leader election is basically a long-lived distributed lock with automatic failover.
When Do We Need Leader Election?
1️⃣ Repeated / Continuous Responsibilities
Examples:
Schedulers
Kafka consumers
Queue pollers
Rebalancers
Health monitors
Instead of:
Acquire lock → Do task → Release lock
We do:
Elect leader → Leader keeps running tasks
2️⃣ Avoiding Lock Contention
Without leader election:
Every pod:
Tries to acquire lock
Fails
Retries
This causes:
Extra load
Redis hot keys
Unnecessary network traffic
Leader election:
Only leader performs work
Others stay idle
3️⃣ Faster Failover
If leader crashes:
Lock TTL expires
New leader elected automatically
No manual intervention.
Real World Analogy 🏢
Think of a company:
10 managers
Only 1 payroll admin
Payroll admin:
Processes salaries
Others are backups
If admin leaves:
Another manager takes over
That’s leader election.
Leader Election vs Per-Task Lock
| Aspect | Distributed Lock | Leader Election |
| ---------- | ---------------- | ---------------------- |
| Scope | Single operation | Long-running role |
| Duration | Short-lived | Long-lived |
| Retry load | High | Low |
| Use cases | Inventory update | Scheduler, coordinator |
Concrete Example (Leader Election Using Lock)
lockKey = "leader:order-service"
TTL = 30s
Pod A acquires lock → becomes leader
Pod A renews lock every 10s
Pod A dies → TTL expires
Pod B acquires lock → new leader
Interview-Perfect One-Liner
“Leader election is needed so that exactly one pod coordinates cluster-wide responsibilities, preventing duplicate execution while enabling automatic failover.”
Bonus: When NOT to Use Leader Election
Simple request-level mutual exclusion
One-off critical sections
Low contention operations

