Design distributed lock in multipod environment

The distributed lock system prioritizes strong consistency, fault tolerance, and safety over availability, ensuring no two pods can hold the same lock even under failures or network partitions

Shashank Mishra

Jan 05, 2026

1.A client (pod/service instance) must be able to request a lock for a given resource identifier.

2.The system must ensure mutual exclusion:

At any point in time, only one pod holds the lock for a resource.

3.Lock acquisition should support:

Blocking acquisition (wait until lock is available)
Non-blocking acquisition (fail immediately if lock is held)

4.A client holding a lock must be able to explicitly release it and only the current lock owner is allowed to release the lock.

5.Releasing a lock should immediately make it available to other waiting pods.

6.Locks must be time-bound (lease-based), basically it should have TTL

7.Each lock must be associated with:

lockId / resourceKey
ownerId (pod ID / service instance ID / unique token)

8.A lock holder must be able to renew/extend the lock lease before TTL expiry

9.The system may support FIFO ordering and Lock requests are granted in the order they arrive

10.The system may support re-entrant locks:

Same pod acquiring the same lock multiple times.

11.Idempotency

Repeated requests with the same lock token should:
- Not create duplicate locks
- Return consistent responses

NFR

1.The system must guarantee strong consistency for lock state:

At most one owner per lock at any time

2.Lock acquisition and release must be atomic operations.

3.Lock service should be highly available:

Survive pod restarts and node failures

4.System should be fault tolerant , incase of partition locks must not be granted if ownership cannot be safely determined

5.Lock operations should be low latency to acquire / release / renew in single-digit milliseconds under normal load.

6.System should scale horizontally by increasing number of pods and increasing number of locks

7.System should be highly durable to persist lock metadata, even if survive node crashes.

Below are clean, production-grade APIs for a Distributed Lock Service in a multi-pod (K8s) environment, written at HLD level and aligned with the FR + NFRs we defined.

This is the level interviewers expect: clear contracts, idempotency, ownership safety, TTL, and failure handling.

Distributed Lock Service – API Design

Design Principles

RESTful APIs (easy to reason about in interviews)
Idempotent operations
Lease-based locking
Owner validation
Safe retries

Common Concepts

Identifiers

lockKey – Unique resource identifier
Example: inventory:sku:123
ownerId – Unique pod/service instance ID
Example: order-service-pod-7f9c
lockToken – Server-generated unique token returned on acquire
Used for ownership validation

1. Acquire Lock

Endpoint

POST /v1/locks/{lockKey}/acquire

Request Body

{
  "ownerId": "order-service-pod-7f9c",
  "ttlMillis": 30000,
  "wait": false,
  "requestId": "req-12345"
}

Request Fields

| Field       | Description                            |
| ----------- | -------------------------------------- |
| `ownerId`   | Unique identifier of the caller        |
| `ttlMillis` | Lease duration                         |
| `wait`      | If true, block until lock is available |
| `requestId` | Idempotency key                        |

Success Response (200)

{
  "lockKey": "inventory:sku:123",
  "lockToken": "f2a9e91c-cc1b-4c1e",
  "ownerId": "order-service-pod-7f9c",
  "expiresAt": 1700000000000
}

Failure Response (409 – Lock Held)

{
  "error": "LOCK_ALREADY_HELD",
  "currentOwner": "payment-service-pod-23a",
  "retryAfterMillis": 12000
}

2. Release Lock

Endpoint

POST /v1/locks/{lockKey}/release

Request Body

{
  "lockToken": "f2a9e91c-cc1b-4c1e",
  "ownerId": "order-service-pod-7f9c"
}

Success Response (200)

{
  "status": "RELEASED",
  "lockKey": "inventory:sku:123"
}

Failure Response (403 – Ownership Violation)

{
  "error": "NOT_LOCK_OWNER"
}

3. Renew / Extend Lock (Heartbeat)

Endpoint

POST /v1/locks/{lockKey}/renew

Request Body

{
  "lockToken": "f2a9e91c-cc1b-4c1e",
  "ownerId": "order-service-pod-7f9c",
  "ttlMillis": 30000
}

Success Response (200)

{
  "lockKey": "inventory:sku:123",
  "expiresAt": 1700000030000
}

Failure Response (409 – Lock Expired)

{
  "error": "LOCK_EXPIRED"
}

4. Get Lock Status

Endpoint

GET /v1/locks/{lockKey}

Success Response (200)

{
  "lockKey": "inventory:sku:123",
  "locked": true,
  "ownerId": "order-service-pod-7f9c",
  "expiresAt": 1700000030000
}

If Lock Does Not Exist (404)

{
  "locked": false
}

5. Force Unlock (Admin / Recovery Only)

⚠️ Restricted API – Used for operational recovery.

Endpoint

POST /v1/locks/{lockKey}/force-release

Request Body

{
  "reason": "stuck lock after pod crash"
}

Success Response

{
  "status": "FORCE_RELEASED"
}

6. List Locks (Optional – Debugging)

Endpoint

GET /v1/locks?namespace=inventory-service

Response

{
  "locks": [
    {
      "lockKey": "inventory:sku:123",
      "ownerId": "order-service-pod-7f9c",
      "expiresAt": 1700000030000
    }
  ]
}

HTTP Status Code Usage

| Code | Meaning             |
| ---- | ------------------- |
| 200  | Success             |
| 409  | Lock conflict       |
| 403  | Ownership violation |
| 404  | Lock not found      |
| 500  | Internal error      |

Idempotency Behavior

requestId ensures:
- Duplicate acquire calls return same lockToken
- Safe retries during network failures

Example Acquire Flow (Non-Blocking)

Pod A → Acquire lock
Lock granted → token returned
Pod B → Acquire same lock
Receives 409 (lock held)
Pod A releases → Pod B retries

Databases and Schema

Primary (Recommended): Key-Value store (Redis / etcd style)
Alternative: Relational DB (Postgres/MySQL)

You can pick one in interviews and briefly mention the other.

Option 1️⃣: Key-Value Store Schema (Redis / etcd)

Key Pattern

lock:{namespace}:{lock_key}

Example Key

lock:inventory-service:sku-123

Value (JSON / Protobuf equivalent)

{
  "lockKey": "inventory:sku:123",
  "ownerId": "order-service-pod-7f9c",
  "lockToken": "f2a9e91c-cc1b-4c1e",
  "ttlMillis": 30000,
  "expiresAt": 1700000030000,
  "createdAt": 1700000000000,
  "updatedAt": 1700000010000,
  "reentrancyCount": 1
}

Redis Commands (Atomic)

Acquire Lock (NX + TTL)

SET lock:inventory-service:sku-123 "<value>" NX PX 30000

Renew Lock (Lua Script)

if redis.call("GET", KEYS[1]) == ARGV[1] then
  return redis.call("PEXPIRE", KEYS[1], ARGV[2])
else
  return 0
end

Release Lock (Lua Script)

if redis.call("GET", KEYS[1]) == ARGV[1] then
  return redis.call("DEL", KEYS[1])
else
  return 0
end

Why This Works

TTL guarantees auto-expiry
NX ensures mutual exclusion
Lua ensures atomicity
No clock dependency on clients

Option 2️⃣: Relational Database Schema (Postgres / MySQL)

Used when KV store is not allowed (rare, but interview-safe fallback).

Table: `distributed_locks`

CREATE TABLE distributed_locks (
    lock_key           VARCHAR(255) PRIMARY KEY,
    namespace          VARCHAR(100) NOT NULL,
    owner_id           VARCHAR(255) NOT NULL,
    lock_token         VARCHAR(255) NOT NULL,
    expires_at         BIGINT NOT NULL,
    created_at         BIGINT NOT NULL,
    updated_at         BIGINT NOT NULL,
    reentrancy_count   INT DEFAULT 1
);

Indexes

CREATE INDEX idx_expires_at ON distributed_locks (expires_at);
CREATE INDEX idx_namespace ON distributed_locks (namespace);

Acquire Lock (Atomic Insert)

INSERT INTO distributed_locks (
    lock_key,
    namespace,
    owner_id,
    lock_token,
    expires_at,
    created_at,
    updated_at
)
VALUES (?, ?, ?, ?, ?, ?, ?)
ON CONFLICT (lock_key) DO NOTHING;

Release Lock

DELETE FROM distributed_locks
WHERE lock_key = ?
  AND lock_token = ?;

Renew Lock

UPDATE distributed_locks
SET expires_at = ?, updated_at = ?
WHERE lock_key = ?
  AND lock_token = ?;

Cleanup Expired Locks (Background Job)

DELETE FROM distributed_locks
WHERE expires_at < ?;

Downsides (Mention in Interview)

Needs cleanup job
Higher latency
Harder to scale
Risky under partitions

Option 3️⃣: etcd / ZooKeeper (Conceptual Schema)

Path

/locks/{namespace}/{lockKey}

Value

{
  "ownerId": "order-service-pod-7f9c",
  "lockToken": "uuid",
  "leaseId": "etcd-lease-id"
}

Lease expiration == TTL
Strong consistency via Raft
Ideal for leader election

“I’d use Redis or etcd as the primary lock store because TTL + atomic operations guarantee correctness with minimal latency. SQL is only a fallback.”

Microservices Architecture – Distributed Lock System

High-Level View

Client Pods
    |
    v
Lock Client Library
    |
    v
API Gateway / Service Mesh
    |
    v
Lock Management Service
    |
    +--> Lock Store (Redis / etcd)
    |
    +--> Metadata Store (optional)
    |
    +--> Metrics / Logs

1. Lock Client Library (SDK)

Responsibility

Used by application pods.
Abstracts retry logic, idempotency, backoff.
Sends heartbeats (renew calls).
Handles token storage safely.

Why Needed

Prevents every service from re-implementing lock logic.
Standardizes retries and TTL renewal.

Interaction

App Pod → Lock Client → Lock Service APIs

2. API Gateway / Service Mesh (Optional)

Responsibility

Authentication / authorization
Rate limiting
Routing
TLS termination

Examples

Istio / Linkerd
NGINX / Envoy

Interaction

Client → Gateway → Lock Service

3. Lock Management Service (Core Service)

Responsibility

Exposes REST APIs:
- Acquire
- Release
- Renew
- Get status
Validates ownership using lockToken
Enforces TTL rules
Ensures idempotency
Performs atomic operations against lock store

Key Properties

Stateless
Horizontally scalable
Multiple replicas behind a load balancer

4. Lock Store (Strongly Consistent Store)

Options

Redis (Primary choice)
etcd (Best for infra / leader election)
ZooKeeper (legacy but valid)

Responsibility

Single source of truth for locks
Atomic operations
TTL enforcement

Interaction

Lock Service → Lock Store

5. Metadata Store (Optional)

Responsibility

Store historical data:
- Lock acquisition logs
- Forced unlocks
- Debug info
Not in critical path

Example

PostgreSQL / Elasticsearch

6. Monitoring & Observability Stack

Responsibility

Metrics:
- Lock acquisition latency
- Failures
- Expired locks
Logs:
- Owner changes
- Forced releases
Alerts:
- High contention
- Stuck locks

Tools

Prometheus
Grafana
ELK

Interaction Flows (Sequence-Level)

1️⃣ Acquire Lock Flow

Pod A
  |
  | Acquire(lockKey)
  v
Lock Client
  |
  v
Lock Service
  |
  | SET lockKey NX PX ttl
  v
Lock Store
  |
  | Success
  v
Lock Service → Lock Client → Pod A

2️⃣ Lock Conflict Flow

Pod B
  |
  | Acquire(lockKey)
  v
Lock Service
  |
  | SET NX fails
  v
Lock Store
  |
  | Conflict
  v
409 LOCK_ALREADY_HELD

3️⃣ Heartbeat / Renew Flow

Pod A
  |
  | Renew(lockKey)
  v
Lock Service
  |
  | Validate token
  | Extend TTL
  v
Lock Store

4️⃣ Pod Crash Scenario

Pod A crashes
  |
  | No renew
  v
TTL expires in Lock Store
  |
  v
Lock becomes available

5️⃣ Release Lock Flow

Pod A
  |
  | Release(lockKey)
  v
Lock Service
  |
  | Validate token
  | Delete key
  v
Lock Store

Failure Handling Interactions

| Failure            | System Behavior             |
| ------------------ | --------------------------- |
| Pod crash          | TTL expiry releases lock    |
| Lock Service crash | Stateless → retry safe      |
| Redis node failure | Failover via replication    |
| Network partition  | Fail closed (no lock grant) |

Why This Design Works (Interview Gold)

Stateless lock service → easy scaling
Centralized lock store → single source of truth
Lease-based locking → no deadlocks
Client SDK → fewer bugs, safer retries
Consistency first → no split-brain

“Clients interact via a lightweight SDK with a stateless Lock Service, which performs atomic lease-based operations against a strongly consistent lock store like Redis or etcd.”

Want to Go Deeper?

Next logical steps:

Full sequence diagram
Redis Lua scripts deep dive
Leader election vs mutex locks
Hot-key mitigation & sharding
CAP tradeoff justification

Some of real world use case
1️⃣ Database Migration Lock (Very Common)

Problem

You have multiple pods of the same service.
On startup, each pod tries to run:

ALTER TABLE / schema migration

If two pods do this together → data corruption 💥

Distributed Lock Solution

All pods try to acquire:

lockKey = "db-migration:user-service"

Only one pod gets the lock.
That pod runs migration.
Others wait or skip.

Why Distributed Lock?

Pods are stateless
Leader election via lock
Prevents double migration

Interview One-Liner

“We use a distributed lock to ensure only one pod performs schema migration during deployment.”

2️⃣ Cron Job Deduplication Across Pods

Problem

You deploy a cron job as a Kubernetes CronJob or scheduled task.
But:

Service has N replicas
Each pod fires the same cron logic

Result:

Same job runs N times ❌

Distributed Lock Solution

lockKey = "cron:daily-report"

Flow:

All pods try to acquire lock at 12:00 AM
One pod succeeds
Others exit immediately

Real Companies

Payment settlements
Daily analytics aggregation
Cache warmup jobs

3️⃣ Leader Election (Classic Distributed Systems Use Case)

Problem

You want one leader pod to:

Consume Kafka partitions
Push config updates
Manage cluster metadata

Distributed Lock Solution

lockKey = "leader:election:order-service"

Pod holding lock = leader
TTL-based lock
If leader dies → lock expires → new leader elected

Where Used

Kafka consumer group coordinators
Controller services
Scheduler services

Interview Gold Line

“Leader election is implemented using a lease-based distributed lock.”

4️⃣ Inventory Reservation (E-commerce)

Problem

Two users try to buy the last item simultaneously.

Without lock:

Inventory goes negative ❌

Distributed Lock Solution

lockKey = "inventory:sku:iphone15"

Flow:

Pod A acquires lock
Checks stock
Reserves item
Releases lock

Why DB Lock Is Not Enough

Microservices
Multiple DBs
External services involved

5️⃣ Payment Deduplication / Exactly-Once Processing

Problem

Payment gateway retries webhook multiple times.
Multiple pods may process same payment.

Distributed Lock Solution

lockKey = "payment:txn:98765"

First pod processes payment
Others fail fast
Ensures idempotency

6️⃣ Cache Rebuild / Cache Stampede Prevention

Problem

Cache expires.
100 pods try to rebuild it simultaneously → thundering herd 🐘

Distributed Lock Solution

lockKey = "cache:rebuild:product-list"

One pod rebuilds cache
Others serve stale data or wait

Real Usage

Recommendation systems
Search indexes
Pricing engines

7️⃣ File Processing / Batch Jobs

Problem

Multiple workers read same file from S3 / GCS.

Distributed Lock Solution

lockKey = "file:process:2026-01-05.csv"

One pod processes file
Prevents duplicate uploads / processing

Mapping to Your Design (Interview Ready)

| Component | Real World Role   |
| --------- | ----------------- |
| Lock Key  | Business resource |
| Owner ID  | Pod / instance    |
| TTL       | Crash safety      |
| Renew     | Long task         |
| Release   | Normal completion |

One-Line Summary (Perfect Interview Answer)

“Distributed locks are used to coordinate critical operations like leader election, cron jobs, database migrations, inventory updates, and payment deduplication across multiple pods.”

why do we need leader election in this case?

This is a very important question, and interviewers ask it to check whether you understand why distributed locks exist at all, not just how to build them.

Let’s answer it logically + with a real-world mindset, not theory-heavy.

Short Answer (What Interviewers Want)

We need leader election so that exactly one pod is responsible for performing a critical, cluster-wide task, while others stay passive and take over automatically if the leader fails.

Now the Real Explanation (Step-by-Step)

Problem in a Multi-Pod Environment

In Kubernetes:

Pods are stateless
Pods can be:
- Restarted
- Rescheduled
- Scaled up/down
You cannot rely on “pod A is special”

Yet some tasks must be done by only one instance.

What Goes Wrong Without Leader Election?

Example: Cron Job / Scheduler

You deploy:

order-service (5 pods)

Each pod has this logic:

@Scheduled(cron = "0 0 * * *")
generateDailyReport();

Without Leader Election ❌

All 5 pods run the job
Report generated 5 times
Emails sent 5 times
Data duplicated or corrupted

Why Not Just “Use a Distributed Lock Once”?

Good question.

Distributed Lock Alone

Works per operation
Each task tries to acquire a lock every time

Leader Election

Chooses one long-lived owner
Leader handles all such tasks
Other pods don’t even try

Leader election is basically a long-lived distributed lock with automatic failover.

When Do We Need Leader Election?

1️⃣ Repeated / Continuous Responsibilities

Examples:

Schedulers
Kafka consumers
Queue pollers
Rebalancers
Health monitors

Instead of:

Acquire lock → Do task → Release lock

We do:

Elect leader → Leader keeps running tasks

2️⃣ Avoiding Lock Contention

Without leader election:

Every pod:
- Tries to acquire lock
- Fails
- Retries

This causes:

Extra load
Redis hot keys
Unnecessary network traffic

Leader election:

Only leader performs work
Others stay idle

3️⃣ Faster Failover

If leader crashes:

Lock TTL expires
New leader elected automatically

No manual intervention.

Real World Analogy 🏢

Think of a company:

10 managers
Only 1 payroll admin

Payroll admin:

Processes salaries
Others are backups

If admin leaves:

Another manager takes over

That’s leader election.

Leader Election vs Per-Task Lock

| Aspect     | Distributed Lock | Leader Election        |
| ---------- | ---------------- | ---------------------- |
| Scope      | Single operation | Long-running role      |
| Duration   | Short-lived      | Long-lived             |
| Retry load | High             | Low                    |
| Use cases  | Inventory update | Scheduler, coordinator |

Concrete Example (Leader Election Using Lock)

lockKey = "leader:order-service"
TTL = 30s

Pod A acquires lock → becomes leader
Pod A renews lock every 10s
Pod A dies → TTL expires
Pod B acquires lock → new leader

Interview-Perfect One-Liner

“Leader election is needed so that exactly one pod coordinates cluster-wide responsibilities, preventing duplicate execution while enabling automatic failover.”

Bonus: When NOT to Use Leader Election

Simple request-level mutual exclusion
One-off critical sections
Low contention operations

Shashank’s Substack

Discussion about this post

Ready for more?

Shashank’s Substack

Design distributed lock in multipod environment

The distributed lock system prioritizes strong consistency, fault tolerance, and safety over availability, ensuring no two pods can hold the same lock even under failures or network partitions

Distributed Lock Service – API Design

Design Principles

Common Concepts

Identifiers

1. Acquire Lock

Endpoint

Request Body

Request Fields

Success Response (200)

Failure Response (409 – Lock Held)

2. Release Lock

Endpoint

Request Body

Success Response (200)

Failure Response (403 – Ownership Violation)

3. Renew / Extend Lock (Heartbeat)

Endpoint

Request Body

Success Response (200)

Failure Response (409 – Lock Expired)

4. Get Lock Status

Endpoint

Success Response (200)

If Lock Does Not Exist (404)

5. Force Unlock (Admin / Recovery Only)

Endpoint

Request Body

Success Response

6. List Locks (Optional – Debugging)

Endpoint

Response

HTTP Status Code Usage

Idempotency Behavior

Example Acquire Flow (Non-Blocking)

Databases and Schema

Option 1️⃣: Key-Value Store Schema (Redis / etcd)

Key Pattern

Example Key

Value (JSON / Protobuf equivalent)

Redis Commands (Atomic)

Acquire Lock (NX + TTL)

Renew Lock (Lua Script)

Release Lock (Lua Script)

Why This Works

Option 2️⃣: Relational Database Schema (Postgres / MySQL)

Table: distributed_locks

Indexes

Acquire Lock (Atomic Insert)

Release Lock

Renew Lock

Cleanup Expired Locks (Background Job)

Downsides (Mention in Interview)

Option 3️⃣: etcd / ZooKeeper (Conceptual Schema)

Path

Value

Microservices Architecture – Distributed Lock System

High-Level View

1. Lock Client Library (SDK)

Responsibility

Why Needed

Interaction

2. API Gateway / Service Mesh (Optional)

Responsibility

Examples

Interaction

3. Lock Management Service (Core Service)

Responsibility

Key Properties

4. Lock Store (Strongly Consistent Store)

Options

Responsibility

Interaction

5. Metadata Store (Optional)

Responsibility

Example

6. Monitoring & Observability Stack

Responsibility

Table: `distributed_locks`

Some of real world use case
1️⃣ Database Migration Lock (Very Common)