How can consumer throughput be increased in kafka

Shashank Mishra

Mar 21, 2026

🚀 1. Increase Parallelism (Biggest Lever)

✅ Increase partitions

Kafka guarantees 1 consumer per partition per group
More partitions ⇒ more parallel consumption

👉 Rule of thumb:

Partitions ≥ total consumer threads

✅ Scale consumer group horizontally

Add more consumer instances in the same group
Kafka auto-rebalances partitions

⚠️ Watch out:

Too many rebalances can hurt throughput

✅ Multi-threaded consumers

Default Kafka consumer is single-threaded
Use:
- Thread pool for processing
- Separate polling + processing

Pattern:

Poll → Queue → Worker Threads

⚡ 2. Optimize Polling & Fetching

✅ Increase `fetch.min.bytes`

Broker waits to accumulate more data before responding
Improves batching

✅ Increase `fetch.max.wait.ms`

Wait longer to get larger batches

✅ Tune `max.poll.records`

More records per poll = better throughput
But avoid:
- Long processing → consumer rebalance

✅ Increase `receive.buffer.bytes`

Helps in high network throughput scenarios

📦 3. Batch Processing (Critical)

✅ Process records in batches

Instead of:

process(record)

Do:

process(batch_of_records)

Benefits:

Better CPU cache usage
Fewer DB/network calls

✅ Bulk writes (DB optimization)

Use batch inserts / bulk APIs
Example:
- Instead of 1000 DB calls → 1 bulk write

🔄 4. Async & Non-blocking Processing

✅ Avoid blocking I/O

Use async calls (futures, reactive frameworks)

✅ Decouple consumption & processing

Use internal queue (like Disruptor / BlockingQueue)

🧠 5. Optimize Business Logic

✅ Reduce processing time per message

Cache frequently accessed data (Redis)
Avoid heavy joins / external calls

✅ Idempotency design

Enables safe retries without slowing pipeline

💾 6. Offset Commit Strategy

✅ Use async commits

enable.auto.commit=false
Use commitAsync()

✅ Commit in batches

Commit after processing N records

🧵 7. JVM & System Tuning

✅ Increase heap size

Avoid GC pauses

✅ Tune GC

Use G1GC / ZGC for low latency

✅ CPU scaling

More cores = more processing threads

📡 8. Network & Broker-Level Optimizations

✅ Compression

Use snappy or lz4
Reduces network overhead

✅ Co-locate consumers with brokers

Reduces network latency

✅ Increase broker I/O capacity

SSDs, better network bandwidth

🧪 9. Avoid Common Bottlenecks

❌ Frequent rebalances

❌ Long processing inside poll() loop

❌ Synchronous DB/API calls

❌ Too few partitions

❌ Large message size

🧩 10. Advanced Patterns (System Design Level)

🔹 Consumer → Queue → Worker Pool

Decouple ingestion from processing

🔹 Kafka → Stream Processor

Use:

Kafka Streams
Apache Flink

🔹 Backpressure handling

Pause/resume consumption (consumer.pause())

🧠 Mental Model (Important)

Throughput =

👉 min(fetch speed, processing speed, commit speed)

You must optimize all three, not just one.

💡 Practical Strategy (What I’d do in real system)

Increase partitions (if low)
Add consumers horizontally
Introduce batch processing + bulk DB writes
Decouple poll & processing (thread pool)
Tune fetch configs
Add caching layer
Monitor lag + rebalance frequency

Great—let’s walk through a real-world Kafka throughput optimization case using a Payments / Ledger system (very relevant to your fintech + system design work).

💳 Problem Context (Real System)

You’re building something like a UPI / wallet system:

Flow:

Payment Service → Kafka → Ledger Service (consumer)

Each payment event must:

Validate transaction
Update user balance
Write ledger entry
Emit downstream event

🚨 Initial Problem

At scale:

50M+ events/day
Consumers lagging
High latency (seconds → minutes)
DB becoming bottleneck

🔍 Root Causes (Typical)

❌ Single-threaded consumption

One message processed at a time

❌ Per-message DB writes

1 Kafka message → 3 DB queries

❌ Synchronous external calls

Fraud check / user lookup

❌ Small batches

max.poll.records = 10

⚙️ Step-by-Step Throughput Optimization

🧵 Step 1: Increase Parallelism

Before:

8 partitions → 2 consumers

After:

32 partitions → 8 consumers

👉 Immediate 4x throughput gain

📦 Step 2: Batch Processing (Game Changer)

Before:

for (record : records) {
    process(record);
}

After:

processBatch(records);

Inside batch:

Group by user_id
Aggregate updates

💥 Optimization:

Instead of:

1000 messages → 3000 DB queries

Do:

1000 messages → 50 bulk DB writes

👉 Massive DB load reduction

🔄 Step 3: Async Processing Pipeline

Introduce internal pipeline:

Kafka Poll Thread
        ↓
   In-Memory Queue
        ↓
 Worker Thread Pool
        ↓
 Batch Aggregator
        ↓
 DB Writer

Why this works:

Polling stays fast (avoids rebalance)
Processing scales independently

⚡ Step 4: Async I/O + Parallel Calls

Before:

fraudCheck();
dbWrite();
eventPublish();

After:

CompletableFuture.allOf(
    fraudCheckAsync(),
    dbWriteAsync(),
    publishAsync()
);

👉 CPU + I/O parallelism unlocked

🧠 Step 5: Smart Caching

Use:

Redis

Cache:

User metadata
Account status
Limits

👉 Avoid repeated DB hits

🧾 Step 6: Offset Commit Optimization

Before:

Commit after every message ❌

After:

Commit after batch (e.g., 1000 records)
Use async commit

consumer.commitAsync();

👉 Reduces commit overhead significantly

📡 Step 7: Kafka Config Tuning

fetch.min.bytes=1MB
fetch.max.wait.ms=50
max.poll.records=1000

👉 Larger batches → higher throughput

💾 Step 8: Idempotent Ledger Design

Critical for fintech ⚠️

Use:

Transaction ID as unique key
Upsert instead of insert

👉 Enables:

Safe retries
Parallel processing

🔥 Step 9: Partitioning Strategy

Partition by:

user_id

👉 Ensures:

Ordering per user
Parallelism across users

📊 Final Architecture

                ┌──────────────┐
                │  Kafka Topic │
                └──────┬───────┘
                       ↓
              ┌─────────────────┐
              │ Consumer Group  │
              └──────┬──────────┘
                     ↓
            ┌──────────────────┐
            │ Polling Thread   │
            └──────┬───────────┘
                   ↓
        ┌──────────────────────┐
        │ In-Memory Queue      │
        └──────┬───────────────┘
               ↓
     ┌────────────────────────────┐
     │ Worker Thread Pool         │
     └──────┬─────────────────────┘
            ↓
     ┌────────────────────────────┐
     │ Batch Aggregator           │
     └──────┬─────────────────────┘
            ↓
     ┌────────────────────────────┐
     │ Bulk DB Writer (Ledger DB) │
     └──────┬─────────────────────┘
            ↓
     ┌────────────────────────────┐
     │ Event Publisher (Kafka)    │
     └────────────────────────────┘

📈 Real Impact (Typical Gains)

OptimizationGainPartition scaling3–5xBatch processing10–50xAsync pipeline2–3xCaching2–10xConfig tuning20–30%

👉 Combined: 10x–100x throughput improvement

⚠️ Trade-offs (Important)

🔸 Larger batches

throughput

– higher latency

🔸 More partitions

parallelism

– harder rebalancing

🔸 Async processing

speed

– complexity (ordering, retries)

🧠 Key Insight (What Interviewers Love)

Kafka is rarely the bottleneck.

Your downstream system (DB / processing) is.

So:

👉 Optimize end-to-end pipeline, not just Kafka configs.

🔄 What Problem Are We Solving?

❌ Naive consumer:

while (true) {
    records = consumer.poll();
    for (record : records) {
        process(record); // blocking
    }
}

Issues:

Poll thread is blocked → triggers rebalance
No parallelism
Low CPU utilization
Throughput tightly coupled to processing speed

🚀 Goal of Async Pipeline

👉 Decouple ingestion from processing

So:

Kafka polling stays fast ✅
Processing scales independently ✅
System absorbs spikes (buffering) ✅

🧩 Core Architecture

Kafka → Poll Thread → Queue → Worker Pool → Batch → DB

Let’s break each component 👇

1️⃣ Poll Thread (Fast + Non-blocking)

Responsibilities:

Only poll Kafka
Push records to internal queue
Never block on processing

while (true) {
    ConsumerRecords<K, V> records = consumer.poll(Duration.ofMillis(100));
    
    for (ConsumerRecord record : records) {
        queue.put(record); // fast enqueue
    }
}

⚠️ Key Rule:

👉 Keep poll() loop lightweight

If you block here:

Kafka thinks consumer is dead
Rebalance happens → throughput drops

2️⃣ In-Memory Queue (Buffer Layer)

Use:

BlockingQueue
Disruptor (low latency)

Why this matters:

Smooths traffic spikes
Decouples Kafka speed from processing speed

Backpressure (VERY IMPORTANT)

If queue is full:

queue.put(record); // blocks OR

Better:

if (!queue.offer(record)) {
    consumer.pause(partitions);
}

Resume later:

consumer.resume(partitions);

👉 Prevents system overload instead of crashing

3️⃣ Worker Thread Pool (Parallel Processing)

ExecutorService pool = Executors.newFixedThreadPool(N);

while (true) {
    Record record = queue.take();
    
    pool.submit(() -> process(record));
}

Benefits:

Utilizes multi-core CPUs
Parallel I/O (DB, APIs)

⚠️ Ordering Problem

Kafka guarantees:

👉 Order within a partition

But with threads:

❌ Order can break

✅ Solution: Partition-aware processing

Partition 0 → Thread 1
Partition 1 → Thread 2

OR:

Use key-based routing
Same key → same worker

4️⃣ Batch Aggregator (Throughput Booster)

Instead of processing one-by-one:

List<Record> batch = new ArrayList<>();

queue.drainTo(batch, 1000);
processBatch(batch);

Why:

Reduces DB calls
Improves cache locality
Enables bulk writes

5️⃣ DB Writer (Bulk + Async)

INSERT INTO ledger VALUES (...), (...), (...);

Instead of:

INSERT INTO ledger VALUES (...); // 1000 times ❌

🔁 Offset Commit Strategy (Critical)

You must ensure:

👉 Process → then commit

❌ Wrong:

poll → commit → process

✅ Correct:

poll → enqueue → process → commit

Batch commit example:

if (processedCount >= 1000) {
    consumer.commitAsync();
}

⚠️ Failure Handling (Real-world complexity)

Case 1: Worker crashes

👉 Message not committed → reprocessed

✔️ Requires idempotency

Case 2: Partial batch failure

Strategy:

Retry failed records
Send to DLQ (dead letter queue)

DLQ pattern:

Kafka Topic → Consumer → Failure → DLQ Topic

📊 Threading Model (Clean Design)

            ┌──────────────┐
            │ Poll Thread  │
            └──────┬───────┘
                   ↓
        ┌──────────────────────┐
        │ Blocking Queue       │
        └──────┬───────────────┘
               ↓
     ┌──────────────────────────┐
     │ Worker Thread Pool (N)   │
     └──────┬───────────────────┘
            ↓
     ┌──────────────────────────┐
     │ Batch Processor          │
     └──────┬───────────────────┘
            ↓
     ┌──────────────────────────┐
     │ DB / External Services   │
     └──────────────────────────┘

⚡ Performance Gains (Why this works)

ComponentGainDecouplingNo poll blockingThread poolCPU parallelismQueueBurst handlingBatchFewer DB ops

👉 Combined: 5x–20x improvement

🧠 Advanced Enhancements (Senior-level insights)

🔹 Adaptive thread pool

Increase threads during high lag
Reduce during idle

🔹 Dynamic batching

Batch size based on:
- queue size
- lag

🔹 Priority queues

High-priority events processed first

🔹 Zero-copy processing

Avoid unnecessary serialization/deserialization

💡 Key Insight (Interview Gold)

Kafka consumer is NOT just a loop.

It is a mini stream processing system.

You are essentially building:

👉 A lightweight version of

Kafka Streams
Apache Flink

🔚 Final Takeaway

To maximize throughput:

👉 Keep polling fast, push work downstream, and process in parallel batches

🧩 High-Level Components

KafkaConsumerService
 ├── Poller Thread
 ├── Blocking Queue
 ├── Worker Pool
 ├── Batch Processor
 ├── Offset Manager
 └── DLQ Producer

🏗️ 1. Core Consumer Service

public class KafkaConsumerService {

    private final KafkaConsumer<String, String> consumer;
    private final BlockingQueue<ConsumerRecord<String, String>> queue;
    private final ExecutorService workerPool;
    private final BatchProcessor batchProcessor;
    private final OffsetManager offsetManager;

    public KafkaConsumerService(Properties props, int workerThreads, int queueSize) {
        this.consumer = new KafkaConsumer<>(props);
        this.queue = new ArrayBlockingQueue<>(queueSize);
        this.workerPool = Executors.newFixedThreadPool(workerThreads);
        this.batchProcessor = new BatchProcessor(offsetManager);
        this.offsetManager = new OffsetManager(consumer);
    }

    public void start() {
        consumer.subscribe(List.of(”payments-topic”));

        // Start worker threads
        for (int i = 0; i < 8; i++) {
            workerPool.submit(new Worker(queue, batchProcessor));
        }

        // Start polling loop
        poll();
    }

    private void poll() {
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));

            for (ConsumerRecord<String, String> record : records) {
                boolean added = queue.offer(record);

                if (!added) {
                    // Backpressure: pause consumption
                    consumer.pause(consumer.assignment());
                    break;
                }
            }

            // Resume if queue has capacity
            if (queue.remainingCapacity() > 100) {
                consumer.resume(consumer.assignment());
            }
        }
    }
}

🧵 2. Worker Thread

public class Worker implements Runnable {

    private final BlockingQueue<ConsumerRecord<String, String>> queue;
    private final BatchProcessor batchProcessor;

    public Worker(BlockingQueue<ConsumerRecord<String, String>> queue,
                  BatchProcessor batchProcessor) {
        this.queue = queue;
        this.batchProcessor = batchProcessor;
    }

    @Override
    public void run() {
        List<ConsumerRecord<String, String>> batch = new ArrayList<>();

        while (true) {
            try {
                ConsumerRecord<String, String> record = queue.poll(100, TimeUnit.MILLISECONDS);

                if (record != null) {
                    batch.add(record);
                }

                if (batch.size() >= 500) {
                    batchProcessor.process(batch);
                    batch.clear();
                }

            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

📦 3. Batch Processor

public class BatchProcessor {

    private final OffsetManager offsetManager;
    private final LedgerRepository repository = new LedgerRepository();

    public BatchProcessor(OffsetManager offsetManager) {
        this.offsetManager = offsetManager;
    }

    public void process(List<ConsumerRecord<String, String>> records) {
        try {
            // Transform records
            List<LedgerEntry> entries = transform(records);

            // Bulk DB write
            repository.bulkInsert(entries);

            // Mark offsets processed
            offsetManager.markProcessed(records);

        } catch (Exception e) {
            // Send failed batch to DLQ
            sendToDLQ(records);
        }
    }

    private List<LedgerEntry> transform(List<ConsumerRecord<String, String>> records) {
        // Map JSON → Object
        return records.stream()
                .map(r -> new LedgerEntry(r.value()))
                .toList();
    }

    private void sendToDLQ(List<ConsumerRecord<String, String>> records) {
        // Produce to DLQ topic
    }
}

🧾 4. Offset Manager (VERY IMPORTANT)

public class OffsetManager {

    private final KafkaConsumer<String, String> consumer;
    private final Map<TopicPartition, OffsetAndMetadata> offsets = new ConcurrentHashMap<>();

    public OffsetManager(KafkaConsumer<String, String> consumer) {
        this.consumer = consumer;
    }

    public synchronized void markProcessed(List<ConsumerRecord<String, String>> records) {
        for (ConsumerRecord<String, String> record : records) {
            TopicPartition tp = new TopicPartition(record.topic(), record.partition());

            offsets.put(tp, new OffsetAndMetadata(record.offset() + 1));
        }

        // Commit asynchronously
        consumer.commitAsync(offsets, (map, ex) -> {
            if (ex != null) {
                System.err.println(”Commit failed: “ + ex.getMessage());
            }
        });
    }
}

💾 5. Repository (Bulk Writes)

public class LedgerRepository {

    public void bulkInsert(List<LedgerEntry> entries) {
        // Example: JDBC batch insert

        String sql = “INSERT INTO ledger (id, amount, user_id) VALUES (?, ?, ?)”;

        try (Connection conn = dataSource.getConnection();
             PreparedStatement stmt = conn.prepareStatement(sql)) {

            for (LedgerEntry entry : entries) {
                stmt.setString(1, entry.getId());
                stmt.setDouble(2, entry.getAmount());
                stmt.setString(3, entry.getUserId());
                stmt.addBatch();
            }

            stmt.executeBatch();

        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

⚠️ Production-Grade Improvements

🔹 1. Partition-aware workers

Ensure ordering per partition
Use:

partition → dedicated queue → worker

🔹 2. Idempotency layer

Use unique transaction_id
DB constraint:

UNIQUE(transaction_id)

🔹 3. DLQ integration

Separate Kafka topic:

payments-dlq-topic

🔹 4. Metrics (CRITICAL)

Track:

Consumer lag
Queue size
Processing latency
Batch size

Use:

Prometheus
Grafana

🔹 5. Graceful shutdown

Runtime.getRuntime().addShutdownHook(new Thread(() -> {
    workerPool.shutdown();
    consumer.close();
}));

🧠 Final Mental Model

This system is:

👉 Kafka Consumer + Internal Stream Processor

Similar to:

Kafka Streams
Apache Flink

…but custom-built for:

Lower latency
Full control
Simpler infra

🚀 What Makes This “Senior-Level”

Decoupled ingestion vs processing
Backpressure handling (pause/resume)
Batch + bulk DB writes
Async offset commits
Failure isolation (DLQ)
Extensible for retries, idempotency

Shashank’s Substack

Discussion about this post

Ready for more?

Shashank’s Substack

How can consumer throughput be increased in kafka

🚀 1. Increase Parallelism (Biggest Lever)

✅ Increase partitions

✅ Scale consumer group horizontally

✅ Multi-threaded consumers

⚡ 2. Optimize Polling & Fetching

✅ Increase fetch.min.bytes

✅ Increase fetch.max.wait.ms

✅ Tune max.poll.records

✅ Increase receive.buffer.bytes

📦 3. Batch Processing (Critical)

✅ Process records in batches

✅ Bulk writes (DB optimization)

🔄 4. Async & Non-blocking Processing

✅ Avoid blocking I/O

✅ Decouple consumption & processing

🧠 5. Optimize Business Logic

✅ Reduce processing time per message

✅ Idempotency design

💾 6. Offset Commit Strategy

✅ Use async commits

✅ Commit in batches

🧵 7. JVM & System Tuning

✅ Increase heap size

✅ Tune GC

✅ CPU scaling

📡 8. Network & Broker-Level Optimizations

✅ Compression

✅ Co-locate consumers with brokers

✅ Increase broker I/O capacity

🧪 9. Avoid Common Bottlenecks

🧩 10. Advanced Patterns (System Design Level)

🔹 Consumer → Queue → Worker Pool

🔹 Kafka → Stream Processor

🔹 Backpressure handling

🧠 Mental Model (Important)

💡 Practical Strategy (What I’d do in real system)

💳 Problem Context (Real System)

Flow:

🚨 Initial Problem

🔍 Root Causes (Typical)

❌ Single-threaded consumption

❌ Per-message DB writes

❌ Synchronous external calls

❌ Small batches

⚙️ Step-by-Step Throughput Optimization

🧵 Step 1: Increase Parallelism

Before:

After:

📦 Step 2: Batch Processing (Game Changer)

Before:

After:

Inside batch:

💥 Optimization:

🔄 Step 3: Async Processing Pipeline

Introduce internal pipeline:

Why this works:

⚡ Step 4: Async I/O + Parallel Calls

Before:

After:

🧠 Step 5: Smart Caching

🧾 Step 6: Offset Commit Optimization

Before:

After:

📡 Step 7: Kafka Config Tuning

💾 Step 8: Idempotent Ledger Design

Use:

🔥 Step 9: Partitioning Strategy

📊 Final Architecture

📈 Real Impact (Typical Gains)

⚠️ Trade-offs (Important)

🔸 Larger batches

🔸 More partitions

🔸 Async processing

🧠 Key Insight (What Interviewers Love)

🔄 What Problem Are We Solving?

❌ Naive consumer:

Issues:

🚀 Goal of Async Pipeline

✅ Increase `fetch.min.bytes`

✅ Increase `fetch.max.wait.ms`

✅ Tune `max.poll.records`

✅ Increase `receive.buffer.bytes`