How does Kafka ensures exactly once semantics?
Kafka ensures Exactly-Once Semantics (EOS) using a combination of features across the producer, broker, and consumer, coordinated via transactions. The key goal of EOS is to ensure no message is lost
✅ 1. Idempotent Producer
Problem: Retry during transient errors can lead to duplicate messages.
Solution: Kafka enables idempotent writes using:
A unique
producerId
(PID)Monotonic sequence numbers per partition
The broker uses this info to detect and discard duplicates.
props.put("enable.idempotence", "true");
✅ Guarantees that each message is written only once even if retries happen.
✅ 2. Transactional Producer
Problem: If a producer writes to multiple partitions or topics, partial failures may cause inconsistent state.
Solution: Kafka allows producers to group multiple writes into a single atomic transaction.
Key APIs:
producer.initTransactions();
producer.beginTransaction();
producer.send(...);
producer.sendOffsetsToTransaction(...);
producer.commitTransaction(); // or abortTransaction()
Kafka brokers maintain a special topic:
__transaction_state
Transaction Coordinator ensures all writes are atomic and ordered
✅ Ensures atomic write across multiple topics/partitions.
✅ 3. Atomic Offset Commit + Write (Kafka-to-Kafka)
Problem: In typical pipelines (read → process → write), you want to ensure offsets are committed only after output is written. Otherwise, a crash could cause reprocessing or data loss.
Solution: Kafka allows offsets to be committed as part of the same transaction.
producer.sendOffsetsToTransaction(offsets, groupId);
Offsets are written to the internal
__consumer_offsets
topic as part of the transaction.
✅ Ensures that either:
Both output and offset are committed
ORNeither is committed
🔁 End-to-End EOS Pipeline
Consumer reads from input topic with
read_committed
Processes the record
Producer writes result to output topic inside a transaction
Producer also commits the offset (from step 1) as part of the transaction
Transaction is committed — now output and offset are both visible
✅ Guarantees:
No duplication
No message loss
Consistent processing
✅ Summary
Kafka achieves exactly-once via:
enable.idempotence = true
– Avoid duplicate writestransactional.id
– Enable atomic operationssendOffsetsToTransaction
– Link offset and outputread_committed
– Consumers skip uncommitted data
It’s the most powerful when doing Kafka → process → Kafka pipelines.