What are special topics in Kafka?

Kafka has several "special topics" that are automatically created and internally used by Kafka itself. These are not regular user-defined topics but serve internal coordination, metadata.

Shashank Mishra

Jul 04, 2025

🛠️ 1. `__consumer_offsets`

Purpose: Stores committed offsets for Kafka consumers (used in consumer groups).
Type: Compacted topic (only latest offset per partition-key is retained).
Key Format: <group.id>-<topic>-<partition>
Used by: Kafka clients for enable.auto.commit=true or manual commits.
Partitions: Default 50 (can be configured with offsets.topic.num.partitions).

📌 Crucial for consumer rebalancing, offset recovery, and lag monitoring.

🧠 2. `__transaction_state`

Purpose: Stores the state of transactions for Kafka producers (in EOS mode).
Type: Compacted topic.
Used by: Kafka transaction coordinator to track transactional IDs, state, offsets.

📌 Required for exactly-once semantics (EOS) and transactional producers.

🔄 3. `__cluster_metadata` (Kafka KRaft mode only)

Purpose: In KRaft (Kafka without Zookeeper), this topic stores cluster metadata (e.g., brokers, topics, configs).
Used by: KRaft metadata quorum (Raft log).
Appears only if: You run Kafka in KRaft mode (Zookeeper-less).

🔁 4. `__changelog_*` topics (Kafka Streams apps)

Purpose: Store state store changelogs in Kafka Streams applications.
Format: One per state store, auto-created with name like: application-id-store-name-changelog
Type: Compacted topic.

📌 Used to restore stateful operators during app restarts.

🔄 5. `__consumer_timestamps` (optional, added via KIP-320)

Purpose: Tracks last seen timestamps of active consumers in a group.
Useful for: More accurate lag measurements, detecting dead consumers.

💬 6. Kafka Connect Internal Topics

When using Kafka Connect, it creates several internal topics:

connect-configs: Stores connector and task configurations
connect-offsets: Tracks source offsets for connectors
connect-status: Tracks status of connectors and tasks

📌 All are compacted topics.

🧾 7. Schema Registry Topic

If you're using Confluent Schema Registry, it uses:

_schemas — stores all Avro/Protobuf schemas and their versions
Compacted topic

🧼 8. MirrorMaker 2.0 Internal Topics

If you're replicating Kafka clusters using MirrorMaker 2.0, it creates:

__mm2-configs
__mm2-offsets
__mm2-heartbeats

These are internal system topics for config tracking, offset sync, and heartbeat validation.

🔐 Accessing Special Topics

By default:

Many internal topics are hidden from basic listing APIs.
You can still list them using:

kafka-topics.sh --bootstrap-server <host> --list --exclude-internal false

🔒 Should You Write to These Topics?

❌ No – You should never directly produce to or consume from internal topics like __consumer_offsets or __transaction_state.
✅ Use Kafka APIs (producers/consumers/streams) which interact with these topics safely.

Let’s dive into how Kafka uses the __consumer_offsets topic internally, and then we’ll look at Kafka Streams changelog topics.

🔄 Part 1: `__consumer_offsets` – How Kafka Tracks Consumer Group Progress

✅ Purpose:

Kafka uses the __consumer_offsets topic to store committed offsets for consumer groups. It ensures that when a consumer restarts or a rebalance occurs, it can resume from the last committed offset.

🧱 Topic Structure:

Name: __consumer_offsets
Default partitions: 50 (configurable via offsets.topic.num.partitions)
Cleanup policy: compact (retains only the latest offset per key)
Replication factor: Default is 3 (offsets.topic.replication.factor)
Key format:

GroupMetadataKey {
   group: "my-consumer-group",
   topic: "my-topic",
   partition: 0
}

🔧 How It Works (Internally):

1. Offset Commit

When a consumer commits offsets (either automatically or manually), Kafka:
- Writes a message to __consumer_offsets with:
  - Key: Consumer group + topic + partition
  - Value: Committed offset and metadata (like timestamp)
This is asynchronous and batched for efficiency.

2. Offset Fetch

On consumer start or rebalance:
- Kafka fetches the latest committed offsets from __consumer_offsets.
- Because it’s a compacted topic, only the latest offset per key is read.

3. Consumer Group Coordinator

A broker is selected as the Group Coordinator.
It handles:
- Heartbeats from consumers
- Offset commits and fetches
- Rebalancing logic

Kafka uses a hash of the group ID to assign each group to a specific coordinator broker.

4. Rebalancing

When consumers join/leave, the group coordinator triggers a rebalance.
New assignments are communicated, and offset fetch happens via __consumer_offsets.

👁️ Monitoring Offsets

Consumer Lag = Latest offset in partition - Committed offset in __consumer_offsets
Tools:
- kafka-consumer-groups.sh --describe
- Prometheus + JMX metrics: kafka.consumer:type=consumer-fetch-manager-metrics

⚙️ Part 2: Kafka Streams Changelog Topics

✅ Purpose:

Kafka Streams uses state stores for operations like counting, joining, windowing, etc.
These state stores are backed up to Kafka via changelog topics, so that the processing state can be recovered after a crash or migration.

🧱 Changelog Topic Structure:

Name format: <application-id>-<store-name>-changelog
- Example: pageviews-app-clicks-store-changelog
Partitions: Same as the corresponding task’s input topic
Cleanup policy: compact
Key-value: Mirror of the state store updates

🔧 How It Works:

1. State Store Writes

During processing (e.g., a count operation), Kafka Streams updates a local RocksDB store.
These updates are also written to a changelog topic in Kafka.

2. Recovery

When a Streams app instance crashes or migrates, Kafka Streams:
- Reconstructs the state store by replaying the changelog topic.
- Enables fast failover and recovery with full processing consistency.

3. Exactly-Once Support

In EOS mode (processing.guarantee=exactly_once_v2):
- Changelog updates and output records are part of the same Kafka transaction.

🛠 Example:

Suppose you’re counting clicks per user:

KTable<String, Long> userClicks = clicksStream
    .groupByKey()
    .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("user-clicks-store"));

Kafka Streams:
- Stores counts in user-clicks-store (local RocksDB)
- Persists changes in my-app-user-clicks-store-changelog Kafka topic

📌 Why Changelog Topics Matter

Make stateful stream processing fault-tolerant
Allow scaling out Kafka Streams apps (new instances can rebuild state)
Support hot standby replicas for quick failover

| Feature      | `__consumer_offsets`                     | Kafka Streams Changelog Topics                     |
| ------------ | ---------------------------------------- | -------------------------------------------------- |
| Stores       | Committed offsets of consumer groups     | Updates to state stores (like counts, windows)     |
| Used by      | All Kafka consumers                      | Kafka Streams apps                                 |
| Topic Type   | Compacted                                | Compacted                                          |
| Partitions   | Default 50 (configurable)                | Depends on state store partitioning                |
| Critical for | Offset tracking, rebalance, consumer lag | Stateful recovery, EOS processing, fault tolerance |

Shashank’s Substack

Discussion about this post

Shashank’s Substack

What are special topics in Kafka?

Kafka has several "special topics" that are automatically created and internally used by Kafka itself. These are not regular user-defined topics but serve internal coordination, metadata.

🛠️ 1. __consumer_offsets

🧠 2. __transaction_state

🔄 3. __cluster_metadata (Kafka KRaft mode only)

🔁 4. __changelog_* topics (Kafka Streams apps)

🔄 5. __consumer_timestamps (optional, added via KIP-320)

💬 6. Kafka Connect Internal Topics

🧾 7. Schema Registry Topic

🧼 8. MirrorMaker 2.0 Internal Topics

🔐 Accessing Special Topics

🔒 Should You Write to These Topics?

🔄 Part 1: __consumer_offsets – How Kafka Tracks Consumer Group Progress

✅ Purpose:

🧱 Topic Structure:

🔧 How It Works (Internally):

1. Offset Commit

2. Offset Fetch

3. Consumer Group Coordinator

4. Rebalancing

👁️ Monitoring Offsets

⚙️ Part 2: Kafka Streams Changelog Topics

✅ Purpose:

🧱 Changelog Topic Structure:

🔧 How It Works:

1. State Store Writes

2. Recovery

3. Exactly-Once Support

🛠 Example:

📌 Why Changelog Topics Matter

Discussion about this post

🛠️ 1. `__consumer_offsets`

🧠 2. `__transaction_state`

🔄 3. `__cluster_metadata` (Kafka KRaft mode only)

🔁 4. `__changelog_*` topics (Kafka Streams apps)

🔄 5. `__consumer_timestamps` (optional, added via KIP-320)

🔄 Part 1: `__consumer_offsets` – How Kafka Tracks Consumer Group Progress