> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pebchip.top/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Kafka: Event Streaming and Message Queue Patterns

> Understand Apache Kafka's architecture, producer-consumer patterns, partitioning, consumer groups, and reliable message delivery guarantees.

Apache Kafka is a distributed event-streaming platform built for high-throughput, fault-tolerant message delivery. Unlike traditional message brokers that discard messages after delivery, Kafka retains messages on disk for a configurable period, letting multiple independent consumers read the same stream at their own pace. You use Kafka to decouple services, absorb traffic spikes, power real-time data pipelines, and implement patterns like event sourcing and change-data capture. This page explains the core architecture, how producers and consumers interact, the delivery guarantees Kafka offers, and when to prefer Kafka over alternatives like RabbitMQ.

## Core concepts

### Topic

A **topic** is a named, ordered, and durable log of messages. Every message a producer sends is appended to a topic. Topics are the primary abstraction for categorizing data — you might have separate topics for `orders`, `payments`, and `user-events`.

### Partition

Each topic is divided into one or more **partitions**. A partition is an ordered, append-only sequence stored as a file on a broker's disk. Partitioning enables two things:

1. **Parallelism**: multiple consumers can read different partitions simultaneously, increasing aggregate throughput.
2. **Scalability**: partitions can be distributed across multiple broker machines, so a topic's capacity is not limited by a single node.

Messages within a single partition are strictly ordered by **offset** — a monotonically increasing integer that identifies a message's position in the partition. Messages across different partitions have no guaranteed ordering relationship.

<Note>
  Within a consumer group, each partition is assigned to exactly one consumer at a time. If you have more consumers than partitions, the extra consumers sit idle. Always set your partition count at or above the maximum number of concurrent consumers you expect to run.
</Note>

### Broker

A **broker** is a Kafka server process. You run multiple brokers to form a **cluster**. Each broker stores a subset of partitions. For each partition, one broker is the **leader** (handles all reads and writes) and the remaining brokers that hold replicas are **followers** (replicate from the leader).

### Consumer group

A **consumer group** is a named set of consumers that collectively consume a topic. Kafka automatically distributes partitions among the consumers in a group:

* Each partition is consumed by exactly one consumer in the group.
* Different groups consuming the same topic are fully independent — they each maintain their own offsets and do not interfere with each other.

This model lets you run multiple downstream applications (billing, analytics, notifications) off the same Kafka topic without any coordination between them.

```
Topic: orders (3 partitions)

Consumer Group A (billing):
  Consumer A1 → Partition 0
  Consumer A2 → Partition 1
  Consumer A3 → Partition 2

Consumer Group B (analytics):
  Consumer B1 → Partition 0, 1
  Consumer B2 → Partition 2
```

***

## Producer patterns and partition selection

A producer publishes messages to a topic. When writing, it must determine which partition to target. Kafka applies three rules in order:

1. **Explicit partition**: the producer specifies a partition number directly.
2. **Key-based**: if the message carries a key, Kafka hashes the key and maps the result to a partition. All messages with the same key land in the same partition, preserving per-key ordering.
3. **Round-robin**: if neither a partition nor a key is specified, Kafka distributes messages across partitions in a round-robin fashion.

### Write flow

```
Producer → asks cluster for leader of target partition
        → sends message to leader broker
        → leader writes to local disk
        → followers pull from leader and write locally
        → followers send ACK to leader
        → leader sends ACK to producer
```

The number of follower ACKs the producer waits for before considering a message "sent" is controlled by the `acks` setting.

### Delivery guarantees

<Tabs>
  <Tab title="At-most-once">
    The producer sends the message and does not retry on failure. Messages may be lost but are never duplicated.

    **Configuration**: `acks=0` (fire-and-forget).

    **When to use**: metrics or log aggregation where a small percentage of loss is acceptable and throughput is paramount.
  </Tab>

  <Tab title="At-least-once">
    The producer retries until it receives an acknowledgement. Messages are never lost but may be delivered more than once if a network issue causes a retry after the broker already wrote the message.

    **Configuration**: `acks=1` or `acks=all` with `retries > 0`.

    **When to use**: most business events. Make your consumer logic idempotent (use a unique message ID to detect and discard duplicates) to handle redelivery safely.
  </Tab>

  <Tab title="Exactly-once">
    Kafka's idempotent producer (`enable.idempotence=true`) assigns a sequence number to each message. The broker deduplicates retries with the same sequence number, ensuring each message is written exactly once even across producer restarts.

    For cross-partition exactly-once (e.g., reading from one topic and writing to another), use Kafka Transactions:

    ```java theme={null}
    producer.initTransactions();
    producer.beginTransaction();
    producer.send(new ProducerRecord<>("output-topic", key, value));
    consumer.commitSync(offsets);
    producer.commitTransaction();
    ```

    **When to use**: financial transactions, inventory updates, or any workflow where duplicate processing causes data corruption.
  </Tab>
</Tabs>

***

## Consumer groups and offset management

### Offsets

Every consumer in a group tracks its read position in each partition using an **offset**. After processing a batch of messages, the consumer **commits** its offset back to Kafka (stored in the internal `__consumer_offsets` topic). On restart or rebalance, the consumer resumes from its last committed offset.

Two commit strategies:

| Strategy      | How it works                                                                    | Risk                                                                                                                           |
| ------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| Auto-commit   | Kafka periodically commits the offset automatically (`enable.auto.commit=true`) | Message loss: auto-commit may advance the offset before processing completes; if the consumer crashes, that message is skipped |
| Manual commit | Your code calls `commitSync()` or `commitAsync()` after processing              | Duplicate delivery: if the consumer crashes after processing but before committing, it reprocesses on restart                  |

For at-least-once semantics, always use **manual commit after processing**.

### Rebalancing

When consumers join or leave a group, or when new partitions are added, Kafka performs a **rebalance** to redistribute partitions among the active consumers. During a rebalance, all consumers in the group pause consumption. Minimize rebalance frequency by:

* Setting `session.timeout.ms` and `heartbeat.interval.ms` appropriately — a consumer that misses its heartbeat is considered dead and triggers a rebalance.
* Using **static group membership** (`group.instance.id`) so that a restarting consumer re-claims its previous partitions without triggering a full rebalance.

### Message ordering guarantees

Kafka guarantees ordering **within a partition** only. If your application requires all events for a given entity (e.g., all events for `order_id=1234`) to be processed in order, use a message key equal to the entity ID. Kafka's key-based routing ensures all messages for that key land in the same partition and are consumed in order.

***

## How to avoid message loss

Address reliability at each of the three stages:

### Production

* Use `acks=all` to require all in-sync replicas to acknowledge before the producer considers a message written.
* Set `retries` to a high value and `enable.idempotence=true` to deduplicate retries.
* Wrap send calls in try-catch; alert on repeated failures.

### Storage

* Set `min.insync.replicas` to at least 2 so that data is on multiple brokers before the leader acknowledges.
* Use `replication.factor=3` for critical topics.
* Deploy brokers across availability zones.

### Consumption

* Commit offsets only **after** your business logic completes successfully.
* Implement idempotent consumers using a unique message ID stored in a database to detect and skip redeliveries.

***

## Kafka vs. RabbitMQ

| Dimension              | Kafka                                                             | RabbitMQ                                        |
| ---------------------- | ----------------------------------------------------------------- | ----------------------------------------------- |
| Model                  | Publish/subscribe log                                             | Queue model (point-to-point)                    |
| Message retention      | Configurable retention period; messages persist after consumption | Messages deleted after acknowledgement          |
| Replay                 | Consumers can seek to any offset and re-read old messages         | Not possible after acknowledgement              |
| Throughput             | Very high (millions of messages/second per cluster)               | High but lower than Kafka at scale              |
| Ordering               | Guaranteed within a partition                                     | Guaranteed within a single queue                |
| Consumer model         | Pull (consumers request messages from brokers)                    | Push (broker delivers to consumers)             |
| Use case fit           | Event streaming, CDC, log aggregation, large-scale pipelines      | Task queues, RPC patterns, routing via exchange |
| Operational complexity | Higher (ZooKeeper or KRaft, partition management)                 | Lower (simpler single-node setup)               |

Choose Kafka when you need durable, replayable event logs consumed by multiple independent systems. Choose RabbitMQ when you need flexible routing, per-message TTL, or a simpler operational footprint for task-queue workloads.

***

## Common patterns

### Event sourcing

Instead of storing only the current state of a record, you store an ordered log of every event that changed that record. Kafka serves as the event log. Any downstream service can rebuild the current state by replaying the partition from offset 0.

**Example**: an `account-events` topic receives `AccountOpened`, `FundsDeposited`, `FundsWithdrawn` events. A consumer replays the log to compute the current balance.

### Change-data capture (CDC)

Tools like **Debezium** or **Canal** connect to a database's replication log (MySQL binlog, PostgreSQL WAL) and publish every row-level change as a Kafka message. Downstream services consume these change events to keep caches, search indexes, or analytics systems in sync without modifying application code.

```
MySQL binlog → Debezium → Kafka topic: db.orders.cdc → Elasticsearch consumer
                                                       → Redis cache consumer
                                                       → Analytics consumer
```

### Stream processing

Frameworks like **Apache Flink**, **Kafka Streams**, and **Apache Spark Structured Streaming** read from Kafka topics, apply transformations (filtering, aggregation, joining), and write results to output topics or external sinks. This lets you compute real-time metrics, detect anomalies, or enrich events with reference data as they flow through the system.

### Handling message backlog

If consumers fall behind producers, the backlog grows. Resolution strategies:

1. **Find the root cause**: check for processing bugs causing retries or unusually slow business logic.
2. **Optimize the consumer**: reduce per-message processing time by batching database writes or parallelizing work within a single consumer.
3. **Scale horizontally**: increase the partition count on the topic and add more consumer instances to the group. Partition count must increase first — a consumer with no partition assigned contributes nothing.

<Warning>
  You can increase a topic's partition count at any time, but this changes the key-to-partition mapping for any new messages. Consumers that rely on key-based ordering guarantees may see ordering violations for in-flight messages during the transition. Plan partition counts carefully at topic creation time.
</Warning>