Skip to main content
Apache Kafka is a distributed event-streaming platform built for high-throughput, fault-tolerant message delivery. Unlike traditional message brokers that discard messages after delivery, Kafka retains messages on disk for a configurable period, letting multiple independent consumers read the same stream at their own pace. You use Kafka to decouple services, absorb traffic spikes, power real-time data pipelines, and implement patterns like event sourcing and change-data capture. This page explains the core architecture, how producers and consumers interact, the delivery guarantees Kafka offers, and when to prefer Kafka over alternatives like RabbitMQ.

Core concepts

Topic

A topic is a named, ordered, and durable log of messages. Every message a producer sends is appended to a topic. Topics are the primary abstraction for categorizing data — you might have separate topics for orders, payments, and user-events.

Partition

Each topic is divided into one or more partitions. A partition is an ordered, append-only sequence stored as a file on a broker’s disk. Partitioning enables two things:
  1. Parallelism: multiple consumers can read different partitions simultaneously, increasing aggregate throughput.
  2. Scalability: partitions can be distributed across multiple broker machines, so a topic’s capacity is not limited by a single node.
Messages within a single partition are strictly ordered by offset — a monotonically increasing integer that identifies a message’s position in the partition. Messages across different partitions have no guaranteed ordering relationship.
Within a consumer group, each partition is assigned to exactly one consumer at a time. If you have more consumers than partitions, the extra consumers sit idle. Always set your partition count at or above the maximum number of concurrent consumers you expect to run.

Broker

A broker is a Kafka server process. You run multiple brokers to form a cluster. Each broker stores a subset of partitions. For each partition, one broker is the leader (handles all reads and writes) and the remaining brokers that hold replicas are followers (replicate from the leader).

Consumer group

A consumer group is a named set of consumers that collectively consume a topic. Kafka automatically distributes partitions among the consumers in a group:
  • Each partition is consumed by exactly one consumer in the group.
  • Different groups consuming the same topic are fully independent — they each maintain their own offsets and do not interfere with each other.
This model lets you run multiple downstream applications (billing, analytics, notifications) off the same Kafka topic without any coordination between them.
Topic: orders (3 partitions)

Consumer Group A (billing):
  Consumer A1 → Partition 0
  Consumer A2 → Partition 1
  Consumer A3 → Partition 2

Consumer Group B (analytics):
  Consumer B1 → Partition 0, 1
  Consumer B2 → Partition 2

Producer patterns and partition selection

A producer publishes messages to a topic. When writing, it must determine which partition to target. Kafka applies three rules in order:
  1. Explicit partition: the producer specifies a partition number directly.
  2. Key-based: if the message carries a key, Kafka hashes the key and maps the result to a partition. All messages with the same key land in the same partition, preserving per-key ordering.
  3. Round-robin: if neither a partition nor a key is specified, Kafka distributes messages across partitions in a round-robin fashion.

Write flow

Producer → asks cluster for leader of target partition
        → sends message to leader broker
        → leader writes to local disk
        → followers pull from leader and write locally
        → followers send ACK to leader
        → leader sends ACK to producer
The number of follower ACKs the producer waits for before considering a message “sent” is controlled by the acks setting.

Delivery guarantees

The producer sends the message and does not retry on failure. Messages may be lost but are never duplicated.Configuration: acks=0 (fire-and-forget).When to use: metrics or log aggregation where a small percentage of loss is acceptable and throughput is paramount.

Consumer groups and offset management

Offsets

Every consumer in a group tracks its read position in each partition using an offset. After processing a batch of messages, the consumer commits its offset back to Kafka (stored in the internal __consumer_offsets topic). On restart or rebalance, the consumer resumes from its last committed offset. Two commit strategies:
StrategyHow it worksRisk
Auto-commitKafka periodically commits the offset automatically (enable.auto.commit=true)Message loss: auto-commit may advance the offset before processing completes; if the consumer crashes, that message is skipped
Manual commitYour code calls commitSync() or commitAsync() after processingDuplicate delivery: if the consumer crashes after processing but before committing, it reprocesses on restart
For at-least-once semantics, always use manual commit after processing.

Rebalancing

When consumers join or leave a group, or when new partitions are added, Kafka performs a rebalance to redistribute partitions among the active consumers. During a rebalance, all consumers in the group pause consumption. Minimize rebalance frequency by:
  • Setting session.timeout.ms and heartbeat.interval.ms appropriately — a consumer that misses its heartbeat is considered dead and triggers a rebalance.
  • Using static group membership (group.instance.id) so that a restarting consumer re-claims its previous partitions without triggering a full rebalance.

Message ordering guarantees

Kafka guarantees ordering within a partition only. If your application requires all events for a given entity (e.g., all events for order_id=1234) to be processed in order, use a message key equal to the entity ID. Kafka’s key-based routing ensures all messages for that key land in the same partition and are consumed in order.

How to avoid message loss

Address reliability at each of the three stages:

Production

  • Use acks=all to require all in-sync replicas to acknowledge before the producer considers a message written.
  • Set retries to a high value and enable.idempotence=true to deduplicate retries.
  • Wrap send calls in try-catch; alert on repeated failures.

Storage

  • Set min.insync.replicas to at least 2 so that data is on multiple brokers before the leader acknowledges.
  • Use replication.factor=3 for critical topics.
  • Deploy brokers across availability zones.

Consumption

  • Commit offsets only after your business logic completes successfully.
  • Implement idempotent consumers using a unique message ID stored in a database to detect and skip redeliveries.

Kafka vs. RabbitMQ

DimensionKafkaRabbitMQ
ModelPublish/subscribe logQueue model (point-to-point)
Message retentionConfigurable retention period; messages persist after consumptionMessages deleted after acknowledgement
ReplayConsumers can seek to any offset and re-read old messagesNot possible after acknowledgement
ThroughputVery high (millions of messages/second per cluster)High but lower than Kafka at scale
OrderingGuaranteed within a partitionGuaranteed within a single queue
Consumer modelPull (consumers request messages from brokers)Push (broker delivers to consumers)
Use case fitEvent streaming, CDC, log aggregation, large-scale pipelinesTask queues, RPC patterns, routing via exchange
Operational complexityHigher (ZooKeeper or KRaft, partition management)Lower (simpler single-node setup)
Choose Kafka when you need durable, replayable event logs consumed by multiple independent systems. Choose RabbitMQ when you need flexible routing, per-message TTL, or a simpler operational footprint for task-queue workloads.

Common patterns

Event sourcing

Instead of storing only the current state of a record, you store an ordered log of every event that changed that record. Kafka serves as the event log. Any downstream service can rebuild the current state by replaying the partition from offset 0. Example: an account-events topic receives AccountOpened, FundsDeposited, FundsWithdrawn events. A consumer replays the log to compute the current balance.

Change-data capture (CDC)

Tools like Debezium or Canal connect to a database’s replication log (MySQL binlog, PostgreSQL WAL) and publish every row-level change as a Kafka message. Downstream services consume these change events to keep caches, search indexes, or analytics systems in sync without modifying application code.
MySQL binlog → Debezium → Kafka topic: db.orders.cdc → Elasticsearch consumer
                                                       → Redis cache consumer
                                                       → Analytics consumer

Stream processing

Frameworks like Apache Flink, Kafka Streams, and Apache Spark Structured Streaming read from Kafka topics, apply transformations (filtering, aggregation, joining), and write results to output topics or external sinks. This lets you compute real-time metrics, detect anomalies, or enrich events with reference data as they flow through the system.

Handling message backlog

If consumers fall behind producers, the backlog grows. Resolution strategies:
  1. Find the root cause: check for processing bugs causing retries or unusually slow business logic.
  2. Optimize the consumer: reduce per-message processing time by batching database writes or parallelizing work within a single consumer.
  3. Scale horizontally: increase the partition count on the topic and add more consumer instances to the group. Partition count must increase first — a consumer with no partition assigned contributes nothing.
You can increase a topic’s partition count at any time, but this changes the key-to-partition mapping for any new messages. Consumers that rely on key-based ordering guarantees may see ordering violations for in-flight messages during the transition. Plan partition counts carefully at topic creation time.