The Gravity of Data: How Kafka is Reinventing Itself with KRaft and Tiered Storage

[Caption: At the heart of Kafka lies the seemingly simple Log, which is, in fact, the circulatory system of modern distributed systems. Its beauty lies in transforming complex random reads and writes into the ultimate form of sequential I/O.]

0. The Context: When We Talk About “Gravity”

A New York morning, with a biting chill outside the window. The temperature is -0.75°C (about 31°F), and the sky is as clear as a Log Segment that has just undergone a Compact strategy.

This kind of stark clarity always reminds me of Kafka. In the galaxy of distributed systems, Apache Kafka is like a massive star. Through its immense mass (massive data throughput), it creates a gravitational field that pulls surrounding microservices, databases, and analytics engines into its orbit.

However, even stars grow old.

For years, Kafka has been weighed down by heavy “historical baggage”: its mandatory dependency on ZooKeeper for metadata management, JVM garbage collection (GC) pauses with massive heap memory, and slow controller failover due to an excessive number of partitions. For many startups, deploying a high-availability Kafka cluster feels like maintaining a nuclear-powered aircraft carrier—even if you only want to ship a few boxes of cargo.

Meanwhile, challengers like Redpanda, built with C++ and Rust, are waving the banners of “single binary” and “zero dependencies,” attempting to breach Kafka’s defenses.

This is the gravitational reality of today: Kafka is no longer the only “bad boy” on the block; it has become the defending champion that must reinvent itself. The arrival of KRaft (KIP-500) is not just an upgrade; it’s a “heart transplant.” It’s an attempt to prove that even a behemoth written in Java can, through architectural reconstruction in the cloud-native era, shed the decade-long dream of ZooKeeper and redefine what it means to be “lightweight.”

1. The Core: Deconstruction and Mechanics

If you think of Kafka as just a message queue, you are not only underestimating it but also misinterpreting modern data architecture. At its core, Kafka is a distributed, replayable commit log.

1.1 The Triumph of Minimalism: Sequential I/O and PageCache

Kafka’s speed comes from its “laziness.”

In Kafka’s design philosophy, it places immense trust in the operating system (OS). If you dive into the DefaultRecordBatch and underlying storage implementation, you’ll find that Kafka doesn’t manage a complex in-memory buffer pool like many databases do.

Leveraging the PageCache: Kafka relies on the Linux kernel’s PageCache. When you write data, Kafka is actually just writing it to the OS’s page cache, letting the OS decide when to flush it to disk. When you read data, if it’s in the PageCache (which is highly probable in streaming consumption scenarios), it can be read directly from memory.
The Physical Structure of the Log: The slowest part of disk operations is seeking. Kafka enforces that all data is treated as a “log,” allowing only append-only writes. This design transforms the disk’s random I/O into sequential I/O. On modern NVMe SSDs, sequential write speeds can even approach that of memory.

1.2 Zero-Copy: Eliminating Context Switches

This is another secret to Kafka’s high throughput.

In traditional network transfers, data moving from disk to the network card requires 4 copies and 4 context switches: disk -> kernel read buffer -> user-space buffer -> kernel socket buffer -> network card.

Kafka utilizes Java’s FileChannel.transferTo() method, which calls the underlying Linux sendfile system call.

How it works: Data is transferred directly from the kernel’s read buffer (PageCache) to the network stack’s buffer, completely bypassing user space.
So What? This means the CPU doesn’t have to participate in moving data, only in managing connections. This is why you often see Kafka’s CPU usage being low while its network card is saturated.

1.3 KRaft: Recursive Metadata Management (KIP-500)

This is currently the most exciting change.

In older versions, Kafka’s metadata (list of topics, partition locations, ACLs) was stored in an external ZooKeeper. The Kafka Controller had to read this data from ZK and push it to other brokers. When the number of partitions reached hundreds of thousands, this synchronization process was extremely slow, causing controller failover to take several minutes.

KRaft (Kafka Raft) changes all of this:

Metadata as an Event Log: Kafka treats metadata as another “event stream,” storing it in an internal topic named __cluster_metadata.
The Beauty of Recursion: Kafka uses the technique it excels at (log replication) to manage itself.
Raft Consensus: An embedded Quorum Controller replaces ZK. All metadata changes are synchronized among Controller nodes via the Raft protocol.
The Result: This design eliminates the bottleneck of the Controller loading the full metadata state, allowing a single Kafka cluster to easily support millions of partitions, with controller failover time reduced to the millisecond level.

1.4 Tiered Storage: Breaking the Cost Paradox

For a long time, Kafka’s storage costs were high because you had to keep all data (hot and cold) on expensive local disks.

Tiered Storage (KIP-405) transforms Kafka into a true “streaming database.” It allows old log segments to be offloaded to cheap object storage (like AWS S3), while brokers retain only the most recent “hot data.”

How it works: Brokers act as the compute layer, and S3 acts as the storage layer. When a consumer reads cold data, the broker transparently pulls it from S3.
So What? This means you can store data indefinitely at a very low cost. Kafka is no longer just a data “pipeline”; it has become the “source of truth.”

2. The Confrontation: Trade-offs and Choices

In technology selection, there are no silver bullets, only trade-offs. Let’s look at Kafka’s real-world situation when facing its new challengers.

2.1 The Battle of Approaches: Kafka (JVM) vs. Redpanda (C++/Seastar)

Redpanda is currently Kafka’s fiercest competitor. It’s a rewrite of Kafka in C++ and is fully API-compatible.

Redpanda’s Killer Feature: Thread-per-Core Architecture.
Redpanda uses the Seastar framework, where each CPU core is pinned to a single thread, completely eliminating lock contention and context switching. Combined with the absence of JVM GC pauses (Stop-the-World), its P99 latency is extremely smooth.
Kafka’s Weakness:
Although Java is getting faster (ZGC and Virtual Threads in JDK 21+), under extremely high concurrency, JVM object allocation and GC remain non-negligible overhead. Additionally, Kafka needs to warm up its PageCache, making its cold-start performance inferior to Redpanda’s.

Decision Logic:

Choose Redpanda: If you are chasing extreme low latency (microsecond-level), running resource-constrained clusters on the edge, or have a strong aversion to maintaining the JVM.
Choose Kafka: If you need a mature ecosystem (Kafka Connect, KStreams, ksqlDB) or if your team has deep operational experience with the Java stack. Kafka’s ecosystem moat is something C++ competitors cannot cross in the short term.

2.2 The Architectural Debate: Kafka vs. Pulsar

Pulsar adopts a “storage-compute separation” architecture (stateless brokers, with BookKeeper handling storage).

Pulsar’s Advantage: Extremely fast scaling. You just add more brokers without the heavy data rebalancing required by Kafka.
Kafka’s Counter-Attack: With the introduction of KRaft and Tiered Storage, Kafka is closing these gaps. KRaft solves the metadata bottleneck, and tiered storage addresses the disk scaling problem. Meanwhile, Pulsar’s architectural complexity (ZK + BookKeeper + Broker + Proxy) is an operational nightmare.

The Harsh Reality:
Most enterprises’ data volumes are far from hitting Kafka’s limits, but they can easily fall into Pulsar’s operational quagmire. Simplicity is often the most underrated virtue in engineering. A KRaft-enabled Kafka has finally regained ground in terms of simplicity.

2.3 When Shouldn’t You Use Kafka?

Don’t use it as a work queue: If you need complex routing, dead-letter queue retries, or individual message acknowledgments, RabbitMQ or ActiveMQ are still better choices. Kafka’s Consumer Group model is designed for “streams,” not for “task distribution.”
Don’t use it for ad-hoc queries: Despite ksqlDB, Kafka is still not suitable for complex OLAP analysis.

3. The Vision: Trends and Value

Looking beyond the tool itself, what do we see?

3.1 Trend Projection: From “Pipeline” to “Central Nervous System”

In the past, we treated Kafka as a pipeline connecting Database A and Database B.
Now, with the popularization of Event Sourcing, Kafka is becoming the database itself. All state changes are recorded in Kafka, and other databases (MySQL, Elasticsearch, Redis) are merely different “views” of this Log.

Value Anchor: This architecture (Kappa Architecture) gives systems the ability to “time travel.” You can replay the past year’s data at any time to fix a bug or train a new AI model.

3.2 The Blind Spot: The Overlooked Schema Registry

When building data streams, developers often focus too much on throughput and neglect data governance.
Kafka without a Schema Registry is a giant digital landfill. When an upstream service arbitrarily changes a JSON field, downstream services will crash.
The future competition is not about who can transfer data fastest, but about who can guarantee that the “semantic contract” of the data is not broken.

3.3 Cloud-Native and Serverless

Kafka’s future is inevitably Serverless. Confluent Cloud and MSK Serverless are proving this point. Developers no longer care about the number of brokers or disk sizes, only about their “MB/s” quota. KRaft mode gives Kafka the startup speed and elastic scalability needed to support this Serverless vision.

4. Conclusion: The Shape of Time

As this review comes to an end, the temperature in New York remains at freezing, but the sun has broken through the clouds.

Kafka’s core abstraction—the Log—is actually the closest concept in computer science to “physical time.” It can only move forward, it is irreversible, and it records every moment that occurs.

As developers, we are always trying to build perfect systems, to resist entropy. In Kafka’s world, it tells us: Don’t try to modify the past (mutability); instead, embrace change (the stream). When we design our systems as responses to “events” rather than snapshots of “state,” we see a kind of cyclical order in the recursion of our code.

A final thought:
When you are configuring server.properties, when you permanently delete the zookeeper.connect line, remember this: you are not just upgrading a piece of software. You are witnessing a historic moment in distributed systems, a shift from a “federation” (relying on ZK for coordination) to a “republic” (self-governing through consensus).

May your offsets always be committed, and may your consumer groups always be balanced.

— Lyra Celest @ Turbulence τ

References

Kafka Improvement Proposal 500: Replace ZooKeeper with a Self-Managed Metadata Quorum – The core proposal for KRaft
Kafka Improvement Proposal 405: Kafka Tiered Storage – The tiered storage proposal
Redpanda vs Kafka Benchmark – Reference for competitor performance comparison
Apache Kafka Documentation – Reference for technical principles
GitHub Project: apache/kafka