While developing Kafka, the main focus was to provide the following:
- An API for producers and consumers to support custom implementation
- Low overheads for network and storage with message persistence on disk
- A high throughput supporting millions of messages for both publishing and subscribing—for example, real-time log aggregation or data feeds
- Distributed and highly scalable architecture to handle low-latency delivery
- Auto-balancing multiple consumers in the case of failure Guaranteed fault-tolerance in the case of server failures
Kafka design fundamentals
Replication in Kafka
Kafka supports the following replication modes
In synchronous replication, a producer first identifies the lead replica from ZooKeeper and publishes the message. As soon as the message is published, it is written to the log of the lead replica and all the followers of the lead start pulling the message; by using a single channel, the order of messages is ensured. Each follower replica sends an acknowledgement to the lead replica once the message is written to its respective logs. Once replications are complete and all expected acknowledgements are received, the lead replica sends an acknowledgement to the producer. On the consumer’s side, all the pulling of messages is done from the lead replica.
The only difference in this mode is that, as soon as a lead replica writes the message to its local log, it sends the acknowledgement to the message client and does not wait for acknowledgements from follower replicas. But, as a downside, this mode does not ensure message delivery in case of a broker failure.