Kafka Design


While developing Kafka, the main focus was to provide the following:

  •   An API for producers and consumers to support custom implementation
  •   Low overheads for network and storage with message persistence on disk
  •   A high throughput supporting millions of messages for both publishing and subscribing—for example, real-time log aggregation or data feeds
  •   Distributed and highly scalable architecture to handle low-latency delivery
  •   Auto-balancing multiple consumers in the case of failure  Guaranteed fault-tolerance in the case of server failures

Kafka design fundamentals

kafka_arch

Replication in Kafka

kafka_adv

Kafka supports the following replication modes

Synchronous replication

In synchronous replication, a producer first identifies the lead replica from ZooKeeper and publishes the message. As soon as the message is published, it is written to the log of the lead replica and all the followers of the lead start pulling the message; by using a single channel, the order of messages is ensured. Each follower replica sends an acknowledgement to the lead replica once the message is written to its respective logs. Once replications are complete and all expected acknowledgements are received, the lead replica sends an acknowledgement to the producer. On the consumer’s side, all the pulling of messages is done from the lead replica.

Asynchronous replication

The only difference in this mode is that, as soon as a lead replica writes the message to its local log, it sends the acknowledgement to the message client and does not wait for acknowledgements from follower replicas. But, as a downside, this mode does not ensure message delivery in case of a broker failure.

 


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA

.