There are number of ways in which Kafka can be used in any architecture. This section discusses some of the popular use cases for Apache Kafka and the well-known companies that have adopted Kafka. The following are the popular Kafka use cases:
This is the process of collecting physical log files from servers and putting them in a central place (a file server or HDFS) for processing. Using Kafka provides clean abstraction of log or event data as a stream of messages, thus taking away any dependency over file details. This also gives lower-latency processing and support for multiple data sources and distributed data consumption.
Kafka can be used for the use case where collected data undergoes processing at multiple stages—an example is raw data consumed from topics and enriched or transformed into new Kafka topics for further consumption. Hence, such processing is also called stream processing.
Kafka can be used to represent external commit logs for any large scale distributed system. Replicated logs over Kafka cluster help failed nodes to recover their states.
Click stream tracking
Another very important use case for Kafka is to capture user click stream data such as page views, searches, and so on as real-time publish subscribe feeds. This data is published to central topics with one topic per activity type as the volume of the data is very high. These topics are available for subscription, by many consumers for a wide range of applications including real-time processing and monitoring.
Message brokers are used for decoupling data processing from data producers. Kafka can replace many popular message brokers as it offers better throughput, built-in partitioning, replication, and fault-tolerance.
Setting Up a Kafka Cluster
we can create multiple types of clusters, such as the following:
- A single node—single broker cluster
- A single node—multiple broker clusters
- Multiple nodes—multiple broker clusters
A Kafka cluster primarily has five main components:
A topic is a category or feed name to which messages are published by the message producers. In Kafka, topics are partitioned and each partition is represented by the ordered immutable sequence of messages. A Kafka cluster maintains the partitioned log for each topic. Each message in the partition is assigned a unique sequential ID called the offset.
A Kafka cluster consists of one or more servers where each one may have one or more server processes running and is called the broker. Topics are created within the context of broker processes.
ZooKeeper serves as the coordination interface between the Kafka broker and consumers. The ZooKeeper overview given on the Hadoop Wiki site is as follows (http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription):
“ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system."
The main differences between ZooKeeper and standard filesystems are that every znode can have data associated with it and znodes are limited to the amount of data that they can have. ZooKeeper was designed to store coordination data: status information, configuration, location information, and so on.
Producers publish data to the topics by choosing the appropriate partition within the topic. For load balancing, the allocation of messages to the topic partition can be done in a round-robin fashion or using a custom defined function.
Consumers are the applications or processes that subscribe to topics and process the feed of published messages.
A single node – a single broker cluster
Starting the Kafka broker
Now start the Kafka broker in the new console window using the following command:
You should now see output as shown in the following screenshot:
The server.properties file defines the following important properties required for the
Creating a Kafka topic
Kafka provides a command line utility to create topics on the Kafka server. Let’s create a topic named kafkatopic with a single partition and only one replica using this utility:
Created topic “kafkatopic”.
You should get output on the Kafka server window as shown in the following screenshot:
The kafka-topics.sh utility will create a topic, override the default number of partitions from two to one, and show a successful creation message. It also takes ZooKeeper server information, as in this case: localhost:2181. To get a list of topics on any Kafka server,
use the following command in a new console window:
Starting a producer to send messages
Kafka provides users with a command line producer client that accepts inputs from the command line and publishes them as a message to the Kafka cluster. By default, each new line entered is considered as a new message. The following command is used to start the
console-based producer in a new console window to send the messages:
While starting the producer’s command line client, the following parameters are required:
The broker-list parameter specifies the brokers to be connected as <node_address:port>—that is, localhost:9092. The kafka topic topic was created in the Creating a Kafka topic section. The topic name is required to send a message to a specific group of consumers who have subscribed to the same topic, kafka topic.
Now type the following messages on the console window:
Type Welcome to Kafka and press Enter Type This is single broker cluster and press Enter
You should see output as shown in the following screenshot:
Starting a consumer to consume messages
Kafka also provides a command line consumer client for message consumption. The following command is used to start a console-based consumer that shows the output at the command line as soon as it subscribes to the topic created in the Kafka broker:
On execution of the previous command, you should get output as shown in the following
A single node – multiple broker clusters
Let us now set up a single node multiple broker-based Kafka cluster as shown in the following diagram:
Starting the Kafka broker
For setting up multiple brokers on a single node, different server property files are required for each broker. Each property file will define unique, different values for the
A similar procedure is followed for all new brokers. While defining the properties, we have changed the port numbers as all additional brokers will still be running on the same machine but, in the production environment, brokers will run on multiple machines. Now
we start each new broker in a separate console window using the following commands:
Creating a Kafka topic using the command line
Using the command line utility for creating topics on the Kafka server, let’s create a topic
named replicated-kafkatopic with two partitions and two replicas:
Starting a producer to send messages
If we use a single producer to get connected to all the brokers, we need to pass the initial list of brokers, and the information of the remaining brokers is identified by querying the broker passed within broker-list, as shown in the following command.
information is based on the topic name:
–broker-list localhost:9092, localhost:9093
Use the following command to start the producer: