Kafka Installation


There are number of ways in which Kafka can be used in any architecture. This section discusses some of the popular use cases for Apache Kafka and the well-known companies that have adopted Kafka. The following are the popular Kafka use cases:

Log aggregation

This is the process of collecting physical log files from servers and putting them in a central place (a file server or HDFS) for processing. Using Kafka provides clean abstraction of log or event data as a stream of messages, thus taking away any dependency over file details. This also gives lower-latency processing and support for multiple data sources and distributed data consumption.

Stream processing

Kafka can be used for the use case where collected data undergoes processing at multiple stages—an example is raw data consumed from topics and enriched or transformed into new Kafka topics for further consumption. Hence, such processing is also called stream processing.

Commit logs

Kafka can be used to represent external commit logs for any large scale distributed system. Replicated logs over Kafka cluster help failed nodes to recover their states.

Click stream tracking

Another very important use case for Kafka is to capture user click stream data such as page views, searches, and so on as real-time publish subscribe feeds. This data is published to central topics with one topic per activity type as the volume of the data is very high. These topics are available for subscription, by many consumers for a wide range of applications including real-time processing and monitoring.

Messaging

Message brokers are used for decoupling data processing from data producers. Kafka can replace many popular message brokers as it offers better throughput, built-in partitioning, replication, and fault-tolerance.

Setting Up a Kafka Cluster

we can create multiple types of clusters, such as the following:

  •      A single node—single broker cluster
  •      A single node—multiple broker clusters
  •      Multiple nodes—multiple broker clusters

A Kafka cluster primarily has five main components:

Topic

A topic is a category or feed name to which messages are published by the message producers. In Kafka, topics are partitioned and each partition is represented by the ordered immutable sequence of messages. A Kafka cluster maintains the partitioned log for each topic. Each message in the partition is assigned a unique sequential ID called the offset.

Broker

A Kafka cluster consists of one or more servers where each one may have one or more server processes running and is called the broker. Topics are created within the context of broker processes.

Zookeeper

ZooKeeper serves as the coordination interface between the Kafka broker and consumers. The ZooKeeper overview given on the Hadoop Wiki site is as follows (http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription):
“ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system.”
The main differences between ZooKeeper and standard filesystems are that every znode can have data associated with it and znodes are limited to the amount of data that they can have. ZooKeeper was designed to store coordination data: status information, configuration, location information, and so on.

Producers

Producers publish data to the topics by choosing the appropriate partition within the topic. For load balancing, the allocation of messages to the topic partition can be done in a round-robin fashion or using a custom defined function.

Consumer

Consumers are the applications or processes that subscribe to topics and process the feed of published messages.

A single node – a single broker cluster

Single_node_kafka

Starting the Kafka broker
Now start the Kafka broker in the new console window using the following command:

You should now see output as shown in the following screenshot:

kafka_output

The server.properties file defines the following important properties required for the
Kafka broker:

Creating a Kafka topic

Kafka provides a command line utility to create topics on the Kafka server. Let’s create a topic named kafkatopic with a single partition and only one replica using this utility:

Created topic “kafkatopic”.
You should get output on the Kafka server window as shown in the following screenshot:

The kafka-topics.sh utility will create a topic, override the default number of partitions from two to one, and show a successful creation message. It also takes ZooKeeper server information, as in this case: localhost:2181. To get a list of topics on any Kafka server,
use the following command in a new console window:

 

Starting a producer to send messages

Kafka provides users with a command line producer client that accepts inputs from the command line and publishes them as a message to the Kafka cluster. By default, each new line entered is considered as a new message. The following command is used to start the
console-based producer in a new console window to send the messages:

 

While starting the producer’s command line client, the following parameters are required:

  •      broker-list
  •      topic

The broker-list parameter specifies the brokers to be connected as <node_address:port>—that is, localhost:9092. The kafka topic topic was created in the Creating a Kafka topic section. The topic name is required to send a message to a specific group of consumers who have subscribed to the same topic, kafka topic.

Now type the following messages on the console window:
Type Welcome to Kafka and press Enter Type This is single broker cluster and press Enter
You should see output as shown in the following screenshot:

kafka_output_2

 

Starting a consumer to consume messages

Kafka also provides a command line consumer client for message consumption. The following command is used to start a console-based consumer that shows the output at the command line as soon as it subscribes to the topic created in the Kafka broker:

On execution of the previous command, you should get output as shown in the following
screenshot:

kafka_output_3

 

A single node – multiple broker clusters

Let us now set up a single node multiple broker-based Kafka cluster as shown in the following diagram:

 

 

kafka_arch_1

Starting the Kafka broker
For setting up multiple brokers on a single node, different server property files are required for each broker. Each property file will define unique, different values for the
following properties:
broker.id
port
log.dir

A similar procedure is followed for all new brokers. While defining the properties, we have changed the port numbers as all additional brokers will still be running on the same machine but, in the production environment, brokers will run on multiple machines. Now

we start each new broker in a separate console window using the following commands:

Creating a Kafka topic using the command line
Using the command line utility for creating topics on the Kafka server, let’s create a topic
named replicated-kafkatopic with two partitions and two replicas:

Starting a producer to send messages

If we use a single producer to get connected to all the brokers, we need to pass the initial list of brokers, and the information of the remaining brokers is identified by querying the broker passed within broker-list, as shown in the following command.
This metadata
information is based on the topic name:

–broker-list localhost:9092, localhost:9093
Use the following command to start the producer:

If we have a requirement to run multiple producers connecting to different combinations of brokers, we need to specify the broker list for each producer as we did in the case of multiple brokers.

Starting a consumer to consume messages

The same consumer client, as in the previous example, will be used in this process. Just as before, it shows the output on the command line as soon as it subscribes to the topic
created in the Kafka broker:

Multiple nodes – multiple broker clusters

The following is the diagram from the multiple nodes with multiple broker clusters.
Note : Implementation of this is beyond on single machine.

kafka-arch2

 

The Kafka broker property list
Please refer the below given link for more broker property
http://kafka.apache.org/documentation.html#brokerconfigs

 


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.