Apache Storm Integration With Apache Kafka

Installing Apache Storm

The prerequisite for storm to work on the machine.
a. Download and installation commands for ZeroMQ 2.1.7:
Run the following commands on terminals

b. Download and installation commands for JZMQ: 


2. Download latest storm from http://storm.apache.org/downloads.html 

Second start Storm Cluster by starting master and worker nodes.

Start master node i.e. nimbus.

To start master i.e. nimbus go to the ‘bin’ directory of the Storm installation and execute following command. [separate command line window]

Start worker node i.e. supervisor.

To start worker i.e. supervisor go to the ‘bin’ directory of the Storm installation and execute following command. [separate command line window]

Apache Kafka + Apache Storm

1. Kafka Producer

2. Storm Spout/Topology

3. Bolts

4. SentenceBolt

Running the Kafka and storm application

1. First create Kafka topic “words_topic”
Start server:

Create topic:

Enter something on Producer console


Run the strom jar.













Kafka Design

While developing Kafka, the main focus was to provide the following:

  •   An API for producers and consumers to support custom implementation
  •   Low overheads for network and storage with message persistence on disk
  •   A high throughput supporting millions of messages for both publishing and subscribing—for example, real-time log aggregation or data feeds
  •   Distributed and highly scalable architecture to handle low-latency delivery
  •   Auto-balancing multiple consumers in the case of failure  Guaranteed fault-tolerance in the case of server failures

Kafka design fundamentals


Replication in Kafka


Kafka supports the following replication modes

Synchronous replication

In synchronous replication, a producer first identifies the lead replica from ZooKeeper and publishes the message. As soon as the message is published, it is written to the log of the lead replica and all the followers of the lead start pulling the message; by using a single channel, the order of messages is ensured. Each follower replica sends an acknowledgement to the lead replica once the message is written to its respective logs. Once replications are complete and all expected acknowledgements are received, the lead replica sends an acknowledgement to the producer. On the consumer’s side, all the pulling of messages is done from the lead replica.

Asynchronous replication

The only difference in this mode is that, as soon as a lead replica writes the message to its local log, it sends the acknowledgement to the message client and does not wait for acknowledgements from follower replicas. But, as a downside, this mode does not ensure message delivery in case of a broker failure.


Kafka Installation

There are number of ways in which Kafka can be used in any architecture. This section discusses some of the popular use cases for Apache Kafka and the well-known companies that have adopted Kafka. The following are the popular Kafka use cases:

Log aggregation

This is the process of collecting physical log files from servers and putting them in a central place (a file server or HDFS) for processing. Using Kafka provides clean abstraction of log or event data as a stream of messages, thus taking away any dependency over file details. This also gives lower-latency processing and support for multiple data sources and distributed data consumption.

Stream processing

Kafka can be used for the use case where collected data undergoes processing at multiple stages—an example is raw data consumed from topics and enriched or transformed into new Kafka topics for further consumption. Hence, such processing is also called stream processing.

Commit logs

Kafka can be used to represent external commit logs for any large scale distributed system. Replicated logs over Kafka cluster help failed nodes to recover their states.

Click stream tracking

Another very important use case for Kafka is to capture user click stream data such as page views, searches, and so on as real-time publish subscribe feeds. This data is published to central topics with one topic per activity type as the volume of the data is very high. These topics are available for subscription, by many consumers for a wide range of applications including real-time processing and monitoring.


Message brokers are used for decoupling data processing from data producers. Kafka can replace many popular message brokers as it offers better throughput, built-in partitioning, replication, and fault-tolerance.

Setting Up a Kafka Cluster

we can create multiple types of clusters, such as the following:

  •      A single node—single broker cluster
  •      A single node—multiple broker clusters
  •      Multiple nodes—multiple broker clusters

A Kafka cluster primarily has five main components:


A topic is a category or feed name to which messages are published by the message producers. In Kafka, topics are partitioned and each partition is represented by the ordered immutable sequence of messages. A Kafka cluster maintains the partitioned log for each topic. Each message in the partition is assigned a unique sequential ID called the offset.


A Kafka cluster consists of one or more servers where each one may have one or more server processes running and is called the broker. Topics are created within the context of broker processes.


ZooKeeper serves as the coordination interface between the Kafka broker and consumers. The ZooKeeper overview given on the Hadoop Wiki site is as follows (http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription):
“ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system.”
The main differences between ZooKeeper and standard filesystems are that every znode can have data associated with it and znodes are limited to the amount of data that they can have. ZooKeeper was designed to store coordination data: status information, configuration, location information, and so on.


Producers publish data to the topics by choosing the appropriate partition within the topic. For load balancing, the allocation of messages to the topic partition can be done in a round-robin fashion or using a custom defined function.


Consumers are the applications or processes that subscribe to topics and process the feed of published messages.

A single node – a single broker cluster


Starting the Kafka broker
Now start the Kafka broker in the new console window using the following command:

You should now see output as shown in the following screenshot:


The server.properties file defines the following important properties required for the
Kafka broker:

Creating a Kafka topic

Kafka provides a command line utility to create topics on the Kafka server. Let’s create a topic named kafkatopic with a single partition and only one replica using this utility:

Created topic “kafkatopic”.
You should get output on the Kafka server window as shown in the following screenshot:

The kafka-topics.sh utility will create a topic, override the default number of partitions from two to one, and show a successful creation message. It also takes ZooKeeper server information, as in this case: localhost:2181. To get a list of topics on any Kafka server,
use the following command in a new console window:


Starting a producer to send messages

Kafka provides users with a command line producer client that accepts inputs from the command line and publishes them as a message to the Kafka cluster. By default, each new line entered is considered as a new message. The following command is used to start the
console-based producer in a new console window to send the messages:


While starting the producer’s command line client, the following parameters are required:

  •      broker-list
  •      topic

The broker-list parameter specifies the brokers to be connected as <node_address:port>—that is, localhost:9092. The kafka topic topic was created in the Creating a Kafka topic section. The topic name is required to send a message to a specific group of consumers who have subscribed to the same topic, kafka topic.

Now type the following messages on the console window:
Type Welcome to Kafka and press Enter Type This is single broker cluster and press Enter
You should see output as shown in the following screenshot:



Starting a consumer to consume messages

Kafka also provides a command line consumer client for message consumption. The following command is used to start a console-based consumer that shows the output at the command line as soon as it subscribes to the topic created in the Kafka broker:

On execution of the previous command, you should get output as shown in the following



A single node – multiple broker clusters

Let us now set up a single node multiple broker-based Kafka cluster as shown in the following diagram:




Starting the Kafka broker
For setting up multiple brokers on a single node, different server property files are required for each broker. Each property file will define unique, different values for the
following properties:

A similar procedure is followed for all new brokers. While defining the properties, we have changed the port numbers as all additional brokers will still be running on the same machine but, in the production environment, brokers will run on multiple machines. Now

we start each new broker in a separate console window using the following commands:

Creating a Kafka topic using the command line
Using the command line utility for creating topics on the Kafka server, let’s create a topic
named replicated-kafkatopic with two partitions and two replicas:

Starting a producer to send messages

If we use a single producer to get connected to all the brokers, we need to pass the initial list of brokers, and the information of the remaining brokers is identified by querying the broker passed within broker-list, as shown in the following command.
This metadata
information is based on the topic name:

–broker-list localhost:9092, localhost:9093
Use the following command to start the producer:

If we have a requirement to run multiple producers connecting to different combinations of brokers, we need to specify the broker list for each producer as we did in the case of multiple brokers.

Starting a consumer to consume messages

The same consumer client, as in the previous example, will be used in this process. Just as before, it shows the output on the command line as soon as it subscribes to the topic
created in the Kafka broker:

Multiple nodes – multiple broker clusters

The following is the diagram from the multiple nodes with multiple broker clusters.
Note : Implementation of this is beyond on single machine.



The Kafka broker property list
Please refer the below given link for more broker property


Kafka Installation and Test Broker Setup

In this post we will discuss about Kafka Installation and Test Broker Setup in Ubuntu machine in a stand-alone zookeeper mode.

Apache Kafka:

Open Source Message Broker from Apache Software Foundation.  Initially developed at LinkedIn and later contributed to Open-source community. It is written in Scala.

Kafka Provides a unified, high-throughput, low-latency platform for handling real-time data feeds.

Types of data being transported through Kafka?

  • Metrics: operational telemetry data
  • Tracking: everything a LinkedIn.com user does
  • Queuing: between LinkedIn apps, e.g. for sending emails


Why is Kafka so fast?

Fast writes:

  • While Kafka persists all data to disk, essentially all writes go to thepage cache of OS, i.e. RAM.
  • Cf. hardware specs and OS tuning (we cover this later)

Fast reads:

  • Very efficient to transfer data from page cache to a network socket
  • Linux: sendfile() system call

Example (Operations): On a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks as they will be serving data entirely from cache.

Core ComponentsKafka Architecture


  • Topics, partitions, replicas, offsets
  • Producers, brokers, consumers
  • Producers write data to brokers.
  • Consumers read data from brokers.
  • All this is distributed.
  • Data is stored in topics.
  • Topics are split into partitions, which are replicated.

Step 1: Download

First download latest statble verion of kafka from Apache Download Mirrors . In this post we are installing kafka_2.11- and untar it into /usr/lib/kafka folder.

Step 2:

Add below entries into .bashrc file

### kafka Home directory ####

Step 3 : Start ZooKeeper

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don’t already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

Step 4: Start the Kafka Server

Open another terminal and start Kafka Server

Step 5: Create Test Topic

Open another terminal. Create a topic named “test” with a single partition and only one replica:

Use the below list command to view the topic

Run the producer and then type a few messages into the console to send to the consumer.

Step 6 : Start Consumer

Open another terminal and see producers messages being Consumed

If you receive messages like above, You have successfully setup Kafka Console Producer and Consumer.