Log Analysis in Hadoop 4

In this post we will discuss about various log file types and Log Analysis in Hadoop. Log Files: Logs are computer-generated files that capture network and server operations data.They are useful  during various stages of software development, mainly for debugging and profiling purposes and also  for managing network operations. Need For Log Files: Log files are commonly used at customer’s installations for the purpose of permanent software monitoring and/or fine-tuning. […]

Multi Agent Setup in Flume 2

In this post we will discuss about setting up of Multiple Agents in a flume flow and pass events from one machine to another machine. We will pass events from one machine to another machine via Avro RPC protocol. Multi Agent Setup in Flume: In Multi Agent or Multi-hop setup events travel through multiple agents before reaching the final destination. In Multi agent flows, the sink of the previous agent (ex: Machine1) […]

Flume Data Collection into HDFS with Avro Serialization 4

In this post, we will provide proof of concept for Flume Data collection into HDFS with Avro Serialization by using HDFS sink, Avro Serializer on Sequence Files with Snappy Compression. Also we will use the formatting escape sequences to store the events on HDFS Path. In this post, we will create a flume agent with Spooling directory source with JDBC Channel and HDFS Sink. Now lets create our agent Agent7 in flume.conf […]

Data Collection from HTTP Client into HBase 7

This post provides a proof of concept of data collection from HTTP client into HBase. In this post, we will setup a flume agent with HTTP Source, JDBC Channel and AsyncHBase Sink. Initially we concentrate on POC of HTTP client data collection into HBase and at the end of this post we will go deep into details of each component used for setup of this agent. Now lets create our […]

Flume Data Collection into HBase 5

We will discuss about collection of data into HBase directly through flume agent. In our previous posts under flume category, we have covered setup of flume agents for file roll, logger and HDFS sink types. In this, we are going to explore the details of HBase sink and its setup with live example. As we have already covered File channel , Memory channel and JDBC Channel, so we will try to make […]

Flume Data Collection into HDFS 2

In this post, we will discuss about setup of an agent for Flume data collection into HDFS . In this post, we will setup an agent with Sequence Generator Source, HDFS Sink and Memory channel and start that agent and verify its functionality. Flume data collection into HDFS Flume Agent – Sequence Generator Source, HDFS Sink and Memory channel: Add the below configuration properties in flume.conf file to create Agent4 with Sequence source, memory […]

Flume Avro Client – Collecting a Remote File into Local File 2

In this post, we will discuss about setup of a Flume Agent using Avro Client, Avro Source, JDBC Channel, and File Roll sink. First we will create Agent3 in flume.conf file under FLUME_HOME/conf directory. Flume Agent – Avro Source, JDBC Channel and File Roll Sink: Add the below configuration properties in flume.conf file to create Agent3.

 Make sure /usr/lib/flume/agent/files/ directory is created and Flume use has write permissions to this location. […]

Flume Agent – Collect Data From Command to a Flat File 1

In this post, we will discuss about flume agent configuration and setup for collecting data from an output of a command line tool into a flat file. We will use Exec Source type, File Channel and File Roll sink type in configuration of our agent. Lets name our agent as Agent2. We will discuss more about each component and their additional properties at the bottom of this post but we […]

Flume Agent Configuration 2

As discussed in previous post, we will discuss in detail about the properties in flume agent configuration properties. For ease of understanding, we will consider the same flume.conf file created in our previous post. Flume agent configuration file flume.conf resembles a Java property file format with hierarchical property settings. Here the filename flume.conf is not fixed, and we can provide any name to it and need to use the same name […]

Flume Agent Setup – Netcat Source, Console Sink 3

In this post, we will discuss about setting up of simple flume agent using Netcat as source and Console as sink. In this example of single-node Flume deployment, we create a Netcat source which listens on a port (localhost:44444) for network connections and logger sink type to log network traffic to console. For sending network traffic, we can either use curl utility or traditional tool telnet. We prefer using curl in this […]

Apache Flume Installation 10

In this post, we briefly discuss about Apache Flume Installation and Configuration on Ubuntu machine. The current version of Apache Flume is called as Flume NG (Next Generation) and it’s old version is renamed as Flume OG (Old Generation). In this post, we will discuss about Flume NG only. Prerequisite:  JDK 1.6 or later versions of Java installed on our Ubuntu machine. Memory – Sufficient memory for configurations used by sources, channels or […]

Flume Architecture 7

This post describes basics of Apache Flume overview and illustrates its architecture. What is Flume ? : Flume is a highly reliable, distributed and configurable streaming data collection tool. Flume can transport log files across a large number of hosts into HDFS. Need for Flume: These days, most of the new data is contained in high-throughput streams like Application logs,  social media updates, Web Server logs, Network logs and website click streams create fast-moving streams […]

Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA