In this post, we will discuss about setup of a Flume Agent using Avro Client, Avro Source, JDBC Channel, and File Roll sink.
First we will create Agent3 in flume.conf file under FLUME_HOME/conf directory.
Flume Agent – Avro Source, JDBC Channel and File Roll Sink:
- Add the below configuration properties in flume.conf file to create Agent3.
- Make sure /usr/lib/flume/agent/files/ directory is created and Flume use has write permissions to this location.
- Create a sample file for giving it as input to Avro Client. Lets create AvroClientInput.txt in the home directory itself.
- Now start the agent with below command.
Below is the screen shot of starting agent from terminal.
- Now open the host connection through Avro client to send the test file to specified port with the below command in another terminal.
An Avro client included in the Flume distribution can send a given file to Flume Avro source with the above command. The above command will send the contents of /home/siva/AvroClientInput.txt file to the Flume source listening on localhost:11111 port.
Below is the screen shot of another terminal. We have created test input file first and then avro-client is used to send the file at localhost:11111 port.
- Validate the output in the destination directory /usr/lib/flume/agent/files/ .
So, we have successfully configured the agent with Avro Source and JDBC channel into File Roll sink.
Below are the indepth details about Avro source and JDBC Channels.
Avro is a data serialization framework and it manages the packaging and transport of data from one point to another point across the network. An Avro Source Listens on Avro port and receives events from external Avro client streams. When connected with the built-in Avro Sink on another Flume agent, it can create chain-agents topology.
By default, Flume distribution supports both an Avro source and a standalone Avro client. Avro Client reads a file and sends it to an Avro source anywhere on the network and need not to be on same local machine as we used in our example. But if it is outside the local machine Avro client requires the explicit hostname and port of the Avro source to which it should send the file.
Finally, Avro collects the events from file sent by avro client and passes them to file sink.
Below are the properties related to avro source. Required properties are in bold.
|type||–||The component type name, needs to be avro|
|bind||–||hostname or IP address to listen on|
|port||–||Port # to bind to|
|threads||–||Maximum number of worker threads to spawn|
|compression-type||none||This can be “none" or “deflate". The compression-type must match the compression-type of matching AvroSource|
|ssl||FALSE||Set this to true to enable SSL encryption|
Similar to File channel, JDBC channel also provide persistent storage of events to prevent event loss in case of agent failure. By default JDBC channel uses an embedded Derby database to store events. This is a durable channel that’s ideal for flows where recoverability is important.
In below properties table, required properties are in bold.
|type||–||The component type name, needs to be jdbc|
|db.type||DERBY||Database vendor, needs to be DERBY.|
|driver.class||org.apache.derby.jdbc.EmbeddedDriver||Class for vendor’s JDBC driver|
|driver.url||(constructed from other properties)||JDBC connection URL|
|db.username||“sa"||User id for db connection|
|db.password||–||password for db connection|
|connection.properties.file||–||JDBC Connection property file path|
|create.schema||TRUE||If true, then creates db schema if not there|
|create.index||TRUE||Create indexes to speed up lookups|
|transaction.isolation||“READ_COMMITTED"||Isolation level for db session READ_UNCOMMITTED, READ_COMMITTED, SERIALIZABLE, REPEATABLE_READ|
|maximum.connections||Max connections allowed to db|
|maximum.capacity||Max number of events in the channel|
|sysprop.*||DB Vendor specific properties|
|sysprop.user.home||Home path to store embedded Derby database|