In this post we will discuss about the configuration required for Hive connectivity with Hunk, Hadoop flavor of Splunk, the famous visualization tool.
Splunk tool captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, dashboards and visualizations.
Splunk released a product called Hunk: Splunk Analytics for Hadoop, which supports accessing, searching, and reporting on external data sets located in Hadoop from a Splunk interface.
Hunk is a commercial product and we need license to use it but for trial purpose we can get it and use it for 60 days without any license at free of cost.
Hunk Installation On Ubuntu:
Hunk is not supported on Windows.
- Download latest Hunk release from Splunk downloads page and extract the binary tar ball into our installation directory /usr/lib/hunk.
- Add Hunk installation directory to environment variables in .bashrc file.
- Start Splunk for the first time by accepting its license with argument –accept-license. For the second time onward this argument is not needed.
Below is the screen shot of terminal performing all the above actions.
If our splunk is started successfully then we can open its web interface at http://<hostname>:8000 address. By default Splunk web interface & its management runs at ports 8000 and 8089.
Note: We can change the default ports to our custom ports by copying $HUNK_HOME/etc/system/default/web.conf to $HUNK_HOME/etc/system/local/
and we can change the port values to our available ports in web.conf file.
Login to Hunk Web UI:
Open the above URL to login to splunk (hunk) web UI. By default admin , changeme are username and password respectively.
Connectivity to Hive:
In order to connect to hive database or tables, first we need to create appropriate Provider and Index components. This can be done from Hunk Web UI.
- After login to Hunk web UI, open Settings –> Virtual indexes
Create a new Provider by clicking New Provider and enter details as shown below appropriate to your hadoop installation.
Here Name, Java Home, Hadoop Home, Hadoop version, HDFS path, YARN Resource manager and scheduler addresses all are mandatory properties All the below configurations are suitable for testing Hunk on hadoop pseudo mode local machine. If you want to use it for Hadoop distributed cluster, then we need to increase the memory parameters and other properties appropriately.
In the below list we have specified hive metastore URI in vix.hive.metastore.uris as thrift://localhost:9083 and specified HiveSplitGenerator in vix.splunk.search.splitter
Create Virtual Index:
Next we need to create new virtual index for HiveProvider created above. This can be created from Virtual Indexes tab beside Providers tab. In the below configuration we are pointing to default Hive database and user_test table name. It contains user records from four countries AU, CA, US and UK.
Search Virtual Index:
Before starting searching virtual index (hiveindex) we need to start hive metastore service otherwise we will receive an exception.
Below is the schema of user_test table in hive:
Search the virtual index created above by clicking search button against it in below screen.
And now the user_test table data is pulled into Hunk with same schema as in user_test table in hive metastore.
Below is a sample visualization of user_test table data by country. Below is Splunk search command to get this visualization.
We can also save this visualization to a dashboard panel or a report as shown below.
We can also select other visualization format such as Pie charts, Bar Charts and Column charts, Area etc. We can also save these dashboard panels or reports as CSV, XML, JSON or PDF formats.
Thus we have successfully installed Hunk on Ubuntu machine and able configure it to access Hive metastore and pull Hive tables data into Hunk. We have also created visualizations on hive data.