Monthly Archives: November 2014

QlikView Integration with Hadoop 2

In this post we will discuss about basic introduction to Qlikview BI tool and Qlikview Integration with hadoop hive. In this post we will use Cloudera Hive and its jdbc drivers/connectors to connect with Qlikview and we will see sample table retrieval from cloudera hadoop hive database. QlikView Overview What is QlikView? QlikView is one of the famous business intelligence and visualization software/tool build by Qlik (previously known as QlikTech) company for […]

Run Remote Commands over SSH

In this post, we will discuss about the details on communication between two nodes in a network via SSH and executing/running remote commands over SSH on a remote machine. These two nodes in the cluster can be treated as server/client machines for easy understanding. To allow secure communications between Server and client machines, on the server side, we will need a public key and an authorization file, and on the […]

Brief Notes on Unix Shell Scripting Concepts 1

This post provides a very brief notes on Unix Shell Scripting. As this topic is very well described in many text books,we are not going much deep into the details of each point. This post is for quick review/revision/reference of common Unix commands or Unix Shell Scripting. Unix Shell Scripting Kernel The kernel is the heart of the UNIX system. It provides utilities with a means of accessing a machine’s […]

Processing Logs in Pig 3

In the previous post we have discussed about the basic introduction on log files and the architecture of log analysis in hadoop. In this post, we will enter into much deeper details on processing logs in pig. As discussed in the previous post, there will be three types of log files majorly. Web Server Access Logs Web Server Error Logs Application Server Logs All these log files will be in […]

Built-in Load Store Functions in Pig 2

In this post, we will discuss about the following built in load store functions in pig with examples. PigStorage TextLoader BinStorage JsonLoader, JsonStorage AvroStorage HBaseStorage MongoStorage PigStorage: PigStorage() is the default load/store function in pig. PigStorage expects data to be formatted using field delimiters and the default delimiter is ‘\t’. PigStorage() itself can be used for both Load and Store functions. It Loads/stores data as structured text files. All Pig simple and complex […]

Load Functions In Pig 1

In this post, we will discuss about basic details of load functions in pig with some sample examples and we will also discuss about custom load functions in pig by writing UDFs. To work with data in Pig, the first thing we need to do is load data from a source, and Pig has a built-in LOAD function that loads data from the file system. Load Operator: Syntax:

Input […]

Log Analysis in Hadoop 6

In this post we will discuss about various log file types and Log Analysis in Hadoop. Log Files: Logs are computer-generated files that capture network and server operations data.They are useful  during various stages of software development, mainly for debugging and profiling purposes and also  for managing network operations. Need For Log Files: Log files are commonly used at customer’s installations for the purpose of permanent software monitoring and/or fine-tuning. […]

Apache Solr Installation on Ubuntu 3

In this post we will discuss about the basic introduction of Apache Solr and we will also describe the procedure for Apache Solr Installation on Ubuntu Machine. Apache Solr Overview: What is Apache Solr? Apache Solr is another top level project from Apache Software Foundation, it is an open source enterprise search platform built on Apache Lucene. As Apache Solr is based on open source search engine Apache Lucene, some […]

Multi Agent Setup in Flume 2

In this post we will discuss about setting up of Multiple Agents in a flume flow and pass events from one machine to another machine. We will pass events from one machine to another machine via Avro RPC protocol. Multi Agent Setup in Flume: In Multi Agent or Multi-hop setup events travel through multiple agents before reaching the final destination. In Multi agent flows, the sink of the previous agent (ex: Machine1) […]

Tableau Integration with Hadoop 5

In this post we are going to discuss about basic details of Tableau software and Tableau Integration with hadoop. Tableau Overview What is Tableau? Tableau is a visualization tool based on breakthrough technology  that provides drag & drop features to analyze data on large amounts of data very easily and quickly. The Dashboard of Tableau is very interactive and gives dynamic results. Tableau supports strong interactive capabilities and provides rich set of graphic […]

Azkaban Hadoop – A Workflow Scheduler For Hadoop 8

In this post, we will discuss about basic details of Azkaban hadoop and its setup in Ubuntu machine. What is Azkaban Hadoop? Azkaban Hadoop is an open-source workflow engine for hadoop eco system. It is a batch job scheduler allowing developers to control job execution inside Java and especially Hadoop projects. Azkaban is developed at LinkedIn and it is written in Java, JavaScript and Clojure. Its main purpose is to solve the problem […]

Hive Interview Questions and Answers for experienced Part – 4 5

Below are some the of important hive Interview Questions and Answers for experienced hadoop developers. Hive Interview Questions and Answers for experienced 1. What is the Hive configuration precedence order? There is a precedence hierarchy to setting properties. In the following list, lower numbers take precedence over higher numbers: The Hive SET command The command line -hiveconf option hive-site.xml hive-default.xml hadoop-site.xml (or, equivalently, core-site.xml, hdfs-site.xml, and mapred-site.xml) hadoop-default.xml (or, equivalently, […]

Sqoop Importing MySQL Data into HDFS 1

In this post, we will create a new database in MySQL server and create table and insert data into it and we will do importing MySQL data into HDFS via Sqoop tool. We assume that MySQL is installed and Sqoop & Hadoop are installed on local machine to test this example. We need to make sure that MySQL JDBC drivers connector jar file is downloaded and copied into SQOOP_HOME/lib directory. Prepare MySQL […]

SQOOP Installation on Ubuntu 4

In this post we will discuss about the basic introduction to Sqoop and Sqoop Installation on Ubuntu machine and we will discuss about example run of Sqoop from MySQL database in the next post. SQOOP Introduction: What is Sqoop? Sqoop is open source tool that enables users to transfer bulk data between Hadoop eco system and relational databases. Here Hadoop eco system includes, HDFS, Hive, HBase, HCatalog, etc. And Relational databases […]