Monthly Archives: March 2015


NoSQL Databases

In this post we will discuss about NoSQL Databases and their characteristics and differences of NoSQL vs SQL databases. What is NoSQL NoSQL stands for Not Only SQL and provides mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Features of NoSQL databases Non-Relational Distributed Open-Source Horizontally Scalable Relaxing from ACID properties of RDBMS but adheres to BASE […]


Hive Built In Functions 3

Hive Built In Functions Functions in Hive are categorized as below. Mathematical Functions: These functions mainly used to perform mathematical calculations. Date Functions: These functions are used to perform operations on date data types like adding the number of days to the date etc. String Functions: These functions are used to perform operations on strings like finding the length of a string etc. Conditional Functions: These functions are used to […]


Exhausted available authentication methods

In this post we will discuss about common problem in installing cloudera manager 5.3.2 on Ubuntu 14.04 machine and solution for the root cause. Problem: When installing Cloudera Manager 5.3.2 on Ubuntu 14.04.2 Machine getting below error messages after giving root privileges on ssh configuration page. Exhausted available authentication methods or Installation failed on all hosts. Installation failed. Failed to authenticate In this case, ssh to root might be a […]


Hive Authorization Models and Hive Security 6

In this post, we will discuss about Hive Authorization Models and Hive security. Before discussing about Hive Authorization Models lets note the difference between authentication and authorization. Authentication – Verifying the identity of the user, whether the logged in user is real user or not. Authorization – Verifying whether a user has permission to perform a certain action. Hive Authorization Models In Hive, by default Authorization will not be enabled. But […]


QuerySurge Configuring Connections

QuerySurge Configuring Connections: SQL Server When you create a QuerySurge Connection, the Add Connection Wizard will guide you through the process. Different types of QuerySurge connections require different types of information. For an SQL Server Connection, you will need the following information (check with a DBA or other knowledgeable resource in your organization): Database login credentials (ID and Password) Server Name or IP address of the SQL Server (e.g. sqlsvr1.myserver.com, […]


QuerySurge single machine installation – Windows

QuerySurge single machine installation – Windows 1. Download the QuerySurge Installer to the machine you want to install QuerySurge on. 2. Double click on the QuerySurge Installer to start the installation process.   Click “Next” to accept the License Agreement, and “Next” again to set the installation directory. 3. On the ‘Select Components’ section, make sure “Database”, “Server”, “Agent” are checked. Leave “Tutorial + Sample Data” checked to install the […]


Querysurge Tool for Hadoop Testing

Querysurge Tool for Hadoop Testing The QuerySurge CASE tool developed by RTTS is a tool that assists the DW testers in preparing and scheduling query pairs to compare data transformed from the source to the destination, for example; preparing a query pair one that runs on a DS and the other on the ODS to verify the completeness, correctness, and consistency of the structure of data and the data transformed […]


Hive JDBC Client Example 6

In this post, we will discuss about one of common hive clients, JDBC client for both HiveServer1 (Thrift Server) and HiveServer2. Use of HiveServer2 is recommended as HiveServer1 has several concurrency issues and lacks some features available in HiveServer2. JDBC Data Types The following table lists the data types implemented for HiveServer/HiveServer2 JDBC. Hive Type Java Type Specification TINYINT byte signed or unsigned 1-byte integer SMALLINT short signed 2-byte integer INT int […]


HiveServer2 Beeline Introduction 4

In this post we will discuss about HiveServer2 Beeline Introduction. As of hive-0.11.0, Apache Hive started decoupling HiveServer2 from Hive. It is because of overcoming the existing Hive Thrift Server. Below are the Limitations of Hive Thrift Server 1 No Sessions/Concurrency Essentially need 1 server per client Security Client Interface Stability Sessions/Currency Old Thrift API and server implementation didn’t support concurrency. Authentication/Authorization Incomplete implementations of Authentication (verifying the identity of […]


Mapreduce Use Case for N-Gram Statistics 2

In this post we will provide solution to famous N-Grams calculator in Mapreduce Programming. Mapreduce Use case for N-Gram Statistics. N-Gram: In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. An n-gram […]


Mapreduce Use Case to Calculate PageRank 1

PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general case, the PageRank value for any page u can be expressed as: , i.e. the PageRank […]


Sqoop Hive Use Case Example 3

This is another Use case on Sqoop, Hive concepts. Hive Use Case Example. Hive Use Case Example Problem Statement There are about 35,000 crime incidents that happened in the city of San Francisco in the last 3 months. Our task is to store this relational data in an RDBMS. Use Sqoop to import it into Hadoop. Can we answer the following queries on this data:   Relative frequencies of different types of crime incidents […]


Hive Use case example for JSON Data 2

Hive Use case example with US government web sites data Click here to download example data to analyze —> UsaGovData The data present in the above file is JSON Format and its JSON Schema is as shown below,

Note: If you copy the text file into LFS make sure that you do not have any empty lines at the end of the file otherwise you will encounter below exception

[…]


Sharing Windows Folders Across Linux Machines in a Network

Sharing Windows Folders Across Linux Machines in a Network Below are the high level steps to Share files from a windows 7 machine to Linux Ubuntu 14.04 Machine. A common goal of setting up computers on a local network is being able to share files and folders. In order for file sharing to work, the computers must exist on the same network and Workgroup. Enabling Sharing on Windows Folders Sharing of […]