Hadoop Performance Tuning There are many ways to improve the performance of Hadoop jobs. In this post, we will provide a few MapReduce properties that can be used at various mapreduce phases to improve the performance tuning. There is no one-size-fits-all technique for tuning Hadoop jobs, because of the architecture of Hadoop, achieving balance among resources is often more effective than addressing a single problem. Depending on the type of job […]
In this post we will provide solution to famous N-Grams calculator in Mapreduce Programming. Mapreduce Use case for N-Gram Statistics. N-Gram: In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. An n-gram […]
PageRank is a way of measuring the importance of website pages. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. In the general case, the PageRank value for any page u can be expressed as: , i.e. the PageRank […]
In this post we will discuss about basic MRUnit example for Wordcount algorithm. Below are the tools used in this example Eclipse 3.8, mrunit-1.0.0-hadoop2.jar Procedure: 1. Download mrunit jar from this link and add this to the java project build path (File –> properties –> java build path –> add external jars) in eclipse. 2. As we are testing wordcount algorithm…Below is the code for the same.