Hadoop Evolution


Now a day the electronic data is getting increased rapidly day by day in terms of tera bytes (1000 GB = 1 TB) or peta bytes (1000 TB = 1 PB) all over world. This data is majorly stored on databases, distributed across the globe. The rate of data increase is getting accelerated. Some of the data might be structural and some might be unstructured data like flat data sets. Some of the examples of huge data generations sources are like social networking sites, blogs, databases and many other kinds of web sites. This data is being used by various organizations/industries for analyzing and foreseeing trends of business in near future based on the analysis of current data statistics.

But extraction & analysis of vast amount of structured data or unstructured data requires lot of computational power which is beyond the scope of legacy databases or processing techniques. This massive explosion of data over the years leads many organizations to replace the data servers with high processing servers which couldn’t solve the problem beyond a certain point of growth in data.

That’s where the Hadoop evolution started based on scale-out approach for storing big data on large clusters of commodity hardware. Since Hadoop is designed to to use commodity hardware through on scale-out approach instead of using the larger servers in scale-up approach, data storage and maintenance became very cheap and cost effective when compared to other storage mechanisms.

For processing this Big data distributed across various clusters of commodity hardware, Map Reduce technique is introduced to parallelize the process of data extractions & processing of structured/unstructured data from many nodes/hard drives in the clusters.

Hadoop was created by Doug Cutting who is the creator Apache Lucene, a text search library. Hadoop was written in Java and has its origins from Apache Nutch, an open source web search engine. As Apache Software Foundation developed Hadoop, it is often called as Apache Hadoop and it is an Open Source frame work and available for free downloads from Apache Hadoop Distributions.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.