• Home
  • Big Data
  • Hadoop
    • Hadoop
    • HDFS Design Concepts
  • Map Reduce
  • Hive
    • Hive
    • Hive Commands & Functions
    • Hive Installation
    • Hive Integration With BI Tools
    • Hive Use Cases
  • Pig
  • HBase
  • Flume
  • Interview Questions
    • Interview Questions
    • Hadoop Interview Questions
    • HBase Interview Questions
    • Hive Interview Questions
    • Java Interview Questions
    • MapReduce Interview Questions
    • Pig Interview Questions
    • SQL Interview Questions
    • Sqoop Interview Questions
  • Miscellaneous
    • Miscellaneous
    • Avro
    • Eclipse
    • Error Repository
    • Hadoop Testing
    • Java
    • Scala
    • MySQL
    • R
    • Snappy
  • OtherTools
    • OtherTools
    • Sqoop
    • Oozie
    • Tez
    • Kafka
    • Unix
    • Tableau
    • Azkaban
    • SSH
    • Solr
  • Hadoop Discussion Forum
wand

Hadoop

Opportunities don’t just happen You need to create it.

Read more »

plus

Scala

Open the World with Scala Object Oriented language.

Read more »

html5

Spark

Full Insight of the Spark and Crystal details of its usage.

Read more »

What is Hadoop ?

Hadoop is an open source frame work used for storing & processing large-scale data (huge data sets generally in GBs or TBs or PBs of size) which can be either structured or unstructured format. This vast amount of data is called Big data which usually can’t be processed/handled by legacy data storage mechanisms.

Hadoop is written in java by Apache Software Foundation. Hadoop can easily handle multi tera bytes of data reliably and in fault-tolerant manner.

Hadoop parallelizes the processing of the data on 1000s of computers or nodes in clusters. This frame work uses normal commodity hardware for storing distributed data across various nodes on the cluster.

This Site provides detailed walk through of the Hadoop framework along with all the sub components under Hadoop Frame work.

Hadoop Eco System Core Components:
  • Hadoop Common : Common utilities supporting hadoop components
  • HDFS : Hadoop Distributed File System
  • YARN : Frame work for job scheduling and resource management.
  • Map Reduce : Parallel Processing Mechanism for distributed data
The sub components are:
  • Hbase : Column Oriented Database for Processing Billions of Records
  • Hive : Data Warehouse for Distributed File System HDFS
  • Pig : High Level Programming Language for Distributed computations
  • Sqoop : Data migration tool from/to RDBMSs to/from HDFS, HBase, Hive
  • Flume : Data Collection mechanism for Log & Event data
  • Oozie : Work Flow Management Service.
  • ZooKeeper : Configuration Management & Coordination Service.
  • Avro : Serialization Framework
  • Tez : Successor for Mapreduce Framework
  • Hcatalog : Common Interface for Hive, Pig, HBase.
  • Azkaban: Workflow management tool. Alternative to Oozie.

Refer Corresponding Categories on this blog for further details on each sub component of Hadoop Eco System.

 

Why Hadoop ? :

Now a day the electronic data is getting increased rapidly day by day in terms of tera bytes (1000 GB = 1 TB) or peta bytes (1000 TB = 1 PB) all over world. This data is majorly stored on databases, distributed across the globe. The rate of data increase is getting accelerated. Some of the data might be structural and some might be unstructured data like flat data sets. Some of the examples of huge data generations sources are like social networking sites, blogs, databases and many other kinds of web sites. This data is being used by various organizations/industries for analyzing and foreseeing trends of business in near future based on the analysis of current data statistics.

But extraction & analysis of vast amount of structured data or unstructured data requires lot of computational power which is beyond the scope of legacy databases or processing techniques. This massive explosion of data over the years leads many organizations to replace the data servers with high processing servers which couldn’t solve the problem beyond a certain point of growth in data.

That’s where the Hadoop evolution started based on scale-out approach for storing big data on large clusters of commodity hardware. Since Hadoop is designed to use commodity hardware through scale-out approach instead of using the larger servers in scale-up approach, data storage and maintenance became very cheap and cost effective when compared to other storage mechanisms.

For processing this Big data, distributed across various clusters of commodity hardware, Map Reduce technique is introduced to parallelize the process of data extractions & processing of structured/unstructured data from many nodes/hard drives in the clusters.

Hadoop was created by Doug Cutting, who is the creator of Apache Lucene, a text search library. Hadoop was written in Java and has its origins from Apache Nutch, an open source web search engine. As Apache Software Foundation developed Hadoop, it is often called as Apache Hadoop and it is a Open Source frame work and available for free downloads from Apache Hadoop Distributions.

Now, start reading from Hadoop blog to start learning hadoop from scratch.

You can watch below video for basic bigdata introduction.

 

 

 

 

 

 

Share this:

  • Tweet
  • .
Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

default image

I really like your explanations.

Sylvain Nzeyang hadoop developer December/2016 November 23, 2016

default image

Siva , your teaching's are great and indeed very useful for the people who are interested in hadoop. Your sessions are more close to real-time and helps every one to get clear in interviews. Thanks for your support.

kalpana Bhemireddy Hadoop developer Spark jul/2016 September 26, 2016

default image

Course content is well structured. I like Siva's explanation of topics using slide decks & virtual machine (CDH cluster) at the same time,this will help audience to learn not only theory behind a topic but also practical aspect of it. Overall, I would recommend this course.

Kumar Big Data Developer Hadoop&Aug/2016 September 26, 2016

default image

Course content is well structured. I like Siva's explanation of topics using slide decks & virtual machine (CDH cluster) at the same time,this will help audience to learn not only theory behind a topic but also practical aspect of it. Overall, I would recommend this course.

Kumar Big Data Developer Hadoop&Aug/2016 September 26, 2016

default image

One of the best trainer is Siva Kumar, his way of communication and explantion superb,he teaches excellent as theratical and practically also,I suggest he is the Excellent trainer for Spark and Scala.

purushotham Sr.Software Engineer Spark August/2016 September 15, 2016

default image

Here is 2 cents
1. Got More exercises and provide feedback. (also a final project)
2. Support (may be you need a part time person)

Lexman Architect Hadoop/Spark September 13, 2016 

default image

Siva will give excellent training for Hadoop,spark. He has 4 years real time experience. His teaching is will go close to real time.

sriniwaas Hadoop consultant June 2016 September 13, 2016 

default image

Excellent Training, classes were so interactive,I never got bored,Siva has Immense Knowledge in all the Hadoop tools.He explained everything so near to real-time . You can never find Hadoop course so pure in the market.

Akhila Hadoop Developer Hadoop/spark September 13, 2016 

default image

Siva did an excellent job in explaining each topic patiently, gave many real-time examples
And he was really patient enough in answering each of our doubts,responds well in time when needed.
He has Immense knowledge in all the Hadoop/spark eco-system tools. Never felt bored in his classes he makes the classes so interactive
He has an excellent blog..got addicted to it.

Akhila Hadoop Developer Hadoop/spark September 13, 2016 

default image

Spark and Hadoop course content is really apt for the beginners. Concept articulation gives clarity on the subject and recording are quite handy for reference. my request is to start an advance level course where it takes very close to real time feel

Ramesh Pallothu Senior Architect Spark Course September 13, 2016 

.

Recent Comments

  • ›01.2019 – gvr4wd on HDFS Web UI
  • ›online hadoop training on 100 Interview Questions on Hadoop
  • ›online hadoop training on 100 Interview Questions on Hadoop
  • ›Koushik Mallik on Apache Phoenix – An SQL Layer on HBase
  • ›Koushik Mallik on Apache Phoenix – An SQL Layer on HBase

Hadooptutorial.info

Hadooptutorial.info

Let’s get Social :

  • Facebook
  • Twitter
  • LinkedIn

· © 2019 Hadoop Online Tutorials · Designed by Press Customizr ·

Back to top

 

More Info

Core Hadoop

  • Big Data
    • Introduction
    • Characteristics And Challenges
  • Hadoop
    • What is Hadoop?
    • Hadoop History
    • Popular Distributions
    • Hadoop Installation
      • Java Installation
      • PasswordLess SSH Setup
      • Single Node Cluster
      • Multi Node Cluster
      • Fake Multi Node Cluster
      • Installation on EC2 Cloud
      • Logging Configuration
    • HDFS Design Concepts
      • Basic Design Concepts
      • Secondary NameNode
      • Check Point Node
      • BackUp NameNode
      • Safe Mode In Hadoop
      • Rack Awareness
      • HDFS Rebalance
      • Hadoop Archive Files
    • HDFS Commands
      • File System Commands
      • Administration Commands
      • Hadoop distcp
    • Web UI
      • HDFS Web UI
      • YARN Web UI
      • Hadoop Logs in Web UI
    • FsImage Viewer Tool
    • EditLog Viewer Tool
    • HDFS Java API
      • Basic Java API
      • HDFS File Read & Write
      • Querying File System
    • Special File Formats
      • Sequence Files Introduction
      • Reading & Writing Sequence Files
      • Merging Small Files Into SequenceFile
    • Hadoop Best Practices
    • Hadoop Testing Tools
    • HDFS Nodes Storage Calculation Formula
  • Map Reduce
    • Sample MapReduce Program
    • Mapreduce Data Types
      • Custom Writable Data Type
    • Basic Components in MR program
    • Combiner in Mapreduce
    • MapReduce Job Flow
    • Predefined Mappers & Reducers
    • MR Input Formats
      • Multiple Outputs Use case
    • MR Output Formats
    • MRUnit
    • Hadoop Streaming
    • Hadoop Performance Tuning
    • MR Use Cases
      • Calculate Missing Count
      • N-Gram Statistics
      • Calculate PageRank

EcoSystem Tools

  • Hive
    • Hive Overview
    • Hive Architecture
    • Java vs Hive
    • Hive Installation
      • Hive Installation
      • Metastore Configuration
    • Hive Commands & Functions
      • Hive CLI Commands
      • Hive Shell Commands
      • Hive Data Types
      • Hive Database Commands
      • Hive Table Commands
      • Hive Built In Functions
      • Hive String Functions
      • Hive Date Functions
      • Hive Aggregate Functions
      • Hive Functions Examples
    • Partitioning in Hive
    • Bucketing In Hive
    • Hive Udfs
    • Hive JDBC Client Example
    • HiveServer2 Beeline Intro
    • Hive Authorization Models
    • Hive Integration With Tools
      • Hive on HBase
      • Hive on Tez
      • Tableau on Hive
      • Hunk on Hive
      • QlikView on Hive
    • Compression in Hive
    • Hive Performance Tuning
    • Hive Use Cases
      • Log Analysis in Hadoop
      • Processing Logs in Hive
      • Hive with JSON Data
      • Sqoop Hive Use Case
  • Pig
    • Pig Overview
    • Pig Installation
    • Load Functions In Pig
    • Built-in Load Store Functions
    • Pig Functions Cheat Sheet
    • Pig Functions Examples
    • HCatalog and Pig Integration
    • Processing Logs in Pig
  • HBase
    • HBase Overview
    • HBase Pseudo Installation
    • Daemons in Pseudo Mode
    • Hbase Full Installation
    • HBase Shell Commands
    • HBase Functions Cheat Sheet
    • Zookeeper Commands
    • HBase Integration with Hive
    • Phoenix on HBase
    • Flume into HBase
    • Flume HTTP Client into HBase
  • Impala
    • Impala Introduction
    • Impala Commands Cheat Sheet

Popular Pages

  • Hive Archives - Hadoop Online Tutorials
  • Hadoop Developer Course Contents - Hadoop Online Tutorials
  • Spark Course Contents - Hadoop Online Tutorials
  • Hadoop Archives - Hadoop Online Tutorials
  • Map Reduce Archives - Hadoop Online Tutorials
 

Social

  • Facebook
  • Twitter
  • Google+

Links

  • Home
  • Big Data
  • Hadoop
  • Map Reduce
  • Hive
  • Pig
  • HBase
  • Flume
  • Interview Questions
  • Miscellaneous

Search

  • Privacy Policy