Big Data Introduction 3


We have been discussing all technical details on hadoop and its eco system tools in all categories of this site till now. To be successful for any hadoop developer, it is very important to focus on the data part in addition to technical details of Hadoop architecture and its sub-components. In any industry, at the end of day, business usage/ business benefits out of out one tool or product will rule the future of that product or tool. As we have seen in many posts of other categories on this blog, How to setup hadoop clusters, and how to administrate and maintain Hadoop and its tools (Hive, Pig, HBase, Flume, Sqoop), Now it is time to focus on Business Intelligence that can achieved from big data by using hadoop, which is the end user/clients expectation.

To provide knowledge on Business Intelligence and Business Analytics tools on Big data Using various problematic approaches on Big data, we have dedicated this BI Big Data Category for our hadoop learners and beginners. In this category, we will start discussions/topics/posts from scratch in Business Intelligence, Big data, Big Data Analytics concepts. This can be treated as Advanced topics or Next level course for hadoop beginners. First and Foremost we will discuss about Big Data and its problems in this Big Data Introduction post.

Big Data Introduction

Is Hype around Big Data Worth enough

Before going into what is big data and its problems, lets know about the basic hype around this word big data. It is the hot buzzword of IT industry in 2013-14 and it will be the same for at least next 3-4 years, because of the forecasting results on the big data market for 2013-2017 Financial years.

From the below forecast results from wikibon for 2012-2017, we can expect that big data market is growing beyond the expectations and crossing $ 50 Billion dollars. Below forecasts are sourced from http://wikibon.org/wiki.

BigDataMarket

As part of its market-sizing efforts, Wikibon tracked and modeled the 2013 Big Data revenue of more than 70 vendors. And below is the list of top vendors (This list sourced from wikibon.org)

2013 Worldwide Big Data Revenue by Vendor ($US millions)
Vendor Big Data Revenue Total Revenue Big Data Revenue as % of Total Revenue
IBM $1,368 $99,751 1%
HP $869 $114,100 1%
Dell $652 $54,550 1%
SAP $545 $22,900 2%
Teradata $518 $2,665 19%
Oracle $491 $37,552 1%
SAS Institute $480 $3,020 16%
Palantir $418 $418 100%
Accenture $415 $30,606 1%
PWC $312 $32,580 1%
Deloitte $305 $33,050 1%
Pivotal $300 $300 100%
Cisco Systems $295 $50,200 1%
Splunk $283 $283 100%
Microsoft $280 $83,200 0%
Amazon $275 $70,000 1%
Hitachi $260 $89,999 1%
CSC $188 $14,200 1%
CenturyLink $175 $13,757 1%
Google $175 $59,767 1%
Fusion-io $173 $401 43%
NetApp $167 $6,450 3%
Intel $165 $52,708 1%
EMC $165 $23,222 1%
Mu Sigma $160 $160 100%
TCS $157 $11,570 1%
Microstrategy $144 $576 25%
Actian $138 $138 100%
Booz Allen Hamilton $125 $5,850 2%
Opera Solutions $124 $124 100%
Red Hat $109 $1,437 8%
Capgemini $104 $13,639 1%
Informatica $98 $948 10%
MarkLogic $96 $96 100%
General Electric $80 $146,000 1%
VMware $80 $5,207 1%
Syncsort $75 $75 100%
Cloudera $73 $73 100%
SGI $65 $667 10%
MongoDB $62 $62 100%
Hortonworks $55 $55 100%
DDN $54 $315 17%
Guavus $54 $54 100%
Alteryx $48 $48 100%
1010data $45 $45 100%
Rackspace $42 $1,520 3%
TIBCO $36 $1,069 3%
MapR $35 $35 100%
Tableau Software $33 $206 16%
Qlik $30 $467 6%
Attivio $29 $29 100%
Juniper $28 $4,669 1%
DataStax $26 $26 100%
GoodData $26 $78 33%
Attunity $23 $30 77%
Fractal Analytics $19 $27 70%
Pentaho $18 $38 45%
Datameer $17 $17 100%
Couchbase $17 $17 100%
Basho $16 $16 100%
Kognitio $15 $15 100%
Sumo Logic $14 $14 100%
Jaspersoft $14 $34 41%
SiSense $14 $14 100%
Talend $14 $57 25%
Actuate $13 $140 9%
Revolution Analytics $12 $12 100%
Aerospike $12 $12 100%
Neo Technologies $12 $12 100%
Digital Reasoning $11 $11 100%
Tresata $10 $10 100%
Rainstor $10 $10 100%
Think Big Analytics $10 $10 100%
ODM $3,800 n/a n/a
Other $3,030 n/a n/a
Total $18,607 n/a n/a

By looking at these huge figures of hundreds of millions of dollars being invested in Big data market, now we should be able to understand the reason behind the great hype around big data term in last two years. That’s why big data is the hottest area in each IT company and many companies are started investing in big data market and recruiting big data professionals with very huge compensations.

What is Big Data

Definitions

Based on context and the person who asks this question, its answer can given in below forms.

  • Big data is data that exceeds the processing capacity of conventional database systems.
  • ‘Big Data’ is similar to ‘small data’, but bigger in size.
  • Currently some professionals say that if data size is Multiple terabytes or petabytes, then it is big data. But today’s big can be tomorrow’s small data.

Below are the frequently used Big data Measurement terms:

  • 1000 Gigabytes (GB)   =   1 Terabyte (TB)
  • 1000 Terabytes            =    1 Petabyte (PB)
  • 1000 Petabytes           =    1 Exabyte (EB)
  • 1000 Exabytes            =    1 Zettabyte (ZB)
  • 1000 Zettabytes          =    1 Yottabyte (YB)

A Few Facts About Data Explosion – Cause for Big Data

The reason behind the big data is the exponential growth rate of digital data in last few years. Below are the real facts about current big data revolution.

IDC_Hadoop

  • Over 90% of all the digital data in the world was created in past 2 years.
  • As per IDC, predictions, the global digital data size as of 2013 was 4.4 Zettabytes and it is expected double for every two years and will be 10 time by 2020, resulting 44 Zettabytes. (If the digital data is represented by memory in a stack of Ipad Air tablets (0.29″ thick and 128 GB memory), then it would be stretched to 6.6 times of distance from Earth to Moon by 2020)
  • Every minute we send 254 million emails, generate 19 million Facebook likes, send 278k Tweets, and up-load 200,000 photos to Facebook.
  • Walmart handles more than 1 million customer transactions every hour.
  • And last but not least, Industry predicts that, 2 million IT jobs will be created in the US by 2015 to carry out big data projects.
Big Data Sources

Below are the primary source ways of big data.

  • Users
  • Social media like Facebook, LinkedIn, Twitter, etc.
  • Application/Web Servers, Internet Websites
  • Automated data generation (Like logs, reports) by many application devices, systems.
  • Mobile Devices, Microphones and Sensor data collection
  • Readers/Scanners and Softwares.

Why Do We Need to Process Big Data – Benefits of Big Data Management

What matters if the data grows even more faster unless we take any fruitful benefit out of data. Data is very valuable. Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse, It’s about the ability to make better decisions and take meaningful actions at the right time.

The value of big data to an organization falls into two categories:

  • Analytic use (gain operational insights, improve decision-making)
  • Enabling new products (by track and analyze shopping patterns).
  • Customer behavior based sentiment analysis of social media

Big data analytics can reveal insights hidden previously by data, such as peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Large companies have realized the value of data and have been using it to improve their services.

For Example Below are some of the greatest analytic uses of big data,

  • Google displays advertisements relevant to our web surfing
  • Amazon or Flipkart recommend new products or titles that often match well to our tastes and interests.
  • Facebook, LinkedIn, Twitter like social networking sites recommend friend requests and displays products or groups of our most likely interest.
  • Integrating big data analytics into healthcare, the industry can save upto $300 billions a year – that’s the equivalent of reducing the healthcare costs of every man, woman and child by $1,000 a year.
  • Retailers could increase their profit margins by more than 60% through the full exploitation of big data analytics.

As this data is too big, moves too fast, and doesn’t fit the structures of conventional database architectures, Currently only a small fraction of this much data is being analyzed.To gain value from this data, Enterprises must choose an alternative way to process it based on data characteristics.

IDC estimates that in 2013 perhaps 5% was especially valuable, or “target rich.” That percentage should be more than double by 2020 as enterprises take advantage of new Big Data and analytics technologies.

In the next post under this category, we will discuss about Big Data Characteristics and Challenges in processing Big Data.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

3 thoughts on “Big Data Introduction


Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA

.