In the previous post we have discussed about brief introduction to Big Data, and now we will discuss about Big Data Challenges along with its characteristics. Before going into big data challenges, we will briefly go through the characteristics of Big data.
Big Data Characteristics
Often Big data characteristics are described with the help of Five Vs (Big Data Volume Velocity Variety and Veracity). They are as follows.
Volume – How Big is data
- The Volume of Big data is growing at exponential rate and expected to reach around 44 ZB (1021) by 2020.
Velocity – How Fast is data processed
- speed at which new data is generated and the speed at which data moves around.
- The latency of processing big data and decision making is very important and that’s where it makes huge difference with conventional RDBMS.
Variety – The various types of data
ConventionalRDBMS supports only Structured Data but Big data supports three kinds of data.
- Structured – Highly structured and Usually stored in an RDBMS. Approximately 20% of all world’s data is structured. Examples – Numbers, Dates, and groups or tables of words and numbers (for example, a customer table with name, age, address, and so on columns).
- Semi-Structured – Semi-structured data does not necessarily conform to a fixed schema (structure) but may be self-describing and may have simple key/value pairs. Cannot be stored in rows and tables in a typical database. For example, JSON, XML, Logs, Tweets.
- Unstructured – Lacks structure or parts of it lack structure. 80% of the world’s data is unstructured. Example Formats – Free-Form Text, Emails, Images, Videos, Voice Recordings, Social media conversations, Sensor data, etc.
- ConventionalRDBMS supports only Structured Data but Big data supports three kinds of data.
- Veracity – How accuracy/meaningful/trustworthy are the results to the given problem space.
- Value – Useful Business value extracted out of big data.
Big Data Analytic Companies include all these Five characteristic Vs into consideration before they decide to build programs for data analysis.
Big Data Management
Currently many large enterprises (Google, LinkedIn, Facebook, IBM, Oracle) are already entered into Big data management life cycle, which will include collection of data to decision making phases as shown below.
- Data Collection
- Data Storage & Organization
- Decision Making
Below is the high level architecture of Big Data Software companies big data analysis model. Data is collected from various sources like Web servers, social media, etc and stored in Hadoop Cluster and supplied through Analytics Platform and Big Data Warehouse and made available to Business Intelligence Users.
Big data analytics software companies like IBM, Facebook, LinkedIn, Google, Twitter are already evolving into technology that allows analyzing the data while it is being generated (sometimes referred to as real-time in-memory analytics), without ever putting it into databases. So, if any enterprise doesn’t recognize the importance of Big data analytics, it will definitely fall behind the future market trends.
Big Data Challenges
Below are the current challenges of Big Data management and decision making faced by big data analytic companies.
- High Volume of Data. Scalablity.
- High Velocity of data generation
- Complex and Variety data types especially Semi-structured and Unstructured
- Disk Storage and Transmission capacities. By 2013, a single disk can store upto 4 TB data and its maximum data transfer speed is upto 128 mb/sec only. With this storage and transfer limitations, one can read entire disk in roughly 5 hours.
- Data management issues of access, utilization, updating, governance, and reference.
- Privacy and Security is another major challenge in Big data. For Example, Information regarding the people is collected and used in order to add value to the business of the organization. This is done by creating insights in their lives which they are unaware of.
- Data Sharing between big data companies, about their clients and operations threatens the culture of secrecy and competitiveness.
Big Data Analytic Challenges
- Ability to determine what data to collect and how to analyse it to find patterns and correlations as the data is very huge.
- Ability to understand big data business intelligence objectives & information needs and come with Appropriate computer algorithms.
- Need experienced mathematics and statistics knowledge to build the relations between data.
- Ability to present data (both verbal and written) to ensure the insights are understood and acted upon.
Big Data Solutions
Below are the solutions for the above discussed big data challenges
- Distributed storage across multiple disks
- Implement Parallel Processing
- Bring the code to the data for processing instead of bringing data to code.
One and Only technology that meets all the above expectations is Hadoop, an open source framework for storing and parallel processing of distributed data across multiple nodes. Below is the high-level architecture of Hadoop Distributed File System. For In depth details into Hadoop and HDFS refer Hadoop category.
Below is the high level view of parallel processing framework phases Map and Reduce which works on top of HDFS and works at data. For In depth details into Mapreduce framework refer Mapreduce category.
In the next post, we will discuss about the Hadoop Distributions.