managing data in big data

Question

I am reading book on big data for dummies.

Welcome to Big Data For Dummies. Big data is becoming one of the most important technology trends that has the potential for dramatically changing the way organizations use information to enhance the customer experience and transform their business models.

Big data enables organizations to store, manage, and manipulate vast amounts of data at the right speed and at the right time to gain the right insights. The key to understanding big data is that data has to be managed so that it can meet the business requirement a given solution is designed to support. Most companies are at an early stage with their big data journey.

I can understand store means we have to store in DBMS

My questions on above text .

What does author mean by manage vast amounts of data in above context? Example will be helpful.
What does author mean by "organizations transform their business models" with big data? Again example will be helpful.
What does author mean by "manipulate vast amounts of data in above context?

score 2 · Accepted Answer · answered Jul 10 '17 at 04:07

Following are the answers to your questions:

1.What does author mean by manage vast amounts of data in above context? Example will be helpful.

Ans. When we talk about Bigdata, its the data at scale that we mention. Vast amounts of data in the above context indicates a hint at the volume of data that we can process with bigdata platforms. It could be somewhere in the range of Terabytes to petabytes or even more. This volume of data is unmanageable for the age old relational systems.

Example : Twitter, Facebook, Google etc. handling Petabytes of data on a daily basis.

2.What does author mean by "organizations transform their business models" with big data? Again example will be helpful.

Ans. With the use of bigdata technologies,organizations can have huge insights into their business models and accordingly they can make future strategies that can help them to conquer more business share in the market.

Example : Online Retail giant Amazon thrives on user data that helps them know about user's online shopping pattern and hence they create more products and services that are likely to shoot up the business and take them way ahead of their competitors.

3.What does author mean by "manipulate vast amounts of data in above context? Example will be helpful.

Ans. We can manage humongous amounts of data with big data but managing is not enough. So we use sophisticated tools that help us manipulate data in such a way that it turns into business insights and ultimately into money.

Example : Clickstream data. This data consists of user clicks on websites, how much time he/she spent on a particular site, on a particular item etc. All these things when manipulated properly results in greater business insights about the users and hence a huge profit.

score 0 · Answer 2 · answered Feb 10 '17 at 10:29

Vast amount of Data means a large size file not MB or GB it may be in Tera Byte. For example some social networking site everyday generate approx 6 TB of data.
Organization using traditional RDBMS to handle data. But they are implementing Hadoop, Spark to manage easily big data. So day by day they are changing their business tactics with the help of new technology. Easily they are getting customer view with analysis of insight.

score 0 · Answer 3 · answered Nov 06 '17 at 12:07

Your assumption/understanding "I can understand store means we have to store in DBMS"

was the way long back. I am answering that aspect in my detailed answer here. Detailed so you get the Big Data concept clear upfront. (I will provide answers to your listed questions in another subsequent answer post.)

It's not just DBMS/RDBMS any more. It's data storage including file system to data stores.
In Big Data Context, it refers to a) big data (data itself) and b) a storage system - distributed file system (highly available, scalable, fault-tolerant being the salient features. High throughput and low latency is targeted.) handling large volumes (multiples) (not necessarily homogenous or one type of data) than the traditional DBMS in terms of I/O and (durable/consistent) storage. and (extension) c) Big Data eco system that includes systems, frameworks, projects that handle and deal with or interacts with (and/or based on) the above two. Example. Apache Spark.
It can store just any file including raw file as it's. DBMS equivalent Data Storage system for Big Data allows giving structure to data or storing structured data.
As you store data on any normal user device – computer, hard disk or external hard disks, you can think of Big Data store as a cluster (defined/configurable networked collection of nodes) of commodity hardware and storage components (that has a configurable network IP at least, so you usually need to mount/attach a storage device or disk to a computer system or server to have an IP) to provide a single aggregated distributed (data/file) view store / storage system.
So data: structured (traditional DBMS equivalent), relational structured (RDMS equivalent), un structured (e.g., text files and more) and semi-structured files/data (csv, json, xml etc.).
With respect to Big Data, it can be flat files, text files, log files, image files, video files or binary files.
There's again row-oriented and/or column-oriented data as well (when structured / semi-structured data are stored/treated as Database / Data Warehouse data. Example: Hive is a data warehouse of/on Hadoop that allows storing structed relational data and csv files etc. in as-is file format or any specific one like parquet, avro, ORC etc.).
In terms of volume/size, though individual files can be (KBs not recommended) MBs, GBs or some times TBs aggregating to be TBs and PBs (or more; there's no official limit as such) storage at any point of time across the store/system.
It can be batch data or discrete stream data or stream real time data and feeds.
(Wide Data goes beyond Big Data in terms of nature, size and volume etc.)

Book for Beginners: 11. In terms of Book for Beginners, though “Big Data for Dummies” is not a bad option (I have not personally read it though, but know their series/style when I had touched upon during my software engineering degree studies way back.) 12. I suggest you go for "Hadoop: The Definitive Guide" book. You should go for the last edition release which happens to be the 4th Edition (year 2015). It's based on Hadoop 2.x. Though it has not been enhanced with latest 2.x updates, you will find it really good book to read and reading it.

Beyond:

Though Hadoop 3 in alpha phase, you need not worry about that just now.
Follow the Apache Hadoop site and documentation though. (ref: http://hadoop.apache.org/) Know and learn the Hadoop Ecosystem as well.
(Big Data and Hadoop almost going synonymous now a days though Hadoop is based on the Big Data concept. Hadoop is an Open Source Apache project. Used in Production.)
The file system I mentioned is HDFS (Hadoop Distributed File System) (and/or similar ones).
Otherwise it's other Cloud storage systems including AWS S3, Google Cloud Storage and Azure Blob Storage (Object Storage).
Big data can also be stored on NoSQL DB/s which functions as non-relational flexible schema data store DBMS but not optimised for strictly relational data though. If you store relational data, relation constraints are by default removed/broken. And they are not inherently SQL-oriented though interfaces are provided. NoSQL DBs like HBase (on top of HDFS and based Big Table), Cassandra, MongoDB etc. depending on the data type (or direct files) storage and CAP theorem's attributes handled.

managing data in big data

3 Answers3