What is BIG Data ? – PART I
Before knowing about BIG Data, we should understand about STRUCTURED DATA and UNSTRUCTURED DATA.
What is Structures Data?
Any data that is organized in COLUMNS and ROWS is STRUCTURED DATA. Irrespective of technology where the data is physically stored, as long as it is in columns and rows then we refer it as Structured Data. Structured data can be physically stored in RDBMS databases, excel files, text files, in memory databases, cloud server or in any other technologies.
All the transactional data is structured data. In general transactional data created by systems such as ERP, CRM, Online Transactional Systems, e-Commerce and others.
What is Unstructured Data?
Any Data that is not organized in the form of Columns and Rows is UNSTRUCTURED DATA.
Word documents, pdf documents, email content, images, audio files, video files and others.
Data created in social networking website such as Facebook, LinkedIn, Twitter, Whatsapp and others.
Data created by user search in search engines such as Google, Bing, Yahoo and others.
All the UNSTRUCTURED DATA is referred as BIG Data, because more than 80% world data in Unstructured data.
What is Hadoop and Spark ?
Hadoop and Spark are frame works to prepared Structured data from Unstructured data.
There are people who are referring BIG Data as Hadoop and Hadoop as BIG Data – please don’t, I hope now you can understand the difference.
Let me add one more comment – We can process Unstructured Data (BIG Data) in to Structured Data without Hadoop and Spark. We can write simple structured program or object oriented program to convert Unstructured Data (BIG Data) in to Structured Data.