Exaplanation, Overview of, Components of , Details of BIG Data Hadoop Spark Application Architecture, PDF PPT

BIG Data Hadoop Spark Application Simple Architecture

(Note: This post is regarding a simple BIG Data / Hadoop Spark Architecture to support Data Science, Machine Learning and Advanced Analytics. Excluding the platform aspect, not getting how many clusters, nodes, name nodes, data nodes and so on. It is simply focused from the functional application point of view rather than platform point of view. If you are looking for Hadoop HDFS platform level Architecture please visit: : Hadoop HDFS Architecture and Design.)

Pictorial Presentation diagram of BIG Data Hadoop Spark Application Architecture

Source Data: It contains Unstructured, Semi-structured and Structured data. Example e-commerce, trading, click stream, Social Media and Realtime stream data. ETL / ELT (Extract Transform Load (or) Extract Load Transform): Depending on the nature of the data such as volume, latency and complexity – the ETL / ELT technology stack can vary.

Spark / Scala / Java / Python: Spark is a frame. We can implement Scala, Java and Python code on Spark frame work to implement most complex transformations.

Sqoop: It is like import / export utility between RDBMS and HDFS.

Kafka: It is for real time streaming of the data.

MapReduce: It is slowly getting outdated. Earlier implementations of Hadoop ecosystem used MapReduce for ETL / ELT work. Spark framework is 10x faster than Hadoop MapReduce frame work. On a lighter note some companies are not moving away from MapReduce because of their employees who transformed from Java developer to Hadoop developer. They still love to work in MapReduce and Pig environment :)

HDFS Data Layers: The data in HDFS file system can organize in to the following four layers.

Raw Data: It is the combination of Structures, Unstructured and Semi-structured data. It represents the same source data copy without any modifications.

Work Data: Prepared and processed data through ETL / ELT process. May contain some erroneous data, incomplete transformations. But, it is all structured data.

...Click here for full details


What is BIG Data ?