Spark Application Simple Architecture - Exaplanation, Overview of, Components of Architecture, PDF PPT

Spark Application Simple Architecture
Pictorial Presentation diagram of Spark Application Architecture

Spark / Scala / Java / Python: Spark is a frame. We can implement Scala, Java and Python code on Spark frame work to implement most complex transformations.

Sqoop: It is like import / export utility between RDBMS and HDFS.

Kafka: It is for real time streaming of the data.

MapReduce: It is slowly getting outdated. Earlier implementations of Hadoop ecosystem used MapReduce for ETL / ELT work. Spark framework is 10x faster than Hadoop MapReduce frame work. On a lighter note some companies are not moving away from MapReduce because of their employees who transformed from Java developer to Hadoop developer. They still love to work in MapReduce and Pig environment :)

HDFS Data Layers: The data in HDFS file system can organize in to the following four layers.

Raw Data: It is the combination of Structures, Unstructured and Semi-structured data. It represents the same source data copy without any modifications.

Work Data: Prepared and processed data through ETL / ELT process. May contain some erroneous data, incomplete transformations. But, it is all structured data.

Gold Data: Fully transformed, clean and quality data.

Business Data: Highly aggregated and summarized for business needs.


What is BIG Data ?