Exaplanation, Overview of, Components of , Details of BIG Data Hadoop Spark Application Architecture, PDF PPT

BIG Data Hadoop Spark Application Simple Architecture

(Note: This post is regarding a simple BIG Data / Hadoop Spark Architecture to support Data Science, Machine Learning and Advanced Analytics. Excluding the platform aspect, not getting how many clusters, nodes, name nodes, data nodes and so on. It is simply focused from the functional application point of view rather than platform point of view. If you are looking for Hadoop HDFS platform level Architecture please visit: : Hadoop HDFS Architecture and Design.)

Source Data: It contains Unstructured, Semi-structured and Structured data. Example e-commerce, trading, click stream, Social Media and Realtime stream data. ETL / ELT (Extract Transform Load (or) Extract Load Transform): Depending on the nature of the data such as volume, latency and complexity – the ETL / ELT technology stack can vary.

Spark / Scala / Java / Python: Spark is a frame. We can implement Scala, Java and Python code on Spark frame work to implement most complex transformations.

Sqoop: It is like import / export utility between RDBMS and HDFS.

Kafka: It is for real time streaming of the data.

...Click here for full details


What is BIG Data ?