Hadoop HDFS BIG Data installations and setup, Online Trainings, Certification, Pricing, Downloads, Blogs, Interview Questions, projects and jobs..

Hadoop HDFS Architecture and Design

(Note: This post is mainly focused on the platform level architecture, if you are looking for Hadoop BIG Data Application level architecture visit : BIG Data Hadoop Spark Application level Architecture.)

Hadoop HDFS has a master and slave architecture. HDFS (Hadoop Distributed File System) is designed to run on commodity hardware. These commodity hardware providers can be Dell, HP, IBM, Huawei and others. Though there are similarities between HDFS and other distributed file systems, the unique differences making HDFS a market leader.

The following is the pictorial presentation and diagram of the Hadoop Architecture and Design:

Fault-tolerant:


HDFS is highly fault-tolerant and is designed to be deployed on multiple nodes so that if one node fails the other node is available.
HDFS is good for applications that have large data sets, the throughput is very high in HDFS design. With some additional setup, HDFS can accommodate live streaming ability and real time data processing.

NameNode:


It is repository of HDFS metadata. The user data never flows through the NameNode, this design protects the metadata without corrupting the Name node. The NameNode acts as an arbitrator. The NameNode executes file system operations such as opening, closing and renaming of files and directories. HDFS cluster consists of a single NameNode, a master server that manages file system. Also, the mapping of blocks to DataNodes is determined by the NameNode.

DataNode:


HDFS cluster consists of several DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes.
NameNode determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

Data Replication:


To maintain the fault tolerance, the replication is an important process. HDFS can store very large files across machines in a large cluster. It stores each file as a sequence of blocks. The size of the block and replication factor are configured per file. In general replication factor is 3.x and block size is 128 MB.
The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Block report contains all blocks on a DataNode. Heartbeat to know whether DataNode is functioning or not.

...Click here for full details

Python for BEGINNERS and DUMMIES

Reusable Define Function with Return


Python 2D Lists and Nested Loops Examples


Python exception handling using
try except with examples


Python For Loop condition with examples


Python while condition with examples


Python if elif else logical condition


Python Tuples vs List


Python Translator with Examples


Python List Functions with Examples


Data Structures and Data Types


Python Exponent Function with examples


Python dictionary with examples


Variables, Expressions & Statements


List of 'Reserved Words' in Python


Boolean Expressions &
Comparison Operators

Python Interview Questions and
Answers


What are the differences between
Python 2.x vs Python 3.x
(2.7 vs 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9),
when to you which version ?


What is Python Variable, Expression
and Statement ?


What is a 'Reserved Word' in Python ?
List of Reserved words in Python ?

How to Install Python, IDE, UI
to Practice


Download and Install Python from
Python.org


Download and Install Anaconda to Use
SPIDER and Jupyter Notebook