Hadoop Multi Node Cluster Setup

Published On: 19 September 2022.By .
  • General

What is Hadoop ?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Installation Steps

Prerequisite are :

  • Make sure your Both Master and Slave System  has Ssh installed  & Active .
  • To Check weather it is Installed and Active or Not : Open Terminal and type
  • If Not installed then install by
  • java Should be installed .

Steps To install Multi-Cluster  are:

  •  We have a Server IP : 192.168.0.186 (Master) And a Node or Slave IP : 192.168.0.119.

Generation of Keys.

  • Generate the ssh key and add all the node keys in all the nodes under /home/username/.ssh/authorized_keys ( create this file if not exists ).
  • Generate the key (ssh-keygen -t rsa)   – 4 times press enter. 
  • Exchange the keys between Master Node and Slave Node . Example Paste Slave Node Key in Authorize _keys file in  Master Node and vice versa.
  • You will find your key in /home/username/.ssh/id_rsa.pub file . Copy it  & Paste on Master Node and Vice versa. 
  • Key Look Like  slave 1 –   ssh-rsa AAAAB3N………….

Download the Hadoop & untar the file

  • Download the hadoop from official website or open the terminal and use the command
  • Configure Hadoop Environment Variables (bashrc)
  • Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano):
Once you add the variables, save and exit the .bashrc file.

It is vital to apply the changes to the current running environment by using the following command:

Edit hadoop-env.sh File

  • The hadoop-env.sh file serves as a master file to configure YARN, HDFS, MapReduce, and Hadoop-related project settings.
  • When setting up a single node Hadoop cluster, you need to define which Java implementation is to be utilized. Use the previously created $HADOOP_HOME variable to access the hadoop-env.sh file:

Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to the OpenJDK installation on your system. 

Edit core-site.xml File

  • The core-site.xml file defines HDFS and Hadoop core properties.
  • Open the core-site.xml file in a text editor:
Add the following configuration to both Master and Slave Node. Add IP of Master Node.

Edit hdfs-site.xml File

  • The properties in the hdfs-site.xml file govern the location for storing node metadata, fsimage file, and edit log file.
  • Configure the file by defining the NameNode and DataNode storage directories.

Edit mapred-site.xml File

  • see the following command to access the mapred-site.xml file and define MapReduce values:

Edit yarn-site.xml File

  • The yarn-site.xml file is used to define settings relevant to YARN.
  • It contains configurations for the Node Manager, Resource Manager, Containers, and Application Master.
  • Open the yarn-site.xml file in a text editor:
Append the following configuration to the file: Now Edit Workers file and put all the IP of The Nodes That you want to make slave. (Only In Master Node) Now Format The Name-Node in Master Node Now for start the hadoop by running command in sbin folder of hadoop Directory . Type this simple command to check if all the daemons are active and running as Java processes: If everything is working as intended, the resulting list of running Java processes contains all the HDFS and YARN daemons.

We can also access Web UI of Hadoop by hitting a url

The YARN Resource Manager is accessible on port 8088:

Conclusion

  • You have successfully installed Hadoop on Ubuntu and deployed it in a distributed mode.
  • A Multi node Hadoop deployment is an excellent starting point to explore basic HDFS commands and acquire the experience.

 

 

Related content

That’s all for this blog