Set up Apache Hadoop on Ubuntu 20.04



Dedicated Server | Guides & Tutorials | VPS Server



0 Comments(s)



Dicembre 12, 2022

In this guide provided by Space Hosting, we will learn how you can set up Apache Hadoop ons on Ubuntu 20.04 completely including dependencies such as Hadoop HDFS, yarn, Data Nodes, the configuration of localhost, etc.

What is Apache Hadoop?
Set up Apache Hadoop on Ubuntu 20.04
Requirements
Dependencies
Hadoop User Account on Ubuntu
Installation and Configuration of Hadoop Apache
Configuration of Hadoop
Formating Starting Hadoop
Access Hadoop Apache Web Interface

The Operating System Ubuntu 20.04 we using for the tutorial is running on a Ryzen VPS. if you don’t own a server, you can purchase an AMD VPS

€ 2.99
First Month

Recurring Price $3.99/mo

CPU AMD RYZEN 9 5950X 4.9GHz

CORE 1 Core

RAM 2 GB DDR4

TECHNOLOGY KVM

INSTANT DEPLOYMENT YES

STORAGE 25 GB SSD NVME M.2

NETWORK 1GB/s (shared)

€ 3.75
First Month

Recurring Price $4.99/mo

CPU AMD RYZEN 9 5950X 4.9GHz

CORE 1 Core

RAM 2 GB DDR4

TECHNOLOGY KVM

INSTANT DEPLOYMENT YES

STORAGE 30 GB SSD NVME M.2

NETWORK 1GB/s (shared)

€ 7.49
First Month

Recurring Price $9.99/mo

CPU AMD RYZEN 9 5950X 4.9GHz

CORE 2 Core

RAM 4 GB DDR4

TECHNOLOGY KVM

INSTANT DEPLOYMENT YES

STORAGE 35 GB SSD NVME M.2

NETWORK 1GB/s (shared)

1. What is Apache Hadoop?

Apache Hadoop is an open-source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data under clustered systems. It is Java-based and uses Hadoop Distributed File System (HDFS) to store its data and process data using MapReduce.

Apache Hadoop plays an important role after performing the ETL (Extract Transform Load) steps in order to store the result in order to process or use it for analysis. Storing the data will not be easy because the size can be from gigabytes to petabytes (as mentioned in the above paragraph) and after storing it then make a backup of it so can be used later if 1 node stopped working. Apache Hadoop is a Java-based software platform that helps in managing data such as processing data and making replicas (backups) of it, etc.

2. Set up Apache Hadoop on Ubuntu 20.04

Before starting the installation and configuration of Apache Hadoop, please check if the requirements are fulfilled or not.

3. Requirements

Root Server
Fresh install Ubuntu/Debian OS

Connect to the server using Putty and type the following commands;

4. Dependencies

Install the dependencies and update your system with the following commands:

apt update
sudo apt install default-jdk default-jre -y
java -version

5. Hadoop User Account on Ubuntu

This account will be used by the Hadoop process in order to take complete access to the server with the help of sudo group. The following commands will create a Hadoop user and install OpenSSH so that later on accessible through keys

sudo adduser hadoop
sudo usermod -aG sudo hadoop
sudo su - hadoop
apt install openssh-server openssh-client -y

Now access the Hadoop user and generate the public key and add in authorized keys in order to access the user hadoop easily.

sudo su - hadoop
ssh-keygen -t rsa
sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
sudo chmod 640 ~/.ssh/authorized_keys
ssh localhost

6. Installation and Configuration of Hadoop Apache

Hadoop Apache is designed to handle batch processing efficiently. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly with the help of Hadoop Apache.

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
tar -xvzf hadoop-3.3.1.tar.gz
sudo mv hadoop-3.3.1 /usr/local/hadoop
sudo mkdir /usr/local/hadoop/logs
sudo chown -R hadoop:hadoop /usr/local/hadoop

7. Configuration of Hadoop

The configuration of Hadoop is very important so the web UI and other modules such as DataNode, Resource Manager, and Name node work perfectly, In this guide, used default configuration so I recommend you also follow the exact same steps.

Please type the below commands properly and paste the text into the text editor.

Note: To save Nano Text Editor files, Press: CTRL + X Button to save a file

sudo nano ~/.bashrc

# Paste the Following content on Text Editor

# Starting
export HADOOP_HOME=/usr/local/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

# END

source ~/.bashrc
which javac
readlink -f /usr/bin/javac
sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Paste the Following content on Text Editor

Configure the Java home location variable so java can access it with a single command.

# Starting
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"
# Ending

core-site.xml is the main file where ports are defined and other major configurations of Hadoop.

cd /usr/local/hadoop/lib
sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-api/1.2.0/javax.activation-api-1.2.0.jar
hadoop version
sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

Paste the Following content on Text Editor

<configuration>

   <property>

      <name>fs.default.name</name>

      <value>hdfs://0.0.0.0:9000</value>

      <description>The default file system URI</description>

   </property>

</configuration>

sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}
sudo chown -R hadoop:hadoop /home/hadoop/hdfs
sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

The below configuration defined the amount of distributed file system (DFS) replicas.

<configuration>

   <property>

      <name>dfs.replication</name>

      <value>1</value>

   </property>



   <property>

      <name>dfs.name.dir</name>

      <value>file:///home/hadoop/hdfs/namenode</value>

   </property>



   <property>

      <name>dfs.data.dir</name>

      <value>file:///home/hadoop/hdfs/datanode</value>

   </property>

</configuration>

sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Paste the Following content on Text Editor

<configuration>

   <property>

      <name>mapreduce.framework.name</name>

      <value>yarn</value>

   </property>

</configuration>

sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Yarn (Yet another resource manager) controls the overall backend working of Hadoop apache.

<configuration>

   <property>

      <name>yarn.nodemanager.aux-services</name>

      <value>mapreduce_shuffle</value>

   </property>

</configuration>

8. Formating Starting Hadoop

sudo su - hadoop
hdfs namenode -format
start-dfs.sh
start-yarn.sh
jps

After typing JPS, if screen display all of the modules such as Secondary NameNode, DataNode, NameNode, Resource Manage etc then it means hadoop apache successfully installed.

9. Access Hadoop Apache Web Interface

Visit URL: http://server-IP:9870
For Example
http://localhost:9870
or
http://173.51.23.123:9870
or
http://127.0.0.1:9870

Let us know in the comments if you face any trouble 🙂

Follow and like us:

About US

Welcome to the Space Hosting Blog

Stay updated with the latest news from the Top Tier Leading European cloud provider. Explore technical insights from our engineers, interviews with satisfied customers, and our posts on the digital revolution.

Cost-Effective Hosting Services for E-commerce Startups

da Muhammad.Sheryar | Dicembre 30, 2024 | Guides & Tutorials | Commenti 0

Choosing the right e-commerce hosting services is a critical step for e-commerce startups. A reliable hosting solution ensures your online store is always available, secure, and performing optimally, directly impacting customer experience and sales. However, startups...

How to Flush Hosts in MySQL?

da Muhammad.Sheryar | Settembre 10, 2023 | Guides & Tutorials, VPS Server | Commenti 0

Learn the importance "FLUSH HOSTS" command in MySQL and the method of how to flush hosts in MySQL. This guide provides a comprehensive understanding of the process, walking you through when to use it, its implications, and detailed steps to execute the command...

Icarus Dedicated Server Guide For the Beginners

da Muhammad.Sheryar | Luglio 12, 2023 | Game Server, Guides & Tutorials | Commenti 0

Navigate through the vast terrain of Icarus dedicated server guide to setting up your dedicated server. From choosing the server type, managing backups, and understanding technical requirements, to detailed steps on installing and configuring your server - we offer a...

How to run your own Ark Dedicated Server

da Muhammad.Sheryar | Luglio 12, 2023 | Game Server, Guides & Tutorials | Commenti 0

To run your own Ark dedicated server may appear daunting, but this guide demystifies the process. From the initial setup to managing game mods, every aspect is covered. Whether you are a gaming enthusiast or a hosting business exploring new ventures, this guide...

Best Dedicated Server Hosting for Games

da Muhammad.Sheryar | Luglio 12, 2023 | Dedicated Server, Game Server, Guides & Tutorials | Commenti 0

Unleash superior gaming experiences with our guide on the best dedicated server hosting for games. We delve into high-performance servers, ensuring smooth gameplay for various online games. Catering to hosting businesses, resellers, and individual gamers, our insights...

What is Dedicated Web Hosting?

da Muhammad.Sheryar | Luglio 12, 2023 | Dedicated Server, Guides & Tutorials, VPS Server | Commenti 0

Explore what is dedicated web hosting and its significance in ensuring high-performance web solutions. Learn about its benefits, costs, and how it compares to other hosting types like Shared Hosting, Virtual Private Servers (VPS), and Physical Server Hosting. Find out...

Advantages and Disadvantages of Dedicated Web Hosting

da Muhammad.Sheryar | Luglio 12, 2023 | Dedicated Server, Guides & Tutorials | Commenti 0

Explore the balance between the advantages and disadvantages of dedicated web hosting in our detailed guide. Understand the advantages, including total server control and unrivaled power, against the disadvantages like high costs and required technical knowledge....

How to Host a CSGO Server on Linux as a beginner

da Muhammad.Sheryar | Luglio 12, 2023 | Game Server, Guides & Tutorials | Commenti 0

Immerse in the thrill of your favorite game by learning how to host a CSGO server. This guide provides step-by-step instructions, ensuring a seamless gaming experience. Perfect for gamers and server enthusiasts seeking to create their own private CSGO server. Let's...

How to Host a CS 1.6 Server as a beginner

da Muhammad.Sheryar | Luglio 12, 2023 | Game Server, Guides & Tutorials | Commenti 0

Immerse in the thrill of your favorite game by learning how to host a CS 1.6 server. This guide provides step-by-step instructions, ensuring a seamless gaming experience. Perfect for gamers and server enthusiasts seeking to create their own private CS 1.6 server....

How to Access a VPS via SSH Protocol

da Muhammad.Sheryar | Giugno 18, 2023 | Guides & Tutorials, VPS Server | Commenti 0

Discover how to securely access your VPS (Virtual Private Server) using SSH (Secure Shell). We offer a step-by-step guide to establishing a connection using the root or VPS Control Panel. In this guide provided by Space Hosting, we will learn how to access VPS via ssh...

← Prev: Setup WordPress on my server Next: How to Check Shutdown Logs in Windows →

Higher Level Technologies

Benefits of a KVM VPS with Unlimited Bandwidth

View Feed News

Dedicated Servers

Top Tier DedicatedServers

Next Tier DedicatedServers

VPS Hosting

Game ServersHosting

Web 3.0.Hosting

Web Hosting

Web Hosting

Free Hosting

Free Hosting

Higher Level Technologies

Benefits of a KVM VPS with Unlimited Bandwidth

View Feed News

About Us

Client Area

Our Tools

Contact

Our Customers

Legal

Web Hosting

Web Hosting

Free Hosting

Free Hosting

Higher Level Technologies

Benefits of a KVM VPS with Unlimited Bandwidth

View Feed News

Dedicated Servers

Top Tier DedicatedServers

Next Tier DedicatedServers

VPS Hosting

Game ServersHosting

Web 3.0.Hosting

Web Hosting

Web Hosting

Free Hosting

Free Hosting

Higher Level Technologies

Benefits of a KVM VPS with Unlimited Bandwidth

View Feed News

About Us

Client Area

Our Tools

Contact

Our Customers

Legal

Web Hosting

Web Hosting

Free Hosting

Free Hosting

Set up Apache Hadoop on Ubuntu 20.04

Dedicated Server | Guides & Tutorials | VPS Server

Dicembre 12, 2022

Table of contents

€ 2.99First Month

€ 3.75First Month

€ 7.49First Month

1. What is Apache Hadoop?

2. Set up Apache Hadoop on Ubuntu 20.04

3. Requirements

4. Dependencies

5. Hadoop User Account on Ubuntu

6. Installation and Configuration of Hadoop Apache

7. Configuration of Hadoop

Note: To save Nano Text Editor files, Press: CTRL + X Button to save a file

8. Formating Starting Hadoop

9. Access Hadoop Apache Web Interface

Let us know in the comments if you face any trouble 🙂

About US

Recent Posts

Categories

You May Also Like…

0 commenti

Invia un commento Annulla risposta

Pin It on Pinterest

Top Tier Dedicated
Servers

Next Tier Dedicated
Servers

Game Servers
Hosting

Web 3.0.
Hosting

Top Tier Dedicated
Servers

Next Tier Dedicated
Servers

Game Servers
Hosting

Web 3.0.
Hosting

€ 2.99
First Month

€ 3.75
First Month

€ 7.49
First Month