[Hadoop] Run Hadoop Cluster on Docker # 1 - Set up Hadoop on CentOS Container
2022, Apr 17
1. Download CentOS Image
- (mac term) On your mac terminal, type the command line below to create new container with CentOS image (version 7 here)
$ docker run --restart always --name [container_name] -dt centos:7
- now you can see new centos image is created in your docker images list (Docker Dashboard)
-
new centos container is created with the name you set with the option
--name [container_name]
(here, my_centos) -
(mac term) execute the centos container that you’ve just created
$ docker exec -it my_centos_container /bin/bash
- you can see the container list on run with the command
docker ps
- (mac term) execute docker
$ docker exec -it [container_name] /bin/bash
- after this command executed, you can see that your current serving environment is changed from base to root@[container_id]
2. Setting Hadoop Base on CentOS Image
- (mac term) create new container that will be your hadoop base with the name ‘hadoop_base’
$ docker run -it --name hadoop_base -dt centos:7
- (mac term) exec hadoop_base
docker exec -it hadoop_base /bin/bash
- (container) update yum packages and install all required libraries
/* CentOS Container */ $ yum update $ yum install wget -y $ yum install vim -y $ yum install openssh-server openssh-clients openssh-askpass -y $ yum install java-1.8.0-openjdk-devel.x86_64 -y
- wget : free software package for interacting with REST APIs to retrieve files using HTTP, HTTPS, FTP and FTPS
- vim : edit files at terminals
- openssh-server openssh-clients openssh-askpass : connectivity tool for remote login with the SSH protocol
-
java : select the desired java version
- (container) type commands below to allow password-free interaction between containers (nodes of hadoop clusters)
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ ssh-keygen -f /etc/ssh/ssh_host_rsa_key -t rsa -N "" $ ssh-keygen -f /etc/ssh/ssh_host_ecdsa_key -t ecdsa -N "" $ ssh-keygen -f /etc/ssh/ssh_host_ed25519_key -t ed25519 -N ""
- (container) adding JAVA_HOME directory to PATH
$ readlink -f /usr/bin/javac ## check your java directory /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/bin/javac $ vim ~/.bashrc ## you can edit your PATH at terminal by using vim
- (vim) type ‘i’ to start writing mode and add your java direc (note! type except ‘/bin/javac’ part)
. . export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.aarch64 export PATH=$PATH:$JAVA_HOME/bin . .
- result
- (vim) to exit from writing mode, enter
esc
- (vim) to store the edit and exit from vim, type
:w (store) -> :q (exit)
- (container) make sure to actually execute the content of a file you’ve edited
$ source ~/.bashrc
Install Hadoop and Set Hadoop Configurations on CentOS Image
- (container)
$ mkdir /hadoop_home $ cd /hadoop_home $ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz ## choose the hadoop version you want (here, hadoop-2.7.7) $ tar -xvzf hadoop-2.7.7.tar.gz ## unzip
- (container) add HADOOP_HOME directory to your PATH
$ vim ~/.bashrc
- (vim)
. . export HADOOP_HOME=/hadoop_home/hadoop-2.7.7 export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin ## run sshd /usr/sbin/sshd . .
- result
- (container)
$ source ~/.bashrc
- (container) create files (temp, namenode, datanode) in $HADOOP_HOME directory
$ mkdir /hadoop_home/tmp $ mkdir /hadoop_home/namenode $ mkdir /hadoop_home/datanode
Now, edit hadoop configurations with vim
(container)
$ cd $HADOOP_CONFIG_HOME
## create mapred-site.xml at $HADOOP_CONFIG_HOME direc
$ cp mapred-site.xml.template mapred-site.xml
1) core-site.xml
(container) go to file core-site.xml
vim $HADOOP_CONFIG_HOME/core-site.xml
(vim)
<!-- core-site.xml -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop_home/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://nn:9000</value> <!-- nn : hostname of namenode, name as you wnat-->
<final>true</final>
</property>
</configuration>
2) hdfs-site.xml
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop_home/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop_home/datanode</value>
<final>true</final>
</property>
</configuration>
3) mapred-site.xml
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>nn:9001</value>
</property>
</configuration>
-
Finally, format namenode and commit the container to centos:hadoop image
(container)
$ hadoop namenode -format $ exit
(mac term)
$ docker commit -m "hadoop in centos" hadoop_base centos:hadoop
-
docker commit -m [message] [container_name] [image_name]
Next posting, we will gonna create namenode and multiple datanodes with the created hadoop-base image file below (centos:hadoop)