[Hadoop] Run Hadoop Cluster on Docker # 1 - Set up Hadoop on CentOS Container

2022, Apr 17    

1. Download CentOS Image

  • (mac term) On your mac terminal, type the command line below to create new container with CentOS image (version 7 here)
$ docker run --restart always --name [container_name] -dt centos:7

  • now you can see new centos image is created in your docker images list (Docker Dashboard)

  • new centos container is created with the name you set with the option --name [container_name] (here, my_centos)

  • (mac term) execute the centos container that you’ve just created

    $ docker exec -it my_centos_container /bin/bash

  • you can see the container list on run with the command docker ps image

  • (mac term) execute docker
    $ docker exec -it [container_name] /bin/bash

  • after this command executed, you can see that your current serving environment is changed from base to root@[container_id] image

2. Setting Hadoop Base on CentOS Image

  • (mac term) create new container that will be your hadoop base with the name ‘hadoop_base’
    $ docker run -it --name hadoop_base -dt centos:7

  • (mac term) exec hadoop_base docker exec -it hadoop_base /bin/bash
  • (container) update yum packages and install all required libraries
    /* CentOS Container */
    $ yum update
    $ yum install wget -y
    $ yum install vim -y
    $ yum install openssh-server openssh-clients openssh-askpass -y
    $ yum install java-1.8.0-openjdk-devel.x86_64 -y

  • wget : free software package for interacting with REST APIs to retrieve files using HTTP, HTTPS, FTP and FTPS
  • vim : edit files at terminals
  • openssh-server openssh-clients openssh-askpass : connectivity tool for remote login with the SSH protocol
  • java : select the desired java version

  • (container) type commands below to allow password-free interaction between containers (nodes of hadoop clusters)
    $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa
    $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    $ ssh-keygen -f /etc/ssh/ssh_host_rsa_key -t rsa -N ""
    $ ssh-keygen -f /etc/ssh/ssh_host_ecdsa_key -t ecdsa -N ""
    $ ssh-keygen -f /etc/ssh/ssh_host_ed25519_key -t ed25519 -N "" 

  • (container) adding JAVA_HOME directory to PATH
    $ readlink -f /usr/bin/javac     ## check your java directory
    $ vim ~/.bashrc      ## you can edit your PATH at terminal by using vim 

  • (vim) type ‘i’ to start writing mode and add your java direc (note! type except ‘/bin/javac’ part)
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-
    export PATH=$PATH:$JAVA_HOME/bin

  • result
  • image

  • (vim) to exit from writing mode, enter esc
  • (vim) to store the edit and exit from vim, type :w (store) -> :q (exit)
  • (container) make sure to actually execute the content of a file you’ve edited
    $ source ~/.bashrc

Install Hadoop and Set Hadoop Configurations on CentOS Image

  • (container)
    $ mkdir /hadoop_home       
    $ cd /hadoop_home
    $ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
    ## choose the hadoop version you want (here, hadoop-2.7.7)
    $ tar -xvzf hadoop-2.7.7.tar.gz         ## unzip

  • (container) add HADOOP_HOME directory to your PATH
    $ vim ~/.bashrc

  • (vim)
    export HADOOP_HOME=/hadoop_home/hadoop-2.7.7
    export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    ## run sshd 

  • result
  • image

  • (container) $ source ~/.bashrc
  • (container) create files (temp, namenode, datanode) in $HADOOP_HOME directory
    $ mkdir /hadoop_home/tmp
    $ mkdir /hadoop_home/namenode
    $ mkdir /hadoop_home/datanode

Now, edit hadoop configurations with vim


## create mapred-site.xml at $HADOOP_CONFIG_HOME direc
$ cp mapred-site.xml.template mapred-site.xml 

1) core-site.xml

(container) go to file core-site.xml

vim $HADOOP_CONFIG_HOME/core-site.xml


<!-- core-site.xml -->

        <value>hdfs://nn:9000</value>      <!-- nn : hostname of namenode, name as you wnat-->

2) hdfs-site.xml

<!-- hdfs-site.xml -->



3) mapred-site.xml

<!-- mapred-site.xml -->



  • Finally, format namenode and commit the container to centos:hadoop image


      $ hadoop namenode -format
      $ exit

    (mac term)

      $ docker commit -m "hadoop in centos" hadoop_base centos:hadoop
  • docker commit -m [message] [container_name] [image_name]

Next posting, we will gonna create namenode and multiple datanodes with the created hadoop-base image file below (centos:hadoop)
