从零开始搭建Hadoop-2.7.2完全分布式部署

相关软件下载

准备四台虚拟机

这里使用centos6.8-64位的精简版镜像安装四台服务器:

虚拟机集群

  • [192.168.0.10] - master,集群master
  • [192.168.0.11] - datanode1,集群节点1
  • [192.168.0.12] - datanode2,集群节点2
  • [192.168.0.13] - datanode3,集群节点3

配置好服务器基础配置

修改服务器hostname,hosts,免密码key登录等等集群基础配置

(1)修改hostname

master上修改位置文件 /etc/sysconfig/network 的HOSTNAME值为master
datanode1上修改位置文件 /etc/sysconfig/network 的HOSTNAME值为datanode1
datanode2上修改位置文件 /etc/sysconfig/network 的HOSTNAME值为datanode2
datanode3上修改位置文件 /etc/sysconfig/network 的HOSTNAME值为datanode3

修改配置文件并不会使hostname立即生效,如果要立即生效,可以使用命令hostname master,然后重新登录即可看到hostname被修改了,配置文件的修改则是永久的,重启后也不会改变

(2)修改hosts

分别在 masterdatanode1datanode2datanode3/etc/hosts 添加如下配置

192.168.0.10 master  
192.168.0.11 datanode1  
192.168.0.12 datanode2  
192.168.0.13 datanode3  
(3)配置ssh免密码秘钥登录

master 上执行 useradd hadoop 然后生成秘钥:

[root@master hadoop]# su hadoop
[hadoop@master ~]$ ssh-keygen 
Generating public/private rsa key pair.  
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):  
Created directory '/home/hadoop/.ssh'.  
Enter passphrase (empty for no passphrase):  
Enter same passphrase again:  
Your identification has been saved in /home/hadoop/.ssh/id_rsa.  
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.  
The key fingerprint is:  
f6:69:98:76:fc:c1:e5:0a:c3:84:9f:b3:b7:a2:8c:44 hadoop@master  
The key's randomart image is:  
+--[ RSA 2048]----+
|                 |
|                 |
|                 |
|         .       |
|      E S .   .  |
|     . . O + o   |
|      . + & o .  |
|     . + o.*.o   |
|      . o..o+.   |
+-----------------+
[hadoop@master .ssh]$ cat id_rsa.pub > authorized_keys
[hadoop@master .ssh]$ cd ..
[hadoop@master ~]$ chmod .ssh/ 700 -R
[hadoop@master ~]$ ssh master
Last login: Wed Sep 21 16:24:11 2016 from master  
[hadoop@master ~]$ exit

datanode1datanode2datanode3上分别添加好hadoop用户后,将master上生成的key配置在/home/hadoop/.ssh/authorized_keys 文件中,然后在master上测试登录:

[hadoop@master .ssh]$ ssh datanode1
The authenticity of host 'datanode1 (192.168.0.11)' can't be established.  
RSA key fingerprint is 9b:73:d5:70:e3:14:bf:61:6c:18:93:a6:5e:1f:e4:d5.  
Are you sure you want to continue connecting (yes/no)? yes  
Warning: Permanently added 'datanode1,192.168.0.11' (RSA) to the list of known hosts.  
[hadoop@datanode1 ~]$ exit
[hadoop@master .ssh]$ ssh datanode2
The authenticity of host 'datanode2 (192.168.0.12)' can't be established.  
RSA key fingerprint is 25:cf:9e:7c:82:d0:2e:bd:7b:52:55:5f:f5:04:b5:c6.  
Are you sure you want to continue connecting (yes/no)? yes  
Warning: Permanently added 'datanode2,192.168.0.12' (RSA) to the list of known hosts.  
[hadoop@datanode2 ~]$ exit
[hadoop@master .ssh]$ ssh datanode3
The authenticity of host 'datanode3 (192.168.0.13)' can't be established.  
RSA key fingerprint is a5:b4:01:d8:ab:bd:9e:b2:47:9f:cd:f3:25:c9:89:77.  
Are you sure you want to continue connecting (yes/no)? yes  
Warning: Permanently added 'datanode3,192.168.0.13' (RSA) to the list of known hosts.  
[hadoop@datanode3 ~]$ exit

安装 jdk

四台机器上全部需要安装jdk

将jdk1.7安装在目录 /usr/local/jdk 目录下,安装非常非常简单:

[root@master ~]# ll
total 134868  
-rw-------. 1 root root      1101 Sep 13 11:05 anaconda-ks.cfg
-rw-r--r--. 1 root root      8837 Sep 13 11:05 install.log
-rw-r--r--. 1 root root      3384 Sep 13 11:05 install.log.syslog
-rw-r--r--. 1 root root 138082565 Sep 21 15:56 jdk-7u79-linux-x64.rpm
[root@master ~]# mkdir /usr/local/jdk
[root@master ~]# rpm -ivh --prefix=/usr/local/jdk/ jdk-7u79-linux-x64.rpm 
Preparing...                ########################################### [100%]  
   1:jdk                    ########################################### [100%]
Unpacking JAR files...  
        rt.jar...
        jsse.jar...
        charsets.jar...
        tools.jar...
        localedata.jar...
        jfxrt.jar...
ln: creating symbolic link `/usr/java/jdk1.7.0_79': No such file or directory  
[root@master ~]# 

修改 /erc/profile 增加:

export JAVA_HOME=/usr/local/jdk/jdk1.7.0_79  
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar  
export PATH=$PATH:$JAVA_HOME/bin  

然后执行 source /etc/profile 生效

安装hadoop

这里开始安装hadoop-2.7.2了,先确保上面的配置没有问题再往下搭建

(1)下载并解压到指定目录下
[root@master ~]# wget http://apache.fayea.com/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
[root@master ~]# ll
-rw-------. 1 root root      1101 Sep 13 11:05 anaconda-ks.cfg
-rw-r--r--. 1 root root 212046774 Jan 26  2016 hadoop-2.7.2.tar.gz
-rw-r--r--. 1 root root      8837 Sep 13 11:05 install.log
-rw-r--r--. 1 root root      3384 Sep 13 11:05 install.log.syslog
-rw-r--r--. 1 root root 138082565 Sep 21 15:56 jdk-7u79-linux-x64.rpm
[root@master ~]# tar xvzf hadoop-2.7.2.tar.gz -C /usr/local/
[root@master ~]# cd /usr/local/
[root@master local]# ll
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 bin  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 etc  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 games  
drwxr-xr-x. 9 10011 10011 4096 Jan 26  2016 hadoop-2.7.2  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 include  
drwxr-xr-x. 3 root  root  4096 Sep 21 17:20 jdk  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 lib  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 lib64  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 libexec  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 sbin  
drwxr-xr-x. 5 root  root  4096 Sep 13 11:04 share  
drwxr-xr-x. 2 root  root  4096 Sep 23  2011 src  
[root@master local]# chown hadoop.hadoop hadoop-2.7.2/ -R
(2)添加环境变量source生效
# 修改 /etc/profile
export HADOOP_INSTALL=/usr/local/hadoop-2.7.2  
export PATH=$PATH:$HADOOP_INSTALL/bin  
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL  
export HADOOP_COMMON_HOME=$HADOOP_INSTALL  
export HADOOP_HDFS_HOME=$HADOOP_INSTALL  
export YARN_HOME=$HADOOP_INSTALL  
(3)修改JAVA_HOME

不知道什么原因,修改 /etc/profile 下的JAVAHOME并不能生效,必须要在 hadoop-env.sh 中添加才可生效 修改 JAVAHOME:

export JAVA_HOME=/usr/local/jdk/jdk1.7.0_79  
(4)修改hdfs-site.xml

配置文件在:/usr/local/hadoop-2.7.2/etc/hadoop/hdfs-site.xml 内容修改为:

<?xml version="1.0" encoding="UTF-8"?>  
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
<!--  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->  
<configuration>  
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>

        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>master:9001</value>
        </property>

        <property>
                <name>hadoop.tmp.dir</name>
                <value>/data/hadoop/tmp</value>
        </property>

        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>  

master 上创建hadoop的临时目录 mkdir /data/hadoop/tmp

(5)修改mapred-site.xml

配置文件在:/usr/local/hadoop-2.7.2/etc/hadoop/mapred-site.xml 首先改名:

[root@master hadoop]# mv mapred-site.xml.template mapred-site.xml

然后内容修改为:

<?xml version="1.0"?>  
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
<!--  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>  
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>

        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>master:10020</value>
        </property>

        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>master:19888</value>
        </property>

        <property>
                <name>mapred.map.tasks.speculative.execution</name>
                <value>false</value>
        </property>

        <property>
                <name>mapred.map.tasks.speculative.execution</name>
                <value>flase</value>
        </property>
</configuration>  
(6)修改core-site.xml

配置文件在:/usr/local/hadoop-2.7.2/etc/hadoop/core-site.xml 内容修改为:

<configuration>  
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:9000</value>
        </property>

        <property>
                <name>hadoop.proxyuser.hadoop.groups</name>
                <value>hadoop</value>
                <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
        </property>

        <property>
                <name>hadoop.proxyuser.hadoop.hosts</name>
                <value>master,datanode1,datanode2</value>
                <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
        </property>

        <property>
                <name>hadoop.tmp.dir</name>
                <value>/data/hadoop/tmp</value>
        </property>

</configuration>  
(7)修改yarn-site.xml

配置文件在:/usr/local/hadoop-2.7.2/etc/hadoop/yarn-site.xml 内容修改为:

<?xml version="1.0"?>  
<!--  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<configuration>  
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>

        <property>                                                               
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>

        <property>
                <name>yarn.resourcemanager.address</name>
                <value>master:8032</value>
        </property>

        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>master:8030</value>
        </property>

        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>master:8031</value>
        </property>

        <property>
                <name>yarn.resourcemanager.admin.address</name>
                <value>master:8033</value>
        </property>

        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>master:8088</value>
        </property>
</configuration>  
(8)修改slaves

slaves文件里面记录的是集群里所有DataNode的主机名,这里添加三台节点的主机名

[root@master hadoop]# cat slaves 
datanode1 datanode2 datanode3  
(9)分发hadoop的安装包到各个节点

因为已经配置了key,所以直接通过scp分发到三个节点即可,因为是内网,速度是飞快滴

[hadoop@master local]$ tar cvzf hadoop-2.7.2.tar.gz hadoop-2.7.2/
[hadoop@master local]$ scp hadoop-2.7.2.tar.gz hadoop@datanode1:/home/hadoop/
[hadoop@master local]$ scp hadoop-2.7.2.tar.gz hadoop@datanode2:/home/hadoop/ 
[hadoop@master local]$ scp hadoop-2.7.2.tar.gz hadoop@datanode3:/home/hadoop/

分发过去后三台节点都把包解压到 /usr/local 目录下 tar xvzf hadoop-2.7.2.tar.gz -C /usr/local/

(10)datanode上配置hdfs存储目录

修改配置文件 /usr/local/hadoop-2.7.2/etc/hadoop/hdfs-site.xml 增加属性:

<property>  
      <name>dfs.data.dir</name>
      <value>/data/hadoop/hdfs</value>
</property>  

当然,这个目录要自己手动创建一下,并授权到hadoop用户组下

(11)格式化namenode

在master上执行命令 hadoop namenode -format 格式化,如果出现下面这个输出就说明格式化成功:

common.Storage: Storage directory /data/hadoop/tmp/dfs/name has been successfully formatted.  
(12)启动集群

操作如下:

[hadoop@master hadoop-2.7.2]$ sbin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh  
16/09/22 05:41:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
Starting namenodes on [master]  
master: starting namenode, logging to /usr/local/hadoop-2.7.2/logs/hadoop-hadoop-namenode-master.out  
datanode3: starting datanode, logging to /usr/local/hadoop-2.7.2/logs/hadoop-hadoop-datanode-datanode3.out  
datanode2: starting datanode, logging to /usr/local/hadoop-2.7.2/logs/hadoop-hadoop-datanode-datanode2.out  
datanode1: starting datanode, logging to /usr/local/hadoop-2.7.2/logs/hadoop-hadoop-datanode-datanode1.out  
Starting secondary namenodes [master]  
master: starting secondarynamenode, logging to /usr/local/hadoop-2.7.2/logs/hadoop-hadoop-secondarynamenode-master.out  
16/09/22 05:42:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
starting yarn daemons  
starting resourcemanager, logging to /usr/local/hadoop-2.7.2/logs/yarn-hadoop-resourcemanager-master.out  
datanode2: starting nodemanager, logging to /usr/local/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-datanode2.out  
datanode1: starting nodemanager, logging to /usr/local/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-datanode1.out  
datanode3: starting nodemanager, logging to /usr/local/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-datanode3.out  
[hadoop@master hadoop-2.7.2]$ jps
15537 NameNode  
15695 SecondaryNameNode  
16135 Jps  
15849 ResourceManager  
[hadoop@master hadoop-2.7.2]$ 

浏览器打开 http://master:50070 可以看到hadoop自带的一个web查看集群的基本情况:

测试Hello world

用hadoop安装目录下的README.txt来做测试mapreduce

操作:

[hadoop@master hadoop-2.7.2]$ hdfs dfs -mkdir /test
[hadoop@master hadoop-2.7.2]$ hdfs dfs -put README.txt /test
[hadoop@master hadoop-2.7.2]$ hdfs dfs -mkdir /output
[hadoop@master hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /test/README.txt /output
16/09/22 06:09:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  
16/09/22 06:09:56 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.10:8032  
16/09/22 06:09:57 INFO input.FileInputFormat: Total input paths to process : 1  
16/09/22 06:09:57 INFO mapreduce.JobSubmitter: number of splits:1  
16/09/22 06:09:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474494126957_0002  
16/09/22 06:09:57 INFO impl.YarnClientImpl: Submitted application application_1474494126957_0002  
16/09/22 06:09:57 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1474494126957_0002/  
16/09/22 06:09:57 INFO mapreduce.Job: Running job: job_1474494126957_0002  
16/09/22 06:10:07 INFO mapreduce.Job: Job job_1474494126957_0002 running in uber mode : false  
16/09/22 06:10:07 INFO mapreduce.Job:  map 0% reduce 0%  
16/09/22 06:10:15 INFO mapreduce.Job:  map 100% reduce 0%  
16/09/22 06:10:22 INFO mapreduce.Job:  map 100% reduce 100%  
16/09/22 06:10:22 INFO mapreduce.Job: Job job_1474494126957_0002 completed successfully  
16/09/22 06:10:22 INFO mapreduce.Job: Counters: 49  
        File System Counters
                FILE: Number of bytes read=1836
                FILE: Number of bytes written=239135
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1465
                HDFS: Number of bytes written=1306
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=4609
                Total time spent by all reduces in occupied slots (ms)=5048
                Total time spent by all map tasks (ms)=4609
                Total time spent by all reduce tasks (ms)=5048
                Total vcore-milliseconds taken by all map tasks=4609
                Total vcore-milliseconds taken by all reduce tasks=5048
                Total megabyte-milliseconds taken by all map tasks=4719616
                Total megabyte-milliseconds taken by all reduce tasks=5169152
        Map-Reduce Framework
                Map input records=31
                Map output records=179
                Map output bytes=2055
                Map output materialized bytes=1836
                Input split bytes=99
                Combine input records=179
                Combine output records=131
                Reduce input groups=131
                Reduce shuffle bytes=1836
                Reduce input records=131
                Reduce output records=131
                Spilled Records=262
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=113
                CPU time spent (ms)=1200
                Physical memory (bytes) snapshot=318619648
                Virtual memory (bytes) snapshot=1681313792
                Total committed heap usage (bytes)=164630528
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1366
        File Output Format Counters 
                Bytes Written=1306

好,安装流程到此结束~~~