Hadoop2.6.0集群搭建
===============
在多台服务器上搭建Hadoop2.6.0平台,操作系统为CentOS/RHEL 64位。使用Hadoop为2.6.0的源文件进行64位版本编译。
===============
注意事项:
1.环境:笔记本电脑,装了2台centos6.6虚拟机,每台虚拟机分配:1G内存,20G硬盘。其中xml做了较多修改,参考:安装配置手册:http://blog.csdn.net/licongcong_0224/article/details/12972889
2.命令基本在hadoop用户下进行,不要贪图简便在root用户下运行,会出错
3.区分root和hadoop用户(# 和 $)
4./etc/profile是在根目录,由root修改就行;~/.bash_profile是在每个用户的根目录下(例如:/home/hadoop;/home/hongyuqin)
5.配置错误,重新配置时,建议删除~/hdfs,重新mkdir…,删除~/hadoop-2.6.0/logs,重新mkdir…;再hadoop namenode -format
6.添加了 wordcount 测试实例
===============
第一步:创建用户
CentOS增加hadoop用户,设定密码,加入/etc/sudoers配置(需要chmod u+w)
# useradd hadoop
# passwd hadoop
如果需要配置成sudo用户的话:
# chmod u+w /etc/sudoers
# vi /etc/sudoers 添加 xxx ALL=(ALL) ALL
# chmod u-w /etc/sudoers
补充:更改主机名
#vim /etc/hosts
(先把备份后内容清空)
192.168.136.131 master
192.168.136.130 slave
#vim /etc/selinux/config
#改成disabled
setenforce 0
#vim /etc/sysconfig/network
HOSTNAME=slave
第二步:安装Java
原先的linux里已经安装了Java,在安装完新版后选择新版本。
下载jdk-7u71-linux-x64.rpm
$ sudo rpm -ivh jdk-7u71-linux-x64.rpm
编辑sudo vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_71
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
把新的jdk加入到使用的备选项
$ sudo update-alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_71/bin/java 300
$ sudo update-alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_71/bin/javac 300
更改默认版本
$ sudo update-alternatives --config java
会出现一些备选项,选择刚刚加入的版本即可
查验:
#java -version
# source /etc/profile
# echo $PATH
第三步:配置Key
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys
$ chmod 700 ~/.ssh/
设置并重启sshd服务
#vi /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
#service sshd restart
测试一下本机ok否:
$ ssh localhost
对slave机器分别执行:
#ssh-copy-id hadoop@slave
测试登录
$ ssh slave
第四步:编译Hadoop
在64位机器上想安装64位的Hadoop只能下载源码自己编译,因为Hadoop只提供了32位的bin。编译工具集需要,如Maven、Findbugs、protobuf、CMake、zlib等等。
# yum -y update
# yum -y install svn ncurses-devel gcc*
# yum -y install lzo-devel zlib-devel autoconf automake libtool cmake openssl-devel
# yum -y install gcc make cmake zlib zlib-devel openssl glibc-headers gcc-c++
然后检查下基本的有没有齐全
#rpm -ql zlib-devel
/usr/include/zconf.h
/usr/include/zlib.h
/usr/lib64/libz.so
/usr/lib64/pkgconfig/zlib.pc
/usr/share/doc/zlib-devel-1.2.3
/usr/share/doc/zlib-devel-1.2.3/README
/usr/share/doc/zlib-devel-1.2.3/algorithm.txt
/usr/share/doc/zlib-devel-1.2.3/example.c
/usr/share/doc/zlib-devel-1.2.3/minigzip.c
/usr/share/man/man3/zlib.3.gz
#rpm -ql zlib
/lib64/libz.so.1
/lib64/libz.so.1.2.3
/usr/share/doc/zlib-1.2.3
/usr/share/doc/zlib-1.2.3/ChangeLog
/usr/share/doc/zlib-1.2.3/FAQ
/usr/share/doc/zlib-1.2.3/README
# vi + /etc/profile(添加下述)
export LD_LIBRARY_PATH=/lib64:/usr/lib64
export ZLIB_INCLUDE_DIR=/lib64:/usr/lib64
1. 安装maven
wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.tar.gz
解压到目录/usr/local/apache-maven
# vi + /etc/profile添加
export MAVEN_HOME=/usr/local/apache-maven-3.2.5
export PATH=$PATH:$MAVEN_HOME/bin
执行source /etc/profile使之生效,测试一下
$ echo $MAVEN_HOME
$ mvn –v
2. 安装ant
百度云盘下载 :http://pan.baidu.com/s/1c0vjhBy
$tar xvzf apache-ant-1.9.4-bin.tar.gz
$sudo mv apache-ant-1.9.4 /usr/local
然后在/etc/profile中添加环境变量
export ANT_HOME=/usr/local/apache-ant-1.9.4
export PATH=$PATH:$ANT_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ANT_HOME/lib
3. 安装protocol buffers
百度云盘 :http://pan.baidu.com/s/1pJlZubT
$cd protobuf-2.5.0
$./configure
$sudo make
$sudo make check
$sudo make install
%默认安装在/usr/local/bin/protoc和/usr/local/lib/*.so,最后检查一下
protoc --version
显示如下
libprotoc 2.5.0
4. 安装findbugs
$ wget ...
$ tar xvzf findbugs-3.0.0.tar.gz
$ mv findbugs-3.0.0 findbugs
$ sudo mv findbugs /usr/local/
$ vim /etc/profile
export FINDBUGS_HOME=/usr/local/findbugs
export PATH=$PATH:$FINDBUGS_HOME/bin
#PATH中加入$FINDBUGS_HOME/bin
5. 编译Hadoop
# cd ~
官网下载:http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.0/hadoop-2.6.0-src.tar.gz
# tar zxvf hadoop-2.6.0-src.tar.gz -C /home/hadoop/
# chown –R hadoop /home/hadoop/
# source /etc/profile
进入Hadoop源码目录根目录
# cd /home/hadoop-2.6.0-src
# mvn clean package -Pdist,native -DskipTests -Dtar
或者
# mvn package -Pdist,native,doc -DskipTest -Dtar -Dmaven.test.skip=true 跳过测试
编译成功后 source目录下的 /hadoop-dist/target/hadoop-2.6.0.tar.gz就是需要的文件了
编译好的文件传到各个slave机器上:
# scp hadoop-2.6.0.tar.gz hadoop@slave:~
在master,slave机器上:
# tar -xzvf hadoop-2.6.0.tar.gz -C /home/hadoop
# chown -R hadoop:hadoop /home/hadoop
第五步:配置Hadoop
先为当前hadoop用户配置环境变量
配置用户名为Hadoop的默认环境变量(如果没有在/etc/profile里配置JAVA_HOME也可以配置在bash_profile)
$cd ~
$vi + .bash_profile
#Hadoop variables
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0
export HADOOP_INSTALL=/home/hadoop/hadoop-2.6.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
alias h='cd $HADOOP_HOME'
alias etc='cd $HADOOP_HOME/etc/hadoop'
$source .bash_profile
在hadoop-env.sh和yarn-env.sh里加入下面的内容:
$ vim hadoop-env.sh
#modify JAVA_HOME
export JAVA_HOME=/usr/java/jdk1.7.0_71
.bash_profile要在hadoop用户下编辑和传给slave,.bash_profile是在/home/hadoop/目录下:
$scp .bash_profile slave:~
还有十台机器
目录规划
HDFSNameNode元数据文件 dfs.namenode.name.dir /home/hadoop/hdfs/name
HDFS数据文件1 dfs.datanode.data.dir /home/hadoop/hdfs/data
HDFSNameNode备份文件目录 fs.checkpoint.dir /home/hadoop/hdfs/checkpoint
临时文件 hadoop.tmp.dir /home/hadoop/hdfs/tmp
创建目录:
$ mkdir -p /home/hadoop/hdfs/name /home/hadoop/hdfs/data /home/hadoop/hdfs/checkpoint /home/hadoop/hdfs/tmp /home/hadoop/hdfs/tmp/nodemanager/local /home/hadoop/hdfs/tmp/nodemanager/remote /home/hadoop/hdfs/tmp/nodemanager/logs
检查一下所属是不是hadoop
$ ll /home/hadoop/hdfs
drwxrwxr-x 2 hadoop hadoop 4096 Aug 12 09:25 checkpoint
drwxrwxr-x 2 hadoop hadoop 4096 Aug 12 09:25 data
drwxrwxr-x 3 hadoop hadoop 4096 Aug 13 00:23 name
drwxrwxr-x 4 hadoop hadoop 4096 Aug 12 23:13 tmp
在各台slave机器上执行上述命令。
master进入目录cd /home/hadoop/hadoop-2.6.0/etc/hadoop
需要配置yarn-env.sh,hadoop-env.sh(这2个.sh本文前部分添加了JAVA_HOME),slaves,masters, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml这8个文件.
1. 配置etc/master和etc/slaves
etc目录下
echo "master" > masters
echo -ne "slave\n" > slaves
2. 配置core-site.xml
需要先mkdir temp
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.logfile.size</name>
<value>104857600</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hdfs/tmp</value>
</property>
</configuration>
3. 配置hdfs-site.xml
configuration>
<property>
<name>dfs.name.dir</name>
<value>file:/home/hadoop/hdfs/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:/home/hadoop/hdfs/data</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
4. 配置mapred-site.xml
首先copy mapred-site.xml.template到mapred-site.xml:
$ cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Execution framework set to Hadoop YARN.</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
<description>MapReduce JobHistory Server host:port, default port is 10020.</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
<description>MapReduce JobHistory Server Web UI host:port, default port is 19888.</description>
</property>
</configuration>
5. 配置yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
<description>the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
<description>host is the hostname of the resourcemanager and port isthe port on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
<description>host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
</configuration>
$ scp mapred-site.xml core-site.xml hdfs-site.xml yarn-site.xml masters slaves hadoop-env.sh yarn-env.sh slave:~/hadoop-2.6.0/etc/hadoop/
第六步:开启Hadoop服务
进入/home/hadoop/hadoop-2.6.0/bin/下执行命令格式化namenode
$ hadoop namenode -format
或者执行
$ hdfs namenode -format
只要初始化一次就行,因为dfs里面有数据的话会弄丢
开启hadoop服务其实只用在master上运行
start-all.sh
可以在master:50070查看dataNode信息;
也可以命令行输入:
$ hdfs dfsadmin -report
查看,
Configured Capacity: 18645180416 (17.36 GB)
Present Capacity: 12578476032 (11.71 GB)
DFS Remaining: 12578250752 (11.71 GB)
DFS Used: 225280 (220 KB)
DFS Used%: 0.00%
Under replicated blocks: 6
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (1):
Name: 192.168.136.130:50010 (slave)
Hostname: slave
Decommission Status : Normal
Configured Capacity: 18645180416 (17.36 GB)
DFS Used: 225280 (220 KB)
Non DFS Used: 6066704384 (5.65 GB)
DFS Remaining: 12578250752 (11.71 GB)
DFS Used%: 0.00%
DFS Remaining%: 67.46%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Aug 13 16:39:54 CST 2015
第七步:测试并访问Hadoop服务
在master机器上
配置好core-site.xml, hdfs-site.xml, yarn-site.xml mapred-site.xml是关键
在master运行 start-all.sh
$start-all.sh
创建目录:
$mkdir /home/hadoop/input
$cd /home/hadoop/input
创建文件:
$touch wordcount1.txt
$touch wordcount2.txt
二、添加内容
$echo "Hello World" > wordcount1
$echo "Hello Hadoop" > wordcount2
三、在hdfs上创建input目录
$hadoop fs -mkdir /input
四、拷贝文件到/input目录
$hadoop fs -put /home/hadoop/input/* /input
五、执行程序
$hadoop jar /home/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
六、完成后查看输出目录
$hadoop fs -ls /output
七、查看输出结果,正确如下
$hadoop fs -cat /output/part-r-00000
Hadoop 1
Hello 2
World 1
备注:
/etc/profile的内容
export JAVA_HOME=/usr/java/jdk1.7.0_71
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export LD_LIBRARY_PATH=/lib64:/usr/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export ZLIB_INCLUDE_DIR=/lib64:/usr/lib64
export MAVEN_HOME=/usr/local/apache-maven-3.2.5
export PATH=$PATH:$MAVEN_HOME/bin
export ANT_HOME=/usr/local/apache-ant-1.9.4
export PATH=$PATH:$ANT_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ANT_HOME/lib
export FINDBUGS_HOME=/usr/local/findbugs
export PATH=$PATH:$FINDBUGS_HOME/bin
~/.bash_profile内容
#Hadoop variables
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0
export HADOOP_INSTALL=/home/hadoop/hadoop-2.6.0
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
alias h='cd $HADOOP_HOME'
alias etc='cd $HADOOP_HOME/etc/hadoop'
目录/home/hadoop/hadoop-2.6.0/etc/hadoop下
slaves的内容
slave
masters的内容
master
参考资料:
安装配置手册:http://blog.csdn.net/licongcong_0224/article/details/12972889
Hadoop2.3搭建完全实践:http://www.debugo.com/hadoop2-3-install/
本文详细介绍了在多台CentOS/RHEL 64位服务器上搭建Hadoop2.6.0集群的步骤,包括安装maven、ant、protocol buffers、findbugs,以及配置Hadoop环境,如core-site.xml、hdfs-site.xml等,并提供了相关参考资料。
5619

被折叠的 条评论
为什么被折叠?



