文章目录
-
- 1.简述
- 2. 安装步骤
-
- 1. linux环境准备
-
- 1.基础环境规划
- 2. host配置和主机名(四台)
- 3. 安装jdk
- 2. linux免密登录配置
-
- 1. 关闭四台服务器的防火墙和SELINUX
- 2. 免密码登录
-
- 1. 保证本机能够免密登录本机
- 2. 设置机器之间的免密登录
- 3. hadoop 安装
-
- 1. master上 解压缩安装包及创建基本目录
- 2. 配置master的hadoop环境变量
- 3. 配置相关配置文件
- 4.配置slave的hadoop环境
- 5.启动集群
- 6. 使用浏览器查看集群情况
- 3. 补充
- 1. 如何重置
- 2. JAVA_HOME报错
1.简述
整个安装步骤主要分为三步
- linux环境准备
- 设置ssh免密
- 安装hadoop
本次安装的操作系统环境为centos6.8
hadoop版本为2.6.0
下载hadooop
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
2. 安装步骤
1. linux环境准备
1.基础环境规划
本次安装的操作系统环境为centos6.8, 8G,8Core
集群规划 4个节点,一个master,3个slave
对应的ip:hostname为
10.76.0.98 dev-search-01.test
10.76.3.145 dev-search-02.test
10.76.0.129 dev-search-03.test
10.76.5.198 stag-search-03.test
可以直接执行 hostname
命令来查看本机对应的hostname
hostname
dev-search-01.test
dev-search-01.test 作为master,其他三台为数据节点slave
jdk为1.8
hadoop版本为2.6.0,下载地址为https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
2. host配置和主机名(四台)
修改四台服务器的hosts文件
10.76.0.98 dev-search-01.test
10.76.3.145 dev-search-02.test
10.76.0.129 dev-search-03.test
10.76.5.198 stag-search-03.test
使相互之间能够通过后面的hostname ping通
3. 安装jdk
这个可以使用yum安装也可以手动下载安装,
下载过程不再详述
配置环境变量,修改配置文件vim /etc/profile
export JAVA_HOME=/usr/local/jdk1.8.0_91
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
使用souce命令让立刻生效
source /etc/profile
安装完成后能够实现
[root@dev-search-01 sbin]# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
即代表安装成功
2. linux免密登录配置
这个属于linux配置的技能,核心就是关闭防火墙以及每个机器都保留其他机器的公钥,登录的时候直接使用公私钥加解密
1. 关闭四台服务器的防火墙和SELINUX
查看防火墙状态
service iptables status
关闭防火墙
service iptables stop
chkconfig iptables off
关闭SELINUX后,需要重启服务器
-- 关闭SELINUX
# vim /etc/selinux/config
-- 注释掉
#SELINUX=enforcing
#SELINUXTYPE=targeted
-- 添加
SELINUX=disabled
2. 免密码登录
1. 保证本机能够免密登录本机
- 生产秘钥
ssh-keygen -t rsa
- 将公钥追加到”authorized_keys”文件
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 赋予权限
chmod 600 .ssh/authorized_keys
- 验证本机能无密码访问
ssh dev-search-01.test
最后,依次配置其他几台服务器的无密码访问
2. 设置机器之间的免密登录
对于服务器dev-search-01.test 来说,将
dev-search-02.test
dev-search-03.test
stag-search-03.test
刚才产生的公钥都追加到 dev-search-01.test
的 ~/.ssh/authorized_keys
当中即可
达到的效果是从dev-search-01.test
可以执行
ssh dev-search-02.test
同样的,对其他几台服务器执行相同的操作。
3. hadoop 安装
1. master上 解压缩安装包及创建基本目录
#下载
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
#解压
tar -xzvf hadoop-2.6.0.tar.gz -C /usr/local
#重命名
mv hadoop-2.6.0 hadoop
2. 配置master的hadoop环境变量
配置环境变量,修改配置文件vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
使得hadoop命令在当前终端立即生效
source /etc/profile
3. 配置相关配置文件
下面配置,文件都在:/usr/local/hadoop/etc/hadoop路径下
1.配置core-site.xml
修改Hadoop核心配置文件/usr/local/hadoop/etc/hadoop/core-site.xml,通过fs.default.name指定NameNode的IP地址和端口号,通过hadoop.tmp.dir指定hadoop数据存储的临时文件夹。
<configuration><property><name>hadoop.tmp.dir</name><value>file:/usr/local/hadoop/tmp</value><description>Abase for other temporary directories.</description></property><property><name>fs.defaultFS</name><value>hdfs://dev-search-01.test:9000</value></property>
</configuration>
特别注意:如没有配置hadoop.tmp.dir参数,此时系统默认的临时目录为:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被删除,必须重新执行format才行,否则会出错。
2.配置hdfs-site.xml:
修改HDFS核心配置文件/usr/local/hadoop/etc/hadoop/hdfs-site.xml,通过dfs.replication指定HDFS的备份因子为3,通过dfs.name.dir指定namenode节点的文件存储目录,通过dfs.data.dir指定datanode节点的文件存储目录。
<configuration><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.name.dir</name><value>/usr/local/hadoop/hdfs/name</value></property><property><name>dfs.data.dir</name><value>/usr/local/hadoop/hdfs/data</value></property>
</configuration>
3.配置mapred-site.xml
拷贝mapred-site.xml.template为mapred-site.xml,在进行修改
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapred.job.tracker</name><value>http://dev-search-01.test:9001</value></property>
</configuration>
4.配置yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.resourcemanager.hostname</name><value>dev-search-01.test</value></property>
</configuration>
5.配置masters文件
增加 /usr/local/hadoop/etc/hadoop/masters文件,该文件指定namenode节点所在的服务器机器。删除localhost,添加namenode节点的主机名dev-search-01.test;不建议使用IP地址,因为IP地址可能会变化,但是主机名一般不会变化。
vi /usr/local/hadoop/etc/hadoop/masters
## 内容
dev-search-01.test
6.配置slaves文件(Master主机特有)
修改/usr/local/hadoop/etc/hadoop/slaves文件,该文件指定哪些服务器节点是datanode节点。删除locahost,添加所有datanode节点的主机名,如下所示。
vi /usr/local/hadoop/etc/hadoop/slaves
## 内容
dev-search-02.test
dev-search-03.test
stag-search-03.test
4.配置slave的hadoop环境
下面以配置dev-search-02.test的hadoop为例进行演示,用户需参照以下步骤完成其他slave服务器的配置。
1.复制hadoop到dev-search-02.test节点
scp -r /usr/local/hadoop dev-search-02.test:/usr/local/
登录dev-search-02.test服务器,删除slaves内容
rm -rf /usr/local/hadoop/etc/hadoop/slaves
2.配置环境变量
vi /etc/profile
## 内容
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
使得hadoop命令在当前终端立即生效;
source /etc/profile
依次配置其它slave服务
5.启动集群
1.格式化HDFS文件系统
进入master的/usr/local/hadoop目录,执行以下操作
bin/hadoop namenode -format
格式化namenode,第一次启动服务前执行的操作,以后不需要执行。
2.启动hadoop:
sbin/start-all.sh
3.使用jps命令查看运行情况
#master 执行 jps查看运行情况
12067 NameNode
12347 SecondaryNameNode
25341 Jps
12573 ResourceManager#slave 执行 jps查看运行情况
17104 NodeManager
16873 DataNode
21676 Jps
4.命令查看Hadoop集群的状态
通过简单的jps命令虽然可以查看HDFS文件管理系统、MapReduce服务是否启动成功,但是无法查看到Hadoop整个集群的运行状态。我们可以通过hadoop dfsadmin -report进行查看。用该命令可以快速定位出哪些节点挂掉了,HDFS的容量以及使用了多少,以及每个节点的硬盘使用情况。
hadoop dfsadmin -report
输出结果:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.20/08/05 10:36:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 158262480896 (147.39 GB)
Present Capacity: 106997108736 (99.65 GB)
DFS Remaining: 106996961280 (99.65 GB)
DFS Used: 147456 (144 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0-------------------------------------------------
Live datanodes (3):
...
...
5.hadoop 重启
sbin/stop-all.sh
sbin/start-all.sh
6. 使用浏览器查看集群情况
在浏览器打开
http://dev-search-01.test:50070/
可以看到hadoop集群的状态
使用
http://dev-search-01.test:8088/cluster/nodes
可以看到yarn的工作状态
3. 补充
格式化时候的日志,生成了/usr/local/hadoop/hdfs/name
目录
也就是说这个目录是hadoop的数据目录
[root@dev-search-01 hadoop]# bin/hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.20/08/04 18:51:04 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = dev-search-01.test/10.76.0.98
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /.........trib/capacity-scheduler/*.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1; compiled by 'jenkins' on 2014-11-13T21:10Z
STARTUP_MSG: java = 1.8.0_91
************************************************************/
20/08/04 18:51:04 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
20/08/04 18:51:04 INFO namenode.NameNode: createNameNode [-format]
20/08/04 18:51:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/04 18:51:04 WARN common.Util: Path /usr/local/hadoop/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
20/08/04 18:51:04 WARN common.Util: Path /usr/local/hadoop/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-48367cf0-3528-4278-b35a-5c8b7ce56693
20/08/04 18:51:04 INFO namenode.FSNamesystem: No KeyProvider found.
20/08/04 18:51:04 INFO namenode.FSNamesystem: fsLock is fair:true
20/08/04 18:51:04 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
20/08/04 18:51:04 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/08/04 18:51:04 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/08/04 18:51:04 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Aug 04 18:51:04
20/08/04 18:51:04 INFO util.GSet: Computing capacity for map BlocksMap
20/08/04 18:51:04 INFO util.GSet: VM type = 64-bit
20/08/04 18:51:04 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
20/08/04 18:51:04 INFO util.GSet: capacity = 2^21 = 2097152 entries
20/08/04 18:51:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/08/04 18:51:05 INFO blockmanagement.BlockManager: defaultReplication = 3
20/08/04 18:51:05 INFO blockmanagement.BlockManager: maxReplication = 512
20/08/04 18:51:05 INFO blockmanagement.BlockManager: minReplication = 1
20/08/04 18:51:05 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
20/08/04 18:51:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
20/08/04 18:51:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/08/04 18:51:05 INFO blockmanagement.BlockManager: encryptDataTransfer = false
20/08/04 18:51:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
20/08/04 18:51:05 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
20/08/04 18:51:05 INFO namenode.FSNamesystem: supergroup = supergroup
20/08/04 18:51:05 INFO namenode.FSNamesystem: isPermissionEnabled = true
20/08/04 18:51:05 INFO namenode.FSNamesystem: HA Enabled: false
20/08/04 18:51:05 INFO namenode.FSNamesystem: Append Enabled: true
20/08/04 18:51:05 INFO util.GSet: Computing capacity for map INodeMap
20/08/04 18:51:05 INFO util.GSet: VM type = 64-bit
20/08/04 18:51:05 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
20/08/04 18:51:05 INFO util.GSet: capacity = 2^20 = 1048576 entries
20/08/04 18:51:05 INFO namenode.NameNode: Caching file names occuring more than 10 times
20/08/04 18:51:05 INFO util.GSet: Computing capacity for map cachedBlocks
20/08/04 18:51:05 INFO util.GSet: VM type = 64-bit
20/08/04 18:51:05 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
20/08/04 18:51:05 INFO util.GSet: capacity = 2^18 = 262144 entries
20/08/04 18:51:05 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/08/04 18:51:05 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
20/08/04 18:51:05 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
20/08/04 18:51:05 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/08/04 18:51:05 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/08/04 18:51:05 INFO util.GSet: Computing capacity for map NameNodeRetryCache
20/08/04 18:51:05 INFO util.GSet: VM type = 64-bit
20/08/04 18:51:05 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
20/08/04 18:51:05 INFO util.GSet: capacity = 2^15 = 32768 entries
20/08/04 18:51:05 INFO namenode.NNConf: ACLs enabled? false
20/08/04 18:51:05 INFO namenode.NNConf: XAttrs enabled? true
20/08/04 18:51:05 INFO namenode.NNConf: Maximum size of an xattr: 16384
20/08/04 18:51:05 INFO namenode.FSImage: Allocated new BlockPoolId: BP-766871960-10.76.0.98-1596538265437
20/08/04 18:51:05 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
20/08/04 18:51:05 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
20/08/04 18:51:05 INFO util.ExitUtil: Exiting with status 0
20/08/04 18:51:05 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at dev-search-01.test/10.76.0.98
************************************************************/
[root@dev-search-01 hadoop]# ll
total 56
drwxr-xr-x 2 20000 20000 4096 Nov 14 2014 bin
drwxr-xr-x 3 20000 20000 4096 Nov 14 2014 etc
drwxr-xr-x 3 root root 4096 Aug 4 18:51 hdfs
drwxr-xr-x 2 20000 20000 4096 Nov 14 2014 include
drwxr-xr-x 3 20000 20000 4096 Nov 14 2014 lib
drwxr-xr-x 2 20000 20000 4096 Nov 14 2014 libexec
-rw-r--r-- 1 20000 20000 15429 Nov 14 2014 LICENSE.txt
-rw-r--r-- 1 20000 20000 101 Nov 14 2014 NOTICE.txt
-rw-r--r-- 1 20000 20000 1366 Nov 14 2014 README.txt
drwxr-xr-x 2 20000 20000 4096 Nov 14 2014 sbin
drwxr-xr-x 4 20000 20000 4096 Nov 14 2014 share
[root@dev-search-01 hadoop]# cd hdfs/
[root@dev-search-01 hdfs]# ll
total 4
drwxr-xr-x 3 root root 4096 Aug 4 18:51 name
[root@dev-search-01 hdfs]# cd name/
[root@dev-search-01 name]# ll
total 4
drwxr-xr-x 2 root root 4096 Aug 4 18:51 current
[root@dev-search-01 name]# cd current/
[root@dev-search-01 current]# ll
total 16
-rw-r--r-- 1 root root 351 Aug 4 18:51 fsimage_0000000000000000000
-rw-r--r-- 1 root root 62 Aug 4 18:51 fsimage_0000000000000000000.md5
-rw-r--r-- 1 root root 2 Aug 4 18:51 seen_txid
-rw-r--r-- 1 root root 201 Aug 4 18:51 VERSION
[root@dev-search-01 current]#
1. 如何重置
参考这里
把namenode的目录和datanode的目录都删掉
然后重新走格式化就ok了
<configuration><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.name.dir</name><value>/usr/local/hadoop/hdfs/name</value></property><property><name>dfs.data.dir</name><value>/usr/local/hadoop/hdfs/data</value></property>
</configuration>
2. JAVA_HOME报错
如果JAVA_HOME已经设置了,启动的时候有可能还会报错
Error: JAVA_HOME is not set and could not be found.
这个时候打开etc/hadoop/hadoop-env.sh
可以看到里面有引用,
export JAVA_HOME=${JAVA_HOME}
在上面再加一句,把实际的java_home加上
JAVA_HOME="/usr/local/jdk1.8.0_91"
export JAVA_HOME=${JAVA_HOME}
主要参考了
http://www.ityouknow.com/hadoop/2017/07/24/hadoop-cluster-setup.html
https://juejin.im/post/6854573210311557127