文章目录

    • 1.简述
    • 2. 安装步骤
      • 1. linux环境准备
        • 1.基础环境规划
        • 2. host配置和主机名(四台)
        • 3. 安装jdk
      • 2. linux免密登录配置
        • 1. 关闭四台服务器的防火墙和SELINUX
        • 2. 免密码登录
          • 1. 保证本机能够免密登录本机
          • 2. 设置机器之间的免密登录
      • 3. hadoop 安装
        • 1. master上 解压缩安装包及创建基本目录
        • 2. 配置master的hadoop环境变量
        • 3. 配置相关配置文件
        • 4.配置slave的hadoop环境
        • 5.启动集群
        • 6. 使用浏览器查看集群情况
    • 3. 补充
    • 1. 如何重置
    • 2. JAVA_HOME报错

1.简述

整个安装步骤主要分为三步

  1. linux环境准备
  2. 设置ssh免密
  3. 安装hadoop

本次安装的操作系统环境为centos6.8
hadoop版本为2.6.0

下载hadooop
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

2. 安装步骤

1. linux环境准备

1.基础环境规划

本次安装的操作系统环境为centos6.8, 8G,8Core
集群规划 4个节点,一个master,3个slave
对应的ip:hostname为


10.76.0.98 dev-search-01.test
10.76.3.145 dev-search-02.test
10.76.0.129 dev-search-03.test
10.76.5.198 stag-search-03.test

可以直接执行 hostname命令来查看本机对应的hostname

hostname
dev-search-01.test

dev-search-01.test 作为master,其他三台为数据节点slave

jdk为1.8
hadoop版本为2.6.0,下载地址为https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

2. host配置和主机名(四台)

修改四台服务器的hosts文件


10.76.0.98 dev-search-01.test
10.76.3.145 dev-search-02.test
10.76.0.129 dev-search-03.test
10.76.5.198 stag-search-03.test

使相互之间能够通过后面的hostname ping通

3. 安装jdk

这个可以使用yum安装也可以手动下载安装,
下载过程不再详述
配置环境变量,修改配置文件vim /etc/profile

export JAVA_HOME=/usr/local/jdk1.8.0_91
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

使用souce命令让立刻生效

source /etc/profile

安装完成后能够实现

[root@dev-search-01 sbin]# java -version
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

即代表安装成功

2. linux免密登录配置

这个属于linux配置的技能,核心就是关闭防火墙以及每个机器都保留其他机器的公钥,登录的时候直接使用公私钥加解密

1. 关闭四台服务器的防火墙和SELINUX

查看防火墙状态

service iptables status

关闭防火墙

service iptables stop
chkconfig iptables off

关闭SELINUX后,需要重启服务器

-- 关闭SELINUX
# vim /etc/selinux/config
-- 注释掉
#SELINUX=enforcing
#SELINUXTYPE=targeted
-- 添加
SELINUX=disabled

2. 免密码登录

1. 保证本机能够免密登录本机
  1. 生产秘钥
ssh-keygen -t rsa
  1. 将公钥追加到”authorized_keys”文件
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  1. 赋予权限
chmod 600 .ssh/authorized_keys
  1. 验证本机能无密码访问
    ssh dev-search-01.test
    最后,依次配置其他几台服务器的无密码访问
2. 设置机器之间的免密登录

对于服务器dev-search-01.test 来说,将

dev-search-02.test
dev-search-03.test
stag-search-03.test

刚才产生的公钥都追加到 dev-search-01.test~/.ssh/authorized_keys当中即可
达到的效果是从dev-search-01.test可以执行

ssh dev-search-02.test

同样的,对其他几台服务器执行相同的操作。

3. hadoop 安装

1. master上 解压缩安装包及创建基本目录

#下载
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
#解压
tar -xzvf hadoop-2.6.0.tar.gz -C /usr/local
#重命名
mv hadoop-2.6.0 hadoop

2. 配置master的hadoop环境变量

配置环境变量,修改配置文件vi /etc/profile

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

使得hadoop命令在当前终端立即生效

source /etc/profile

3. 配置相关配置文件

下面配置,文件都在:/usr/local/hadoop/etc/hadoop路径下
1.配置core-site.xml
修改Hadoop核心配置文件/usr/local/hadoop/etc/hadoop/core-site.xml,通过fs.default.name指定NameNode的IP地址和端口号,通过hadoop.tmp.dir指定hadoop数据存储的临时文件夹。

<configuration><property><name>hadoop.tmp.dir</name><value>file:/usr/local/hadoop/tmp</value><description>Abase for other temporary directories.</description></property><property><name>fs.defaultFS</name><value>hdfs://dev-search-01.test:9000</value></property>
</configuration>

特别注意:如没有配置hadoop.tmp.dir参数,此时系统默认的临时目录为:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被删除,必须重新执行format才行,否则会出错。

2.配置hdfs-site.xml:

修改HDFS核心配置文件/usr/local/hadoop/etc/hadoop/hdfs-site.xml,通过dfs.replication指定HDFS的备份因子为3,通过dfs.name.dir指定namenode节点的文件存储目录,通过dfs.data.dir指定datanode节点的文件存储目录。

<configuration><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.name.dir</name><value>/usr/local/hadoop/hdfs/name</value></property><property><name>dfs.data.dir</name><value>/usr/local/hadoop/hdfs/data</value></property>
</configuration>

3.配置mapred-site.xml

拷贝mapred-site.xml.template为mapred-site.xml,在进行修改

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapred.job.tracker</name><value>http://dev-search-01.test:9001</value></property>
</configuration>

4.配置yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.resourcemanager.hostname</name><value>dev-search-01.test</value></property>
</configuration>

5.配置masters文件

增加 /usr/local/hadoop/etc/hadoop/masters文件,该文件指定namenode节点所在的服务器机器。删除localhost,添加namenode节点的主机名dev-search-01.test;不建议使用IP地址,因为IP地址可能会变化,但是主机名一般不会变化。

vi /usr/local/hadoop/etc/hadoop/masters
## 内容
dev-search-01.test

6.配置slaves文件(Master主机特有)

修改/usr/local/hadoop/etc/hadoop/slaves文件,该文件指定哪些服务器节点是datanode节点。删除locahost,添加所有datanode节点的主机名,如下所示。

vi /usr/local/hadoop/etc/hadoop/slaves
## 内容
dev-search-02.test
dev-search-03.test
stag-search-03.test

4.配置slave的hadoop环境

下面以配置dev-search-02.test的hadoop为例进行演示,用户需参照以下步骤完成其他slave服务器的配置。

1.复制hadoop到dev-search-02.test节点

scp -r /usr/local/hadoop dev-search-02.test:/usr/local/

登录dev-search-02.test服务器,删除slaves内容

rm -rf /usr/local/hadoop/etc/hadoop/slaves

2.配置环境变量

vi /etc/profile
## 内容
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

使得hadoop命令在当前终端立即生效;

source /etc/profile

依次配置其它slave服务

5.启动集群

1.格式化HDFS文件系统

进入master的/usr/local/hadoop目录,执行以下操作

bin/hadoop namenode -format

格式化namenode,第一次启动服务前执行的操作,以后不需要执行。

2.启动hadoop:

sbin/start-all.sh

3.使用jps命令查看运行情况

#master 执行 jps查看运行情况
12067 NameNode
12347 SecondaryNameNode
25341 Jps
12573 ResourceManager#slave 执行 jps查看运行情况
17104 NodeManager
16873 DataNode
21676 Jps

4.命令查看Hadoop集群的状态

通过简单的jps命令虽然可以查看HDFS文件管理系统、MapReduce服务是否启动成功,但是无法查看到Hadoop整个集群的运行状态。我们可以通过hadoop dfsadmin -report进行查看。用该命令可以快速定位出哪些节点挂掉了,HDFS的容量以及使用了多少,以及每个节点的硬盘使用情况。

hadoop dfsadmin -report

输出结果:

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.20/08/05 10:36:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 158262480896 (147.39 GB)
Present Capacity: 106997108736 (99.65 GB)
DFS Remaining: 106996961280 (99.65 GB)
DFS Used: 147456 (144 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0-------------------------------------------------
Live datanodes (3):
...
...

5.hadoop 重启

sbin/stop-all.sh
sbin/start-all.sh

6. 使用浏览器查看集群情况

在浏览器打开

http://dev-search-01.test:50070/

可以看到hadoop集群的状态

使用

http://dev-search-01.test:8088/cluster/nodes

可以看到yarn的工作状态

3. 补充

格式化时候的日志,生成了/usr/local/hadoop/hdfs/name目录
也就是说这个目录是hadoop的数据目录

[root@dev-search-01 hadoop]# bin/hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.20/08/04 18:51:04 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = dev-search-01.test/10.76.0.98
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.0
STARTUP_MSG:   classpath = /.........trib/capacity-scheduler/*.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1; compiled by 'jenkins' on 2014-11-13T21:10Z
STARTUP_MSG:   java = 1.8.0_91
************************************************************/
20/08/04 18:51:04 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
20/08/04 18:51:04 INFO namenode.NameNode: createNameNode [-format]
20/08/04 18:51:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/04 18:51:04 WARN common.Util: Path /usr/local/hadoop/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
20/08/04 18:51:04 WARN common.Util: Path /usr/local/hadoop/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-48367cf0-3528-4278-b35a-5c8b7ce56693
20/08/04 18:51:04 INFO namenode.FSNamesystem: No KeyProvider found.
20/08/04 18:51:04 INFO namenode.FSNamesystem: fsLock is fair:true
20/08/04 18:51:04 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
20/08/04 18:51:04 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/08/04 18:51:04 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/08/04 18:51:04 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Aug 04 18:51:04
20/08/04 18:51:04 INFO util.GSet: Computing capacity for map BlocksMap
20/08/04 18:51:04 INFO util.GSet: VM type       = 64-bit
20/08/04 18:51:04 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
20/08/04 18:51:04 INFO util.GSet: capacity      = 2^21 = 2097152 entries
20/08/04 18:51:05 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/08/04 18:51:05 INFO blockmanagement.BlockManager: defaultReplication         = 3
20/08/04 18:51:05 INFO blockmanagement.BlockManager: maxReplication             = 512
20/08/04 18:51:05 INFO blockmanagement.BlockManager: minReplication             = 1
20/08/04 18:51:05 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
20/08/04 18:51:05 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
20/08/04 18:51:05 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/08/04 18:51:05 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
20/08/04 18:51:05 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
20/08/04 18:51:05 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
20/08/04 18:51:05 INFO namenode.FSNamesystem: supergroup          = supergroup
20/08/04 18:51:05 INFO namenode.FSNamesystem: isPermissionEnabled = true
20/08/04 18:51:05 INFO namenode.FSNamesystem: HA Enabled: false
20/08/04 18:51:05 INFO namenode.FSNamesystem: Append Enabled: true
20/08/04 18:51:05 INFO util.GSet: Computing capacity for map INodeMap
20/08/04 18:51:05 INFO util.GSet: VM type       = 64-bit
20/08/04 18:51:05 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
20/08/04 18:51:05 INFO util.GSet: capacity      = 2^20 = 1048576 entries
20/08/04 18:51:05 INFO namenode.NameNode: Caching file names occuring more than 10 times
20/08/04 18:51:05 INFO util.GSet: Computing capacity for map cachedBlocks
20/08/04 18:51:05 INFO util.GSet: VM type       = 64-bit
20/08/04 18:51:05 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
20/08/04 18:51:05 INFO util.GSet: capacity      = 2^18 = 262144 entries
20/08/04 18:51:05 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/08/04 18:51:05 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
20/08/04 18:51:05 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
20/08/04 18:51:05 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/08/04 18:51:05 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/08/04 18:51:05 INFO util.GSet: Computing capacity for map NameNodeRetryCache
20/08/04 18:51:05 INFO util.GSet: VM type       = 64-bit
20/08/04 18:51:05 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
20/08/04 18:51:05 INFO util.GSet: capacity      = 2^15 = 32768 entries
20/08/04 18:51:05 INFO namenode.NNConf: ACLs enabled? false
20/08/04 18:51:05 INFO namenode.NNConf: XAttrs enabled? true
20/08/04 18:51:05 INFO namenode.NNConf: Maximum size of an xattr: 16384
20/08/04 18:51:05 INFO namenode.FSImage: Allocated new BlockPoolId: BP-766871960-10.76.0.98-1596538265437
20/08/04 18:51:05 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
20/08/04 18:51:05 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
20/08/04 18:51:05 INFO util.ExitUtil: Exiting with status 0
20/08/04 18:51:05 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at dev-search-01.test/10.76.0.98
************************************************************/
[root@dev-search-01 hadoop]# ll
total 56
drwxr-xr-x 2 20000 20000  4096 Nov 14  2014 bin
drwxr-xr-x 3 20000 20000  4096 Nov 14  2014 etc
drwxr-xr-x 3 root  root   4096 Aug  4 18:51 hdfs
drwxr-xr-x 2 20000 20000  4096 Nov 14  2014 include
drwxr-xr-x 3 20000 20000  4096 Nov 14  2014 lib
drwxr-xr-x 2 20000 20000  4096 Nov 14  2014 libexec
-rw-r--r-- 1 20000 20000 15429 Nov 14  2014 LICENSE.txt
-rw-r--r-- 1 20000 20000   101 Nov 14  2014 NOTICE.txt
-rw-r--r-- 1 20000 20000  1366 Nov 14  2014 README.txt
drwxr-xr-x 2 20000 20000  4096 Nov 14  2014 sbin
drwxr-xr-x 4 20000 20000  4096 Nov 14  2014 share
[root@dev-search-01 hadoop]# cd hdfs/
[root@dev-search-01 hdfs]# ll
total 4
drwxr-xr-x 3 root root 4096 Aug  4 18:51 name
[root@dev-search-01 hdfs]# cd name/
[root@dev-search-01 name]# ll
total 4
drwxr-xr-x 2 root root 4096 Aug  4 18:51 current
[root@dev-search-01 name]# cd current/
[root@dev-search-01 current]# ll
total 16
-rw-r--r-- 1 root root 351 Aug  4 18:51 fsimage_0000000000000000000
-rw-r--r-- 1 root root  62 Aug  4 18:51 fsimage_0000000000000000000.md5
-rw-r--r-- 1 root root   2 Aug  4 18:51 seen_txid
-rw-r--r-- 1 root root 201 Aug  4 18:51 VERSION
[root@dev-search-01 current]#

1. 如何重置

参考这里
把namenode的目录和datanode的目录都删掉
然后重新走格式化就ok了

<configuration><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.name.dir</name><value>/usr/local/hadoop/hdfs/name</value></property><property><name>dfs.data.dir</name><value>/usr/local/hadoop/hdfs/data</value></property>
</configuration>

2. JAVA_HOME报错

如果JAVA_HOME已经设置了,启动的时候有可能还会报错

Error: JAVA_HOME is not set and could not be found.

这个时候打开etc/hadoop/hadoop-env.sh
可以看到里面有引用,

export JAVA_HOME=${JAVA_HOME}

在上面再加一句,把实际的java_home加上

JAVA_HOME="/usr/local/jdk1.8.0_91"
export JAVA_HOME=${JAVA_HOME}

主要参考了
http://www.ityouknow.com/hadoop/2017/07/24/hadoop-cluster-setup.html
https://juejin.im/post/6854573210311557127