重启CDH的方法如下:
service cloudera-scm-server-db restart
service cloudera-scm-server restart
service cloudera-scm-agent restart(这个还需要在每个slave上执行)
启动服务cloudera-scm-server时会遇到过一段时间自己挂掉,并返回cloudera-scm-server dead but pid file exists的问题。
以下为根源在cloudera-scm-server-db没有正常启动的情况。
【过程】
cloudera-scm-server启动后过一段时间自己挂掉
- [root@gyvm-4 data]# service cloudera-scm-server start
- Starting cloudera-scm-server: [ OK ]
- [root@gyvm-4 data]#
- [root@gyvm-4 data]# service cloudera-scm-server status
- cloudera-scm-server (pid 60761) is running...
- [root@gyvm-4 data]# service cloudera-scm-server status
- cloudera-scm-server (pid 60761) is running...
- [root@gyvm-4 data]# service cloudera-scm-server status
- cloudera-scm-server (pid 60761) is running...
- [root@gyvm-4 data]# service cloudera-scm-server status
- cloudera-scm-server dead but pid file exists
这时候想要完整重启cloudera-scm server-db/server
发现cloudera-scm-server-db无法重启
- [root@gyvm-4 data]# service cloudera-scm-server-db stop
- waiting for server to shut down............................................................... failed
- pg_ctl: server does not shut down
无法停止server-db的原因是残留了一个pid文件,status显示不正确,删除该文件,通过status查看,server-db其实已经停止了。
- [root@gyvm-4 data]# cd /var/lib/cloudera-scm-server-db/data
- [root@gyvm-4 data]# service cloudera-scm-server-db status
- pg_ctl: server is running (PID: 17378)
- /usr/bin/postgres "-D" "/var/lib/cloudera-scm-server-db/data"
- [root@gyvm-4 data]# rm postmaster.pid
- rm: remove regular file `postmaster.pid'? y
- [root@gyvm-4 data]# service cloudera-scm-server-db status
- pg_ctl: no server running
此时启动server-db,失败
- [root@gyvm-4 data]# service cloudera-scm-server-db start
- DB initialization done.
- waiting for server to start...............................................................could not start server
查看log,tcp/ip端口7432 被占用
- [root@gyvm-4 cloudera-scm-server]# tail db.log
- LOG: could not bind IPv4 socket: Address already in use
- HINT: Is another postmaster already running on port 7432? If not, wait a few seconds and retry.
- LOG: could not bind IPv6 socket: Address already in use
- HINT: Is another postmaster already running on port 7432? If not, wait a few seconds and retry.
- WARNING: could not create listen socket for "*"
- FATAL: could not create any TCP/IP sockets
杀掉占用该端口的进程
- [root@gyvm-4 cloudera-scm-server]# netstat -ntp | grep 7432
- tcp 0 0 192.168.1.17:7432 192.168.1.17:49784 ESTABLISHED 37118/postgres
- tcp 0 0 192.168.1.17:7432 192.168.1.8:35818 ESTABLISHED 36807/postgres
- tcp 0 0 192.168.1.17:7432 192.168.1.17:49779 ESTABLISHED 37060/postgres
- tcp 0 0 192.168.1.17:49783 192.168.1.17:7432 ESTABLISHED 36306/java
- tcp 0 0 192.168.1.17:7432 192.168.1.8:35813 ESTABLISHED 36778/postgres
- tcp 0 0 192.168.1.17:49779 192.168.1.17:7432 ESTABLISHED 36306/java
- tcp 0 0 192.168.1.17:49784 192.168.1.17:7432 ESTABLISHED 36306/java
- tcp 0 0 192.168.1.17:49778 192.168.1.17:7432 ESTABLISHED 36306/java
- tcp 0 0 192.168.1.17:7432 192.168.1.17:49778 ESTABLISHED 37059/postgres
- tcp 0 0 192.168.1.17:7432 192.168.1.8:35814 ESTABLISHED 36779/postgres
- tcp 0 0 192.168.1.17:7432 192.168.1.8:35817 ESTABLISHED 36804/postgres
- tcp 0 0 192.168.1.17:7432 192.168.1.17:49783 ESTABLISHED 37117/postgres
- [root@gyvm-4 cloudera-scm-server]# kill -9 37118
再次开启server-db,成功,启动server,成功。
- [root@gyvm-4 data]# service cloudera-scm-server-db start
- DB initialization done.
- waiting for server to start.... done
- server started
-
- [root@gyvm-4 data]# service cloudera-scm-server start
- Starting cloudera-scm-server: [ OK ]
此时,cloudera管理界面可以正常访问。
【结论】
究其原因,是cloudera-server-db没有正常启动,但是残留了pid文件postmaster.pid。
所以查看cloudera-server-db状态时,显示有误,返回cloudera-server-db是启动的状态。
在此基础上,每次启动cloudera-server就会失败。
而cloudera-server-db启动失败的原因是该服务需要的端口号被占用。