nebula-graphd: Exited

  • nebula 版本:3.4.1
  • 部署方式: 单机
  • 安装方式:RPM
  • 是否上生产环境:N
  • 硬件信息
    • 高性能云硬盘 50G
    • 2核 4GB
  • 编辑配置文件nebula-graphd.conf,修改–enable_authorize参数为true,nebula.service stop all , nebula.service start all 之后,nebula-graphd一直是Exited状态:
    [INFO] nebula-metad(a6dc673): Running as 32513, Listening on 9559
    [INFO] nebula-graphd(a6dc673): Exited
    [INFO] nebula-storaged(a6dc673): Running as 32636, Listening on 9779

查询发现9669,19669已占用:
nebula-gr 32587 root 67u IPv6 235797 0t0 TCP *:9669 (LISTEN)

再次使用nebula.service stop all , nebula.service start all 情况依旧,重启电脑之后执行nebula.service start all 情况依旧, kill 32587 之后nebula.service start graphd 情况依旧

E20230518 14:43:42.762362  5710 WebService.cpp:152] Failed to start web service: failed to bind to async server socket: 0.0.0.0:19669: Address already in use

E20230518 16:52:18.535353  3385 MetaClient.cpp:772] Send request to "127.0.0.1":9559, exceed retry limit
E20230518 16:52:18.535396  3385 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20230518 16:52:18.535437  3459 MetaClient.cpp:192] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect

这是连不上 metad,metad 配置的是本地9559 ?看看 metad log正常不,看看防火墙允许 9559?

是9559,metad的ERRORlog正常:
“”"
Log file created at: 2023/05/18 10:17:32
Running on machine: VM-0-14-centos
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20230518 10:17:32.890949 4035 AddHostsProcessor.cpp:72] The host “127.0.0.1”:9779 have existed!
“”"
查看metad的INFO日志,已经收到9669graph的心跳包了:
“”"
I20230519 10:29:23.545069 21645 MetaDaemon.cpp:137] localhost = “127.0.0.1”:9559
I20230519 10:29:25.385288 21762 HBProcessor.cpp:33] Receive heartbeat from “127.0.0.1”:9669, role = GRAPH
I20230519 10:29:35.388867 21762 HBProcessor.cpp:33] Receive heartbeat from “127.0.0.1”:9669, role = GRAPH
I20230519 10:29:45.398845 21762 HBProcessor.cpp:33] Receive heartbeat from “127.0.0.1”:9669, role = GRAPH
“”"
但是graph还是exited

防火墙是关闭的

端口被占用了?

1 个赞

啊,这是原因(我在手机上看的漏掉了:person_kneeling:t2:

端口就是被nebula-gr服务占用了,但是graphd在查询状态时还是exited

我kill了占用9669端口的服务,然后重启所有服务,graphd还是exited

被占用的是 19669,graphd 不只需要 listen 在 9669

那请问应该怎么解决这个问题?

看看谁占用,可以的话,杀掉进程

我之前kill过进程,重启之后还是不行

可能是非 NebulaGraph 的进程占用的

[root@VM-0-14-centos ~]# kill 32017
[root@VM-0-14-centos ~]# /usr/local/nebula/scripts/nebula.service stop all
[INFO] Stopping nebula-metad…
[INFO] Done
[INFO] Stopping nebula-graphd…
[INFO] Done
[INFO] Stopping nebula-storaged…
[INFO] Done
[root@VM-0-14-centos ~]# /usr/local/nebula/scripts/nebula.service start all
[INFO] Starting nebula-metad…
[INFO] Done
[INFO] Starting nebula-graphd…
[INFO] Done
[root@VM-0-14-centos ~]# /usr/local/nebula/scripts/nebula.service status all
[INFO] nebula-metad(a6dc673): Running as 14347, Listening on 9559
[INFO] nebula-graphd(a6dc673): Exited
[INFO] nebula-storaged(a6dc673): Exited
[root@VM-0-14-centos ~]# lsof -i :19669
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nebula-gr 11083 root 29u IPv4 795253 0t0 TCP *:19669 (LISTEN)
[root@VM-0-14-centos ~]# lsof -i :9669
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nebula-gr 11083 root 67u IPv4 795323 0t0 TCP *:9669 (LISTEN)
[root@VM-0-14-centos ~]# kill 11083
[root@VM-0-14-centos ~]# lsof -i :9669
[root@VM-0-14-centos ~]# lsof -i :19669
[root@VM-0-14-centos ~]# /usr/local/nebula/scripts/nebula.service stop all
[INFO] Stopping nebula-metad…
[INFO] Done
[INFO] Stopping nebula-graphd…
[INFO] Done
[INFO] Stopping nebula-storaged…
[INFO] Done
[root@VM-0-14-centos ~]# /usr/local/nebula/scripts/nebula.service start all
[INFO] Starting nebula-metad…
[INFO] Done
[INFO] Starting nebula-graphd…
[INFO] Done
[INFO] Starting nebula-storaged…
[INFO] Done
[root@VM-0-14-centos ~]# /usr/local/nebula/scripts/nebula.service status all
[INFO] nebula-metad(a6dc673): Running as 14867, Listening on 9559
[INFO] nebula-graphd(a6dc673): Exited
[WARN] nebula-storaged after v3.0.0 will not start service until it is added to cluster.
[WARN] See Manage Storage hosts:ADD HOSTS in https://docs.nebula-graph.io/
[INFO] nebula-storaged(a6dc673): Running as 14997, Listening on 9779
[root@VM-0-14-centos ~]# /usr/local/nebula/scripts/nebula.service status all
[INFO] nebula-metad(a6dc673): Running as 14867, Listening on 9559
[INFO] nebula-graphd(a6dc673): Exited
[INFO] nebula-storaged(a6dc673): Running as 14997, Listening on 9779
[root@VM-0-14-centos ~]#

我把9669,19669占用的进程 32017和11083都kill了,然后重启之后还是Exited :joy:

先看下日志,是不是端口占用。你是不是有多个账号在起服务?nebula-gr是谁的账号?
你也可以自己搞个简单的脚本试试能不能绑到19669, 9669

1 个赞

端口只有nebula-gr进程在LISTEN,只有一个账号root在操作,nebula-gr就是root账户

要么关了 graph-gr 要么修改 graphd config 绕过这个端口,换成别的。

如果启动能过了 bind port 的阶段还是 exited 再看日志判断是什么其他原因

我把占用9669的server、graphd和java进程全部kill,之后把graphd的端口修改为 9679,19679,重启之后依然Exited

graphd日志如下:

metad日志如下:

storaged日志如下:

还请大佬帮忙分析下,多谢

日志不全

nebula-graphd.VM-0-14-centos.root.log.INFO.20230522-154445.16940 (1.4 KB)
nebula-metad.VM-0-14-centos.root.log.INFO.20230522-154445.16881 (5.0 KB)
nebula-storaged.VM-0-14-centos.root.log.INFO.20230522-154445.17004 (11.2 KB)

重启之后的详细日志请见附件