Nebula-Graph异常终止并重启失败

  • nebula 版本:3.6.0

  • 部署方式:单机

  • 安装方式:RPM

  • 是否上生产环境:N

  • 问题的具体描述
    在正常运行一段时间后,发现studio无法连接。查看进程和日志发现graphd进程已不存在,
    尝试重启Nebula-Graph,结果启动失败,日志中不断提示无法连接metad服务,错误编码为-3521。
    实际上metad服务进程存在,并正常运行。

  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
    graph info日志:

I20240204 10:23:53.071693 18604 MetaClient.cpp:80] Create meta client to "127.0.0.1":9559
I20240204 10:23:53.071712 18604 MetaClient.cpp:81] root path: /data/nebula-graph/nebula-graph-3.6.0.el7.x86_64, data path size: 0
I20240204 10:23:53.071743 18604 MetaClient.cpp:2664] Send heartbeat to "127.0.0.1":9559, clusterId 0
I20240204 10:23:53.082346 18629 ThriftClientManager-inl.h:53] There is no existing client to "172.16.16.3":9559, trying to create one
I20240204 10:23:53.082397 18629 ThriftClientManager-inl.h:74] Connecting to "127.0.0.1":9559 for 1 times
I20240204 10:23:53.082768 18629 MetaClient.cpp:730] Send request to meta "127.0.0.1":9559
E20240204 10:23:53.083587 18604 MetaClient.cpp:112] Heartbeat failed, status:Unknown error -3521!
I20240204 10:23:53.083851 18604 MetaClient.cpp:137] Waiting for the metad to be ready!
I20240204 10:24:03.083956 18604 MetaClient.cpp:2664] Send heartbeat to "127.0.0.1":9559, clusterId 0
I20240204 10:24:03.084396 18701 ThriftClientManager-inl.h:53] There is no existing client to "172.16.16.3":9559, trying to create one
I20240204 10:24:03.084424 18701 ThriftClientManager-inl.h:74] Connecting to "127.0.0.1":9559 for 1 times
I20240204 10:24:03.084637 18701 MetaClient.cpp:730] Send request to meta "127.0.0.1":9559
E20240204 10:24:03.085335 18604 MetaClient.cpp:112] Heartbeat failed, status:Unknown error -3521!
I20240204 10:24:03.085369 18604 MetaClient.cpp:137] Waiting for the metad to be ready!
I20240204 10:24:13.085503 18604 MetaClient.cpp:2664] Send heartbeat to "127.0.0.1":9559, clusterId 0
I20240204 10:24:13.085958 18774 ThriftClientManager-inl.h:53] There is no existing client to "172.16.16.3":9559, trying to create one
I20240204 10:24:13.085986 18774 ThriftClientManager-inl.h:74] Connecting to "127.0.0.1":9559 for 1 times
I20240204 10:24:13.086153 18774 MetaClient.cpp:730] Send request to meta "127.0.0.1":9559
E20240204 10:24:13.086817 18604 MetaClient.cpp:112] Heartbeat failed, status:Unknown error -3521!
I20240204 10:24:13.086853 18604 MetaClient.cpp:137] Waiting for the metad to be ready!
I20240204 10:24:23.087031 18604 MetaClient.cpp:2664] Send heartbeat to "127.0.0.1":9559, clusterId 0
I20240204 10:24:23.087589 18851 ThriftClientManager-inl.h:53] There is no existing client to "172.16.16.3":9559, trying to create one
I20240204 10:24:23.087635 18851 ThriftClientManager-inl.h:74] Connecting to "127.0.0.1":9559 for 1 times
I20240204 10:24:23.087868 18851 MetaClient.cpp:730] Send request to meta "127.0.0.1":9559
E20240204 10:24:23.088572 18604 MetaClient.cpp:112] Heartbeat failed, status:Unknown error -3521!
I20240204 10:24:23.088728 18629 ThriftClientManager-inl.h:47] Getting a client to "127.0.0.1":9559
I20240204 10:24:23.088752 18629 MetaClient.cpp:730] Send request to meta "127.0.0.1":9559
I20240204 10:24:23.089113 18701 ThriftClientManager-inl.h:47] Getting a client to "127.0.0.1":9559
I20240204 10:24:23.089139 18701 MetaClient.cpp:730] Send request to meta "127.0.0.1":9559
E20240204 10:24:23.090294 18604 MetaClient.cpp:157] Unknown error -3521!
E20240204 10:24:23.090332 18604 GraphService.cpp:49] Failed to wait for meta service ready synchronously.
E20240204 10:24:23.090358 18604 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
I20240204 10:24:23.090391 18604 MetaClient.cpp:89] ~MetaClient
I20240204 10:24:23.090402 18604 ThriftClientManager.h:27] ~ThriftClientManager
E20240204 10:24:23.091830 18604 GraphDaemon.cpp:156] The graph server start failed

metad info信息

I20240204 10:24:23.088223  4759 HBProcessor.cpp:33] Receive heartbeat from "127.0.0.1":29669, role = GRAPH
I20240204 10:24:23.088268  4759 HBProcessor.cpp:89] Update host "127.0.0.1":29669 dir info, root path: /data/nebula-graph/nebula-graph-3.6.0.el7.x86_64, data path size: 0
E20240204 10:24:23.089320  4759 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_RAFT_WAL_FAIL
I20240204 10:24:25.152861  4759 HBProcessor.cpp:33] Receive heartbeat from "127.0.0.1":9779, role = STORAGE
I20240204 10:24:25.152958  4759 HBProcessor.cpp:89] Update host "127.0.0.1":9779 dir info, root path: /data/nebula-graph/nebula-graph-3.6.0.el7.x86_64, data path size: 1
I20240204 10:24:35.153808  4759 HBProcessor.cpp:33] Receive heartbeat from "127.0.0.1":9779, role = STORAGE
I20240204 10:24:35.153908  4759 HBProcessor.cpp:89] Update host "127.0.0.1":9779 dir info, root path: /data/nebula-graph/nebula-graph-3.6.0.el7.x86_64, data path size: 1

把 meta 和 graph、storage 3 个服务的配置贴一下。

目前问题已解决,graphd服务已能成功启动。
问题原因:
其他服务日志数量暴增,占用了大部分磁盘空间,导致磁盘空间不足,剩余磁盘空间<500M
修复步骤:
将日志进行清理,并重启graphd服务

1 个赞

感谢回复,配置文件并未大改,都沿用了默认配置,只是修改了对应服务的端口信息

嗯嗯,:thinking: 你可以勾选自己的回复为解决方案哦~

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。