Nebula Graphd 3.5 无法启动。

提问参考模版:

  • nebula 版本:3.5.0
  • 部署方式:单机
  • 安装方式: RPM
  • 是否上生产环境:N
  • 硬件信息
    • 磁盘( SSD)
    • Linux Centos 7 x86
  • 问题的具体描述
    今天在通过其他 机器读取数据写入NebulaGraph 时, 建立了大量session. 然后使用了kill session 语句 kill 了所有的session. 然后 就出现无法通过graph studio 进行链接。 登录到 机器上看 发现通过console 也无法链接。
    请问如何恢复?
[root@node3 logs]# cat nebula-graphd.ERROR
Log file created at: 2023/07/19 19:20:27
Running on machine: node3
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20230719 19:20:27.266639 65091 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:20:37.268100 65091 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:20:47.269174 65091 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:20:57.276099 65091 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:20:57.278786 65091 MetaClient.cpp:157] Unknown error!
E20230719 19:20:57.278834 65091 GraphService.cpp:49] Failed to wait for meta service ready synchronously.
E20230719 19:20:57.278853 65091 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20230719 19:20:57.280582 65091 GraphDaemon.cpp:156] The graph server start failed


[root@node3 logs]# cat nebula-metad.ERROR
Log file created at: 2023/07/19 19:20:57
Running on machine: node3
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20230719 19:20:57.278213 65275 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_RAFT_WAL_FAIL


[root@node3 logs]# cat nebula-storaged.ERROR
Log file created at: 2023/07/19 19:20:27
Running on machine: node3
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20230719 19:20:27.282198 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:20:37.283501 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:20:47.284197 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:20:57.285468 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:21:07.286962 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:21:17.287758 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:21:27.288424 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:21:37.290001 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!
E20230719 19:21:47.290750 65154 MetaClient.cpp:112] Heartbeat failed, status:Unknown error!


[root@node3 scripts]# ./nebula.service status all
[WARN] The maximum files allowed to open might be too few: 1024
[INFO] nebula-metad(18fd7b1): Running as 65020, Listening on 9559
[INFO] nebula-graphd(18fd7b1): Exited
[WARN] nebula-storaged after v3.0.0 will not start service until it is added to cluster.
[WARN] See Manage Storage hosts:ADD HOSTS in https://docs.nebula-graph.io/
[INFO] nebula-storaged(18fd7b1): Running as 65154, Listening on 9779



[root@node3 nebula]# ./nebula-console -addr 192.168.130.7 -port 9669 -u root -p XXXXXXXX
2023/07/19 19:21:06 Fail to initialize the connection pool, host: 192.168.130.7, port: 9669, failed to open connection, error: failed to open transport, error: dial tcp 192.168.130.7:9669: connect: connection refused
panic: Fail to initialize the connection pool, host: 192.168.130.7, port: 9669, failed to open connection, error: failed to open transport, error: dial tcp 192.168.130.7:9669: connect: connection refused

goroutine 1 [running]:
log.Panicf(0xc0001c2000, 0xc5, 0x0, 0x0, 0x0)
        /opt/hostedtoolcache/go/1.16.4/x64/src/log/log.go:361 +0xc5
main.main()
        /home/runner/work/nebula-console/nebula-console/main.go:532 +0xb10

graphd 是不是 down 了?start一下

已经使用
nebula.servie stop all
nebula.servie start all
全部重启过 也不行。

似乎是 metad 起不来了,metad 有重新启动过么?

all stop和 all start
应该就是全部都重启了把?

能再看一下 meta 的日志么?

换一台机器部署console试试?

已破案。 磁盘满了。

1 个赞

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。