提问参考模版:
- nebula 版本:3.4.0
- 部署方式:云端 / 分布式 / 单机
- 安装方式: RPM
- 是否上生产环境: N
- 硬件信息
- 磁盘( 推荐使用 SSD)
- CPU、内存信息
- 问题的具体描述
单机可以正常启动。
三台机器的集群无法正常启动。metad和storaged服务直接起不来,graphd服务起来很快就会宕掉
三台机器之间ping是可以ping通的,排除网络不通。 - 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
meta日志(一直在等待leader选举):
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20230327 11:25:07.590477 11684 MetaDaemon.cpp:137] localhost = "192.168.122.7":9559
I20230327 11:25:07.626291 11684 NebulaStore.cpp:52] Start the raft service...
I20230327 11:25:07.626813 11684 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 10485760 for each part by default
I20230327 11:25:07.652513 11684 RaftexService.cpp:46] Start raft service on 9560
I20230327 11:25:07.652614 11684 NebulaStore.cpp:86] Scan the local path, and init the spaces_
I20230327 11:25:07.652652 11684 NebulaStore.cpp:93] Scan path "/home/nebula/data/meta/nebula/0"
I20230327 11:25:07.652665 11684 NebulaStore.cpp:272] Init data from partManager for "192.168.122.7":9559
I20230327 11:25:07.652683 11684 NebulaStore.cpp:388] Create data space 0
I20230327 11:25:07.668061 11684 RocksEngine.cpp:107] open rocksdb on /home/nebula/data/meta/nebula/0/data
I20230327 11:25:07.686794 11684 NebulaStore.cpp:451] Space 0, part 0 has been added, asLearner 0
I20230327 11:25:07.686836 11684 NebulaStore.cpp:79] Register handler...
I20230327 11:25:07.686846 11684 MetaDaemonInit.cpp:106] Waiting for the leader elected...
I20230327 11:25:07.686851 11684 MetaDaemonInit.cpp:118] Leader has not been elected, sleep 1s
I20230327 11:25:08.686931 11684 MetaDaemonInit.cpp:118] Leader has not been elected, sleep 1s
I20230327 11:25:09.687039 11684 MetaDaemonInit.cpp:118] Leader has not been elected, sleep 1s
I20230327 11:25:10.687166 11684 MetaDaemonInit.cpp:118] Leader has not been elected, sleep 1s
graphd报错日志(应该是找不到metad服务):
E20230327 11:25:49.703553 11941 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20230327 11:25:49.703676 11712 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20230327 11:25:52.709307 11943 MetaClient.cpp:772] Send request to "192.168.122.9":9559, exceed retry limit
E20230327 11:25:52.709403 11943 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20230327 11:25:52.709540 11712 MetaClient.cpp:150] RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20230327 11:25:52.709599 11712 GraphService.cpp:49] Failed to wait for meta service ready synchronously.
E20230327 11:25:52.709617 11712 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20230327 11:25:52.712695 11712 GraphDaemon.cpp:156] The graph server start failed
storaged服务日志差不多,也是找不到metad服务。
求助为什么集群情况下metad的三个服务一直在等待leader选举导致服务无法正常启动




