集群搭建启动失败

  • nebula 版本:(为节省回复者核对版本信息的时间,首次发帖的版本信息记得以截图形式展示)
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y

第一台机器

# cat nebula-metad.ERROR 
Log file created at: 2022/11/22 15:51:49
Running on machine: ds-asset-nebula-prd-10-209-51-185.v-bj-4.vivo.lan
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20221122 15:51:49.367237 13648 MetaDaemon.cpp:182] Parser God Role error:E_LEADER_CHANGED
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20221122 15:51:51.150089 13793 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:51:51.150288 13793 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20221122 15:51:51.150377 13682 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20221122 15:52:04.157528 13798 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:52:04.157580 13798 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20221122 15:52:04.157660 13682 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20221122 15:52:17.164840 13682 MetaClient.cpp:112] Heartbeat failed, status:LeaderChanged: Leader changed!
E20221122 15:52:30.172219 13682 MetaClient.cpp:112] Heartbeat failed, status:LeaderChanged: Leader changed!
E20221122 15:52:33.176939 13848 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:52:33.177002 13848 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20221122 15:52:33.177103 13682 MetaClient.cpp:150] RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20221122 15:52:33.177170 13682 GraphService.cpp:48] Failed to wait for meta service ready synchronously.
E20221122 15:52:33.177196 13682 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20221122 15:52:33.178915 13682 GraphDaemon.cpp:156] The graph server start failed

机器配置:

	########## networking ##########
 41 # Comma separated Meta Server Addresses
 42 --meta_server_addrs=10.209.51.185:9559,10.209.45.93:9559
 43 # Local IP used to identify the nebula-graphd process.
 44 # Change it to an address other than loopback if the service is distributed or
 45 # will be accessed remotely.
 46 --local_ip=10.209.51.185
 47 # Network device to listen on
 48 --listen_netdev=any

第二台机器

# cat nebula-graphd.ERROR 
Log file created at: 2022/11/22 15:51:56
Running on machine: ds-asset-nebula-prd-10-209-45-93.v-bj-4.vivo.lan
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20221122 15:51:56.596038 29532 MetaClient.cpp:112] Heartbeat failed, status:LeaderChanged: Leader changed!
E20221122 15:52:09.603366 29587 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:52:09.603442 29587 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:09.603507 29532 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:22.609419 29589 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:52:22.609483 29589 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:22.609553 29532 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:35.615234 29532 MetaClient.cpp:112] Heartbeat failed, status:LeaderChanged: Leader changed!
E20221122 15:52:39.623009 29851 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:52:39.623107 29851 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:39.623168 29532 MetaClient.cpp:157] RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:39.623190 29532 GraphService.cpp:48] Failed to wait for meta service ready synchronously.
E20221122 15:52:39.623214 29532 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20221122 15:52:39.625084 29532 GraphDaemon.cpp:156] The graph server start failed

配置:只截取一部分

 ########## networking ##########
 41 # Comma separated Meta Server Addresses
 42 --meta_server_addrs=10.209.51.185:9559,10.209.45.93:9559
 43 # Local IP used to identify the nebula-graphd process.
 44 # Change it to an address other than loopback if the service is distributed or
 45 # will be accessed remotely.
 46 --local_ip=10.209.45.93
 47 # Network device to listen on
 48 --listen_netdev=any
 49 # Port to listen on
 50 --port=9669

第三台机器

# cat nebula-graphd.ERROR 
Log file created at: 2022/11/22 15:52:01
Running on machine: ds-asset-nebula-prd-10-209-52-161.v-bj-4.vivo.lan
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20221122 15:52:01.183878 23434 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:52:01.184098 23434 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:01.184183 23383 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:14.191946 23383 MetaClient.cpp:112] Heartbeat failed, status:LeaderChanged: Leader changed!
E20221122 15:52:27.198717 23440 MetaClient.cpp:758] Send request to "10.209.51.185":9559, exceed retry limit
E20221122 15:52:27.198776 23440 MetaClient.cpp:759] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:27.198848 23383 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20221122 15:52:40.206787 23383 MetaClient.cpp:112] Heartbeat failed, status:LeaderChanged: Leader changed!
E20221122 15:52:43.215355 23383 MetaClient.cpp:157] LeaderChanged: Leader changed!
E20221122 15:52:43.215420 23383 GraphService.cpp:48] Failed to wait for meta service ready synchronously.
E20221122 15:52:43.215467 23383 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20221122 15:52:43.217020 23383 GraphDaemon.cpp:156] The graph server start failed

配置:只截取一部分

########## networking ##########
 41 # Comma separated Meta Server Addresses
 42 --meta_server_addrs=10.209.51.185:9559,10.209.45.93:9559
 43 # Local IP used to identify the nebula-graphd process.
 44 # Change it to an address other than loopback if the service is distributed or
 45 # will be accessed remotely.
 46 --local_ip=10.209.52.161
 47 # Network device to listen on
 48 --listen_netdev=any

三台机器的话,meta 也要配置三个啊。

这里,把本机也加上。

原来是都配置的,我后来改了这样子,报错信息是一样的。刚刚了解到可能是因为我搭集群之前启动了单机并且改了用户密码,最终导致元数据状态有改动。我先重新搭一个试试

先试试。

Running on machine: ds-asset-nebula-prd-10-209-51-185.v-bj-4.vivo.lan
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20221122 16:30:34.811050 28579 FileUtils.cpp:377] Failed to read the directory “/usr/local/nebula/data/meta/nebula” (2): No such file or directory

这个目录我这里是有的

./nebula-metad.service status

[INFO] nebula-metad(bb2e684): Running as 2887, Listening on 9559
应该是没起来

是的,你从单机变成集群,meta元数据变了。要么元数据替换信息,要么重新搭一个吧,方便省事

1 个赞