nebula-graph 中分布式通信问题?

提问参考模版:

  • nebula 版本:3.1.0
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y / N
  • 硬件信息
    • 磁盘 SSD
    • CPU 最小500最大1000 内存最小512m 最大2048m
  • 问题的具体描述:分布式部署,graph组件启动失败
    3台机子都是graph启动不起来 然后错误日志基本都是这一条
    端口都是打开了的
    配置文件的meta服务配置 3台机子都一样:
# Comma separated Meta Server addresses
--meta_server_addrs=nebula-meta-aliyun.prod:9559,nebula-storage-aliyun.prod:9559,nebula-graph-aliyun.prod:9559
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=nebula-meta-aliyun.prod
# Meta daemon listening port
--port=9559
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19559
# Port to listen on Storage with HTTP protocol, it corresponds to ws_http_port in storage's configuration file
--ws_storage_http_port=19779

nebula-graph.ERROR如下:

Log file created at: 2022/06/29 09:45:11
Running on machine: nebula-meta-aliyun-58745fcf9c-5bvwp
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20220629 09:45:11.121978  1269 MetaClient.cpp:744] Send request to "nebula-meta-aliyun.prod":9559, exceed retry limit
E20220629 09:45:11.137827  1269 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220629 09:45:11.137899  1140 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220629 09:45:24.151472  1323 MetaClient.cpp:744] Send request to "nebula-meta-aliyun.prod":9559, exceed retry limit
E20220629 09:45:24.153199  1323 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220629 09:45:24.153263  1140 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220629 09:45:37.165405  1325 MetaClient.cpp:744] Send request to "nebula-storage-aliyun.prod":9559, exceed retry limit
E20220629 09:45:37.167106  1325 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:45:37.167165  1140 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:45:50.178458  1327 MetaClient.cpp:744] Send request to "nebula-meta-aliyun.prod":9559, exceed retry limit
E20220629 09:45:50.180215  1327 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220629 09:45:50.180290  1140 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220629 09:45:53.190163  1454 MetaClient.cpp:744] Send request to "nebula-meta-aliyun.prod":9559, exceed retry limit
E20220629 09:45:53.190193  1454 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220629 09:45:53.190248  1140 MetaClient.cpp:136] RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220629 09:45:53.190279  1140 GraphService.cpp:48] Failed to wait for meta service ready synchronously.
E20220629 09:45:53.190304  1140 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20220629 09:45:53.198717  1140 GraphDaemon.cpp:164] The graph server start failed

nebula-storage.ERROR如下

Log file created at: 2022/06/29 09:45:11
Running on machine: nebula-meta-aliyun-58745fcf9c-5bvwp
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20220629 09:45:11.318322  1317 MetaClient.cpp:744] Send request to "nebula-storage-aliyun.prod":9559, exceed retry limit
E20220629 09:45:11.335291  1317 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:45:11.335350  1249 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:45:24.364291  1324 MetaClient.cpp:744] Send request to "nebula-meta-aliyun.prod":9559, exceed retry limit
E20220629 09:45:24.365664  1324 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220629 09:45:24.365715  1249 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220629 09:45:37.378767  1326 MetaClient.cpp:744] Send request to "nebula-storage-aliyun.prod":9559, exceed retry limit
E20220629 09:45:37.380332  1326 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:45:37.380386  1249 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:45:50.397750  1328 MetaClient.cpp:744] Send request to "nebula-storage-aliyun.prod":9559, exceed retry limit
E20220629 09:45:50.399629  1328 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:45:50.399679  1249 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:46:03.411484  1455 MetaClient.cpp:744] Send request to "nebula-meta-aliyun.prod":9559, exceed retry limit
E20220629 09:46:03.412739  1455 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220629 09:46:03.412789  1249 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220629 09:46:16.426105  1457 MetaClient.cpp:744] Send request to "nebula-storage-aliyun.prod":9559, exceed retry limit
E20220629 09:46:16.427472  1457 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:46:16.427525  1249 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220629 09:46:29.436530  1459 MetaClient.cpp:744] Send request to "nebula-meta-aliyun.prod":9559, exceed retry limit
E20220629 09:46:29.437741  1459 MetaClient.cpp:745] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220629 09:46:29.437800  1249 MetaClient.cpp:98] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220629 09:46:42.449364  1461 MetaClient.cpp:744] Send request to "nebula-storage-aliyun.prod":9559, exceed retry limit

然后我将集群3x3x3 改为 一个meta服务 1x3x3 就没问题了,难道因为是集群的启动顺序要严格按照 -meta_server_addrs的书写顺序启动吗?

没关系的。
log里看是graph/stroage都连不到meta,你可以贴下meta的日志。

最后换了种部署方式,成功了