nebula3.0 启动不起来

  • nebula 版本:
  • 部署方式:分布式
  • 安装方式: RPM
  • 是否为线上版本 N
  • 硬件信息
    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息
  • 问题的具体描述
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
    graph
Log file created at: 2022/03/01 09:23:17
Running on machine: hg46
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20220301 09:23:17.504344  6521 GraphDaemon.cpp:129] Starting Graph HTTP Service
I20220301 09:23:17.518321  6529 WebService.cpp:128] Web service started on HTTP[19669], HTTP2[19670]
I20220301 09:23:17.518417  6521 GraphDaemon.cpp:143] Number of networking IO threads: 48
I20220301 09:23:17.518443  6521 GraphDaemon.cpp:152] Number of worker threads: 48
I20220301 09:23:17.593760  6521 MetaClient.cpp:84] Create meta client to "hg46":9559
I20220301 09:23:17.593838  6521 MetaClient.cpp:85] root path: /usr/local/nebula, data path size: 0
I20220301 09:23:17.607337  6653 ThriftClientManager-inl.h:66] resolve "hg45":9559 as "192.168.100.45":9559
I20220301 09:23:18.610850  6653 ThriftClientManager-inl.h:66] resolve "hg46":9559 as "192.168.100.46":9559
I20220301 09:23:19.614756  6653 ThriftClientManager-inl.h:66] resolve "hg45":9559 as "192.168.100.45":9559
I20220301 09:23:20.618479  6653 ThriftClientManager-inl.h:66] resolve "hg46":9559 as "192.168.100.46":9559
E20220301 09:23:20.618669  6653 MetaClient.cpp:735] Send request to "hg46":9559, exceed retry limit
E20220301 09:23:20.619000  6653 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220301 09:23:20.619105  6521 MetaClient.cpp:121] RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220301 09:23:20.619166  6521 GraphService.cpp:47] Failed to wait for meta service ready synchronously.
E20220301 09:23:20.619204  6521 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20220301 09:23:20.625504  6521 GraphDaemon.cpp:163] The graph server start failed

mate

I20220301 09:35:27.701663  6500 RaftPart.cpp:1184] [Port: 9560, Space: 0, Part: 0] Sending out an election request (space = 0, part = 0, term = 1, lastLogId = 0, lastLogTerm = 0, candidateIP = hg46, candidatePort = 9560), isPreVote = 1
I20220301 09:35:27.701800  6538 ThriftClientManager-inl.h:66] resolve "hg45":9560 as "192.168.100.45":9560
I20220301 09:35:27.701987  6497 RaftPart.cpp:1150] [Port: 9560, Space: 0, Part: 0] Did not get enough votes from election of term 1, isPreVote = 1
I20220301 09:35:28.639664  6454 MetaDaemonInit.cpp:116] Leader has not been elected, sleep 1s
I20220301 09:35:29.562912  6501 RaftPart.cpp:1184] [Port: 9560, Space: 0, Part: 0] Sending out an election request (space = 0, part = 0, term = 1, lastLogId = 0, lastLogTerm = 0, candidateIP = hg46, candidatePort = 9560), isPreVote = 1
I20220301 09:35:29.563045  6540 ThriftClientManager-inl.h:66] resolve "hg45":9560 as "192.168.100.45":9560
I20220301 09:35:29.563314  6497 RaftPart.cpp:1150] [Port: 9560, Space: 0, Part: 0] Did not get enough votes from election of term 1, isPreVote = 1
I20220301 09:35:29.639751  6454 MetaDaemonInit.cpp:116] Leader has not been elected, sleep 1s

Storage

Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20220301 09:23:17.573156  6607 StorageDaemon.cpp:128] localhost = "hg46":9779
I20220301 09:23:17.573493  6607 StorageDaemon.cpp:143] data path= /usr/local/nebula/data/storage
I20220301 09:23:17.625048  6607 MetaClient.cpp:84] Create meta client to "hg46":9559
I20220301 09:23:17.625113  6607 MetaClient.cpp:85] root path: /usr/local/nebula, data path size: 1
I20220301 09:23:17.639215  6672 ThriftClientManager-inl.h:66] resolve "hg46":9559 as "192.168.100.46":9559
I20220301 09:23:18.647588  6672 ThriftClientManager-inl.h:66] resolve "hg45":9559 as "192.168.100.45":9559
I20220301 09:23:19.651396  6672 ThriftClientManager-inl.h:66] resolve "hg45":9559 as "192.168.100.45":9559
I20220301 09:23:20.654996  6672 ThriftClientManager-inl.h:66] resolve "hg46":9559 as "192.168.100.46":9559
E20220301 09:23:20.655189  6672 MetaClient.cpp:735] Send request to "hg46":9559, exceed retry limit
E20220301 09:23:20.655490  6672 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20220301 09:23:20.655577  6607 MetaClient.cpp:121] RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20220301 09:23:20.655628  6607 StorageServer.cpp:188] waitForMetadReady error!
E20220301 09:23:20.655647  6607 StorageDaemon.cpp:176] Storage server start failed

graph 配置信息

mate 配置信息

Storage 配置信息
image

本以为是端口被占用,修改完端口 一样的错误,权限是root

show hosts 看下服务状态

image

3.0 版本吗,你 add hosts 把 storage 服务加上去了吗,参考文档 管理 Storage 主机 - Nebula Graph Database 手册

storage 就没有起来,添加了 也是离线状态

添加了,要等一会,才会处于 online 状态的

image

这个状态是

:thinking: 可能是 storage 服务没添加成功,你确定下 ip 地址是否是真实的 ip,然后再执行下 add hosts 吧。

确定是真是ip

I20220301 10:44:17.329663 10812 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220301 10:44:27.329748 10812 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220301 10:44:27.330127 10812 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!
I20220301 10:44:27.330168 10812 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220301 10:44:37.330255 10812 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
I20220301 10:44:37.330435 10991 ThriftClientManager-inl.h:66] resolve "hg45":9559 as "192.168.100.45":9559
E20220301 10:44:37.331198 10812 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!
I20220301 10:44:37.331283 10812 MetaClient.cpp:133] Waiting for the metad to be ready!

这是我刚才看了一下 nebula-storaged.INFO

看配置你是双节点吗?

是的 因为这是测试 3.0.0 这个版本的情况及性能

难道这跟节点的个数有关吗

配置文件可以使用机器名吗

W20220301 11:21:51.561070 11531 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
I20220301 11:21:51.561372 11940 ThriftClientManager-inl.h:66] resolve "hg45":9559 as "192.168.100.45":9559
E20220301 11:21:51.562199 11531 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!
I20220301 11:21:51.562252 11531 MetaClient.cpp:133] Waiting for the metad to be ready!

:thinking: 我记得是可以的,3.0 是支持域名的。我找个研发同学看看哈

这个是 nebula-storaged.ERROR 日志
E20220301 11:24:01.571911 11531 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!
E20220301 11:24:11.572906 11531 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!

这是 nebula-metad.ERROR 日志

E20220301 11:25:13.662451 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
E20220301 11:25:14.805325 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
E20220301 11:25:21.576862 11690 HBProcessor.cpp:38] Machine "hg46":9779 is not registed
E20220301 11:25:23.675122 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
E20220301 11:25:24.818002 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
E20220301 11:25:31.577327 11690 HBProcessor.cpp:38] Machine "hg46":9779 is not registed
E20220301 11:25:33.687822 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
E20220301 11:25:34.830827 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
E20220301 11:25:41.577834 11690 HBProcessor.cpp:38] Machine "hg46":9779 is not registed

你把两个meta的配置和日志都发出来,BTW,一般来说meta要么用1个要么用3个

mate hg46机器上的

--local_config=true
########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-metad.pid

########## logging ##########
# The directory to host logging files
--log_dir=logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=metad-stdout.log
--stderr_log_file=metad-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
# wether logging files' name contain time stamp.
--timestamp_in_logfile_name=true

########## networking ##########
# Comma separated Meta Server addresses
--meta_server_addrs=hg45:9559,hg46:9559
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=hg46
# Meta daemon listening port
--port=9559
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19559
# HTTP2 service port
--ws_h2_port=19560
# Port to listen on Storage with HTTP protocol, it corresponds to ws_http_port in storage's configuration file
--ws_storage_http_port=19779

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=/home/data/meta

########## Misc #########
# The default number of parts when a space is created
--default_parts_num=10
# The default replica factor when a space is created
--default_replica_factor=1

--heartbeat_interval_secs=10
--agent_heartbeat_interval_secs=60

############## rocksdb Options ##############
--rocksdb_wal_sync=true

mate hg45机器上的

--local_config=true
########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-metad.pid

########## logging ##########
# The directory to host logging files
--log_dir=logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=metad-stdout.log
--stderr_log_file=metad-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
# wether logging files' name contain time stamp.
--timestamp_in_logfile_name=true

########## networking ##########
# Comma separated Meta Server addresses
--meta_server_addrs=hg45:9559,hg46:9559
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=hg45
# Meta daemon listening port
--port=9559
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19559
# HTTP2 service port
--ws_h2_port=19560
# Port to listen on Storage with HTTP protocol, it corresponds to ws_http_port in storage's configuration file
--ws_storage_http_port=19779

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=/home/data/meta

########## Misc #########
# The default number of parts when a space is created
--default_parts_num=10
# The default replica factor when a space is created
--default_replica_factor=1

--heartbeat_interval_secs=10
--agent_heartbeat_interval_secs=60

############## rocksdb Options ##############
--rocksdb_wal_sync=true

local_ip 配置的域名,add hosts 是 ip

console 里执行一下

add hosts "hg46":9779
# mate hg46机器上的 日志
E20220301 13:47:52.693554 11690 HBProcessor.cpp:38] Machine "hg45":9779 is not registed
I20220301 13:47:54.653616 11690 HBProcessor.cpp:33] Receive heartbeat from "hg46":9669, role = GRAPH
E20220301 13:47:54.656585 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
I20220301 13:47:55.821322 11690 HBProcessor.cpp:33] Receive heartbeat from "hg45":9669, role = GRAPH
E20220301 13:47:55.824025 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
I20220301 13:48:04.666960 11690 HBProcessor.cpp:33] Receive heartbeat from "hg46":9669, role = GRAPH
E20220301 13:48:04.669550 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
I20220301 13:48:05.834424 11690 HBProcessor.cpp:33] Receive heartbeat from "hg45":9669, role = GRAPH
E20220301 13:48:05.836858 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
I20220301 13:48:12.095127 11690 HBProcessor.cpp:33] Receive heartbeat from "hg46":9779, role = STORAGE
E20220301 13:48:12.095171 11690 HBProcessor.cpp:38] Machine "hg46":9779 is not registed
I20220301 13:48:14.679894 11690 HBProcessor.cpp:33] Receive heartbeat from "hg46":9669, role = GRAPH
E20220301 13:48:14.682307 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
I20220301 13:48:15.847230 11690 HBProcessor.cpp:33] Receive heartbeat from "hg45":9669, role = GRAPH
E20220301 13:48:15.849596 11690 ActiveHostsMan.cpp:303] Get last update time failed, error: E_KEY_NOT_FOUND
I20220301 13:48:22.695333 11690 HBProcessor.cpp:33] Receive heartbeat from "hg45":9779, role = STORAGE
E20220301 13:48:22.695384 11690 HBProcessor.cpp:38] Machine "hg45":9779 is not registed

# mate hg45机器上的 日志
I20220301 13:48:32.685746 14125 HBProcessor.cpp:33] Receive heartbeat from "hg45":9779, role = STORAGE
E20220301 13:48:32.685779 14125 HBProcessor.cpp:38] Machine "hg45":9779 is not registed
I20220301 13:48:42.086750 14125 HBProcessor.cpp:33] Receive heartbeat from "hg46":9779, role = STORAGE
E20220301 13:48:42.086783 14125 HBProcessor.cpp:38] Machine "hg46":9779 is not registed
I20220301 13:48:52.686884 14125 HBProcessor.cpp:33] Receive heartbeat from "hg45":9779, role = STORAGE
E20220301 13:48:52.686914 14125 HBProcessor.cpp:38] Machine "hg45":9779 is not registed
I20220301 13:49:02.087882 14125 HBProcessor.cpp:33] Receive heartbeat from "hg46":9779, role = STORAGE
E20220301 13:49:02.087919 14125 HBProcessor.cpp:38] Machine "hg46":9779 is not registed
I20220301 13:49:02.687386 14125 HBProcessor.cpp:33] Receive heartbeat from "hg45":9779, role = STORAGE
E20220301 13:49:02.687408 14125 HBProcessor.cpp:38] Machine "hg45":9779 is not registed
I20220301 13:49:12.088526 14125 HBProcessor.cpp:33] Receive heartbeat from "hg46":9779, role = STORAGE
E20220301 13:49:12.088557 14125 HBProcessor.cpp:38] Machine "hg46":9779 is not registed