nebula集群断电后 启动不了

提问参考模版:

  • nebula 版本:3.0.1
  • 部署方式:分布式
  • 安装方式: RPM
  • 是否为线上版本:Y
  • 硬件信息
    内存32g,cpu14核,1.5T机械硬盘,centos7
  • 问题的具体描述
    集群断电后启动不了
    nebula-graphd 日志
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20220610 16:32:58.861773  6350 GraphDaemon.cpp:129] Starting Graph HTTP Service
I20220610 16:32:58.875756  6351 WebService.cpp:128] Web service started on HTTP[19669], HTTP2[19670]
I20220610 16:32:58.875952  6350 GraphDaemon.cpp:143] Number of networking IO threads: 14
I20220610 16:32:58.875996  6350 GraphDaemon.cpp:152] Number of worker threads: 14
I20220610 16:32:58.909906  6350 MetaClient.cpp:84] Create meta client to "10.100.100.91":9559
I20220610 16:32:58.910008  6350 MetaClient.cpp:85] root path: /usr/local/nebula, data path size: 0
E20220610 16:33:01.942183  6386 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:33:01.949410  6386 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:33:01.949594  6350 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:33:01.949656  6350 MetaClient.cpp:133] Waiting for the metad to be ready!
E20220610 16:33:14.966187  6387 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:33:14.966305  6387 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:33:14.966378  6350 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:33:14.966408  6350 MetaClient.cpp:133] Waiting for the metad to be ready!
E20220610 16:33:27.984249  6350 MetaClient.cpp:102] Heartbeat failed, status:LeaderChanged: Leader changed!
I20220610 16:33:27.984351  6350 MetaClient.cpp:133] Waiting for the metad to be ready!
E20220610 16:33:40.993811  6389 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:33:40.993917  6389 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:33:40.994061  6350 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:33:40.994836  6350 MetaClient.cpp:144] Register time task for heartbeat!
E20220610 16:33:40.994876  6350 GraphService.cpp:51] Failed to wait for meta service ready synchronously.
E20220610 16:33:40.994906  6350 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20220610 16:33:41.001407  6350 GraphDaemon.cpp:163] The graph server start failed

nebula-storaged日志

20220610 16:33:08.609481  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:33:11.627204  6277 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!
I20220610 16:33:11.627295  6277 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220610 16:33:21.627475  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:33:24.636989  6277 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!
I20220610 16:33:24.637080  6277 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220610 16:33:34.642027  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:33:37.651752  6277 MetaClient.cpp:102] Heartbeat failed, status:Machine not existed!
I20220610 16:33:37.651842  6277 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220610 16:33:47.652036  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:33:50.659250  6390 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:33:50.659341  6390 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:33:50.659479  6277 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:33:50.659535  6277 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220610 16:34:00.659744  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:34:03.667017  6391 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:34:03.667115  6391 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:34:03.667184  6277 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:34:03.667228  6277 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220610 16:34:13.667433  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:34:16.675630  6392 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:34:16.675721  6392 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:34:16.675837  6277 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:34:16.675899  6277 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220610 16:34:26.676117  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:34:29.683701  6393 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:34:29.683799  6393 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:34:29.683969  6277 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:34:29.684038  6277 MetaClient.cpp:133] Waiting for the metad to be ready!
W20220610 16:34:39.684247  6277 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E20220610 16:34:42.692072  6398 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 16:34:42.692185  6398 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 16:34:42.692358  6277 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 16:34:42.692413  6277 MetaClient.cpp:133] Waiting for the metad to be ready!

nebula-metad 日志

E20220610 16:34:39.687139  8079 HBProcessor.cpp:38] Machine "10.100.100.94":9779 is not registed
I20220610 16:34:41.191870  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.92":9779, role = STORAGE
E20220610 16:34:41.191916  8079 HBProcessor.cpp:38] Machine "10.100.100.92":9779 is not registed
I20220610 16:34:41.523411  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.92":9669, role = GRAPH
I20220610 16:34:53.697857  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.94":9779, role = STORAGE
E20220610 16:34:53.697937  8079 HBProcessor.cpp:38] Machine "10.100.100.94":9779 is not registed
I20220610 16:34:54.199491  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.92":9779, role = STORAGE
E20220610 16:34:54.199540  8079 HBProcessor.cpp:38] Machine "10.100.100.92":9779 is not registed
I20220610 16:34:58.004276  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9779, role = STORAGE
E20220610 16:34:58.004352  8079 HBProcessor.cpp:38] Machine "10.100.100.91":9779 is not registed
I20220610 16:35:06.704990  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.94":9779, role = STORAGE
E20220610 16:35:06.705081  8079 HBProcessor.cpp:38] Machine "10.100.100.94":9779 is not registed
I20220610 16:35:06.927048  8079 BaseProcessor-inl.h:426] Put data error on meta server
I20220610 16:35:06.930634  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9669, role = GRAPH
I20220610 16:35:15.829326  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.94":9669, role = GRAPH
I20220610 16:35:17.840256  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.94":9669, role = GRAPH
I20220610 16:35:18.210222  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.92":9779, role = STORAGE
E20220610 16:35:18.210273  8079 HBProcessor.cpp:38] Machine "10.100.100.92":9779 is not registed
I20220610 16:35:22.014569  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9779, role = STORAGE
E20220610 16:35:22.014703  8079 HBProcessor.cpp:38] Machine "10.100.100.91":9779 is not registed
I20220610 16:35:22.969236  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9669, role = GRAPH
I20220610 16:35:24.017827  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9779, role = STORAGE
E20220610 16:35:24.017896  8079 HBProcessor.cpp:38] Machine "10.100.100.91":9779 is not registed
I20220610 16:35:25.571708  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.93":9669, role = GRAPH
I20220610 16:35:26.991458  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.93":9779, role = STORAGE
E20220610 16:35:26.991514  8079 HBProcessor.cpp:38] Machine "10.100.100.93":9779 is not registed
I20220610 16:35:33.222293  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.92":9779, role = STORAGE
E20220610 16:35:33.222376  8079 HBProcessor.cpp:38] Machine "10.100.100.92":9779 is not registed
I20220610 16:35:34.986191  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9669, role = GRAPH
I20220610 16:35:38.996415  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.93":9779, role = STORAGE
E20220610 16:35:38.996480  8079 HBProcessor.cpp:38] Machine "10.100.100.93":9779 is not registed
I20220610 16:35:45.993147  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9669, role = GRAPH
I20220610 16:35:50.032680  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.91":9779, role = STORAGE
E20220610 16:35:50.032759  8079 HBProcessor.cpp:38] Machine "10.100.100.91":9779 is not registed
I20220610 16:35:57.233140  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.92":9779, role = STORAGE
E20220610 16:35:57.233198  8079 HBProcessor.cpp:38] Machine "10.100.100.92":9779 is not registed
I20220610 16:35:57.735810  8079 HBProcessor.cpp:33] Receive heartbeat from "10.100.100.94":9779, role = STORAGE
E20220610 16:35:57.735891  8079 HBProcessor.cpp:38] Machine "10.100.100.94":9779 is not registed

各位技术专家,麻烦看看,在线等解决方案。

你这个硬盘能正常IO吗?

磁盘正常

image

集群配置

以前配置了3个metad 节点。 现在改成一个metad节点可以正常启动,希望帮你们可以排查下问题

可以把相关的配置文件贴一下吗

nebula-metad.conf

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-metad.pid

########## logging ##########
# The directory to host logging files
--log_dir=logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=metad-stdout.log
--stderr_log_file=metad-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
# wether logging files' name contain time stamp, If Using logrotate to rotate logging files, than should set it to true.
--timestamp_in_logfile_name=true

########## networking ##########
# Comma separated Meta Server addresses
--meta_server_addrs=10.100.100.91:9559,10.100.100.92:9559,10.100.100.93:9559
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=10.100.100.91
# Meta daemon listening port
--port=9559
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19559
# HTTP2 service port
--ws_h2_port=19560
# Port to listen on Storage with HTTP protocol, it corresponds to ws_http_port in storage's configuration file
--ws_storage_http_port=19779

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=data/meta

########## Misc #########
# The default number of parts when a space is created
--default_parts_num=100
# The default replica factor when a space is created
--default_replica_factor=1

--heartbeat_interval_secs=10
--agent_heartbeat_interval_secs=60


nebula-graphd.conf

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-graphd.pid
# Whether to enable optimizer
--enable_optimizer=true
# The default charset when a space is created
--default_charset=utf8
# The default collate when a space is created
--default_collate=utf8_bin
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=graphd-stdout.log
--stderr_log_file=graphd-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
# wether logging files' name contain time stamp.
--timestamp_in_logfile_name=true
########## query ##########
# Whether to treat partial success as an error.
# This flag is only used for Read-only access, and Modify access always treats partial success as an error.
--accept_partial_success=false
# Maximum sentence length, unit byte
--max_allowed_query_size=4194304

########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=10.100.100.91:9559,10.100.100.92:9559,10.100.100.93:9559
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=10.100.100.91
# Network device to listen on
--listen_netdev=any
# Port to listen on
--port=9669
# To turn on SO_REUSEPORT or not
--reuse_port=false
# Backlog of the listen socket, adjust this together with net.core.somaxconn
--listen_backlog=1024
# The number of seconds Nebula service waits before closing the idle connections
--client_idle_timeout_secs=28800
# The number of seconds before idle sessions expire
# The range should be in [1, 604800]
--session_idle_timeout_secs=28800
# The number of threads to accept incoming connections
--num_accept_threads=1
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=0
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=0
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19669
# HTTP2 service port
--ws_h2_port=19670
# storage client timeout
--storage_client_timeout_ms=60000
# Port to listen on Meta with HTTP protocol, it corresponds to ws_http_port in metad's configuration file
--ws_meta_http_port=19559

########## authentication ##########
# Enable authorization
--enable_authorize=true
# User login authentication type, password for nebula authentication, ldap for ldap authentication, cloud for cloud authentication
--auth_type=password

########## memory ##########
# System memory high watermark ratio, cancel the memory checking when the ratio greater than 1.0
--system_memory_high_watermark_ratio=0.8

########## metrics ##########
--enable_space_level_metrics=false

########## experimental feature ##########
# if use experimental features
--enable_experimental_feature=false

##--query_concurrently=true

nebula-storaged.conf

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-storaged.pid
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=storaged-stdout.log
--stderr_log_file=storaged-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
# Wether logging files' name contain time stamp.
--timestamp_in_logfile_name=true

########## networking ##########
# Comma separated Meta server addresses
--meta_server_addrs=10.100.100.91:9559,10.100.100.92:9559,10.100.100.93:9559
# Local IP used to identify the nebula-storaged process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=10.100.100.91
# Storage daemon listening port
--port=9779
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19779
# HTTP2 service port
--ws_h2_port=19780
# heartbeat with meta service
--heartbeat_interval_secs=10

######### Raft #########
# Raft election timeout
--raft_heartbeat_interval_secs=30
# RPC timeout for raft client (ms)
--raft_rpc_timeout_ms=500
## recycle Raft WAL
--wal_ttl=14400

########## Disk ##########
# Root data path. Split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
# One path per Rocksdb instance.
--data_path=data/storage

# Minimum reserved bytes of each data path
--minimum_reserved_bytes=268435456

# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
# The default block cache size used in BlockBasedTable.
# The unit is MB.
--rocksdb_block_cache=16384
# The type of storage engine, `rocksdb', `memory', etc.
--engine_type=rocksdb

# Compression algorithm, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
# For the sake of binary compatibility, the default value is snappy.
# Recommend to use:
#   * lz4 to gain more CPU performance, with the same compression ratio with snappy
#   * zstd to occupy less disk space
#   * lz4hc for the read-heavy write-light scenario
--rocksdb_compression=lz4

# Set different compressions for different levels
# For example, if --rocksdb_compression is snappy,
# "no:no:lz4:lz4::zstd" is identical to "no:no:lz4:lz4:snappy:zstd:snappy"
# In order to disable compression for level 0/1, set it to "no:no"
--rocksdb_compression_per_level=

# Whether or not to enable rocksdb's statistics, disabled by default
--enable_rocksdb_statistics=false

# Statslevel used by rocksdb to collection statistics, optional values are
#   * kExceptHistogramOrTimers, disable timer stats, and skip histogram stats
#   * kExceptTimers, Skip timer stats
#   * kExceptDetailedTimers, Collect all stats except time inside mutex lock AND time spent on compression.
#   * kExceptTimeForMutex, Collect all stats except the counters requiring to get time inside the mutex lock.
#   * kAll, Collect all stats
--rocksdb_stats_level=kExceptHistogramOrTimers

# Whether or not to enable rocksdb's prefix bloom filter, enabled by default.
--enable_rocksdb_prefix_filtering=true
# Whether or not to enable rocksdb's whole key bloom filter, disabled by default.
--enable_rocksdb_whole_key_filtering=false

############## Key-Value separation ##############
# Whether or not to enable BlobDB (RocksDB key-value separation support)
--rocksdb_enable_kv_separation=false
# RocksDB key value separation threshold. Values at or above this threshold will be written to blob files during flush or compaction.
--rocksdb_kv_separation_threshold=0
# Compression algorithm for blobs, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
--rocksdb_blob_compression=lz4
# Whether to garbage collect blobs during compaction
--rocksdb_enable_blob_garbage_collection=true

############## rocksdb Options ##############
# rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
--rocksdb_db_options={}
# rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_column_family_options={"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
# rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_block_based_table_options={"block_size":"8192"}

#--query_concurrently=true

./nebula.service status all看一下状态

现在是把 nebula-metad 设置了一个节点,是运行正常了, 以前是3个节点

之前3个节点时的状态

[root@nebula scripts]# ./nebula.service status all
[INFO] nebula-metad(2e938c7): Running as 7693, Listening on 9559 
[INFO] nebula-graphd(2e938c7): Exited
[INFO] nebula-storaged(2e938c7): Running as 8028, Listening on 9779 

3个nebula-metad节点时 nebula-graphd 错误日志

Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20220610 17:20:09.448449  1903 GraphDaemon.cpp:129] Starting Graph HTTP Service
I20220610 17:20:09.466908  1907 WebService.cpp:128] Web service started on HTTP[19669], HTTP2[19670]
I20220610 17:20:09.467082  1903 GraphDaemon.cpp:143] Number of networking IO threads: 14
I20220610 17:20:09.467114  1903 GraphDaemon.cpp:152] Number of worker threads: 14
I20220610 17:20:09.500087  1903 MetaClient.cpp:84] Create meta client to "10.100.100.91":9559
I20220610 17:20:09.500175  1903 MetaClient.cpp:85] root path: /usr/local/nebula, data path size: 0
E20220610 17:20:12.532909  1949 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 17:20:12.534148  1949 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 17:20:12.534328  1903 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 17:20:12.534407  1903 MetaClient.cpp:133] Waiting for the metad to be ready!
E20220610 17:20:25.554826  1998 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 17:20:25.554915  1998 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 17:20:25.555073  1903 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 17:20:25.555131  1903 MetaClient.cpp:133] Waiting for the metad to be ready!
E20220610 17:20:38.571157  2001 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 17:20:38.571245  2001 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 17:20:38.571391  1903 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 17:20:38.571449  1903 MetaClient.cpp:133] Waiting for the metad to be ready!
E20220610 17:20:51.581373  2003 MetaClient.cpp:735] Send request to "10.100.100.91":9559, exceed retry limit
E20220610 17:20:51.581466  2003 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
E20220610 17:20:51.581617  1903 MetaClient.cpp:102] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Failed to write to remote endpoint. Wrote 0 bytes. AsyncSocketException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
I20220610 17:20:51.582336  1903 MetaClient.cpp:144] Register time task for heartbeat!
E20220610 17:20:51.582378  1903 GraphService.cpp:51] Failed to wait for meta service ready synchronously.
E20220610 17:20:51.582407  1903 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20220610 17:20:51.587744  1903 GraphDaemon.cpp:163] The graph server start failed

以前是graph启动不起来?meta和storage可以?
建议检查一下防火墙和端口占用情况

是meta 节点重复选举,造成graph启动不了或启动一会儿 又自己掉线了。

防火墙 端口 都检查了,没问题

MetaClient.cpp:133] Waiting for the metad to be ready!

看日志

Heartbeat failed, status:LeaderChanged: Leader changed!

I20220610 15:03:32.480520  3162 MetaClient.cpp:133] Waiting for the metad to be ready!
E20220610 15:03:45.488211  3162 MetaClient.cpp:102] Heartbeat failed, status:LeaderChanged: Leader changed!
I20220610 15:03:45.489526  3162 MetaClient.cpp:144] Register time task for heartbeat!
E20220610 15:03:45.489578  3162 GraphService.cpp:51] Failed to wait for meta service ready synchronously.
E20220610 15:03:45.489616  3162 GraphServer.cpp:39] Failed to wait for meta service ready synchronously.
E20220610 15:03:45.496313  3162 GraphDaemon.cpp:163] The graph server start failed