无法查看启动成功的服务

版本:V3.0.1
部署方式:三台机器,每台机器上都部署metad、storaged、graphd三个进程
使用 /usr/local/nebula/scripts/nebula.service start all
再使用 /usr/local/nebula/scripts/nebula.service status all
结果

[INFO] nebula-metad: Exited
[INFO] nebula-graphd: Exited
[INFO] nebula-storaged: Exited

通过ps -ef|grep nebula

[root@test-nebula-cluster-1 nebula]# ps -ef|grep nebula
 1192 avahi     0:06 avahi-daemon: running [test-nebula-cluster-1.local]
 2987 root      0:23 /usr/local/nebula/bin/nebula-metad --flagfile /usr/local/nebula/etc/nebula-metad.conf
 3057 root      0:04 /usr/local/nebula/bin/nebula-graphd --flagfile /usr/local/nebula/etc/nebula-graphd.conf
 3124 root      4h10 /usr/local/nebula/bin/nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf

能查看对应的进程正在运行;日志和服务都是正常。

配置文件贴一下

metad配置

--local_config=True
--daemonize=true
--pid_file=pids/nebula-metad.pid

--log_dir=/matrix/data/logs/nebula
--minloglevel=0
--v=4
--logbufsecs=0

--meta_server_addrs=10.210.199.82:45500,10.210.199.83:45500,10.210.199.84:45500
--local_ip=10.210.199.84
--port=45500
--ws_ip=0.0.0.0
--ws_http_port=19559
--ws_h2_port=19560

--data_path=/matrix/data/nebula/meta/

--default_parts_num=100
--default_replica_factor=1

--heartbeat_interval_secs=10

storaged配置


--local_config=True
--daemonize=true
--pid_file=pids/nebula-storaged.pid

--log_dir=/matrix/data/logs/nebula
--minloglevel=0
--v=1
--logbufsecs=0

--meta_server_addrs=10.210.199.82:45500,10.210.199.83:45500,10.210.199.84:45500
--local_ip=10.210.199.84
--port=44500
--ws_ip=0.0.0.0
--ws_http_port=19779
--ws_h2_port=19780

--raft_heartbeat_interval_secs=30
--raft_rpc_timeout_ms=500
--wal_ttl=14400

--data_path=/nebula/storage2,/nebula/storage1

--rocksdb_batch_size=4096
--rocksdb_block_cache=16000
--engine_type=rocksdb

--rocksdb_compression=lz4

--rocksdb_compression_per_level=

--enable_rocksdb_statistics=false

--rocksdb_stats_level=kExceptHistogramOrTimers

--enable_rocksdb_prefix_filtering=false
--enable_rocksdb_whole_key_filtering=true
--rocksdb_filtering_prefix_length=12

--rocksdb_disable_wal=true
--rocksdb_db_options={"max_subcompactions": "10", "max_background_jobs" :"10","stats_dump_period_sec":"200", "write_thread_max_yield_usec":"600"}
--rocksdb_column_family_options={"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456","level0_file_num_compaction_trigger":"10","max_write_buffer_number":"4", "min_write_buffer_number_to_merge":"2", "max_write_buffer_number_to_maintain":"1"}
--rocksdb_block_based_table_options={"block_size":"8192"}
--enable_partitioned_index_filter=True
--max_edge_returned_per_vertex=100
--enable_reservoir_sampling = true

现在还发现另外的问题 show hosts看不了集群状态,第二次执行一直卡住详情如下


(root@nebula) [(none)]> show hosts
Execution succeeded (time spent 0/10441 us)

Mon, 16 May 2022 16:16:16 CST

(root@nebula) [(none)]> show hosts

storaged当前的warning日志如下:

Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
W20220516 14:56:27.583027  4871 RaftexService.cpp:179] Cannot find the part 41 in the graph space 37
W20220516 14:56:28.306037  4869 RaftexService.cpp:179] Cannot find the part 62 in the graph space 37
W20220516 14:56:39.546406  4867 RaftPart.cpp:1001] [Port: 44501, Space: 1, Part: 9] Only 0 hosts succeeded, Need to try again
E20220516 15:49:57.402633  4918 Host.cpp:220] [Port: 44501, Space: 37, Part: 90] [Host: 10.210.199.84:44501] Failed to append logs to the host (Err: E_RAFT_UNKNOWN_PART)

metad 的warning日志如下:

W20220516 14:50:50.221231 25048 RaftexService.cpp:179] Cannot find the part 0 in the graph space 0
W20220516 14:50:51.239122 25048 RaftPart.cpp:1122] [Port: 45501, Space: 0, Part: 0] Receive response about askForVote from "10.210.199.83":45501, error code is E_RAFT_LOG_STALE, isPreVote = 1
W20220516 14:50:53.157428 26301 RaftPart.cpp:361] [Port: 45501, Space: 0, Part: 0] The partition is not a leader
W20220516 14:50:53.157476 26301 RaftPart.cpp:696] [Port: 45501, Space: 0, Part: 0] Cannot append logs, clean the buffer
E20220516 14:56:26.529042 26301 HBProcessor.cpp:38] Machine "10.210.199.82":44500 is not registed
E20220516 15:49:56.323539 26301 HBProcessor.cpp:38] Machine "10.210.199.84":44500 is not registed

好像是raft的一致性被破坏了,没法恢复

没有执行add hosts?

是2.0.1升级上来的,升级文档中没有并没有说要弄该操作

:hushed: v3.0 开始是需要执行 add hosts 操作的,参考文档 管理 Storage 主机 - Nebula Graph Database 手册 记得 ip 地址同你配置文件的 storage 的 ip

文档里有个提示的,不过在 FAQ 里

头疼啊。一个重要的升级步骤怎么能放在FQA里。
关键是按照升级文档操作,也按文档中的验证步骤,验证了升级后没问题,能正常查看和读写。但是过一段时间发现居然不行了。


无法添加
metad日志如下

show hosts 是全 exited 的状态吗

运行命令长时间无响应

重新启动下看看呢

开发环境重启了很多次都不行。生产也不敢轻易重启

我让研发看看

nebula-console -v

你用的什么版本的 console

nebula-console version Git: 1f32236, Build Time: 2021-03-26T03:07:59+0000
好的吧,确实是版本不对。我记得版本不对是连接不上的

但是最开始的问题还没解决, 无法通过status· 查看状态

[root@test-nebula-cluster-1 ~]# /usr/local/nebula/scripts/nebula.service status all
[INFO] nebula-metad: Exited
[INFO] nebula-graphd: Exited
[INFO] nebula-storaged: Exited
[root@test-nebula-cluster-1 ~]# ps -ef |grep nebula
 5768 root      0:01 ssh test-nebula-cluster-2
11626 root      0:00 ssh test-nebula-cluster-3
22222 root      9:28 /usr/local/nebula/bin/nebula-metad --flagfile /usr/local/nebula/etc/nebula-metad.conf
22340 root      2:01 /usr/local/nebula/bin/nebula-graphd --flagfile /usr/local/nebula/etc/nebula-graphd.conf
22480 root     52:57 /usr/local/nebula/bin/nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.con

在提个小小意见 v3.0.0 console没有版本信息

看一下 /usr/local/nebula/pids 里的文件里记录的 pid 和你 ps 的能不能对上。

如果对不上,要么手动改一下 pid 的文件,要么把 pid 的目录删了,然后重启一下服务。