meta服务无法启动

  • nebula 版本:3.1.0
  • 部署方式:分布式
  • 安装方式:源码编译
  • 是否上生产环境:N
  • 硬件信息
    • 磁盘 SSD
  • 问题的具体描述
    meta服务启动失败
  • 相关的 meta / storage info
    nebula-meta.INFO:
Log file created at: 2023/06/13 06:35:17
Running on machine: dbcloud
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
I20230613 06:35:17.112601  2542 MetaDaemon.cpp:135] localhost = "10.28.5.209":8558
I20230613 06:35:17.120615  2542 NebulaStore.cpp:51] Start the raft service...
I20230613 06:35:17.121393  2542 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 10485760 for each part by default
I20230613 06:35:17.177824  2542 RaftexService.cpp:46] Start raft service on 8559
I20230613 06:35:17.177948  2542 NebulaStore.cpp:85] Scan the local path, and init the spaces_
I20230613 06:35:17.178009  2542 NebulaStore.cpp:92] Scan path "/root/Yujie_Lu/db/origin/dis/v3_1_0/data/meta/nebula/0"
I20230613 06:35:17.178023  2542 NebulaStore.cpp:271] Init data from partManager for "10.28.5.209":8558
I20230613 06:35:17.178043  2542 NebulaStore.cpp:387] Create data space 0
I20230613 06:35:17.230391  2542 RocksEngine.cpp:97] open rocksdb on /root/Yujie_Lu/db/origin/dis/v3_1_0/data/meta/nebula/0/data
I20230613 06:35:17.754065  2542 NebulaStore.cpp:459] Space 0, part 0 has been added, asLearner 0
I20230613 06:35:17.754094  2542 NebulaStore.cpp:78] Register handler...
I20230613 06:35:17.754102  2542 MetaDaemonInit.cpp:101] Waiting for the leader elected...
I20230613 06:35:17.754108  2542 MetaDaemonInit.cpp:113] Leader has not been elected, sleep 1s
I20230613 06:35:18.754310  2542 MetaDaemonInit.cpp:140] Get meta version is 4
I20230613 06:35:18.754359  2542 MetaDaemonInit.cpp:157] Nebula store init succeeded, clusterId 2063706823241805567
I20230613 06:35:18.754379  2542 MetaDaemon.cpp:148] Start http service
I20230613 06:35:18.754945  2542 MetaDaemonInit.cpp:162] Starting Meta HTTP Service
I20230613 06:35:18.758653  2832 WebService.cpp:124] Web service started on HTTP[18558]
I20230613 06:35:18.758860  2542 JobDescription.cpp:52] p = player_name_index
I20230613 06:35:18.758896  2542 JobDescription.cpp:52] p = team_name_index
I20230613 06:35:18.758913  2542 JobDescription.cpp:52] p = player_age_index
I20230613 06:35:18.759318  2542 JobManager.cpp:56] JobManager initialized
I20230613 06:35:18.759348  2542 MetaDaemon.cpp:178] Check and init root user
I20230613 06:35:18.759366  2838 JobManager.cpp:119] JobManager::scheduleThread enter
I20230613 06:35:18.759410  2542 RootUserMan.h:28] Root user exists
I20230613 06:35:18.766057  2542 MetaDaemon.cpp:213] The meta daemon start on "10.28.5.209":8558
I20230613 06:35:19.177765  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.209":8668, role = GRAPH
I20230613 06:35:19.177862  2925 HBProcessor.cpp:87] Update host "10.28.5.209":8668 dir info, root path: /root/Yujie_Lu/db/origin/dis/v3_1_0, data path size: 0
I20230613 06:35:19.185786  2925 SessionManagerProcessor.cpp:130] resp list session: 1686635945136929
I20230613 06:35:19.185834  2925 SessionManagerProcessor.cpp:130] resp list session: 1686635340088167
I20230613 06:35:19.185848  2925 SessionManagerProcessor.cpp:130] resp list session: 1686620951878248
I20230613 06:35:19.185860  2925 SessionManagerProcessor.cpp:130] resp list session: 1686625545888389
I20230613 06:35:19.185873  2925 SessionManagerProcessor.cpp:130] resp list session: 1686634878622872
I20230613 06:35:19.185883  2925 SessionManagerProcessor.cpp:130] resp list session: 1686621739734991
I20230613 06:35:19.185895  2925 SessionManagerProcessor.cpp:130] resp list session: 1686637859660277
I20230613 06:35:19.192404  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.209":8209, role = STORAGE
I20230613 06:35:19.192493  2925 HBProcessor.cpp:50] Set clusterId for new host "10.28.5.209":8209!
I20230613 06:35:19.192521  2925 HBProcessor.cpp:87] Update host "10.28.5.209":8209 dir info, root path: /root/Yujie_Lu/db/origin/dis/v3_1_0, data path size: 1
I20230613 06:35:19.199390  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.211":8211
I20230613 06:35:19.199441  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.208":8208
I20230613 06:35:19.199458  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.208":8208
I20230613 06:35:19.199489  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.208":8208
I20230613 06:35:19.199504  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.211":8211
I20230613 06:35:19.199518  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.211":8211
I20230613 06:35:19.199534  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.208":8208
I20230613 06:35:19.199550  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.208":8208
I20230613 06:35:19.199564  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.211":8211
I20230613 06:35:19.199577  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.208":8208
I20230613 06:35:19.199594  2925 ListHostsProcessor.cpp:238] skip inactive host: "10.28.5.208":8208
I20230613 06:35:19.221277  2925 SessionManagerProcessor.cpp:130] resp list session: 1686635945136929
I20230613 06:35:19.221335  2925 SessionManagerProcessor.cpp:130] resp list session: 1686635340088167
I20230613 06:35:19.221354  2925 SessionManagerProcessor.cpp:130] resp list session: 1686620951878248
I20230613 06:35:19.221365  2925 SessionManagerProcessor.cpp:130] resp list session: 1686625545888389
I20230613 06:35:19.221377  2925 SessionManagerProcessor.cpp:130] resp list session: 1686634878622872
I20230613 06:35:19.221390  2925 SessionManagerProcessor.cpp:130] resp list session: 1686621739734991
I20230613 06:35:19.221401  2925 SessionManagerProcessor.cpp:130] resp list session: 1686637859660277
I20230613 06:35:22.317011  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.208":8208, role = STORAGE
I20230613 06:35:22.884994  2925 HBProcessor.cpp:50] Set clusterId for new host "10.28.5.208":8208!
I20230613 06:35:22.885035  2925 HBProcessor.cpp:87] Update host "10.28.5.208":8208 dir info, root path: /root/Yujie_Lu/db/origin/dis/v3_1_0, data path size: 1
I20230613 06:35:29.194869  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.209":8209, role = STORAGE
I20230613 06:35:29.852533  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.209":8668, role = GRAPH
I20230613 06:35:32.650426  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.210":8210, role = STORAGE
I20230613 06:35:32.650530  2925 HBProcessor.cpp:50] Set clusterId for new host "10.28.5.210":8210!
I20230613 06:35:32.650559  2925 HBProcessor.cpp:87] Update host "10.28.5.210":8210 dir info, root path: /root/Yujie_Lu/db/origin/dis/v3_1_0, data path size: 1
I20230613 06:35:32.886866  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.208":8208, role = STORAGE
I20230613 06:35:38.571944  2925 HBProcessor.cpp:33] Receive heartbeat from "10.28.5.211":8211, role = STORAGE
I20230613 06:35:38.572054  2925 HBProcessor.cpp:50] Set clusterId for new host "10.28.5.211":8211!

nebula-storaged.ERROR

E20230613 06:42:29.250694  2731 MetaClient.cpp:102] Expect meta version is 3, but actual is 4
E20230613 06:42:39.251715  2731 MetaClient.cpp:102] Expect meta version is 3, but actual is 4
E20230613 06:42:49.253010  2731 MetaClient.cpp:102] Expect meta version is 3, but actual is 4
E20230613 06:42:59.254179  2731 MetaClient.cpp:102] Expect meta version is 3, but actual is 4
E20230613 06:43:09.255182  2731 MetaClient.cpp:102] Expect meta version is 3, but actual is 4

把配置文件贴一下。可能是 meta 配置那块有点问题

meta配置如下:

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-metad.pid

########## logging ##########
# The directory to host logging files
--log_dir=logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=metad-stdout.log
--stderr_log_file=metad-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
# wether logging files' name contain time stamp, If Using logrotate to rotate logging files, than should set it to true.
--timestamp_in_logfile_name=true

########## networking ##########
# Comma separated Meta Server addresses
--meta_server_addrs=10.28.5.209:8558
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=10.28.5.209
# Meta daemon listening port
--port=8558
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=18558
# Port to listen on Storage with HTTP protocol, it corresponds to ws_http_port in storage's configuration file
--ws_storage_http_port=18778

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=data/meta

########## Misc #########
# The default number of parts when a space is created
--default_parts_num=100
# The default replica factor when a space is created
--default_replica_factor=1

--heartbeat_interval_secs=10
--agent_heartbeat_interval_secs=60

这个数据库分别用3.4.1和3.1.0版本的代码编译过,可能错误和这个相关

3.4.1的源码如下:

static constexpr int64_t EXPECT_META_VERSION = 4;

3.1.0的源码中:

static constexpr int64_t EXPECT_META_VERSION = 3;

但是这个meta version的实际含义并不理解,以及为什么3.1.0版本的代码会返回错误的version=4

你不是分布式的部署方式么,:thinking: 这里的话要填上所有的 meta 的信息的。

我的集群只用了一个meta,有四个storage
应该和版本切换有关,3.4.1的可以跑起来

会不会是端口占用的问题,如果你切换了下版本可以的话。

那我再排查一下,非常感谢

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。