Storage节点loadpart很慢,且启动后storage出现反复offline现象

集群partition数较多时(约3000),storage节点启动时在NebulaStore::loadPartFromDataPath阶段需要几个小时,同时有大量RaftPart选举的日志。
等loadpart完成后,节点状态并没有恢复,还会有反复offline的情况,而且nebula-storaged的cpu占有没有降低(此时没有读写业务),查看线程发现是名为executor-pri3的线程占用cpu,打印堆栈发现该线程一直在做RaftPart::processAppendLogRequest。

什么版本?:thinking: 有 storage 的日志可以提供下么?

版本 3.0.2

启动时Scan path比较慢

I20230329 10:54:28.818670 1861112 NebulaStore.cpp:86] Scan path "/data/starsdb/storagedata/nebula/8962"
I20230329 10:54:28.818744 1861112 RocksEngineConfig.cpp:343] Emplace rocksdb option max_background_jobs=4
I20230329 10:54:28.818753 1861112 RocksEngineConfig.cpp:343] Emplace rocksdb option max_subcompactions=4
I20230329 10:54:28.818840 1861112 RocksEngineConfig.cpp:343] Emplace rocksdb option max_bytes_for_level_base=268435456
I20230329 10:54:28.818846 1861112 RocksEngineConfig.cpp:343] Emplace rocksdb option max_write_buffer_number=4
I20230329 10:54:28.818850 1861112 RocksEngineConfig.cpp:343] Emplace rocksdb option write_buffer_size=67108864
I20230329 10:54:28.818853 1861112 RocksEngineConfig.cpp:343] Emplace rocksdb option disable_auto_compactions=false
I20230329 10:54:28.818936 1861112 RocksEngineConfig.cpp:343] Emplace rocksdb option block_size=8192
I20230329 10:54:28.819415 1861360 RaftPart.cpp:1020] [Port: 9780, Space: 9012, Part: 12] Start leader election, reason: lastMsgDur 2004, term 13
I20230329 10:54:28.819427 1861360 RaftPart.cpp:1186] [Port: 9780, Space: 9012, Part: 12] Sending out an election request (space = 9012, part = 12, term = 14, lastLogId = 1362524, lastLogTerm = 13, candidateIP = 10.33.75.50, candidatePort = 9780), isPreVote = 1
I20230329 10:54:28.819432 1861360 RaftPart.cpp:1141] [Port: 9780, Space: 9012, Part: 12] Partition win prevote of term 14
I20230329 10:54:28.819437 1861360 RaftPart.cpp:1186] [Port: 9780, Space: 9012, Part: 12] Sending out an election request (space = 9012, part = 12, term = 14, lastLogId = 1362524, lastLogTerm = 13, candidateIP = 10.33.75.50, candidatePort = 9780), isPreVote = 0
I20230329 10:54:28.819442 1861360 RaftPart.cpp:1143] [Port: 9780, Space: 9012, Part: 12] Partition is elected as the new leader for term 14
I20230329 10:54:28.838356 1861112 RocksEngine.cpp:142] open rocksdb on /data/starsdb/storagedata/nebula/8962/data
I20230329 10:54:28.838387 1861112 NebulaStore.cpp:138] Need to open 3 parts of space 8962
I20230329 10:54:28.838973 1861349 FileBasedWal.cpp:66] [Port: 9780, Space: 8962, Part: 6] lastLogId in wal is 2451928, lastLogTerm is 78, path is /data/starsdb/storagedata/nebula/8962/wal/6/0000000000002451924.wal
I20230329 10:54:28.839022 1861349 RaftPart.cpp:299] [Port: 9780, Space: 8962, Part: 6] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 2451928, lastLogTerm 78, committedLogId 2451927, committedLogTerm 78, term 78
I20230329 10:54:28.839030 1861349 RaftPart.cpp:307] [Port: 9780, Space: 8962, Part: 6] Add peer "10.33.75.51":9780
I20230329 10:54:28.839038 1861349 RaftPart.cpp:307] [Port: 9780, Space: 8962, Part: 6] Add peer "10.33.75.24":9780
I20230329 10:54:28.839080 1861349 NebulaStore.cpp:142] Load part 8962, 6 from disk
I20230329 10:54:28.841084 1861350 FileBasedWal.cpp:66] [Port: 9780, Space: 8962, Part: 9] lastLogId in wal is 2472543, lastLogTerm is 106, path is /data/starsdb/storagedata/nebula/8962/wal/9/0000000000002472534.wal
I20230329 10:54:28.841135 1861350 RaftPart.cpp:299] [Port: 9780, Space: 8962, Part: 9] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 2472543, lastLogTerm 106, committedLogId 2472542, committedLogTerm 106, term 106
I20230329 10:54:28.841142 1861350 RaftPart.cpp:307] [Port: 9780, Space: 8962, Part: 9] Add peer "10.33.75.51":9780
I20230329 10:54:28.841153 1861350 RaftPart.cpp:307] [Port: 9780, Space: 8962, Part: 9] Add peer "10.33.75.176":9780
I20230329 10:54:28.841190 1861350 NebulaStore.cpp:142] Load part 8962, 9 from disk
I20230329 10:54:29.076620 1861351 RaftPart.cpp:1020] [Port: 9780, Space: 8962, Part: 6] Start leader election, reason: lastMsgDur 238, term 78
I20230329 10:54:29.076650 1861351 RaftPart.cpp:1186] [Port: 9780, Space: 8962, Part: 6] Sending out an election request (space = 8962, part = 6, term = 79, lastLogId = 2451928, lastLogTerm = 78, candidateIP = 10.33.75.50, candidatePort = 9780), isPreVote = 1

loadpart完成后,nebula-storaged的线程持续processAppendLogRequest:

I20230329 14:27:14.946261 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 23, Part: 86] Received logAppend: GraphSpaceId = 23, partition = 86, leaderIp = 10.33.75.227, leaderPort = 9780, current_term = 148, committedLogId = 2171855, lastLogIdSent = 2171855, lastLogTermSent = 148, num_logs = 1, local lastLogId = 2171855, local lastLogTerm = 148, local committedLogId = 2171854, local current term = 148, wal lastLogId = 2171855
I20230329 14:27:14.946318 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 7343, Part: 23] Received logAppend: GraphSpaceId = 7343, partition = 23, leaderIp = 10.33.75.227, leaderPort = 9780, current_term = 173, committedLogId = 1924792, lastLogIdSent = 1924790, lastLogTermSent = 173, num_logs = 3, local lastLogId = 1924791, local lastLogTerm = 173, local committedLogId = 1924790, local current term = 173, wal lastLogId = 1924791
I20230329 14:27:14.946545 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 23, Part: 20] Received logAppend: GraphSpaceId = 23, partition = 20, leaderIp = 10.33.75.227, leaderPort = 9780, current_term = 185, committedLogId = 2172225, lastLogIdSent = 2172222, lastLogTermSent = 185, num_logs = 4, local lastLogId = 2172222, local lastLogTerm = 185, local committedLogId = 2172221, local current term = 185, wal lastLogId = 2172222
I20230329 14:27:14.946597 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 9079, Part: 78] Received logAppend: GraphSpaceId = 9079, partition = 78, leaderIp = 10.33.75.24, leaderPort = 9780, current_term = 109, committedLogId = 277977, lastLogIdSent = 277973, lastLogTermSent = 109, num_logs = 5, local lastLogId = 277977, local lastLogTerm = 109, local committedLogId = 277976, local current term = 109, wal lastLogId = 277977
I20230329 14:27:14.946619 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 9, Part: 153] Received logAppend: GraphSpaceId = 9, partition = 153, leaderIp = 10.33.75.24, leaderPort = 9780, current_term = 150, committedLogId = 2215757, lastLogIdSent = 2215756, lastLogTermSent = 150, num_logs = 2, local lastLogId = 2215757, local lastLogTerm = 150, local committedLogId = 2215756, local current term = 150, wal lastLogId = 2215757
I20230329 14:27:14.946902 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 9, Part: 21] Received logAppend: GraphSpaceId = 9, partition = 21, leaderIp = 10.33.75.24, leaderPort = 9780, current_term = 105, committedLogId = 2215588, lastLogIdSent = 2215588, lastLogTermSent = 105, num_logs = 1, local lastLogId = 2215588, local lastLogTerm = 105, local committedLogId = 2215587, local current term = 105, wal lastLogId = 2215588
I20230329 14:27:14.946945 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 8881, Part: 9] Received logAppend: GraphSpaceId = 8881, partition = 9, leaderIp = 10.33.75.176, leaderPort = 9780, current_term = 135, committedLogId = 2073236, lastLogIdSent = 2073236, lastLogTermSent = 135, num_logs = 1, local lastLogId = 2073236, local lastLogTerm = 135, local committedLogId = 2073235, local current term = 135, wal lastLogId = 2073236
I20230329 14:27:14.946987 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 9, Part: 12] Received logAppend: GraphSpaceId = 9, partition = 12, leaderIp = 10.33.75.227, leaderPort = 9780, current_term = 176, committedLogId = 2216457, lastLogIdSent = 2216457, lastLogTermSent = 176, num_logs = 1, local lastLogId = 2216457, local lastLogTerm = 176, local committedLogId = 2216456, local current term = 176, wal lastLogId = 2216457
I20230329 14:27:14.947017 1861133 RaftPart.cpp:1479] [Port: 9780, Space: 8960, Part: 1] Received logAppend: GraphSpaceId = 8960, partition = 1, leaderIp = 10.33.75.24, leaderPort = 9780, current_term = 167, committedLogId = 1653464, lastLogIdSent = 1653463, lastLogTermSent = 167, num_logs = 2, local lastLogId = 1653463, local lastLogTerm = 167, local committedLogId = 1653462, local current term = 167, wal lastLogId = 1653463

Scan Path 过滤的日志

I20230329 10:42:03.570737 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:03.596880 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:14.345314 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:14.367148 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:17.310905 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:17.335930 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:17.366652 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:18.846082 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:29.938251 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:30.222200 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:30.254093 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:30.296523 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:30.333020 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:38.259353 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:38.295161 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:38.317075 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:40.163077 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:57.132236 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:57.153676 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:57.199170 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:42:57.226083 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:07.997990 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:24.581835 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:41.377270 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:41.401686 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:41.427265 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:41.456195 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:42.005578 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:42.516065 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:43.002496 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:50.822981 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:51.322454 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:51.673316 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:54.887676 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:55.383060 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:55.383379 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:56.903782 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:43:56.927683 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:44:15.303777 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:44:35.006249 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:44:35.042498 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:44:35.085417 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:44:46.867856 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:03.938952 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:04.577621 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:05.221843 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:05.847409 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:26.401607 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:56.779323 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:56.802223 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:45:59.831702 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:46:07.289501 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:46:07.317329 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:46:09.565328 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:46:25.999792 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:46:36.011343 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:46:45.385267 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:47:46.740020 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:48:05.118727 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:48:23.457866 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:48:26.381672 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:48:38.490351 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:48:58.761869 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:49:18.188868 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:49:18.209537 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:49:28.256917 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:49:40.499723 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:49:58.422632 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:50:06.901144 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:50:06.926417 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:50:06.950532 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:50:07.603909 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:50:08.230916 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:51:21.100204 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:51:21.122381 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:51:21.144598 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:51:24.251929 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:51:45.740239 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:52:03.724012 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:52:23.524173 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:52:23.551431 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:52:23.590588 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:52:25.398356 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:52:28.726351 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:52:46.508111 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:53:06.040277 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:53:08.993131 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:53:09.021349 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:53:09.053499 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:53:12.704010 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:53:58.610378 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:07.220207 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:07.246476 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:07.281637 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:28.818670 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:36.024328 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:36.045588 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:36.065451 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:48.677271 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:57.498821 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:57.534528 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:54:57.563025 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:55:16.015739 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:56:03.664974 1861112 NebulaStore.cpp:86] Scan path 
I20230329 10:56:03.698127 1861112 NebulaStore.cpp:86] Scan path 

有几个问题:

  1. 版本确定是 3.0.2 吗?日志行数有点对不上。
  2. 你们数据量多大?
  3. 在启动之后,是不是立即有写入进来?

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。