集群更改ip后,storaged服务无法启动

提问参考模版:

  • nebula 版本:v2.6.1
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • CPU 8核
    • 内存 16G
  • 问题的具体描述
    集群一开始使用5台机器搭建的,使用途中删除了一台机器,目前只有4台机器
    31被删除
    32
    33
    34
    35

现在需要把集群的ip修改为
52
53
54
55

修改后集群的storaged服务无法启动

show hosts

nebula-storaged.INFO

I0509 09:06:27.433871 10518 MetaClient.cpp:3012] Load leader ok
I0509 09:06:27.434579 10518 MetaClient.cpp:117] Register time task for heartbeat!
I0509 09:06:27.434758 10518 StorageServer.cpp:167] Init schema manager
I0509 09:06:27.434820 10518 StorageServer.cpp:170] Init index manager
I0509 09:06:27.434850 10518 StorageServer.cpp:173] Init kvstore
I0509 09:06:27.435043 10518 NebulaStore.cpp:52] Start the raft service...
I0509 09:06:27.436919 10518 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 8388608 for each part by default
I0509 09:06:27.437820 10518 RaftexService.cpp:62] Init thrift server for raft service, port: 9780
I0509 09:06:27.438207 10582 RaftexService.cpp:93] Starting the Raftex Service
I0509 09:06:27.460773 10582 RaftexService.cpp:83] Starting the Raftex Service on 9780
I0509 09:06:27.460942 10582 RaftexService.cpp:103] Start the Raftex Service successfully
I0509 09:06:27.462630 10518 NebulaStore.cpp:84] Scan the local path, and init the spaces_
I0509 09:06:27.463361 10518 NebulaStore.cpp:90] Scan path "/data/storage/nebula/0"
I0509 09:06:27.463435 10518 NebulaStore.cpp:90] Scan path "/data/storage/nebula/1"
I0509 09:06:27.463479 10518 NebulaStore.cpp:90] Scan path "/data/storage/nebula/2"
I0509 09:06:27.464795 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_background_jobs=4
I0509 09:06:27.464946 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_subcompactions=4
I0509 09:06:27.465715 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_bytes_for_level_base=268435456
I0509 09:06:27.465795 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_write_buffer_number=4
I0509 09:06:27.465839 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option write_buffer_size=67108864
I0509 09:06:27.465870 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option disable_auto_compactions=false
I0509 09:06:27.466616 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option block_size=8192
I0509 09:06:36.522929 10518 RocksEngine.cpp:128] open rocksdb on /data/storage/nebula/2/data
I0509 09:06:36.523043 10518 NebulaStore.cpp:114] Load space 2 from disk
I0509 09:06:36.523203 10518 NebulaStore.cpp:142] Need to open 19 parts of space 2
I0509 09:06:38.005262 10579 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 4] lastLogId in wal is 11592752, lastLogTerm 
is 20638, path is /data/storage/nebula/2/wal/4/0000000000011492914.wal
I0509 09:06:38.005837 10579 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 4] There are 2 peer hosts, and total 3 copies. T
he quorum is 2, as learner 0, lastLogId 11592752, lastLogTerm 20638, committedLogId 11592751, term 20638
I0509 09:06:38.005877 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 4] Add peer "172.16.10.55":9780
I0509 09:06:38.005949 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 4] Add peer "172.16.10.53":9780
I0509 09:06:38.006299 10579 NebulaStore.cpp:146] Load part 2, 4 from disk
I0509 09:06:40.609370 10579 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 9] lastLogId in wal is 11598177, lastLogTerm 
is 20669, path is /data/storage/nebula/2/wal/9/0000000000011404491.wal
I0509 09:06:40.609550 10579 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 9] There are 2 peer hosts, and total 3 copies. T
he quorum is 2, as learner 0, lastLogId 11598177, lastLogTerm 20669, committedLogId 11598177, term 20669
I0509 09:06:40.609606 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 9] Add peer "172.16.10.55":9780
I0509 09:06:40.609668 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 9] Add peer "172.16.10.53":9780
I0509 09:06:40.609773 10579 NebulaStore.cpp:146] Load part 2, 9 from disk
I0509 09:06:41.192564 10580 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 5] lastLogId in wal is 13408737, lastLogTerm 
is 55, path is /data/storage/nebula/2/wal/5/0000000000013036282.wal
I0509 09:06:41.192863 10580 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 5] There are 2 peer hosts, and total 3 copies. T
he quorum is 2, as learner 0, lastLogId 13408737, lastLogTerm 55, committedLogId 13408736, term 55
I0509 09:06:41.192885 10580 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 5] Add peer "172.16.10.53":9780
I0509 09:06:41.192943 10580 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 5] Add peer "172.16.10.54":9780
I0509 09:06:41.193236 10580 NebulaStore.cpp:146] Load part 2, 5 from disk
I0509 09:06:41.241335 10579 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 14] lastLogId in wal is 11595154, lastLogTerm
 is 285, path is /data/storage/nebula/2/wal/14/0000000000011559606.wal
I0509 09:06:41.241525 10579 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 14] There are 2 peer hosts, and total 3 copies. 
Log file created at: 2022/05/09 09:06:23
Running on machine: nebula-graph-node2.graph.geyecloud.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0509 09:06:23.490108 10518 StorageDaemon.cpp:125] localhost = "172.16.10.52":9779
I0509 09:06:23.718842 10518 MetaClient.cpp:57] Create meta client to "172.16.10.51":9559
I0509 09:06:25.744046 10518 FileBasedClusterIdMan.cpp:53] Get clusterId: 3669655435052860623
I0509 09:06:27.433560 10518 MetaClient.cpp:3006] Load leader of "172.16.10.52":9779 in 0 space
I0509 09:06:27.433773 10518 MetaClient.cpp:3006] Load leader of "172.16.10.53":9779 in 0 space
I0509 09:06:27.433820 10518 MetaClient.cpp:3006] Load leader of "172.16.10.54":9779 in 0 space
I0509 09:06:27.433845 10518 MetaClient.cpp:3006] Load leader of "172.16.10.55":9779 in 0 space
I0509 09:06:27.433871 10518 MetaClient.cpp:3012] Load leader ok
I0509 09:06:27.434579 10518 MetaClient.cpp:117] Register time task for heartbeat!
I0509 09:06:27.434758 10518 StorageServer.cpp:167] Init schema manager
I0509 09:06:27.434820 10518 StorageServer.cpp:170] Init index manager
I0509 09:06:27.434850 10518 StorageServer.cpp:173] Init kvstore
I0509 09:06:27.435043 10518 NebulaStore.cpp:52] Start the raft service...
I0509 09:06:27.436919 10518 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 8388608 for each part by default
I0509 09:06:27.437820 10518 RaftexService.cpp:62] Init thrift server for raft service, port: 9780
I0509 09:06:27.438207 10582 RaftexService.cpp:93] Starting the Raftex Service
I0509 09:06:27.460773 10582 RaftexService.cpp:83] Starting the Raftex Service on 9780
I0509 09:06:27.460942 10582 RaftexService.cpp:103] Start the Raftex Service successfully
I0509 09:06:27.462630 10518 NebulaStore.cpp:84] Scan the local path, and init the spaces_
I0509 09:06:27.463361 10518 NebulaStore.cpp:90] Scan path "/data/storage/nebula/0"
I0509 09:06:27.463435 10518 NebulaStore.cpp:90] Scan path "/data/storage/nebula/1"
I0509 09:06:27.463479 10518 NebulaStore.cpp:90] Scan path "/data/storage/nebula/2"
I0509 09:06:27.464795 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_background_jobs=4
I0509 09:06:27.464946 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_subcompactions=4
I0509 09:06:27.465715 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_bytes_for_level_base=268435456
I0509 09:06:27.465795 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option max_write_buffer_number=4
I0509 09:06:27.465839 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option write_buffer_size=67108864
I0509 09:06:27.465870 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option disable_auto_compactions=false
I0509 09:06:27.466616 10518 RocksEngineConfig.cpp:299] Emplace rocksdb option block_size=8192
I0509 09:06:36.522929 10518 RocksEngine.cpp:128] open rocksdb on /data/storage/nebula/2/data
I0509 09:06:36.523043 10518 NebulaStore.cpp:114] Load space 2 from disk
I0509 09:06:36.523203 10518 NebulaStore.cpp:142] Need to open 19 parts of space 2
I0509 09:06:38.005262 10579 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 4] lastLogId in wal is 11592752, lastLogTerm is 20638, path is /data/storage/nebula/2/wal/4/0000000000011492914.wal
I0509 09:06:38.005837 10579 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 4] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11592752, lastLogTerm 20638, committedLogId 11592751, term 20638
I0509 09:06:38.005877 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 4] Add peer "172.16.10.55":9780
I0509 09:06:38.005949 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 4] Add peer "172.16.10.53":9780
I0509 09:06:38.006299 10579 NebulaStore.cpp:146] Load part 2, 4 from disk
I0509 09:06:40.609370 10579 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 9] lastLogId in wal is 11598177, lastLogTerm is 20669, path is /data/storage/nebula/2/wal/9/0000000000011404491.wal
I0509 09:06:40.609550 10579 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 9] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11598177, lastLogTerm 20669, committedLogId 11598177, term 20669
I0509 09:06:40.609606 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 9] Add peer "172.16.10.55":9780
I0509 09:06:40.609668 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 9] Add peer "172.16.10.53":9780
I0509 09:06:40.609773 10579 NebulaStore.cpp:146] Load part 2, 9 from disk
I0509 09:06:41.192564 10580 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 5] lastLogId in wal is 13408737, lastLogTerm is 55, path is /data/storage/nebula/2/wal/5/0000000000013036282.wal
I0509 09:06:41.192863 10580 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 5] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 13408737, lastLogTerm 55, committedLogId 13408736, term 55
I0509 09:06:41.192885 10580 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 5] Add peer "172.16.10.53":9780
I0509 09:06:41.192943 10580 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 5] Add peer "172.16.10.54":9780
I0509 09:06:41.193236 10580 NebulaStore.cpp:146] Load part 2, 5 from disk
I0509 09:06:41.241335 10579 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 14] lastLogId in wal is 11595154, lastLogTerm is 285, path is /data/storage/nebula/2/wal/14/0000000000011559606.wal
I0509 09:06:41.241525 10579 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 14] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11595154, lastLogTerm 285, committedLogId 11595153, term 285
I0509 09:06:41.241549 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 14] Add peer "172.16.10.55":9780
I0509 09:06:41.241582 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 14] Add peer "172.16.10.54":9780
I0509 09:06:41.241762 10579 NebulaStore.cpp:146] Load part 2, 14 from disk
I0509 09:06:41.408114 10577 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 1] lastLogId in wal is 11619935, lastLogTerm is 233, path is /data/storage/nebula/2/wal/1/0000000000011247381.wal
I0509 09:06:41.408421 10577 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 1] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11619935, lastLogTerm 233, committedLogId 11619934, term 233
I0509 09:06:41.408449 10577 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 1] Add peer "172.16.10.53":9780
I0509 09:06:41.408497 10577 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 1] Add peer "172.16.10.54":9780
I0509 09:06:41.408808 10577 NebulaStore.cpp:146] Load part 2, 1 from disk
I0509 09:06:41.722622 10578 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 3] lastLogId in wal is 11631989, lastLogTerm is 640, path is /data/storage/nebula/2/wal/3/0000000000011257904.wal
I0509 09:06:41.722899 10578 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 3] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11631989, lastLogTerm 640, committedLogId 11631988, term 640
I0509 09:06:41.722918 10578 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 3] Add peer "172.16.10.54":9780
I0509 09:06:41.722990 10578 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 3] Add peer "172.16.10.55":9780
I0509 09:06:41.723218 10578 NebulaStore.cpp:146] Load part 2, 3 from disk
I0509 09:06:43.890496 10579 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 20] lastLogId in wal is 13340621, lastLogTerm is 45, path is /data/storage/nebula/2/wal/20/0000000000013123807.wal
I0509 09:06:43.890710 10579 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 20] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 13340621, lastLogTerm 45, committedLogId 13340620, term 45
I0509 09:06:43.890735 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 20] Add peer "172.16.10.53":9780
I0509 09:06:43.890769 10579 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 20] Add peer "172.16.10.55":9780
I0509 09:06:43.890868 10579 NebulaStore.cpp:146] Load part 2, 20 from disk
W0509 09:06:44.194176 10555 RaftexService.cpp:165] Cannot find the part 6 in the graph space 2
I0509 09:06:44.199333 10555 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 5] Recieved a VOTING request: space = 2, partition = 5, candidateAddr = 172.16.10.53:9780, term = 56, lastLogId = 13408796, lastLogTerm = 55
I0509 09:06:44.199405 10555 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 5] The partition currently is a Follower, lastLogId 13408737, lastLogTerm 55, committedLogId 13408736, term 55
I0509 09:06:44.199487 10555 RaftPart.cpp:1329] [Port: 9780, Space: 2, Part: 5] The partition will vote for the candidate "172.16.10.53":9780
I0509 09:06:44.202205 10574 RaftPart.cpp:1613] [Port: 9780, Space: 2, Part: 5] The current role is Follower. Will follow the new leader 172.16.10.53:9780 [Term: 56]
I0509 09:06:44.203068 10556 RaftPart.cpp:1490] [Port: 9780, Space: 2, Part: 5] Local is missing logs from id 13408737. Need to catch up
I0509 09:06:45.288973 10578 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 8] lastLogId in wal is 11619104, lastLogTerm is 142, path is /data/storage/nebula/2/wal/8/0000000000011307776.wal
I0509 09:06:45.289157 10578 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 8] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11619104, lastLogTerm 142, committedLogId 11619103, term 142
I0509 09:06:45.289175 10578 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 8] Add peer "172.16.10.54":9780
I0509 09:06:45.289206 10578 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 8] Add peer "172.16.10.55":9780
I0509 09:06:45.289320 10578 NebulaStore.cpp:146] Load part 2, 8 from disk
I0509 09:06:45.413447 10577 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 6] lastLogId in wal is 11615007, lastLogTerm is 368, path is /data/storage/nebula/2/wal/6/0000000000011338508.wal
I0509 09:06:45.413671 10577 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 6] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11615007, lastLogTerm 368, committedLogId 11615006, term 368
I0509 09:06:45.413693 10577 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 6] Add peer "172.16.10.53":9780
I0509 09:06:45.413718 10577 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 6] Add peer "172.16.10.54":9780
I0509 09:06:45.413805 10577 NebulaStore.cpp:146] Load part 2, 6 from disk
I0509 09:06:45.654750 10555 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 4] Recieved a VOTING request: space = 2, partition = 4, candidateAddr = 172.16.10.55:9780, term = 20638, lastLogId = 11537357, lastLogTerm = 20637
I0509 09:06:45.654814 10555 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 4] The partition currently is a Follower, lastL
ogId 11592752, lastLogTerm 20638, committedLogId 11592751, term 20638
I0509 09:06:45.654825 10555 RaftPart.cpp:1280] [Port: 9780, Space: 2, Part: 4] The partition currently is on term 20638. The term proposed by the candidate is no greater, so it will be rejected
I0509 09:06:45.656147 10555 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 9] Recieved a VOTING request: space = 2, partition = 9, candidateAddr = 172.16.10.55:9780, term = 20669, lastLogId = 11542782, lastLogTerm = 20668
I0509 09:06:45.656208 10555 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 9] The partition currently is a Follower, lastLogId 11598177, lastLogTerm 20669, committedLogId 11598177, term 20669
I0509 09:06:45.656227 10555 RaftPart.cpp:1280] [Port: 9780, Space: 2, Part: 9] The partition currently is on term 20669. The term proposed by the candidate is no greater, so it will be rejected
I0509 09:06:46.154844 10580 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 10] lastLogId in wal is 11634472, lastLogTerm is 105, path is /data/storage/nebula/2/wal/10/0000000000011260958.wal
I0509 09:06:46.155035 10580 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 10] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11634472, lastLogTerm 105, committedLogId 11634472, term 105
I0509 09:06:46.155052 10580 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 10] Add peer "172.16.10.53":9780
I0509 09:06:46.155095 10580 RaftPart.cpp:285] [Port: 9780, Space: 2, Part: 10] Add peer "172.16.10.55":9780
I0509 09:06:46.155236 10580 NebulaStore.cpp:146] Load part 2, 10 from disk
I0509 09:06:47.585716 10556 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 9] Recieved a VOTING request: space = 2, partition = 9, candidateAddr = 172.16.10.55:9780, term = 20670, lastLogId = 11542782, lastLogTerm = 20668
I0509 09:06:47.585831 10556 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 9] The partition currently is a Follower, lastLogId 11598177, lastLogTerm 20669, committedLogId 11598177, term 20669
I0509 09:06:47.585850 10556 RaftPart.cpp:1292] [Port: 9780, Space: 2, Part: 9] The partition's last term to receive a log is 20669, which is newer than the candidate's log 20668. So the candidate will be rejected
I0509 09:06:47.909714 10554 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 10] Recieved a VOTING request: space = 2, partition = 10, candidateAddr = 172.16.10.53:9780, term = 106, lastLogId = 11634532, lastLogTerm = 105
I0509 09:06:47.909816 10554 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 10] The partition currently is a Follower, lastLogId 11634472, lastLogTerm 105, committedLogId 11634472, term 105
I0509 09:06:47.909844 10554 RaftPart.cpp:1329] [Port: 9780, Space: 2, Part: 10] The partition will vote for the candidate "172.16.10.53":9780
I0509 09:06:47.911628 10567 RaftPart.cpp:1613] [Port: 9780, Space: 2, Part: 10] The current role is Follower. Will follow the new leader 172.16.10.53:9780 [Term: 106]
I0509 09:06:47.911828 10554 RaftPart.cpp:1490] [Port: 9780, Space: 2, Part: 10] Local is missing logs from id 11634472. Need to catch up
I0509 09:06:47.919452 10554 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 4] Recieved a VOTING request: space = 2, partition = 4, candidateAddr = 172.16.10.53:9780, term = 20639, lastLogId = 11592811, lastLogTerm = 20638
I0509 09:06:47.919495 10554 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 4] The partition currently is a Follower, lastLogId 11592752, lastLogTerm 20638, committedLogId 11592751, term 20638
I0509 09:06:47.919517 10554 RaftPart.cpp:1329] [Port: 9780, Space: 2, Part: 4] The partition will vote for the candidate "172.16.10.53":9780
I0509 09:06:47.920217 10554 RaftPart.cpp:1613] [Port: 9780, Space: 2, Part: 4] The current role is Follower. Will follow the new leader 172.16.10.53:9780 [Term: 20639]
I0509 09:06:47.920282 10554 RaftPart.cpp:1490] [Port: 9780, Space: 2, Part: 4] Local is missing logs from id 11592752. Need to catch up
I0509 09:06:47.922717 10575 RaftPart.cpp:1613] [Port: 9780, Space: 2, Part: 6] The current role is Follower. Will follow the new leader 172.16.10.53:9780 [Term: 369]
I0509 09:06:47.950902 10554 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 20] Recieved a VOTING request: space = 2, partition = 20, candidateAddr = 172.16.10.55:9780, term = 45, lastLogId = 13285223, lastLogTerm = 44
I0509 09:06:47.950954 10554 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 20] The partition currently is a Follower, lastLogId 13340621, lastLogTerm 45, committedLogId 13340620, term 45
I0509 09:06:47.950973 10554 RaftPart.cpp:1280] [Port: 9780, Space: 2, Part: 20] The partition currently is on term 45. The term proposed by the candidate is no greater, so it will be rejected
I0509 09:06:48.135762 10578 FileBasedWal.cpp:66] [Port: 9780, Space: 2, Part: 13] lastLogId in wal is 11578248, lastLogTerm is 615, path is /data/storage/nebula/2/wal/13/0000000000011381826.wal
I0509 09:06:48.135965 10578 RaftPart.cpp:278] [Port: 9780, Space: 2, Part: 13] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 11578248, lastLogTerm 615, committedLogId 11578247, term 615

目前的状态
52,53.54,集群的cpu被storaged跑满了,55正常

修改IP是怎么改的,有改meta里面的信息吗

只使用
curl -G “http://172.16.10.53:19559/replace?from=172.16.10.32&to=172.16.10.52
修改了信息

集群运行了一晚上,目前可以查询数据了,但是53那台集群的storaged还在打印错误日志,其他的几台机器好像正常了

嗯,这个就是改meta,配置文件也对应的改了吧

是的,配置文件也做了修改,目前还有一台机器在打印错误日志

E0509 09:06:48.834008 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 21] Receive response about askForVote from "172.16.10.54":9780, error code is E_UNKNOWN_PART
E0509 09:06:48.834106 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 21] Receive response about askForVote from "172.16.10.52":9780, error code is E_UNKNOWN_PART
E0509 09:06:53.573823 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0509 09:06:53.573925 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0509 09:08:04.428822 30913 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 24] Receive response about askForVote from "172.16.10.55":9780, error code is E_UNKNOWN_PART
E0509 09:34:29.919396 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 17] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0509 09:43:18.202214 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 11] Receive response about askForVote from "172.16.10.52":9780, error code is E_TERM_OUT_OF_DATE
E0509 09:44:49.221140 30914 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 21] Receive response about askForVote from "172
...skipping...
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:11.345296 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 17] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:11.497965 30915 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:11.498136 30915 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:11.552317 30914 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:11.552429 30914 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:11.677139 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:11.677261 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.165814 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.165889 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.171233 30916 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.171276 30916 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.584167 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 17] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.584223 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 17] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.654099 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:12.654146 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:13.164574 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:13.164701 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:13.273356 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:13.273474 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:14.149354 30916 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:14.149430 30916 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:14.356637 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:14.356752 30915 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:14.820452 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:14.820506 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:15.688313 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:15.688366 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:16.735850 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:16.735909 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:18.713814 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 01:39:18.713883 30916 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE

一开始删除一台机器是怎么做的,数据balance了吗,看日志是选主失败,好像少了一个副本

当时使用BALANCE DATA REMOVE 172.16.10.31:9779命令删除的

是不是没balance完成

balance完成了,当时删除后,集群正常运行了很长时间
不过35那台机器,因内存溢出,挂掉一段时间,最近才发现的

请问,目前集群的状态算是正常吗,还需要做什么操作吗

nebula-storaged.INFO

I0510 02:04:34.084760 30914 Part.cpp:209] [Port: 9780, Space: 10, Part: 7] Find the new leader "172.16.10.54":9780
I0510 02:04:34.177819 30865 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:34.177932 30865 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:34.178006 30879 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:34.178038 30865 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 6] total time:1763ms, Total send logs: 2
I0510 02:04:34.178136 30865 SlowOpTracker.h:31] [Port: 9780, Space: 10, Part: 6] total time:1763ms, Total send logs: 2
I0510 02:04:34.178200 30865 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 6] total time:1763ms, Total send logs: 2
I0510 02:04:34.178340 30865 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 20] Recieved a VOTING request: space = 2, partition = 20, candidateAddr = 172.16.10.55:9780, term = 113, lastLogId = 13341338, lastLogTerm = 112
I0510 02:04:34.178357 30865 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 20] The partition currently is a Leader, lastLogId 13342118, lastLogTerm 112, committedLogId 13342118, term 112
I0510 02:04:34.178369 30865 RaftPart.cpp:1302] [Port: 9780, Space: 2, Part: 20] The partition's last log id is 13342118. The candidate's last log id 13341338 is smaller, so it will be rejected, candidate is "172.16.10.55":9780
I0510 02:04:35.989073 30879 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:35.989173 30879 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:35.989248 30862 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:37.565536 30862 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:37.565634 30862 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:37.565734 30863 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:39.102746 30863 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:39.102890 30863 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:39.103089 30891 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:40.289386 30914 RaftPart.cpp:957] [Port: 9780, Space: 10, Part: 22] Start leader election, reason: lastMsgDur 3
6740, term 356
I0510 02:04:40.289536 30914 RaftPart.cpp:1095] [Port: 9780, Space: 10, Part: 22] Sending out an election request (space = 1
0, part = 22, term = 357, lastLogId = 739512, lastLogTerm = 93, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 02:04:40.290865 30914 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 22] Receive response about askForVote from "17
2.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 02:04:40.291004 30914 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 22] Receive response about askForVote from "17
2.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 02:04:40.291029 30914 RaftPart.cpp:1162] [Port: 9780, Space: 10, Part: 22] No one is elected, continue the election
I0510 02:04:40.676642 30914 RaftPart.cpp:957] [Port: 9780, Space: 4, Part: 7] Start leader election, reason: lastMsgDur 371
28, term 378
I0510 02:04:40.676925 30914 RaftPart.cpp:1095] [Port: 9780, Space: 4, Part: 7] Sending out an election request (space = 4, 
part = 7, term = 379, lastLogId = 1341324, lastLogTerm = 133, candidateIP = 172.16.10.53, candidatePort = 9780)
I0510 02:04:41.300149 30891 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:41.300243 30891 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:41.300339 30877 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:41.415724 30913 RaftPart.cpp:1095] [Port: 9780, Space: 10, Part: 22] Sending out an election request (space = 1
0, part = 22, term = 358, lastLogId = 739512, lastLogTerm = 93, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 02:04:41.417413 30913 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 22] Receive response about askForVote from "17
2.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 02:04:41.417484 30913 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 22] Receive response about askForVote from "17
2.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 02:04:41.417507 30913 RaftPart.cpp:1162] [Port: 9780, Space: 10, Part: 22] No one is elected, continue the election
I0510 02:04:42.495828 30877 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:42.495932 30877 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:42.496019 30876 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:42.496074 30877 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 21] total time:1196ms, Total send logs: 2
I0510 02:04:42.496588 30877 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 25] total time:1196ms, Total send logs: 2
I0510 02:04:42.496671 30877 SlowOpTracker.h:31] [Port: 9780, Space: 6, Part: 6] total time:1196ms, Total send logs: 2
I0510 02:04:42.496738 30877 SlowOpTracker.h:31] [Port: 9780, Space: 10, Part: 21] total time:1196ms, Total send logs: 2
I0510 02:04:42.496853 30877 SlowOpTracker.h:31] [Port: 9780, Space: 10, Part: 6] total time:1195ms, Total send logs: 2
I0510 02:04:42.496912 30877 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 5] total time:1197ms, Total send logs: 2
I0510 02:04:42.496977 30877 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 4] total time:1197ms, Total send logs: 2
I0510 02:04:42.497026 30877 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 6] total time:1196ms, Total send logs: 2
I0510 02:04:42.497073 30877 SlowOpTracker.h:31] [Port: 9780, Space: 4, Part: 15] total time:1197ms, Total send logs: 2
I0510 02:04:42.497150 30877 SlowOpTracker.h:31] [Port: 9780, Space: 4, Part: 6] total time:1197ms, Total send logs: 2
I0510 02:04:42.497200 30877 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 5] total time:1197ms, Total send logs: 2
I0510 02:04:42.497247 30877 SlowOpTracker.h:31] [Port: 9780, Space: 10, Part: 25] total time:1197ms, Total send logs: 2
I0510 02:04:42.497296 30877 SlowOpTracker.h:31] [Port: 9780, Space: 6, Part: 9] total time:1197ms, Total send logs: 2
I0510 02:04:42.497342 30877 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 9] total time:1196ms, Total send logs: 2
I0510 02:04:42.497386 30877 SlowOpTracker.h:31] [Port: 9780, Space: 6, Part: 24] total time:1197ms, Total send logs: 2
I0510 02:04:42.497433 30877 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 24] total time:1197ms, Total send logs: 2
I0510 02:04:42.497478 30877 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 9] total time:1197ms, Total send logs: 2
I0510 02:04:42.497550 30877 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 20] total time:1197ms, Total send logs: 2
I0510 02:04:42.497596 30877 SlowOpTracker.h:31] [Port: 9780, Space: 6, Part: 10] total time:1197ms, Total send logs: 2
I0510 02:04:42.497640 30877 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 10] total time:1196ms, Total send logs: 2
I0510 02:04:44.299540 30876 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:44.299703 30876 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:44.299896 30867 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:45.830327 30867 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:45.830498 30867 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:45.830869 30885 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:47.036535 30885 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:47.036643 30885 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:47.036749 30892 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:48.560137 30892 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:48.560257 30892 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:48.560413 30880 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:48.560634 30892 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 16] Recieved a VOTING request: space = 2, partition = 16, candidateAddr = 172.16.10.54:9780, term = 509, lastLogId = 11600527, lastLogTerm = 508
I0510 02:04:48.560672 30892 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 16] The partition currently is a Leader, lastLogId 11601234, lastLogTerm 508, committedLogId 11601234, term 508
I0510 02:04:48.560700 30892 RaftPart.cpp:1302] [Port: 9780, Space: 2, Part: 16] The partition's last log id is 11601234. The candidate's last log id 11600527 is smaller, so it will be rejected, candidate is "172.16.10.54":9780
I0510 02:04:48.650382 30911 RaftPart.cpp:1613] [Port: 9780, Space: 2, Part: 18] The current role is Candidate. Will follow the new leader 172.16.10.55:9780 [Term: 888]
I0510 02:04:50.993729 30880 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:50.993942 30880 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:50.994139 30864 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:52.689869 30864 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:52.689982 30864 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:52.690100 30869 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:52.690174 30864 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 10] total time:1695ms, Total send logs: 2
I0510 02:04:52.690558 30864 SlowOpTracker.h:31] [Port: 9780, Space: 4, Part: 15] total time:1697ms, Total send logs: 2
I0510 02:04:52.690657 30864 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 9] total time:1697ms, Total send logs: 2
I0510 02:04:52.690732 30864 SlowOpTracker.h:31] [Port: 9780, Space: 10, Part: 21] total time:1697ms, Total send logs: 2
I0510 02:04:52.690825 30864 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 9] total time:1696ms, Total send logs: 2
I0510 02:04:52.690891 30864 SlowOpTracker.h:31] [Port: 9780, Space: 10, Part: 25] total time:1696ms, Total send logs: 2
I0510 02:04:52.690940 30864 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 5] total time:1697ms, Total send logs: 2
I0510 02:04:52.690990 30864 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 25] total time:1697ms, Total send logs: 2
I0510 02:04:52.691052 30864 SlowOpTracker.h:31] [Port: 9780, Space: 6, Part: 6] total time:1696ms, Total send logs: 2
I0510 02:04:53.940541 30869 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:53.940686 30869 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:53.940791 30875 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:54.181964 30916 RaftPart.cpp:957] [Port: 9780, Space: 10, Part: 17] Start leader election, reason: lastMsgDur 30052, term 361
I0510 02:04:54.182070 30916 RaftPart.cpp:1095] [Port: 9780, Space: 10, Part: 17] Sending out an election request (space = 10, part = 17, term = 362, lastLogId = 739528, lastLogTerm = 97, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 02:04:54.183816 30916 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 17] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 02:04:54.183984 30916 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 17] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 02:04:54.184021 30916 RaftPart.cpp:1162] [Port: 9780, Space: 10, Part: 17] No one is elected, continue the election
I0510 02:04:54.232506 30913 RaftPart.cpp:957] [Port: 9780, Space: 7, Part: 7] Start leader election, reason: lastMsgDur 30100, term 393
I0510 02:04:54.232584 30913 RaftPart.cpp:1095] [Port: 9780, Space: 7, Part: 7] Sending out an election request (space = 7, part = 7, term = 394, lastLogId = 1188118, lastLogTerm = 116, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 02:04:54.234133 30913 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 02:04:54.234192 30913 RaftPart.cpp:1050] [Port: 9780, Space: 7, Part: 7] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
I0510 02:04:54.234215 30913 RaftPart.cpp:1162] [Port: 9780, Space: 7, Part: 7] No one is elected, continue the election
I0510 02:04:55.448468 30875 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:55.448565 30875 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:55.448632 30868 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:57.540938 30868 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:57.541049 30868 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:57.541136 30884 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:04:59.034153 30884 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:04:59.034226 30884 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:04:59.034397 30884 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:05:00.242020 30884 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:05:00.242138 30884 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:05:00.242242 30886 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:05:01.746733 30886 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:05:01.746847 30886 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:05:01.746924 30883 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:05:02.977243 30883 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:05:02.977357 30883 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:05:02.977478 30883 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 5] total time:1231ms, Total send logs: 2
I0510 02:05:02.977500 30890 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:05:02.977584 30883 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 9] total time:1230ms, Total send logs: 2
I0510 02:05:02.977633 30883 SlowOpTracker.h:31] [Port: 9780, Space: 7, Part: 10] total time:1231ms, Total send logs: 2
I0510 02:05:02.977702 30883 SlowOpTracker.h:31] [Port: 9780, Space: 4, Part: 15] total time:1231ms, Total send logs: 2
I0510 02:05:02.977763 30883 SlowOpTracker.h:31] [Port: 9780, Space: 10, Part: 21] total time:1231ms, Total send logs: 2
I0510 02:05:02.977854 30883 SlowOpTracker.h:31] [Port: 9780, Space: 2, Part: 9] total time:1231ms, Total send logs: 2
I0510 02:05:04.513417 30890 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:05:04.513517 30890 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:05:04.513581 30870 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 02:05:06.101167 30870 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 02:05:06.101289 30870 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, logLogTerm is 233, committedLogId is 11619994, term is 237
I0510 02:05:06.101377 30888 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is different from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994

如果挂的时间超过wal_ttl应该是要重新拉数据的,不确定现在集群状态是否正常,如果有数据导入的话可以先停一下,等节点数据都恢复后再看看。

1 个赞

好的,谢谢大佬,我再等一段时间看看。

嗯,可以观察下节点间是否有数据同步

大佬你好,请问怎样确定是否有数据同步呢,

看看网络流量,然后看看日志里面有没有send snapshot或者receive snapshot的信息

日志中,只有昨天的同步记录,没有今天同步的记录 :sob:
53机器

[root@nebula-graph-node3 logs]# grep 'snapshot' ./*
./nebula-metad.INFO:I0509 09:02:51.749081 30559 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 8388608 for each part by default
./nebula-metad.nebula-graph-node3.graph.geyecloud.com.root.log.INFO.20220509-090251.30559:I0509 09:02:51.749081 30559 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 8388608 for each part by default
./nebula-storaged.INFO:I0509 09:06:25.469946 30854 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 8388608 for each part by default
./nebula-storaged.INFO:I0509 10:46:40.482369 30901 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 13286385 in wal, send the snapshot, logIdToSend = 13340718, firstLogId in wal = 13289872, lastLogId in wal = 13340718
./nebula-storaged.INFO:I0509 10:46:40.520201   547 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340717, commitLogTerm = 48
./nebula-storaged.INFO:I0509 10:46:40.520475   547 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 10:46:40.520712   547 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 10:46:40.522856 30901 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340717
./nebula-storaged.INFO:I0509 10:50:31.995023 30895 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340719, firstLogId in wal = 13289872, lastLogId in wal = 13340719
./nebula-storaged.INFO:I0509 10:50:32.018021   551 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340718, commitLogTerm = 48
./nebula-storaged.INFO:I0509 10:50:32.018137   551 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 10:50:32.018262   551 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 12:20:03.029166 30897 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340720, firstLogId in wal = 13289872, lastLogId in wal = 13340720
./nebula-storaged.INFO:I0509 12:20:03.052956   827 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340719, commitLogTerm = 50
./nebula-storaged.INFO:I0509 12:20:03.053051   827 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 12:20:03.053169   827 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 12:20:07.253854 30897 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340719
./nebula-storaged.INFO:I0509 12:25:38.553382 30899 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340721, firstLogId in wal = 13289872, lastLogId in wal = 13340721
./nebula-storaged.INFO:I0509 12:25:38.554544   838 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340720, commitLogTerm = 52
./nebula-storaged.INFO:I0509 12:25:38.554615   838 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 12:25:38.554702   838 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 12:25:38.555092 30899 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340720
./nebula-storaged.INFO:I0509 12:29:38.476153 30906 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340722, firstLogId in wal = 13289872, lastLogId in wal = 13340722
./nebula-storaged.INFO:I0509 12:29:38.476511   547 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340721, commitLogTerm = 52
./nebula-storaged.INFO:I0509 12:29:38.476565   547 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 12:29:38.477155 30906 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340721
./nebula-storaged.INFO:I0509 12:30:57.759305   547 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 12:37:40.257385 30904 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340727, firstLogId in wal = 13289872, lastLogId in wal = 13340727
./nebula-storaged.INFO:I0509 12:37:40.257611   551 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340726, commitLogTerm = 52
./nebula-storaged.INFO:I0509 12:37:40.257705   551 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 12:37:40.257814   551 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 12:37:40.258205 30904 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340726
./nebula-storaged.INFO:I0509 12:45:20.817873 30901 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340735, firstLogId in wal = 13289872, lastLogId in wal = 13340735
./nebula-storaged.INFO:I0509 12:45:20.818670   827 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340734, commitLogTerm = 52
./nebula-storaged.INFO:I0509 12:45:20.818866   827 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 12:45:20.819682 30901 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340734
./nebula-storaged.INFO:I0509 12:49:23.172631   827 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 12:55:56.576490 30902 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340742, firstLogId in wal = 13289872, lastLogId in wal = 13340742
./nebula-storaged.INFO:I0509 12:55:56.576918   838 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340741, commitLogTerm = 52
./nebula-storaged.INFO:I0509 12:55:56.576994   838 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 12:55:56.577095   838 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 12:55:56.577415 30902 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340741
./nebula-storaged.INFO:I0509 12:58:04.333487 30904 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340745, firstLogId in wal = 13289872, lastLogId in wal = 13340745
./nebula-storaged.INFO:I0509 12:58:04.334298   547 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340744, commitLogTerm = 52
./nebula-storaged.INFO:I0509 12:58:04.334367 30904 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340744
./nebula-storaged.INFO:I0509 12:58:04.334439   547 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 12:58:04.334532   547 Host.cpp:414] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot failed!
./nebula-storaged.INFO:I0509 13:02:16.970922 30899 Host.cpp:404] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Can't find log 1 in wal, send the snapshot, logIdToSend = 13340750, firstLogId in wal = 13289872, lastLogId in wal = 13340750
./nebula-storaged.INFO:I0509 13:02:16.971418   551 SnapshotManager.cpp:43] [Port: 9780, Space: 2, Part: 20] Begin to send the snapshot, commitLogId = 13340749, commitLogTerm = 52
./nebula-storaged.INFO:I0509 13:02:16.971549   551 NebulaSnapshotManager.cpp:38] Space 2 Part 20 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 13:02:16.972431 30899 Host.cpp:282] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 13340749
./nebula-storaged.INFO:I0509 13:06:19.817755   551 Host.cpp:412] [Port: 9780, Space: 2, Part: 20] [Host: 172.16.10.55:9780] Send snapshot succeeded!
./nebula-storaged.INFO:I0509 16:28:15.057139 30911 Host.cpp:404] [Port: 9780, Space: 6, Part: 7] [Host: 172.16.10.54:9780] Can't find log 1024306 in wal, send the snapshot, logIdToSend = 1024590, firstLogId in wal = 1024315, lastLogId in wal = 1024590
./nebula-storaged.INFO:I0509 16:28:15.057669   827 SnapshotManager.cpp:43] [Port: 9780, Space: 6, Part: 7] Begin to send the snapshot, commitLogId = 1024590, commitLogTerm = 110
./nebula-storaged.INFO:I0509 16:28:15.057757   827 NebulaSnapshotManager.cpp:38] Space 6 Part 7 start send snapshot, rate limited to 8388608, batch size is 524288
./nebula-storaged.INFO:I0509 16:28:15.066144 30911 Host.cpp:282] [Port: 9780, Space: 6, Part: 7] [Host: 172.16.10.54:9780] The host is waiting for the snapshot, so we need to send log from current committedLogId 1024589
./nebula-storaged.INFO:I0509 16:28:17.813441   827 Host.cpp:412] [Port: 9780, Space: 6, Part: 7] [Host: 172.16.10.54:9780] Send snapshot succeeded!

55机器

[root@nebula-graph-node5 logs]# grep 'snapshot' ./*
./nebula-storaged.INFO:I0509 09:06:25.080827  9810 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 8388608 for each part by default
./nebula-storaged.INFO:I0509 10:46:40.512468  9832 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340717
./nebula-storaged.INFO:I0509 10:47:03.514458  9870 RaftPart.cpp:1030] [Port: 9780, Space: 2, Part: 20] The partition is still waitiong snapshot
./nebula-storaged.INFO:I0509 10:47:03.514561  9870 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 10:47:03.514578  9870 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 10:48:20.846554  9832 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340717
./nebula-storaged.INFO:I0509 10:49:33.589480  9870 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 10:49:33.589501  9870 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 10:50:41.490155  9821 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340718
./nebula-storaged.INFO:I0509 10:51:14.057572  9872 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 10:51:14.057600  9872 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 10:56:08.951915  9835 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340719
./nebula-storaged.INFO:I0509 10:56:34.685190  9870 RaftPart.cpp:1030] [Port: 9780, Space: 2, Part: 20] The partition is still waitiong snapshot
./nebula-storaged.INFO:I0509 10:56:34.685292  9870 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 10:56:34.685312  9870 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:20:07.241503  9831 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340719
./nebula-storaged.INFO:I0509 12:20:15.512737  9872 RaftPart.cpp:1030] [Port: 9780, Space: 2, Part: 20] The partition is still waitiong snapshot
./nebula-storaged.INFO:I0509 12:20:15.512871  9872 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:20:15.512892  9872 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:25:38.552585  9835 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340720
./nebula-storaged.INFO:I0509 12:27:13.107079  9869 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:27:13.107091  9869 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:29:38.474421  9833 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340721
./nebula-storaged.INFO:I0509 12:37:35.772302  9871 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:37:35.772317  9871 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:37:40.255578  9835 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340726
./nebula-storaged.INFO:I0509 12:44:35.876726  9871 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:44:35.876757  9871 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:45:20.816546  9833 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340734
./nebula-storaged.INFO:I0509 12:55:52.748584  9872 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:55:52.748597  9872 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:55:56.575637  9835 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340741
./nebula-storaged.INFO:I0509 12:57:37.635936  9871 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:57:37.635951  9871 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 12:58:04.332782  9844 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340744
./nebula-storaged.INFO:I0509 13:01:52.871423  9872 RaftPart.cpp:1200] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 13:01:52.871435  9872 RaftPart.cpp:1220] [Port: 9780, Space: 2, Part: 20] Clean up the snapshot
./nebula-storaged.INFO:I0509 13:02:16.970798  9848 RaftPart.cpp:1405] [Port: 9780, Space: 2, Part: 20] Begin to wait for the snapshot 13340749
./nebula-storaged.INFO:I0509 13:06:19.816519  9833 RaftPart.cpp:1750] [Port: 9780, Space: 2, Part: 20] Receive all snapsho, committedLogId_ 13340749, lastLodId 13340754, lastLogTermId 52

网络情况只有几k/s

那现在整个集群服务都正常吗?日志还有没有报错,show parts看所有的part都是有三个副本吗

目前集群的状态还是不正常,

show parts

show hosts

52机器的状态显示是正常的,但是show hosts不显示

53机器还是原来的状态


nebula-storaged.ERROR

E0510 11:00:09.089864 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:09.954449 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:09.954509 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:10.498185 30915 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:10.498241 30915 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:10.927518 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:10.927700 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 18] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:12.176103 30915 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:12.176321 30915 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:12.902981 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:12.903111 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:13.743458 30915 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:00:13.743618 30915 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172.16.10.55":9780, error code is E_TERM_OUT_OF_DATE

nebula-storaged.INFO

I0510 11:01:01.340401 30901 RaftPart.cpp:1613] [Port: 9780, Space: 2, Part: 17] The current role is Candidate. Will follow 
the new leader 172.16.10.54:9780 [Term: 647]
I0510 11:01:01.340507 30916 Part.cpp:209] [Port: 9780, Space: 2, Part: 17] Find the new leader "172.16.10.54":9780
I0510 11:01:01.736179 30914 RaftPart.cpp:1095] [Port: 9780, Space: 4, Part: 17] Sending out an election request (space = 4,
 part = 17, term = 852, lastLogId = 1341325, lastLogTerm = 270, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:01.738008 30914 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 17] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:01.738082 30914 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 17] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:01.738134 30914 RaftPart.cpp:1162] [Port: 9780, Space: 4, Part: 17] No one is elected, continue the election
I0510 11:01:01.775323 30914 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 2] Sending out an election request (space = 6, 
part = 2, term = 729, lastLogId = 1008534, lastLogTerm = 88, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:01.776965 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:01.777036 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:01.777071 30914 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 2] No one is elected, continue the election
I0510 11:01:02.046589 30913 RaftPart.cpp:1095] [Port: 9780, Space: 4, Part: 12] Sending out an election request (space = 4,
 part = 12, term = 685, lastLogId = 1341353, lastLogTerm = 193, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.048426 30913 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 12] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.048487 30913 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 12] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.048512 30913 RaftPart.cpp:1162] [Port: 9780, Space: 4, Part: 12] No one is elected, continue the election
I0510 11:01:02.089490 30914 RaftPart.cpp:1095] [Port: 9780, Space: 2, Part: 2] Sending out an election request (space = 2, 
part = 2, term = 1251, lastLogId = 11624768, lastLogTerm = 736, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.091032 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.091110 30914 RaftPart.cpp:1050] [Port: 9780, Space: 2, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.091145 30914 RaftPart.cpp:1162] [Port: 9780, Space: 2, Part: 2] No one is elected, continue the election
I0510 11:01:02.135325 30914 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 7] Sending out an election request (space = 6, 
part = 7, term = 634, lastLogId = 1024592, lastLogTerm = 110, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.137017 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.137104 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.137140 30914 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 7] No one is elected, continue the election
I0510 11:01:02.271531 30916 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 17] Sending out an election request (space = 6,
 part = 17, term = 734, lastLogId = 1007858, lastLogTerm = 132, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.273371 30916 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.273438 30916 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.273463 30916 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 17] No one is elected, continue the election
I0510 11:01:02.421602 30913 RaftPart.cpp:1095] [Port: 9780, Space: 10, Part: 12] Sending out an election request (space = 1
0, part = 12, term = 613, lastLogId = 739621, lastLogTerm = 91, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.423249 30913 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 12] Receive response about askForVote from "17
2.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.423305 30913 RaftPart.cpp:1050] [Port: 9780, Space: 10, Part: 12] Receive response about askForVote from "17
2.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.423329 30913 RaftPart.cpp:1162] [Port: 9780, Space: 10, Part: 12] No one is elected, continue the election
I0510 11:01:02.459748 30916 RaftPart.cpp:1095] [Port: 9780, Space: 4, Part: 7] Sending out an election request (space = 4, 
part = 7, term = 662, lastLogId = 1341324, lastLogTerm = 133, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.461428 30916 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.461483 30916 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.461508 30916 RaftPart.cpp:1162] [Port: 9780, Space: 4, Part: 7] No one is elected, continue the election
I0510 11:01:02.544994 30861 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 11:01:02.545095 30861 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 11:01:02.545179 30881 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 11:01:02.545189 30861 SlowOpTracker.h:31] [Port: 9780, Space: 6, Part: 1] total time:1243ms, Total send logs: 2
I0510 11:01:02.665966 30914 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 2] Sending out an election request (space = 6, 
part = 2, term = 730, lastLogId = 1008534, lastLogTerm = 88, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.667101 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.667160 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.667184 30914 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 2] No one is elected, continue the election
I0510 11:01:02.879510 30914 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 17] Sending out an election request (space = 6,
 part = 17, term = 735, lastLogId = 1007858, lastLogTerm = 132, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:02.880901 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:02.880959 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 17] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:02.880990 30914 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 17] No one is elected, continue the election
I0510 11:01:03.001127 30913 RaftPart.cpp:1095] [Port: 9780, Space: 4, Part: 17] Sending out an election request (space = 4,
 part = 17, term = 853, lastLogId = 1341325, lastLogTerm = 270, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:03.002326 30913 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 17] Receive response about askForVote from "172
.16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:03.002373 30913 RaftPart.cpp:1050] [Port: 9780, Space: 4, Part: 17] Receive response about askForVote from "172
.16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:03.002398 30913 RaftPart.cpp:1162] [Port: 9780, Space: 4, Part: 17] No one is elected, continue the election
I0510 11:01:03.171483 30914 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 2] Sending out an election request (space = 6, 
part = 2, term = 731, lastLogId = 1008534, lastLogTerm = 88, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:03.173712 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:03.173885 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:03.173954 30914 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 2] No one is elected, continue the election
I0510 11:01:03.736420 30916 RaftPart.cpp:1095] [Port: 9780, Space: 4, Part: 7] Sending out an election request (space = 4, 
part = 7, term = 663, lastLogId = 1341324, lastLogTerm = 133, candidateIP = 172.16.10.53, candidatePort = 9780)
I0510 11:01:03.914969 30913 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 7] Sending out an election request (space = 6, 
part = 7, term = 635, lastLogId = 1024592, lastLogTerm = 110, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:03.917171 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:03.917296 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:03.917382 30913 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 7] No one is elected, continue the election
I0510 11:01:04.309039 30913 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 2] Sending out an election request (space = 6, 
part = 2, term = 732, lastLogId = 1008534, lastLogTerm = 88, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:04.311182 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:04.311309 30913 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 2] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:04.311368 30913 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 2] No one is elected, continue the election
I0510 11:01:04.627979 30881 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 11:01:04.628079 30881 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 11:01:04.628227 30876 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 11:01:05.786509 30914 RaftPart.cpp:1095] [Port: 9780, Space: 6, Part: 7] Sending out an election request (space = 6, 
part = 7, term = 636, lastLogId = 1024592, lastLogTerm = 110, candidateIP = 172.16.10.53, candidatePort = 9780)
E0510 11:01:05.788420 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 7] Receive response about askForVote from "172.
16.10.55":9780, error code is E_TERM_OUT_OF_DATE
E0510 11:01:05.788506 30914 RaftPart.cpp:1050] [Port: 9780, Space: 6, Part: 7] Receive response about askForVote from "172.
16.10.54":9780, error code is E_TERM_OUT_OF_DATE
I0510 11:01:05.788554 30914 RaftPart.cpp:1162] [Port: 9780, Space: 6, Part: 7] No one is elected, continue the election
I0510 11:01:05.855819 30876 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 11:01:05.855890 30876 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 11:01:05.855978 30870 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 11:01:05.856007 30876 RaftPart.cpp:1241] [Port: 9780, Space: 4, Part: 17] Recieved a VOTING request: space = 4, parti
tion = 17, candidateAddr = 172.16.10.55:9780, term = 438, lastLogId = 1341957, lastLogTerm = 437
I0510 11:01:05.856022 30876 RaftPart.cpp:1268] [Port: 9780, Space: 4, Part: 17] The partition currently is a Candidate, las
tLogId 1341325, lastLogTerm 270, committedLogId 1341324, term 773
I0510 11:01:05.856034 30876 RaftPart.cpp:1280] [Port: 9780, Space: 4, Part: 17] The partition is currently proposing term 7
73. The term proposed by the candidate is no greater, so it will be rejected
I0510 11:01:05.856117 30876 RaftPart.cpp:1241] [Port: 9780, Space: 10, Part: 2] Recieved a VOTING request: space = 10, part
ition = 2, candidateAddr = 172.16.10.55:9780, term = 283, lastLogId = 740242, lastLogTerm = 282
I0510 11:01:05.856132 30876 RaftPart.cpp:1268] [Port: 9780, Space: 10, Part: 2] The partition currently is a Candidate, las
tLogId 739637, lastLogTerm 113, committedLogId 739637, term 605
I0510 11:01:05.856143 30876 RaftPart.cpp:1280] [Port: 9780, Space: 10, Part: 2] The partition is currently proposing term 6
05. The term proposed by the candidate is no greater, so it will be rejected
I0510 11:01:05.856313 30876 RaftPart.cpp:1241] [Port: 9780, Space: 6, Part: 9] Recieved a VOTING request: space = 6, partit
ion = 9, candidateAddr = 172.16.10.55:9780, term = 20580, lastLogId = 1011128, lastLogTerm = 20579
I0510 11:01:05.856329 30876 RaftPart.cpp:1268] [Port: 9780, Space: 6, Part: 9] The partition currently is a Leader, lastLog
Id 1012966, lastLogTerm 20579, committedLogId 1012966, term 20579
I0510 11:01:05.856340 30876 RaftPart.cpp:1302] [Port: 9780, Space: 6, Part: 9] The partition's last log id is 1012966. The 
candidate's last log id 1011128 is smaller, so it will be rejected, candidate is "172.16.10.55":9780
I0510 11:01:05.856395 30876 RaftPart.cpp:1241] [Port: 9780, Space: 2, Part: 20] Recieved a VOTING request: space = 2, parti
tion = 20, candidateAddr = 172.16.10.55:9780, term = 113, lastLogId = 13341706, lastLogTerm = 112
I0510 11:01:05.856410 30876 RaftPart.cpp:1268] [Port: 9780, Space: 2, Part: 20] The partition currently is a Leader, lastLo
gId 13343503, lastLogTerm 112, committedLogId 13343503, term 112
I0510 11:01:05.856420 30876 RaftPart.cpp:1302] [Port: 9780, Space: 2, Part: 20] The partition's last log id is 13343503. Th
e candidate's last log id 13341706 is smaller, so it will be rejected, candidate is "172.16.10.55":9780
I0510 11:01:07.133628 30901 RaftPart.cpp:1613] [Port: 9780, Space: 6, Part: 17] The current role is Candidate. Will follow 
the new leader 172.16.10.54:9780 [Term: 665]
I0510 11:01:07.133760 30913 Part.cpp:209] [Port: 9780, Space: 6, Part: 17] Find the new leader "172.16.10.54":9780
I0510 11:01:07.135356 30899 RaftPart.cpp:1613] [Port: 9780, Space: 4, Part: 7] The current role is Candidate. Will follow t
he new leader 172.16.10.54:9780 [Term: 596]
I0510 11:01:07.135434 30914 Part.cpp:209] [Port: 9780, Space: 4, Part: 7] Find the new leader "172.16.10.54":9780
I0510 11:01:07.138715 30907 RaftPart.cpp:1613] [Port: 9780, Space: 6, Part: 2] The current role is Candidate. Will follow t
he new leader 172.16.10.54:9780 [Term: 592]
I0510 11:01:07.448817 30870 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 11:01:07.448940 30870 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 11:01:07.449102 30885 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 11:01:08.729571 30885 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 11:01:08.729668 30885 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 11:01:08.729822 30883 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 11:01:10.251667 30883 FileBasedWal.cpp:356] [Port: 9780, Space: 2, Part: 1] Rollback to log 11619994
I0510 11:01:10.251736 30883 RaftPart.cpp:1482] [Port: 9780, Space: 2, Part: 1] Rollback succeeded! lastLogId is 11619994, l
ogLogTerm is 233, committedLogId is 11619994, term is 237
I0510 11:01:10.251816 30867 RaftPart.cpp:1473] [Port: 9780, Space: 2, Part: 1] The local last log term is 233, which is dif
ferent from the leader's prevLogTerm 235, the prevLogId is 11620014. So need to rollback to last committedLogId_ 11619994
I0510 11:01:10.251852 30883 SlowOpTracker.h:31] [Port: 9780, Space: 4, Part: 16] total time:1521ms, Total send logs: 2
I0510 11:01:10.251951 30883 SlowOpTracker.h:31] [Port: 9780, Space: 4, Part: 6] total time:1521ms, Total send logs: 2
I0510 11:01:10.252038 30883 RaftPart.cpp:1241] [Port: 9780, Space: 6, Part: 22] Recieved a VOTING request: space = 6, parti
tion = 22, candidateAddr = 172.16.10.54:9780, term = 278, lastLogId = 1015328, lastLogTerm = 277
I0510 11:01:10.252055 30883 RaftPart.cpp:1268] [Port: 9780, Space: 6, Part: 22] The partition currently is a Follower, last
LogId 1014712, lastLogTerm 101, committedLogId 1014711, term 607
I0510 11:01:10.252066 30883 RaftPart.cpp:1280] [Port: 9780, Space: 6, Part: 22] The partition currently is on term 607. The
 term proposed by the candidate is no greater, so it will be rejected
I0510 11:01:10.252218 30883 RaftPart.cpp:1241] [Port: 9780, Space: 6, Part: 25] Recieved a VOTING request: space = 6, parti
tion = 25, candidateAddr = 172.16.10.54:9780, term = 31, lastLogId = 1024176, lastLogTerm = 30
I0510 11:01:10.252233 30883 RaftPart.cpp:1268] [Port: 9780, Space: 6, Part: 25] The partition currently is a Leader, lastLo
gId 1026009, lastLogTerm 30, committedLogId 1026009, term 30
I0510 11:01:10.252259 30883 RaftPart.cpp:1302] [Port: 9780, Space: 6, Part: 25] The partition's last log id is 1026009. The
 candidate's last log id 1024176 is smaller, so it will be rejected, candidate is "172.16.10.54":9780

看起来是有问题了,是生产环境吗?不行重建吧