storaged出现error code is E_RAFT_UNKNOWN_PART，Did not get enough votes

wzw · 2026 年3 月 5 日 06:18

nebula 版本：3.6.0
部署方式：分布式
安装方式：RPM
是否上生产环境：Y
问题的具体描述
停止服务时storaged一直无法停止，于是使用kill -9停止，之后重启服务，storaged服务cpu一直超过100%，且无法ready，查看日志有报错error code is E_RAFT_UNKNOWN_PART，Did not get enough votes，感觉是raft一致性出现问题了，请问该怎么解决，报这个错的图空间很多，影响范围大
相关的 meta / storage / graph info 日志信息（尽量使用文本形式方便检索）

I20260304 17:12:18.562439 1528010 RaftPart.cpp:1261] [Port: 9780, Space: 16299, Part: 3] Receive response about askForVote from "7.227.5.1":9780, error code is E_RAFT_UNKNOWN_PART, isPreVote = 1
I20260304 17:12:18.562467 1528010 RaftPart.cpp:1288] [Port: 9780, Space: 16299, Part: 3] Did not get enough votes from election of term 8, isPreVote = 1
I20260304 17:12:18.562502 1528091 RaftPart.cpp:1294] [Port: 9780, Space: 15712, Part: 5] Start leader election...
I20260304 17:12:18.562522 1528091 RaftPart.cpp:1322] [Port: 9780, Space: 15712, Part: 5] Sending out an election request (space = 15712, part = 5, term = 9, lastLogId = 171982, lastLogTerm = 8, candidateIP = 7.227.56.193, candidatePort = 9780), isPreVote = 1
I20260304 17:12:18.562556 1528036 ThriftClientManager-inl.h:47] Getting a client to "10.28.80.26":9780
I20260304 17:12:18.562587 1528036 ThriftClientManager-inl.h:47] Getting a client to "7.227.5.1":9780
I20260304 17:12:18.570798 1528036 CollectNSucceeded-inl.h:59] Set Value [completed=2, total=2, Result list size=2]
I20260304 17:12:18.570847 1528010 RaftPart.cpp:1261] [Port: 9780, Space: 15712, Part: 5] Receive response about askForVote from "7.227.5.1":9780, error code is E_RAFT_UNKNOWN_PART, isPreVote = 1
I20260304 17:12:18.570873 1528010 RaftPart.cpp:1261] [Port: 9780, Space: 15712, Part: 5] Receive response about askForVote from "10.28.80.26":9780, error code is E_RAFT_UNKNOWN_PART, isPreVote = 1
I20260304 17:12:18.570880 1528010 RaftPart.cpp:1288] [Port: 9780, Space: 15712, Part: 5] Did not get enough votes from election of term 9, isPreVote = 1
I20260304 17:12:18.570914 1528091 RaftPart.cpp:1294] [Port: 9780, Space: 15841, Part: 3] Start leader election...
I20260304 17:12:18.570933 1528091 RaftPart.cpp:1322] [Port: 9780, Space: 15841, Part: 3] Sending out an election request (space = 15841, part = 3, term = 15, lastLogId = 165710, lastLogTerm = 14, candidateIP = 7.227.56.193, candidatePort = 9780), isPreVote = 1

wzw · 2026 年3 月 5 日 10:08

通过删除cluster.id重启解决问题了，现在cpu正常，但是storaged日志里在报Snapshot send failed, the leader changed?不知道是正常节点间信息同步还是有什么问题，show hosts都是online的，准备等一段时间看看集群能不能恢复

wzw · 2026 年3 月 6 日 06:31

解决了，发现报the leader changed?的图空间show hosts已经查不到了，应该是metad执行了删除，但之前storaged状态异常没有正确响应导致的垃圾数据，停止集群，删除那几个报错的图空间再重启集群恢复正常

system · 2026 年3 月 13 日 06:32

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。