Raft内报错E_UNKNOWN_PART

  • nebula 版本:2.6.2

  • 部署方式:分布式

  • 安装方式:源码编译

  • 是否为线上版本:Y

  • 硬件信息

    • 磁盘 512G
    • CPU、内存信息 4C16G
  • 问题的具体描述
    storaged日志显示报错 E_UNKNOWN_PART,从askForVote到append log

  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)

E0531 02:47:51.248387 29352 RaftPart.cpp:1050] [Port: 9780, Space: 381, Part: 11] Receive response about askForVote from "10.65.101.119"
:9780, error code is E_UNKNOWN_PART
E0531 02:47:51.319093 29351 RaftPart.cpp:1050] [Port: 9780, Space: 381, Part: 12] Receive response about askForVote from "10.65.101.119"
:9780, error code is E_UNKNOWN_PART
E0531 02:47:51.566040 29350 RaftPart.cpp:1050] [Port: 9780, Space: 6799, Part: 6] Receive response about askForVote from "10.65.101.119"
:9780, error code is E_TERM_OUT_OF_DATE
E0531 02:47:51.567747 29350 RaftPart.cpp:1050] [Port: 9780, Space: 381, Part: 15] Receive response about askForVote from "10.65.101.119"
:9780, error code is E_TERM_OUT_OF_DATE
E0531 02:47:51.650444 29351 RaftPart.cpp:1050] [Port: 9780, Space: 381, Part: 13] Receive response about askForVote from "10.65.101.119":9780, error code is E_TERM_OUT_OF_DATE
E0531 02:47:51.665946 29352 RaftPart.cpp:1050] [Port: 9780, Space: 381, Part: 9] Receive response about askForVote from "10.65.101.119":9780, error code is E_TERM_OUT_OF_DATE
E0531 02:49:06.593873 29336 Host.cpp:348] [Port: 9780, Space: 6929, Part: 3] [Host: 10.65.101.118:9790] Failed to append logs to the host (Err: E_UNKNOWN_PART)
E0531 09:01:00.301543 29341 Host.cpp:348] [Port: 9780, Space: 9, Part: 14] [Host: 10.65.101.118:9790] Failed to append logs to the host(Err: E_WRONG_LEADER)
E0531 09:01:56.787384 29340 Host.cpp:348] [Port: 9780, Space: 7475, Part: 13] [Host: 127.0.0.1:9790] Failed to append logs to the host (Err: E_UNKNOWN_PART)
E0531 09:05:16.856062 29329 Host.cpp:348] [Port: 9780, Space: 7475, Part: 13] [Host: 127.0.0.1:9790] Failed to append logs to the host (Err: E_UNKNOWN_PART)
E0531 09:08:36.924034 29340 Host.cpp:348] [Port: 9780, Space: 7475, Part: 13] [Host: 127.0.0.1:9790] Failed to append logs to the host (Err: E_UNKNOWN_PART)
E0531 09:11:56.959607 29343 Host.cpp:348] [Port: 9780, Space: 7475, Part: 13] [Host: 127.0.0.1:9790] Failed to append logs to the host (Err: E_UNKNOWN_PART)
E0531 09:15:17.022974 29345 Host.cpp:348] [Port: 9780, Space: 7475, Part: 13] [Host: 127.0.0.1:9790] Failed to append logs to the host (Err: E_UNKNOWN_PART)
E0531 09:16:23.143977 29302 IndexKeyUtils.cpp:170] prop error by : classificationGroup. status : Unknown prop
E0531 09:16:23.144207 29302 IndexKeyUtils.cpp:170] prop error by : classificationGroup. status : Unknown prop
E0531 09:16:23.144249 29302 IndexKeyUtils.cpp:170] prop error by : classificationGroup. status : Unknown prop
E0531 09:16:23.144287 29302 IndexKeyUtils.cpp:170] prop error by : classificationGroup. status : Unknown prop

看代码,可能是在RaftexService.cpp内的findPart方法抛出的

std::shared_ptr<RaftPart> RaftexService::findPart(GraphSpaceID spaceId, PartitionID partId) {
  folly::RWSpinLock::ReadHolder rh(partsLock_);
  auto it = parts_.find(std::make_pair(spaceId, partId));
  if (it == parts_.end()) {
    // Part not found
    VLOG(4) << "Cannot find the part " << partId << " in the graph space " << spaceId;
    return std::shared_ptr<RaftPart>();
  }

  // Otherwise, return the part pointer
  return it->second;
}

但不知道为什么会发生这种错误
怀疑从文件load到 map<pair<spaceId, partId>, RaftPart>有缺失?
或者addPart没实际落文件?
想了解下什么情况下可能会发生这种错误以及修复方法
谢谢

能大概说下部署机器 ip,以及几副本部署吗?为什么会同时有 10.65.101.118:9790 和 127.0.0.1:9790 两种形式 ip?

3台机器,10.65.101.118,10.65.101.119,10.65.101.120,每台都启动metad,storaged,graphd
这个127.0.0.1确实很奇怪,我在配置文件里grep下,没有任何127.0.0.1的配置项,我再找找是不是以前有过这样启动过

你可以在 console 中使用命令 “show hosts” 看一下,是否有这个 hosts。另外 2.6x 版本 raft 的确有一些问题,3.1 版本修了,如果方便的话,可以升级到 3.1

没有这个hosts。会考虑3.1版本的升级的,关于raft的问题修复,是这些pr嘛,我研究下相关代码,如果还有其他相关问题还请告知,谢谢

对,主要是这些。

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。