重建索引失败

  • nebula 版本:2.6.2
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • SSD
  • 问题的具体描述:

rebuild tag index时失败,日志里显示This space is building index.
show jobs没有正在running的job,看源码是job的状态不为finished就算building中?
状态为failed的job也算正在building吗?

std::vector<AdminSubTask> tasks;
    for (auto it = env_->rebuildIndexGuard_->cbegin();
         it != env_->rebuildIndexGuard_->cend(); ++it) {
        if (std::get<0>(it->first) == space_ && it->second != IndexState::FINISHED) {
            LOG(ERROR) << "This space is building index";
            return cpp2::ErrorCode::E_REBUILD_INDEX_FAILED;
        }
    }
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
    这是索引重建时失败的发生的日志
E1226 10:44:49.166803 424281 UpdateNode.h:183] vertex conflict 169:96:171:0dbb49c96ba38d8d49a9cccca0de9e28
E1226 10:44:52.144032 424285 UpdateNode.h:183] vertex conflict 169:13:171:5b09d9d36aa2d8e5d9c5e15d3c1d8245
E1226 10:47:46.856619 265937 Host.cpp:348] [Port: 9780, Space: 128, Part: 233] [Host: 172.23.236.100:9990] Failed to append logs to the host (Err: E_TERM_OUT_OF_DATE)
E1226 10:49:56.923763 424242 RebuildIndexTask.cpp:63] This space is building index
E1226 10:49:56.923795 424242 AdminTaskManager.cpp:254] job 183, genSubTask failed, err=E_REBUILD_INDEX_FAILED

在日志中还会有大量的如下报错:

Failed to append logs to the host (Err: E_TERM_OUT_OF_DATE)

以及

E1226 11:19:29.156589 424281 UpdateNode.h:183] vertex conflict 169:21:170:13b689b5cb0c1c415849c250708e4f5e
E1226 11:19:29.284868 424280 UpdateNode.h:183] vertex conflict 169:62:170:2146dc6004a8200ed3c1cca1a7ae7008
E1226 11:19:31.090054 265903 AddVerticesProcessor.cpp:164] The vertex locked : tag 170, vid 3dd7a92570f73d4c8541a6478bbecd65
E1226 11:19:32.039448 424280 UpdateNode.h:183] vertex conflict 169:13:170:1b3b3f5d0387352399feefd670b4c587
E1226 11:19:32.039489 424281 UpdateNode.h:183] vertex conflict 169:13:170:1b3b3f5d0387352399feefd670b4c587
E1226 11:19:32.143121 424282 UpdateNode.h:183] vertex conflict 169:2:170:2290a0580ffec7a1bb30d86dcd366571

所以现在问题是先 build 了一次失败了,然后又 build 一次,报 “This space is building index”?

对,第一次rebuild失败之后又进行了一次rebuild

这个是 2.6 版本的一个已知问题,在 3.x 修了。
可以重启下出问题的 build index 失败的 storaged 然后重新 rebuild 就可以了。