Storage Error: part: 2, error: E_RPC_FAILURE(-3). get subgraph 2 steps Error

  • nebula 版本:2.0.1
  • 部署方式(分布式 / 单机 / Docker / DBaaS):分布式
  • 是否为线上版本: N
  • 硬件信息
    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息
  • 问题的具体描述
    get subgraph 2 steps from 3489074474838289673

[ERROR (-8)]: Storage Error: part: 13, error: E_RPC_FAILURE(-3).

  • 相关的 meta / storage / graph info 日志信息
    graphd-stderr.log
1769124 E0518 14:31:07.510525 27061 StorageClientBase.inl:214] Request to "10.0.7.140":9779 failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
1769125 E0518 14:31:07.523274 27060 StorageClientBase.inl:214] Request to "10.0.7.141":9779 failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
1769126 E0518 14:31:07.524279 27062 StorageClientBase.inl:214] Request to "10.0.7.139":9779 failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
1769127 E0518 14:31:07.649363 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 25
1769128 E0518 14:31:07.655267 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 7
1769129 E0518 14:31:07.655334 27072 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 23
1769130 E0518 14:31:07.666276 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 18
1769131 E0518 14:31:07.673326 27080 QueryInstance.cpp:103] Storage Error: The leader has changed. Try again later
1769132 E0518 14:31:07.683270 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 27
1769133 E0518 14:31:07.683337 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 16
1769134 E0518 14:31:07.683355 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 13
1769135 E0518 14:31:07.683370 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 9
1769136 E0518 14:31:07.683384 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 28
1769137 E0518 14:31:07.683398 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 11
1769138 E0518 14:31:07.683413 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 10
1769139 E0518 14:31:07.683426 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 3
1769140 E0518 14:31:07.683440 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 26
1769141 E0518 14:31:07.683454 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 6
1769142 E0518 14:31:07.683467 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 29
1769143 E0518 14:31:07.683481 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 30
1769144 E0518 14:31:07.683497 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 8
1769145 E0518 14:31:07.683511 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 2
1769146 E0518 14:31:07.683524 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 24
1769147 E0518 14:31:07.683550 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 1
1769148 E0518 14:31:07.683571 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 4
1769149 E0518 14:31:07.683590 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 12
1769150 E0518 14:31:07.683609 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 5
1769151 E0518 14:31:07.683629 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 19
1769152 E0518 14:31:07.683647 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 14
1769153 E0518 14:31:07.683672 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 21
1769154 E0518 14:31:07.683692 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 22
1769155 E0518 14:31:07.683712 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 15
1769156 E0518 14:31:07.683730 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 20
1769157 E0518 14:31:07.683744 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 23
1769158 E0518 14:31:07.683763 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 17
1769159 E0518 14:31:07.683786 27077 StorageAccessExecutor.h:112] Storage Error: part: 25, error: E_RPC_FAILURE(-3).
1769160 E0518 14:31:07.683909 27085 QueryInstance.cpp:103] Storage Error: part: 25, error: E_RPC_FAILURE(-3).
1769161 E0518 14:31:08.674284 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 9
1769162 E0518 14:31:08.678251 27077 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 21
1769163 E0518 14:31:08.683280 27072 QueryInstance.cpp:103] Storage Error: The leader has changed. Try again later

storage.log

E0518 14:45:24.393728  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 17
E0518 14:45:24.393750  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 20
E0518 14:45:24.393770  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 19
E0518 14:45:24.393790  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 21
E0518 14:45:24.393811  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 22
E0518 14:45:24.393831  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 14
E0518 14:45:24.393858  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 23
E0518 14:45:24.393877  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 25
E0518 14:45:24.393898  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 28
E0518 14:45:24.393916  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 9
E0518 14:45:24.393939  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 10
E0518 14:45:24.393957  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 11
E0518 14:45:24.393976  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 16
E0518 14:45:24.393996  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 27
E0518 14:45:24.394014  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 18
E0518 14:45:24.394033  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 7
E0518 14:45:24.394050  9148 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_RPC_FAILURE, part 13
E0518 14:45:24.402168  9148 StorageAccessExecutor.h:112] Storage Error: part: 2, error: E_RPC_FAILURE(-3).
E0518 14:45:24.405316  9145 QueryInstance.cpp:103] Storage Error: part: 2, error: E_RPC_FAILURE(-3).
E0518 14:45:24.630026  9145 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 19
E0518 14:45:24.630081  9145 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 13
E0518 14:45:24.630143  9143 QueryInstance.cpp:103] Storage Error: The leader has changed. Try again later
E0518 14:45:25.665174  9143 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 23
E0518 14:45:25.665271  9148 QueryInstance.cpp:103] Storage Error: The leader has changed. Try again later
E0518 14:45:25.927491  9143 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 21
E0518 14:45:25.927533  9143 StorageAccessExecutor.h:35] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 9
E0518 14:45:25.927577  9145 QueryInstance.cpp:103] Storage Error: The leader has changed. Try again later
E0518 14:46:25.856166  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 11
E0518 14:46:25.856218  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 27
E0518 14:46:25.856240  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 16
E0518 14:46:25.856259  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 18
E0518 14:46:25.856277  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 28
E0518 14:46:25.856293  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 5
E0518 14:46:25.856318  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 22
E0518 14:46:25.856334  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 14
E0518 14:46:25.856350  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 15
E0518 14:46:25.856366  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 26
E0518 14:46:25.856382  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 20
E0518 14:46:25.856398  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 8
E0518 14:46:25.856415  9144 StorageAccessExecutor.h:35] InsertVerticesExecutor failed, error E_LEADER_CHANGED, part 7

ps: 当前有插入的操作在进行
手动确认 该节点的1度关系内有节点159个关系,该159个实体有1度内600左右的关系

补充: 这个这个实体的关系查询挺快的,2s 最多,目前不知道为什么无返回。在另外Space, 数据量相差不超过3M,但是响应很快 如下:
image

submit job compact 解决了

(admin@nebula) [test]> match(v:test) return v limit 1;
[ERROR (-8)]: Storage Error: part: 16, error: E_RPC_FAILURE(-3).

Wed, 19 May 2021 18:17:54 CST

(admin@nebula) [test]> match(v:test) return v limit 1;
[ERROR (-8)]: Storage Error: The leader has changed. Try again later

请问job compact不会陷入一个死循环吗,request to xxx failed,会导致做compaction的时候也连不上storage而导致compact job fail吧?

我现在遇到了和您类似的问题,但是只有其中一台服务器出现request to xxx failed,且compact也fail.

首先,你show hosts 看看当前机器是否正常。
然后你去这台机器行看看storage的log

是正常的,详情请看这个帖子 importer导入csv文件失败:log报错ErrMsg: Storage Error: part: 48, error: E_RPC_FAILURE(-3)., ErrCode: -8

可以说一下具体是如何操作的吗?我也遇到类似的问题,恳请指点一二!

我当时再console 上提交了submit job compact 就好了,你可以试试。