Storage Error: The leader has changed. Try again later ,Storage Error: part: 22, error: E_RPC_FAILURE(-3). 查询经常抱着两个错

nebula 版本:nebula-graph-2.0.0.el7.x86_64.rpm

  • 部署方式(分布式 / 单机 / Docker / DBaaS):分布式 三节点
  • 是否为线上版本:N
  • 硬件信息
    • 磁盘( 推荐使用 SSD):SSD
    • CPU、内存信息 每台64核,512G(和其他应用公用,可用磁盘空间约1T)
  • 问题的具体描述
  • 相关的 meta / storage / graph info 日志信息

[ERROR (-8)]: Storage Error: part: 22, error: E_RPC_FAILURE(-3).
[ERROR (-8)]: Storage Error: The leader has changed. Try again later
最近查询经常报这两个错。比如执行下面的查询就会出现。
有两个问题想问一下。

  1. 我们将服务器之间的心跳使劲啊改为1秒,心跳时间和这两个错误有没有关系?
  2. 请教一下,这两个错误产生的一般原因是什么,如何避免,解决?

match(n:Comment) where n.locationIP contains “196.192.80.169” and n.browserUsed == “Firefox” return n.locationIP as locationIP,n.content as content,n.creationDate as creationDate

[ERROR (-8)]: Storage Error: part: 22, error: E_RPC_FAILURE(-3).

贴下 graphd 的 info日志,应该是超时了。

[ERROR (-8)]: Storage Error: The leader has changed. Try again later

这是个已知问题,我们已经记录 issue https://github.com/vesoft-inc/nebula-common/issues/526 ,后续会修复,现在用户的处理是遇到问题做下重试。

我们将服务器之间的心跳使劲啊改为1秒,心跳时间和这两个错误有没有关系?

你改的是那个心跳呢? heartbeat_interval_secs 还是 raft_heartbeat_interval_secs.

@dingding

具体是哪里超时了呢

改的是 heartbeat_interval_secs

日志文件有点多,您需要哪个文件下次出现问题,我先把日志保存下来

[nebula@hadoop-node-2 logs]$ ls -l | grep graphd | grep -i  err
-rw-r--r-- 1 nebula nebula   732879 May 21 09:47 graphd-stderr.log
lrwxrwxrwx 1 nebula nebula       66 May 21 09:17 nebula-graphd.ERROR -> nebula-graphd.hadoop-node-2.nebula.log.ERROR.20210521-091700.10898
-rw-rw-r-- 1 nebula nebula      977 Apr 26 16:02 nebula-graphd.hadoop-node-2.nebula.log.ERROR.20210426-154158.47883
-rw-rw-r-- 1 nebula nebula     3181 Apr 27 10:10 nebula-graphd.hadoop-node-2.nebula.log.ERROR.20210426-161716.5549
-rw-rw-r-- 1 nebula nebula    19624 May 13 11:02 nebula-graphd.hadoop-node-2.nebula.log.ERROR.20210427-111555.48765
-rw-rw-r-- 1 nebula nebula   704884 May 20 18:05 nebula-graphd.hadoop-node-2.nebula.log.ERROR.20210513-161953.16517
-rw-rw-r-- 1 nebula nebula     2755 May 21 09:47 nebula-graphd.hadoop-node-2.nebula.log.ERROR.20210521-091700.10898

nebula-graphd.INFO

问题又出现了日志如下

Log file created at: 2021/05/20 18:17:10
Running on machine: hadoop-node-2
.....
I0521 10:04:11.939095 11064 GraphService.cpp:33] Authenticating user root from 192.168.75.57:52970
I0521 10:04:11.948799 11064 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:07:40.355010 11063 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53074
I0521 10:07:40.368127 11062 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:08:38.076272 11062 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53119
I0521 10:08:38.086640 11064 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:08:55.732565 11064 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53141
I0521 10:08:55.742169 11062 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:09:21.628672 11062 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53163
I0521 10:09:21.638429 11063 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:11:44.059904 11063 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53242
I0521 10:11:44.068528 11063 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:12:49.586751 11064 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53291
I0521 10:12:49.595968 11064 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:12:53.134886 11062 GraphService.cpp:33] Authenticating user root from 127.0.0.1:15414
I0521 10:12:53.136036 11062 SwitchSpaceExecutor.cpp:43] Graph switched to `s946c5f88d6614240977fc853037eca6b', space id: 69
I0521 10:13:44.604797 11062 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53327
I0521 10:13:44.615145 11062 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:15:24.574482 11064 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53391
I0521 10:15:24.584547 11062 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:15:44.620091 11062 GraphService.cpp:33] Authenticating user root from 192.168.75.57:53414
I0521 10:15:44.632786 11062 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:17:06.103852 11064 GraphService.cpp:33] Authenticating user root from 127.0.0.1:16696
I0521 10:17:06.105528 11064 SwitchSpaceExecutor.cpp:43] Graph switched to `s946c5f88d6614240977fc853037eca6b', space id: 69
I0521 10:22:31.109411 11064 GraphService.cpp:33] Authenticating user root from 127.0.0.1:18390
E0521 10:22:31.111459 11064 QueryInstance.cpp:103] SemanticError: Yield clauses are not supported : s5
I0521 10:23:44.957406 11064 GraphService.cpp:33] Authenticating user root from 127.0.0.1:18744
I0521 10:23:44.958896 11064 SwitchSpaceExecutor.cpp:43] Graph switched to `s946c5f88d6614240977fc853037eca6b', space id: 69
I0521 10:27:17.529544 11063 GraphService.cpp:33] Authenticating user root from 127.0.0.1:19884
I0521 10:27:17.553930 11063 SwitchSpaceExecutor.cpp:43] Graph switched to `s946c5f88d6614240977fc853037eca6b', space id: 69
I0521 10:31:07.884450 11064 GraphService.cpp:33] Authenticating user root from 127.0.0.1:21025
I0521 10:31:07.886852 11064 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
I0521 10:36:04.643872 11064 GraphService.cpp:33] Authenticating user root from 127.0.0.1:22506
I0521 10:36:04.646517 11063 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
E0521 10:37:04.651248 10981 StorageClientBase.inl:209] Request to "172.80.1.3":9779 failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
I0521 10:37:04.651440 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 1]
I0521 10:37:04.651466 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 2]
I0521 10:37:04.651482 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 3]
I0521 10:37:04.651499 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 4]
I0521 10:37:04.651513 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 5]
I0521 10:37:04.651528 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 6]
I0521 10:37:04.651543 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 8]
I0521 10:37:04.651558 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 9]
I0521 10:37:04.651573 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 10]
I0521 10:37:04.651588 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 12]
I0521 10:37:04.651603 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 14]
I0521 10:37:04.651618 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 15]
I0521 10:37:04.651633 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 16]
I0521 10:37:04.651648 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 17]
I0521 10:37:04.651662 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 18]
I0521 10:37:04.651677 10981 StorageClientBase.inl:161] Invalidate the leader for [772, 19]
E0521 10:37:04.651789 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 17
E0521 10:37:04.651891 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 16
E0521 10:37:04.651922 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 15
E0521 10:37:04.651950 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 14
E0521 10:37:04.651978 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 12
E0521 10:37:04.652007 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 10
E0521 10:37:04.652035 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 9
E0521 10:37:04.652063 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 8
E0521 10:37:04.652170 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 6
E0521 10:37:04.652283 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 5
E0521 10:37:04.652338 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 4
E0521 10:37:04.652374 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 3
E0521 10:37:04.652408 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 19
E0521 10:37:04.652454 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 2
E0521 10:37:04.652489 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 18
E0521 10:37:04.652570 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 1
E0521 10:37:04.652616 11064 StorageAccessExecutor.h:112] Storage Error: part: 17, error: E_RPC_FAILURE(-3).
E0521 10:37:04.652837 11063 QueryInstance.cpp:103] Storage Error: part: 17, error: E_RPC_FAILURE(-3).
I0521 10:42:46.990617 10936 StorageClientBase.inl:142] Update the leader for [772, 1] to "172.80.1.3":9779
I0521 10:42:46.990748 10936 StorageClientBase.inl:142] Update the leader for [772, 2] to "172.80.1.3":9779
I0521 10:42:46.990777 10936 StorageClientBase.inl:142] Update the leader for [772, 3] to "172.80.1.3":9779
I0521 10:42:46.990797 10936 StorageClientBase.inl:142] Update the leader for [772, 4] to "172.80.1.3":9779
I0521 10:42:46.990813 10936 StorageClientBase.inl:142] Update the leader for [772, 5] to "172.80.1.3":9779
I0521 10:42:46.990833 10936 StorageClientBase.inl:142] Update the leader for [772, 6] to "172.80.1.3":9779
I0521 10:42:46.990850 10936 StorageClientBase.inl:142] Update the leader for [772, 8] to "172.80.1.3":9779
I0521 10:42:46.990867 10936 StorageClientBase.inl:142] Update the leader for [772, 9] to "172.80.1.3":9779
I0521 10:42:46.990885 10936 StorageClientBase.inl:142] Update the leader for [772, 10] to "172.80.1.3":9779
I0521 10:42:46.990903 10936 StorageClientBase.inl:142] Update the leader for [772, 12] to "172.80.1.3":9779
I0521 10:42:46.990921 10936 StorageClientBase.inl:142] Update the leader for [772, 14] to "172.80.1.3":9779
I0521 10:42:46.990940 10936 StorageClientBase.inl:142] Update the leader for [772, 15] to "172.80.1.3":9779
I0521 10:42:46.990958 10936 StorageClientBase.inl:142] Update the leader for [772, 16] to "172.80.1.3":9779
I0521 10:42:46.990975 10936 StorageClientBase.inl:142] Update the leader for [772, 17] to "172.80.1.3":9779
I0521 10:42:46.990993 10936 StorageClientBase.inl:142] Update the leader for [772, 18] to "172.80.1.3":9779
I0521 10:42:46.991010 10936 StorageClientBase.inl:142] Update the leader for [772, 19] to "172.80.1.3":9779
E0521 10:42:46.991080 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 19
E0521 10:42:46.991173 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 18
E0521 10:42:46.991210 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 17
E0521 10:42:46.991245 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 16
E0521 10:42:46.991281 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 15
E0521 10:42:46.991315 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 5
E0521 10:42:46.991350 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 4
E0521 10:42:46.991384 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 3
E0521 10:42:46.991433 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 14
E0521 10:42:46.991469 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 1
E0521 10:42:46.991504 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 12
E0521 10:42:46.991539 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 2
E0521 10:42:46.991575 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 6
E0521 10:42:46.991608 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 8
E0521 10:42:46.991643 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 9
E0521 10:42:46.991677 11064 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_LEADER_CHANGED, part 10
E0521 10:42:46.991847 11062 QueryInstance.cpp:103] Storage Error: The leader has changed. Try again later
I0521 10:46:52.761488 11062 GraphService.cpp:33] Authenticating user root from 127.0.0.1:25850
I0521 10:46:52.763069 11064 SwitchSpaceExecutor.cpp:43] Graph switched to `s14f9ba8d5eca4345b74b92c31670f28e', space id: 772
E0521 10:47:52.766535 10949 StorageClientBase.inl:209] Request to "172.80.1.3":9779 failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
I0521 10:47:52.766707 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 1]
I0521 10:47:52.766731 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 2]
I0521 10:47:52.766748 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 3]
I0521 10:47:52.766763 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 4]
I0521 10:47:52.766778 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 5]
I0521 10:47:52.766794 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 6]
I0521 10:47:52.766809 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 8]
I0521 10:47:52.766824 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 9]
I0521 10:47:52.766839 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 10]
I0521 10:47:52.766855 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 12]
I0521 10:47:52.766870 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 14]
I0521 10:47:52.766885 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 15]
I0521 10:47:52.766901 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 16]
I0521 10:47:52.766916 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 17]
I0521 10:47:52.766930 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 18]
I0521 10:47:52.766945 10949 StorageClientBase.inl:161] Invalidate the leader for [772, 19]
E0521 10:47:52.767053 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 17
E0521 10:47:52.767161 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 16
E0521 10:47:52.767203 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 15
E0521 10:47:52.767242 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 14
E0521 10:47:52.767279 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 12
E0521 10:47:52.767316 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 10
E0521 10:47:52.767354 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 9
E0521 10:47:52.767390 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 8
E0521 10:47:52.767447 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 6
E0521 10:47:52.767552 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 5
E0521 10:47:52.767632 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 4
E0521 10:47:52.767671 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 3
E0521 10:47:52.767709 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 19
E0521 10:47:52.767746 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 2
E0521 10:47:52.767784 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 18
E0521 10:47:52.767822 11062 StorageAccessExecutor.h:35] IndexScanExecutor failed, error E_RPC_FAILURE, part 1
E0521 10:47:52.767868 11062 StorageAccessExecutor.h:112] Storage Error: part: 17, error: E_RPC_FAILURE(-3).
E0521 10:47:52.768510 11063 QueryInstance.cpp:103] Storage Error: part: 17, error: E_RPC_FAILURE(-3).

你拿的数据量比较大,所以超时了。具体参考 MATCH 执行失败 Storage Error - #2,来自 dingding

你好,我也遇到了报错 “Storage Error: The leader has changed. Try again later”
可是我这是第一次执行 INSERT VERTEX 语句,不可能是因为数据量大。
我也是三个节点的集群,
这个报错怎么解决呢

@dingding

SHOW HOSTS; 显示有一台机器 OFFLINE,但 nebula.service status alll 显示是运行状态。

每次重启后就会报错 Wrong cluster!,我必须删除全部机器上的 cluster.id 文件,才能重启成功!
文档里也没有关于 cluster.id 的介绍

重启后,三个机器都是 ONLINE,然后我执行 INSERT VERTEX 语句后,执行该语句的那台机器就会变成OFFLINE
关于集群的文档太少了

每次重启后就会报错 Wrong cluster!,我必须删除全部机器上的 cluster.id 文件,才能重启成功!

这个是因为你修改了metad的ip或者端口信息,或者storage之前加入过其他集群,所以出现了这个问题,这个文档没有的话,我们加上,感谢你的反馈。

p287

该主题在最后一个回复创建后7天后自动关闭。不再允许新的回复。