exchange 2 导入数据提示The leader has changed. Try again later

提问参考模版:

  • nebula 版本:2.0 GA
  • 部署方式(分布式 / 单机 / Docker / DBaaS):分布式 3台
  • 硬件信息
    • 磁盘( 推荐使用 SSD)SSD
    • CPU、内存信息 32G
  • 问题的具体描述
    第一次导入的时候 报这个错误,导入后验证少了些数据,第二次导入还是这个错误 为什么啊?
21/03/29 21:19:47 WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist
21/03/29 21:19:47 WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist
21/03/29 21:19:56 ERROR NebulaGraphClientWriter: write vertex failed for Storage Error: The leader has changed. Try again later
21/03/29 21:19:56 ERROR NebulaGraphClientWriter: write vertex failed for Storage Error: The leader has changed. Try again later
21/03/29 21:19:56 ERROR NebulaGraphClientWriter: write vertex failed for Storage Error: The leader has changed. Try again later
21/03/29 21:20:19 ERROR NebulaGraphClientWriter: write vertex failed for Storage Error: The leader has changed. Try again later

最后烦请删掉本模版和问题无关的信息之后,再提交提问,Thx

1 个赞

可以先检查下服务是否正常?

服务是正常的 导入完数据后 balance leader也可以

  1. 你在console中批量写数据可以写入么?
  2. 麻烦你放一下partition和batch数吧

1 console 批量写入是可以的,github上提供的nba数据我就是在console导入的
2 是这个吗?
batch: 50
partition: 10

还有一个疑惑:

spark-submit --class com.vesoft.nebula.exchange.Exchange --master local /home/xx/nebula/nebula-exchange-2.0.0.jar -c /home/xxx/nebula_application_xxx.conf -h
执行完后(我判断的依据是服务负载和db_dump计算的数据量) 没有退出 一直停留在这里

这个配置 导入的速度不快的,你看下storage的日志,写入的时候出现leader change了。

执行完没有退出, 感谢提出,我们加一下。

1 个赞

今天早上7点多的日志,这个时候我执行了nebula-exchange导入数据
nebula-storaged.ERROR

机器1
E0330 07:34:26.572551 14620 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.8.17":9780, error code is -5
E0330 07:34:26.709370 14619 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 6] Receive response about askForVote from "172.31.8.17":9780, error code is -5



机器2
E0330 07:34:26.671720 15093 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 4] Receive response about askForVote from "172.31.8.17":9780, error code is -5
E0330 07:34:26.834892 15094 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 1] Receive response about askForVote from "172.31.8.17":9780, error code is -5
E0330 08:35:53.890802 14986 RaftPart.cpp:961] [Port: 9780, Space: 15, Part: 2] processAppendLogResponses failed!


机器3
E0330 07:34:29.145084 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 8] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:29.145184 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 8] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:29.145110 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:29.145243 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:29.168062 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 10] Receive response about askForVote from "172.31.1.93":9780, error code is -6
。。。。

写入的时候出现leader change了 ,为什么呢?集群负载高 没有响应?

最开始我用 --master local[4] 后来才调小的 local

这个配置其实很低了
Storage这个日志信息请我们负责存储的同学帮忙看下哈

1 个赞

那我提供的信息 有助于你们分析问题吗?

有需要 可以随时联系我 复现的概率很高

你的自动compact配置有没有关,关掉试下呢
在console配置:
nebula> UPDATE CONFIGS storage:rocksdb_column_family_options = {disable_auto_compactions = true};

麻烦提供几个东西:

  1. show hosts截图
  2. 挑一个有问题的part,比如上面图里的“Space: 15, Part: 3” 对于每个storage机器都只grep下INFO日志,grep “Space: 15, Part: 3” log, 然后上传下

我用exchange写入之前把disable_auto_compactions=true,写入后disable_auto_compactions=false并SUBMIT JOB COMPACT

我上午执行过 balance leader

机器1 ip:172.31.1.93
grep "Space: 15, Part: 3" nebula-storaged.INFO
E0330 07:34:26.572551 14620 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.8.17":9780, error code is -5

机器2 ip:172.31.4.183
grep "Space: 15, Part: 3" /usr/local/nebula/logs/nebula-storaged.INFO
空

机器3 ip:172.31.8.17
grep "Space: 15, Part: 3" /usr/local/nebula/logs/nebula-storaged.INFO
E0330 07:34:29.145110 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:29.145243 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:29.937732 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:29.937772 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:30.465073 14458 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:30.465112 14458 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:31.858142 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:31.858225 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:33.324142 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:33.324193 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:35.132741 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:35.132827 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:35.646097 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:35.646155 14457 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:37.434142 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:37.434173 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:38.785360 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:38.785396 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:40.221325 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:40.221385 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:41.676453 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:41.676517 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:42.226251 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:42.226311 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:43.988873 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:43.988934 14455 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
E0330 07:34:44.969836 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.1.93":9780, error code is -6
E0330 07:34:44.969894 14456 RaftPart.cpp:1143] [Port: 9780, Space: 15, Part: 3] Receive response about askForVote from "172.31.4.183":9780, error code is -6
1 个赞

你可能得把日志等级放开点 现在只打印了Error级别的 minloglevel改成0

好的 晚点试试

试了一下 没有复现问题

1 个赞

你好, 我这也遇到这样的导入问题, 麻烦指导一下
1, 集群


2. 导入你们Nebula官网的篮球员脚本实例
image
3 .导入命令

4. 导入打印的日志

5 SUBMIT JOB STATS;后查询结果

补充: 服务肯定是正常的, 一直都没有重启

你好, DEV, 我
我也有个导入的问题,麻烦你能指导一下: exchange 2 导入数据提示The leader has changed. Try again later - #18 由 yangmeng
谢谢!

@yangmeng
从您console链接的 -port 3699 我理解您的nebula graph 服务端是 1.x 的版本的,但是您的console 却是 2.0 的版本,请注意稳定大版本之间因为 thrift 的变动,两边的通信是对不上的,所以会有奇奇怪怪的结果显示。请用 1.x 的console连 您的 nebula graph cluster 哈