spark-connector大批量导入数据有部分数据插入失败

  • nebula 版本:V2.6.1
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • 2T ssd
    • 3节点 16C 32G
  • meta graph storage 3个节点都部署了 3*3
  • 使用spark-connector同时320个task导入tag近3亿条数据,executeRetry = 4,tagBatch为1000、256、128都测试都会发生部分数据插入失败,最终入库数据存在丢失的情况
nebula日志
E0125 14:22:20.766925 29658 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 2
E0125 14:22:20.773938 29658 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 1
E0125 14:22:20.773953 29658 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 3
E0125 14:22:20.773970 29658 StorageAccessExecutor.h:124] Storage Error: part: 4, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.774032 29658 QueryInstance.cpp:108] Storage Error: part: 1, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.774282 29654 QueryInstance.cpp:108] Storage Error: part: 1, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.774346 29651 QueryInstance.cpp:108] Storage Error: part: 2, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.774494 29652 QueryInstance.cpp:108] Storage Error: part: 4, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.767105 29657 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 1
E0125 14:22:20.775115 29657 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 2
E0125 14:22:20.775136 29657 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 3
E0125 14:22:20.775151 29657 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 5
E0125 14:22:20.775168 29657 StorageAccessExecutor.h:124] Storage Error: part: 4, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.775193 29657 QueryInstance.cpp:108] Storage Error: part: 4, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.767215 29646 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 4
E0125 14:22:20.778800 29646 StorageAccessExecutor.h:124] Storage Error: part: 2, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.778851 29646 QueryInstance.cpp:108] Storage Error: part: 2, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.768628 29649 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 4
E0125 14:22:20.779284 29649 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 2
E0125 14:22:20.779320 29649 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 3
E0125 14:22:20.779335 29649 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 1
E0125 14:22:20.779350 29649 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 5
E0125 14:22:20.779366 29649 StorageAccessExecutor.h:124] Storage Error: part: 4, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.779402 29649 QueryInstance.cpp:108] Storage Error: part: 4, error: E_CONSENSUS_ERROR(-3001).
E0125 14:22:20.768651 29656 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 1
E0125 14:22:20.780360 29656 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 4
E0125 14:22:20.780385 29656 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 3
E0125 14:22:20.780416 29656 StorageAccessExecutor.h:41] InsertVerticesExecutor failed, error E_CONSENSUS_ERROR, part 5

spark日志
ERROR [Executor task launch worker for task 2573] NebulaVertexWriter: failed to write INSERT vertex `tag_XXX`  for Storage Error: part: 13, error: E_CONSENSUS_ERROR(-3001).

您好,麻烦您把出问题节点的nebula storage的日志贴一下吧

这些就是报错的storage的日志,都是重复的一样的错误

把INFO日志发现哈,别发error日志

这个就是插入速度超过 raft 的最大吞吐能力了.
解决方案就两个思路, 一个是降低插入速度, 比如并发设小点?
或者就是尽可能多的 retry.

3 个赞

这个吞吐能力是受硬件的限制还是数据库的限制呀?三台服务器只有一台的storage报这个错误,我配置的graphAddress是三台都配置了的

一般是数据库, 盲猜啊, 可能是有大点(边)落在了某一个 partition 上. 然后这个 partition 的 leader 报错了.

2 个赞

好的,我减少并发后没有再失败了

1 个赞