exchange2.0 csv导入报错

  • nebula 版本:2.0rc1
  • 部署方式(分布式 / 单机 / Docker / DBaaS):docker swarm
  • 硬件信息
    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息:

使用exchange2.0出现无法理解的错误,我使用2.0插入数据,然后出现如下错误,error目录下也有错误打印的语句
但是对应的数据我可以在库里面找到,实际上应该插入成功了
我想问下什么原因会导致插入成功,但是返回插入失败的信息

21/02/03 16:40:55 INFO writer.NebulaGraphClientWriter: Connection to List(10.0.0.4:9559, 10.0.0.2:9559, 10.0.0.7:9559)
21/02/03 16:40:55 ERROR writer.NebulaGraphClientWriter: write vertex failed for -8,Request to storage failed, without failedCodes.
21/02/03 16:40:55 INFO writer.NebulaGraphClientWriter: INSERT VERTEX user(user_id,public,completion_percentage,gender,region,last_login,registrati
on,age) VALUES "224011": (224011, 0, 60, 1, "bratislavsky kraj, malacky", "2012-05-25 13:54:00.0", "2006-01-05 00:00:00.0", 22), "702137": (702137
, 0, 28, 0, "kosicky kraj, michalovce", "2012-03-23 22:56:00.0", "2007-12-27 00:00:00.0", 0), "504679": (504679, 1, 79, 1, "banskobystricky kraj, 
zvolen", "2012-05-15 10:17:00.0", "2009-07-01 00:00:00.0", 18)

cc @nicole @darionyaphet

-8是graphd 返回的执行失败的错误码。

DESC SPACE space_name; DESC TAG user; 贴一下这两个结果

有一点忘记写了,我batch size改成1可以把每条数据都插入进去,并且不报错。
但是我改成50就会出现报错,但是数据都插进去了,也就是说应该执行成功了。
我理解应该是和DESC SPACE space_name; DESC TAG user; 没有关系,应该和batch有关系。

(u@nebula) [debug_graph]> desc space debug_graph;
+----+---------------+------------------+----------------+---------+------------+--------------------+
| ID | Name          | Partition Number | Replica Factor | Charset | Collate    | Vid Type           |
+----+---------------+------------------+----------------+---------+------------+--------------------+
| 21 | "debug_graph" | 100              | 3              | "utf8"  | "utf8_bin" | "FIXED_STRING(16)" |
+----+---------------+------------------+----------------+---------+------------+--------------------+
Got 1 rows (time spent 1044/1619 us)

Thu, 04 Feb 2021 09:23:08 UTC

(u@nebula) [debug_graph]> desc tag user;
+-------------------------+----------+-------+-----------+
| Field                   | Type     | Null  | Default   |
+-------------------------+----------+-------+-----------+
| "user_id"               | "int64"  | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
| "public"                | "int64"  | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
| "completion_percentage" | "int64"  | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
| "gender"                | "int64"  | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
| "region"                | "string" | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
| "last_login"            | "string" | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
| "registration"          | "string" | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
| "age"                   | "int64"  | "YES" | __EMPTY__ |
+-------------------------+----------+-------+-----------+
Got 8 rows (time spent 1119/1683 us)

这需要 @yee 看下,客户端执行insert语句,返回 -8的错误码但数据插入成功。

这个应该是 batch 中的数据部分插入成功,部分失败,所以 graph 始终按照失败来处理。可以去每个 storage 上看看对应的日志,看看具体什么 error ?

这storage的error日志 batch=1没有错误 batch=50就会有错误

E0204 06:41:12.155174    13 RaftPart.cpp:687] [Port: 9780, Space: 18, Part: 36] Cannot append logs, clean the buffer
E0204 06:41:12.161538    17 RaftPart.cpp:367] [Port: 9780, Space: 18, Part: 10] The partition is not a leader
E0204 06:41:12.172740    14 RaftPart.cpp:687] [Port: 9780, Space: 18, Part: 95] Cannot append logs, clean the buffer
E0204 06:41:12.172864    30 RaftPart.cpp:367] [Port: 9780, Space: 18, Part: 77] The partition is not a leader
E0204 06:41:13.163826    27 RaftPart.cpp:687] [Port: 9780, Space: 18, Part: 3] Cannot append logs, clean the buffer
E0204 06:41:13.164686    27 RaftPart.cpp:367] [Port: 9780, Space: 18, Part: 51] The partition is not a leader
E0204 06:41:13.177299    19 RaftPart.cpp:687] [Port: 9780, Space: 18, Part: 92] Cannot append logs, clean the buffer
E0204 06:41:13.177595    37 RaftPart.cpp:367] [Port: 9780, Space: 18, Part: 49] The partition is not a leader
E0204 06:41:14.179781    24 RaftPart.cpp:367] [Port: 9780, Space: 18, Part: 95] The partition is not a leader
E0204 06:41:14.179777    21 RaftPart.cpp:687] [Port: 9780, Space: 18, Part: 51] Cannot append logs, clean the buffer
E0204 06:41:14.198081    28 RaftPart.cpp:687] [Port: 9780, Space: 18, Part: 21] Cannot append logs, clean the buffer
E0204 06:41:14.205024    28 RaftPart.cpp:367] [Port: 9780, Space: 18, Part: 22] The partition is not a leader
E0204 06:41:14.212244    26 RaftPart.cpp:687] [Port: 9780, Space: 18, Part: 21] Cannot append logs, clean the buffer
E0204 06:41:14.212919    26 RaftPart.cpp:367] [Port: 9780, Space: 18, Part: 74] The partition is not a leader
E0204 08:44:43.991329    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 45] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.023383    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 83] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.029975    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 47] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.054409    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 60] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.065016    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 100] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.071357    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 84] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.086710    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 99] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.099277    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 75] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.103790    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 4] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.116744    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 82] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.117372    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 95] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.128237    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 40] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.158421    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 68] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.161113    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 70] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.163620    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 88] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.166801    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 8] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.167631    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 33] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.201018    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 66] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.202373    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 25] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.219158    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 62] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.228258    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 15] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.230314    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 93] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.236876    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 6] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.240433    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 54] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.258884    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 90] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.268848    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 52] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.281461    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 85] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.287914    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 29] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.288565    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 94] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.291141    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 39] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.295259    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 19] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.303706    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 74] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.316123    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 23] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.332510    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 7] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.345424    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 37] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.359369    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 98] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.376909    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 92] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.401540    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 73] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.428186    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 21] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.449779    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 46] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.473398    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 35] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.493059    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 11] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.495424    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 27] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.499500    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 44] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.503783    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 24] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.511098    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 79] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.527762    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 72] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.534207    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 32] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.591652    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 78] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.600106    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 58] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.606600    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 30] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.626714    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 91] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.664268    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 38] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.667356    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 96] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.689697    56 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 97] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.768553    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 80] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.770819    52 Host.cpp:390] [Port: 9780, Space: 21, Part: 80] [Host: 10.0.0.4:9780] Failed to append logs to the host (Err: -5)
E0204 08:44:44.865806    58 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 17] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.897579    57 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 89] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 08:44:44.898264    55 RaftPart.cpp:1143] [Port: 9780, Space: 21, Part: 76] Receive response about askForVote from [10.0.0.4:9780], error code is -5
E0204 09:09:14.855832    34 RaftPart.cpp:687] [Port: 9780, Space: 21, Part: 71] Cannot append logs, clean the buffer
E0204 09:09:14.855882    37 RaftPart.cpp:367] [Port: 9780, Space: 21, Part: 71] The partition is not a leader
E0204 09:09:14.868280    21 RaftPart.cpp:687] [Port: 9780, Space: 21, Part: 65] Cannot append logs, clean the buffer
E0204 09:09:14.868813    21 RaftPart.cpp:367] [Port: 9780, Space: 21, Part: 9] The partition is not a leader
E0204 09:09:15.883616    38 RaftPart.cpp:687] [Port: 9780, Space: 21, Part: 12] Cannot append logs, clean the buffer
E0204 09:09:15.883671    42 RaftPart.cpp:367] [Port: 9780, Space: 21, Part: 81] The partition is not a leader
E0204 09:09:16.895633    27 RaftPart.cpp:367] [Port: 9780, Space: 21, Part: 50] The partition is not a leader
E0204 09:09:16.895632    14 RaftPart.cpp:687] [Port: 9780, Space: 21, Part: 61] Cannot append logs, clean the buffer
E0204 09:09:16.905305    17 RaftPart.cpp:367] [Port: 9780, Space: 21, Part: 36] The partition is not a leader
E0204 09:09:16.905309    21 RaftPart.cpp:687] [Port: 9780, Space: 21, Part: 16] Cannot append logs, clean the buffer

@critical27 这种选举的错误跟 batch 插入会有关系吗?

这个问题也很好复现,我这基本每次都会重现,我这边没法把数据给你们,你们可以根据我的数据结构构造几百万数据,然后用exchange2.0跑
提交命令

nohup spark-submit  --class com.vesoft.nebula.exchange.Exchange  --master yarn  --deploy-mode client  --driver-memory 1g  --executor-memory 1g  --executor-cores 1  --num-executors 2 --files import-debug.conf --conf spark.driver.extraClassPath=./ --conf spark.executor.extraClassPath=./ nebula-exchange-2.0.0.jar -c import-debug.conf >test.log 2>&1 &

import-debug.conf文件

 # Processing tags
  tags: [

    # Loading tag from HDFS and data type is csv
    {
      name: user
      type: {
        source: csv
        sink: client
      }
      path: "graph/test/pokec-profiles.csv"
      fields: [_c0, _c1, _c2, _c3, _c4, _c5, _c6, _c7]
      nebula.fields: [user_id,public,completion_percentage,gender,region,last_login,registration,age]
      vertex: _c0
      separator: ","
      header: false
      batch: 50
      partition: 32
    }
  ]

  # Processing edges
  edges: [
    # Loading tag from HDFS and data type is csv
    {
      name: relation
      type: {
        source: csv
        sink: client
      }
      path: "graph/test/pokec-relations.csv"
      fields: [_c0,_c1]
      nebula.fields: [user,has_friend]
      source: _c0
      target: _c1
      separator: ","
      header: false
      batch: 256
      partition: 32
    }
  ]

exchange现在是有重试吗? @nicole@zzz 有一部分会失败,不是全部失败,leader切换之类的会导致报这些错

对,有对导入失败的数据做重试。 @zzz @critical27

那我理解这应该算是bug了,batch是很正常的操作,通过graph写数据的时候必然会出现这个问题

有报错是正常的,最好是exchange告知哪些没成功,用户再自己判断是否需要重试。

我理解的话,完全是batch的问题,和exchange没有关系,这种情况不是在极端的情况发生的,是在很平常场景。我举个例子,我现在正常的batch插入数据,服务器也很空闲,然后发生这个事情,这个是没法让人理解的。
并且exchange 会重试一次,这个时候还是会失败。

show hosts关注下 leader是不是切换了 hdd还是ssd?至少空闲不应该出现你说的这个情况。

我这边测试数据大概100w左右把,测试用到hdd,我理解100w还不至于到资源瓶颈。
生产用的ssd,到时候生产看会不会复现把。