nebula-importer导入csv数据finish条数和实际条数不符

步骤:

使用nebula-importer 源码编译

git clone nebula-importer master
checkout v1.0.0 && make build

tag_order_info.csv 部分数据

00000060,USER00000048,PRDNO_03,5,1598924701
00000061,USER00000055,PRDNO_00,3,1598924701
00000062,USER00000012,PRDNO_04,0,1598924701
00000063,USER00000005,PRDNO_05,2,1598924701
00000064,USER00000048,PRDNO_05,1,1598924701
00000065,USER00000015,PRDNO_04,4,1598924701
00000066,USER00000020,PRDNO_05,0,1598924701
00000067,USER00000025,PRDNO_01,2,1598924701
00000068,USER00000051,PRDNO_02,4,1598924701
00000069,USER00000047,PRDNO_00,5,1598924701
00000070,USER00000038,PRDNO_03,1,1598924701
00000071,USER00000052,PRDNO_03,4,1598924701
00000072,USER00000003,PRDNO_05,0,1598924701
00000073,USER00000011,PRDNO_04,1,1598924701
00000074,USER00000031,PRDNO_00,4,1598924701
00000075,USER00000004,PRDNO_02,1,1598924701
00000076,USER00000017,PRDNO_01,4,1598924701
00000077,USER00000023,PRDNO_00,2,1598924701
00000078,USER00000021,PRDNO_04,4,1598924701
00000079,USER00000011,PRDNO_04,1,1598924701
00000080,USER00000027,PRDNO_00,3,1598924701
00000081,USER00000045,PRDNO_03,2,1598924701
00000082,USER00000018,PRDNO_05,0,1598924701
00000083,USER00000041,PRDNO_02,1,1598924701
00000084,USER00000039,PRDNO_00,2,1598924701
00000085,USER00000003,PRDNO_01,2,1598924701
00000086,USER00000013,PRDNO_01,4,1598924701
00000087,USER00000022,PRDNO_04,3,1598924701
00000088,USER00000031,PRDNO_01,0,1598924701
00000089,USER00000048,PRDNO_05,0,1598924701
00000090,USER00000025,PRDNO_00,4,1598924701
00000091,USER00000047,PRDNO_04,0,1598924701
00000092,USER00000000,PRDNO_01,0,1598924701
00000093,USER00000005,PRDNO_02,3,1598924701
00000094,USER00000048,PRDNO_05,4,1598924701
00000095,USER00000026,PRDNO_04,5,1598924701
00000096,USER00000005,PRDNO_05,0,1598924701
00000097,USER00000030,PRDNO_00,0,1598924701
00000098,USER00000026,PRDNO_02,5,1598924701
00000099,USER00000023,PRDNO_02,3,1598924701

配置yaml

version: v1
description: example
removeTempFiles: false
clientSettings:
  retry: 3
  concurrency: 2 # number of graph clients
  channelBufferSize: 1
  space: test100
  connection:
    user: root
    password: nebula
    address: 192.168.15.232:3699,192.168.15.233:3699
  postStart:
    commands: |
      UPDATE CONFIGS storage:wal_ttl=3600;
      UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = true };
      DROP SPACE IF EXISTS test100;
      CREATE SPACE IF NOT EXISTS test100(partition_num=5, replica_factor=1);
      USE test100;
      CREATE TAG tag_order_info(cust_id string ,product_type string, order_status int, create_time timestamp);
      CREATE TAG tag_mobile(mobile string);
      CREATE TAG tag_bank_card(bank_card string);
      CREATE TAG tag_cert_no(cert_no string);
      CREATE EDGE edge_mobile();
      CREATE EDGE edge_cert_no();
      CREATE EDGE edge_bank_card_no();
      #CREATE TAG course(name string, credits int);
      #CREATE TAG building(name string);
      #CREATE TAG student(name string, age int, gender string);
      #CREATE EDGE follow(likeness double);
      #CREATE EDGE choose(grade int);
      #CREATE TAG course_no_props();
      #CREATE TAG building_no_props();
      #CREATE EDGE follow_no_props();
    afterPeriod: 30s
  preStop:
    commands: |
      UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = false };
      UPDATE CONFIGS storage:wal_ttl=86400;
logPath: ./err/test100.log
files:
  - path: ./edge_bank_card_no.csv
    batchSize: 2
    inOrder: false
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: edge
      edge:
        name: edge_bank_card_no
        withRanking: false
        #props:
        #  - name: grade
        #    type: int

  - path: ./tag_card_info.csv
    failDataPath: ./err/tag_card_info.csv
    batchSize: 2
    inOrder: false
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: vertex
      vertex:
        tags:
          - name: tag_bank_card
            props:
              - name: bank_card
                type: string
              #- name: credits
              #  type: int
          #- name: building
          #  props:
          #    - name: name
          #      type: string

  - path: ./tag_idcert_info.csv
    failDataPath: ./err/tag_idcert_info.csv
    batchSize: 2
    inOrder: false
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: vertex
      vertex:
        tags:
          - name: tag_cert_no
            props:
              - name: cert_no
                type: string

  - path: ./edge_cert_no.csv
    failDataPath: ./err/edge_cert_no.csv
    batchSize: 2
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: edge
      edge:
        name: edge_cert_no
        withRanking: false
        #props:
        #  - name: likeness
        #    type: double

  - path: ./edge_mobile.csv
    failDataPath: ./err/edge_mobile.csv
    batchSize: 2
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: edge
      edge:
        name: edge_mobile
        withRanking: true

  - path: ./tag_order_info.csv
    failDataPath: ./err/tag_order_info.csv
    batchSize: 2
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: vertex
      vertex:
        tags:
          - name: tag_order_info
            props:
              - name: cust_id
                type: string
              - name: product_type
                type: string
              - name: order_status
                type: int
              - name: create_time
                type: timestamp

  - path: ./tag_phone_info.csv
    failDataPath: ./err/tag_phone_info.csv
    batchSize: 2
    type: csv
    csv:
      withHeader: false
      withLabel: false
    schema:
      type: vertex
      vertex:
        #vid:
        #  index: 1
        #  function: hash
        tags:
          - name: tag_mobile
            props:
              - name: mobile
                type: string

执行导入csv文件

nebule-import --config path/to/data.yaml

日志信息

2020/09/01 18:19:14 [INFO] clientmgr.go:28: Create 4 Nebula Graph clients
2020/09/01 18:19:14 [INFO] reader.go:64: Start to read file(1): /root/testing_data/100/tag_card_info.csv, schema: < :VID,tag_bank_card.bank_card:string >
2020/09/01 18:19:14 [INFO] reader.go:64: Start to read file(6): /root/testing_data/100/tag_phone_info.csv, schema: < :VID,tag_mobile.mobile:string >
2020/09/01 18:19:14 [INFO] reader.go:64: Start to read file(4): /root/testing_data/100/edge_mobile.csv, schema: < :SRC_VID,:DST_VID,:RANK >
2020/09/01 18:19:14 [INFO] reader.go:64: Start to read file(3): /root/testing_data/100/edge_cert_no.csv, schema: < :SRC_VID,:DST_VID >
2020/09/01 18:19:14 [INFO] reader.go:64: Start to read file(0): /root/testing_data/100/edge_bank_card_no.csv, schema: < :SRC_VID,:DST_VID >
2020/09/01 18:19:14 [INFO] reader.go:64: Start to read file(2): /root/testing_data/100/tag_idcert_info.csv, schema: < :VID,tag_cert_no.cert_no:string >
2020/09/01 18:19:14 [INFO] reader.go:64: Start to read file(5): /root/testing_data/100/tag_order_info.csv, schema: < :VID,tag_order_info.cust_id:string,tag_order_info.product_type:string,tag_order_info.order_status:int,tag_order_info.create_time:timestamp >
2020/09/01 18:19:44 [INFO] reader.go:180: Total lines of file(/root/testing_data/100/tag_idcert_info.csv) is: 30, error lines: 0
2020/09/01 18:19:44 [INFO] statsmgr.go:61: Done(/root/testing_data/100/tag_idcert_info.csv): Time(30.09s), Finished(263), Failed(0), Latency AVG(986us), Batches Req AVG(1270us), Rows AVG(8.74/s)
2020/09/01 18:19:44 [INFO] reader.go:180: Total lines of file(/root/testing_data/100/tag_card_info.csv) is: 35, error lines: 0
2020/09/01 18:19:44 [INFO] statsmgr.go:61: Done(/root/testing_data/100/tag_card_info.csv): Time(30.09s), Finished(275), Failed(0), Latency AVG(975us), Batches Req AVG(1259us), Rows AVG(9.14/s)
2020/09/01 18:19:44 [INFO] reader.go:180: Total lines of file(/root/testing_data/100/tag_phone_info.csv) is: 59, error lines: 0
2020/09/01 18:19:44 [INFO] statsmgr.go:61: Done(/root/testing_data/100/tag_phone_info.csv): Time(30.10s), Finished(348), Failed(0), Latency AVG(962us), Batches Req AVG(1249us), Rows AVG(11.56/s)
2020/09/01 18:19:44 [INFO] reader.go:180: Total lines of file(/root/testing_data/100/edge_bank_card_no.csv) is: 100, error lines: 0
2020/09/01 18:19:44 [INFO] statsmgr.go:61: Done(/root/testing_data/100/edge_bank_card_no.csv): Time(30.13s), Finished(520), Failed(0), Latency AVG(939us), Batches Req AVG(1233us), Rows AVG(17.26/s)
2020/09/01 18:19:44 [INFO] reader.go:180: Total lines of file(/root/testing_data/100/edge_cert_no.csv) is: 100, error lines: 0
2020/09/01 18:19:44 [INFO] statsmgr.go:61: Done(/root/testing_data/100/edge_cert_no.csv): Time(30.14s), Finished(534), Failed(0), Latency AVG(938us), Batches Req AVG(1233us), Rows AVG(17.72/s)
2020/09/01 18:19:44 [INFO] reader.go:180: Total lines of file(/root/testing_data/100/tag_order_info.csv) is: 100, error lines: 0
2020/09/01 18:19:44 [INFO] statsmgr.go:61: Done(/root/testing_data/100/tag_order_info.csv): Time(30.14s), Finished(546), Failed(0), Latency AVG(935us), Batches Req AVG(1231us), Rows AVG(18.12/s)
2020/09/01 18:19:44 [INFO] reader.go:180: Total lines of file(/root/testing_data/100/edge_mobile.csv) is: 300, error lines: 0
2020/09/01 18:19:44 [INFO] statsmgr.go:61: Done(/root/testing_data/100/edge_mobile.csv): Time(30.17s), Finished(724), Failed(0), Latency AVG(931us), Batches Req AVG(1226us), Rows AVG(24.00/s)

fetch 查询

fetch prop on tag_order_info 69; 有记录
fetch prop on tag_order_info 70; 没有记录
fetch prop on tag_order_info 71; 没有记录

fetch prop on tag_order_info 77; 没有记录

分析

所有的csv文件总行数是724行,finished(724),Failed(0)。这里的finished条数理论上,应该和csv行数相同。实际在1000w个订单测试中,有些数据在后台也查询不到,但是从importer日志没有发现报错;在err文件夹下,产生的csv也是空文件。
[root@nebula233 err]# ll
total 4
-rw-r–r-- 1 root root 0 Sep 1 18:19 edge_bank_card_no.csv
-rw-r–r-- 1 root root 0 Sep 1 18:19 edge_cert_no.csv
-rw-r–r-- 1 root root 0 Sep 1 18:19 edge_mobile.csv
-rw-r–r-- 1 root root 0 Sep 1 18:19 tag_card_info.csv
-rw-r–r-- 1 root root 0 Sep 1 18:19 tag_idcert_info.csv
-rw-r–r-- 1 root root 0 Sep 1 18:19 tag_order_info.csv
-rw-r–r-- 1 root root 0 Sep 1 18:19 tag_phone_info.csv
-rw-r–r-- 1 root root 3461 Sep 1 18:19 test100.log

非常感谢您详细的描述!

首先看到您的数据文件的第一列虽然是整数,但是全都以 0 开头,在 nebula 系统中如果整数以 0 开头会按照八进制处理。所以以十进制的 vid 去查,会出现查不到的情况。建议您用 00000070 再去查询看看结果。

这里也触发了一个 nebula 1.0 的 bug,即八进制的数据处理时没有校验所有的数字有无超过 8 的情况,后续会修复。

再次感谢您的反馈!

谢谢