Nebula Importer导入CSV文件错误

lhthsbshuai · 2022 年8 月 10 日 09:43

nebula版本：3.2.0
部署方式：单机
Nebula Importer运行方式：源码编译运行

在进行csv文件导入时报如下错误：

2022/08/10 17:18:16 --- START OF NEBULA IMPORTER ---
2022/08/10 17:18:16 [INFO] config.go:393: Failed data path: err/companyerr.csv/company.csv
2022/08/10 17:18:16 [INFO] config.go:399: find file: /app/nebula/nebula-bench/nebula-importer/csv/company.csv
2022/08/10 17:18:36 [INFO] clientmgr.go:31: Create 10 Nebula Graph clients
2022/08/10 17:18:36 [INFO] reader.go:49: The delimiter of /app/nebula/nebula-bench/nebula-importer/csv/company.csv is U+002C ','
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x50e882]

CSV文件格式（共三列）

yaml配置文件如下：

version: v3
description: example
removeTempFiles: false

clientSettings:
  retry: 3
  concurrency: 10 
  channelBufferSize: 128
  space: company_test
  
  connection:
    user: root
    password: nebula
    address: 127.0.0.1:9669
  
  postStart:
    commands: 
      CREATE SPACE IF NOT EXISTS company_test(vid_type = INT64);
      USE company_test;
      CREATE TAG IF NOT EXISTS student(COMPANY_ID string, COMPANY_NM string);
    afterPeriod: 20s

logPath: ./err/test.log

files:
  - path: /app/nebula/nebula-bench/nebula-importer/csv/company.csv
    failDataPath: ./err/companyerr.csv
    batchSize: 10
    limit: 10
    inOrder: true
    type: csv

    csv:
      withHeader: true
      withLabel: false
      delimiter: ","

    schema:
      type: vertex
      vertex:
        vid:
          index: 0
          type: int

        tag:
          - name: COMPANT
            props: 
              - name: COMPANT_ID
              - type: COMPANT_NM
              - index: 1

Aiee · 2022 年8 月 10 日 10:17

importer 版本是多少

lhthsbshuai · 2022 年8 月 11 日 01:21

Nebula Importer v3.2.0

Aiee · 2022 年8 月 11 日 06:24

这个错误是完整的吗? 应该还有个调用栈吧

lhthsbshuai · 2022 年8 月 11 日 06:28

好像是配置文件的问题，我通过studio导入后下载它生成的配置文件，再稍作修改，就成功了

Reid00 · 2022 年8 月 16 日 11:23

请问现在这个yaml 格式要改成什么样的？我之前v2 的yaml 出错了，也碰到了这个问题

wey · 2022 年8 月 16 日 11:58

看前边的 CREATE TAG ，TAG的名字是 student 不是它吧？

wey · 2022 年8 月 16 日 11:59

应该是配置错误，现在的报错特别不友好

Reid00 · 2022 年8 月 17 日 02:01

报错是这个Failed data path: %v
但是，我降了版本，用了v3.1.0 的import 没有错误，可以正常跑起来。
我不知道有没有其他人测试这个通过了

wey · 2022 年8 月 17 日 02:13

@jievince need help here

jievince · 2022 年8 月 17 日 02:40

importer的3.2版本相比没有合并什么新代码呀
你这个报错应该是因为没有指定failDataPath:

这个是必选项

Reid00 · 2022 年8 月 17 日 02:43

指定了： failDataPath: /mnt/csv_file/prod_relation/err/comapny_err.csv
报错：

2022/08/17 10:31:43 [INFO] config.go:393: Failed data path: /mnt/csv_file/prod_relation/err/comapny_err.csv/company.csv
2022/08/17 10:31:43 [INFO] config.go:399: find file: /mnt/csv_file/prod_relation/company.csv
2022/08/17 10:31:43 [INFO] config.go:393: Failed data path: /mnt/csv_file/prod_relation/err/person_err.csv/person.csv
2022/08/17 10:31:43 [INFO] config.go:399: find file: /mnt/csv_file/prod_relation/person.csv
2022/08/17 10:31:43 [INFO] config.go:393: Failed data path: /mnt/csv_file/prod_relation/err/corporation_err.csv/corporation.csv

而且没有改yaml 用v3.1 的importer 可以正常导入

jievince · 2022 年8 月 17 日 02:46

可以发下yaml文件让我本地复现下吗

Reid00 · 2022 年8 月 17 日 02:48

复制了第一部分如下：

# Graph版本，连接2.x时设置为v2。
version: v2

description: Relation Space Create and import data

# 是否删除临时生成的日志和错误数据文件。
removeTempFiles: false

clientSettings:

  # nGQL语句执行失败的重试次数。
  retry: 3

  # Nebula Graph客户端并发数。
  concurrency: 5

  # 每个Nebula Graph客户端的缓存队列大小。
  channelBufferSize: 1024

  # 指定数据要导入的Nebula Graph图空间。
  space: Relation

  # 连接信息。
  connection:
    user: root
    password: nebula
    address: 10.0.7.89:9669,10.0.7.92:9669

  postStart:
    # 配置连接Nebula Graph服务器之后，在插入数据之前执行的一些操作。
    commands: |

    # 执行上述命令后到执行插入数据命令之间的间隔。
    afterPeriod: 1s

  preStop:
    # 配置断开Nebula Graph服务器连接之前执行的一些操作。
    commands: |

# 错误等日志信息输出的文件路径。    
logPath: /mnt/csv_file/prod_relation/err/test.log

# CSV文件相关设置。
files:

    # 数据文件的存放路径，如果使用相对路径，则会将路径和当前配置文件的目录拼接。本示例第一个数据文件为点的数据。
  - path: /mnt/csv_file/prod_relation/company.csv

    # 插入失败的数据文件存放路径，以便后面补写数据。
    failDataPath: /mnt/csv_file/prod_relation/err/comapny_err.csv

    # 单批次插入数据的语句数量。
    batchSize: 100

    # 读取数据的行数限制。
    # limit: 10

    # 是否按顺序在文件中插入数据行。如果为false，可以避免数据倾斜导致的导入速率降低。
    inOrder: false

    # 文件类型，当前仅支持csv。
    type: csv

    csv:
      # 是否有表头。
      withHeader: false

      # 是否有LABEL。
      withLabel: false

      # 指定csv文件的分隔符。只支持一个字符的字符串分隔符。
      delimiter: "\t"

    schema:
      # Schema的类型，可选值为vertex和edge。
      type: vertex

      vertex:

        # 点ID设置。
        vid:
           # 点ID对应CSV文件中列的序号。CSV文件中列的序号从0开始。
           index: 0

           # 点ID的数据类型，可选值为int和string，分别对应Nebula Graph中的INT64和FIXED_STRING。
           type: int

        # Tag设置。   
        tags:
            # Tag名称。
          - name: Company

            # Tag内的属性设置。
            props:
                # 属性名称。
              - name: keyno

                # 属性数据类型。
                type: string

                # 属性对应CSV文件中列的序号。
                index: 1

              - name: name
                type: string
                index: 2
              - name: shortstatus
                type: string
                index: 3
              - name: econkind
                type: string
                index: 4
              - name: registcapi
                type: string
                index: 5
              - name: hasimage
                type: bool
                index: 6
              - name: companytype
                type: int
                index: 7
              - name: ismain
                type: string
                index: 8
              - name: isinvestor
                type: string
                index: 9
              - name: create_time
                type: int
                index: 10
              - name: credit_code
                type: string
                index: 11
              - name: isipo
                type: string
                index: 12

Reid00 · 2022 年8 月 18 日 01:58

能够重现吗？

wey · 2022 年9 月 2 日 21:38

todo 等我重现看看哈

system · 2022 年10 月 2 日 21:39

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。