nebula-importer 出错的，错误文件，不保存错误的行

cy-dream · 2021 年5 月 9 日 14:11

nebula 版本：2.0 GA版
部署方式：分布式
需求：使用程序插入较慢，在研究使用nebula-importer 插入，目前出现了几个疑问
疑问1：导入csv文件，是否支持指定csv文件的分隔符为"|"
疑问2： preStop：commands配置断开Nebula Graph服务器连接之前执行的一些操作。是否支持创建索引以及rebuild，
疑问3：配置文件中均配置了保存错误的行，但是导入错误的行文件中并没有显示

下方为配置文件

# 连接的Nebula Graph版本，连接2.x时设置为v2。
version: v2

description: example

# 是否删除临时生成的日志和错误数据文件。
removeTempFiles: false

clientSettings:

  # nGQL语句执行失败的重试次数。
  retry: 3

  # Nebula Graph客户端并发数。
  concurrency: 10 

  # 每个Nebula Graph客户端的缓存队列大小。
  channelBufferSize: 128

  # 指定数据要导入的Nebula Graph图空间。
  space: student

  # 连接信息。
  connection:
    user: root
    password: nebula
    address: 10.0.6.21:9669,10.0.6.22:9669,10.0.6.23:9669


  postStart:
    # 配置连接Nebula Graph服务器之后，在插入数据之前执行的一些操作。
    commands:
      DROP SPACE IF EXISTS student;
      CREATE SPACE IF NOT EXISTS student(partition_num=5, replica_factor=1, vid_type=FIXED_STRING(20));
      USE student;
      CREATE TAG student(name string, age int,gender string);
      CREATE EDGE follow(degree int);

    # 执行上述命令后到执行插入数据命令之间的间隔。
    afterPeriod: 15s
  
  preStop:
    # 配置断开Nebula Graph服务器连接之前执行的一些操作。
    commands:

# 错误等日志信息输出的文件路径。    
logPath: ./err/test.log

# CSV文件相关设置。
files:
  
    # 数据文件的存放路径，如果使用相对路径，则会将路径和当前配置文件的目录拼接。本示例第一个数据文件为点的数据。
  - path: ./student_with_header.csv

    # 插入失败的数据文件存放路径，以便后面补写数据。
    failDataPath: ./err/studenterr.csv

    # 单批次插入数据的语句数量。
    batchSize: 10

    inOrder: true

    # 文件类型，当前仅支持csv。
    type: csv

    csv:
      # 是否有表头。
      withHeader: true

      # 是否有LABEL。
      withLabel: false

      # 指定csv文件的分隔符。只支持一个字符的字符串分隔符。
      delimiter: "|"

    schema:
      # Schema的类型，可选值为vertex和edge。
      type: vertex

    # 本示例第二个数据文件为边的数据。
  - path: ./follow_with_header.csv
    failDataPath: ./err/followerr.csv
    batchSize: 10
    limit: 10
    inOrder: true
    type: csv
    csv:
      withHeader: true
      withLabel: false
      delimiter: "|"
    schema:
      # Schema的类型为edge。
      type: edge
      edge:
        # 边类型名称。
        name: follow

        # 是否包含rank。
        withRanking: true

这是将’,’ 分隔符替换为 ‘|’ 分隔符出现的问题，因为自己的数据有的字段中是包含英文逗号的, 所有用别的分隔符

2021/05/09 10:36:37 --- START OF NEBULA IMPORTER ---
2021/05/09 10:36:37 [INFO] config.go:404: files[0].schema.vertex is nil
2021/05/09 10:36:37 [INFO] connection_pool.go:74: [nebula-clients] connection pool is initialized successfully
2021/05/09 10:36:37 [INFO] clientmgr.go:28: Create 30 Nebula Graph clients
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x524fdf]

goroutine 86 [running]:
github.com/vesoft-inc/nebula-importer/pkg/config.(*VID).String(0xc00000e660, 0x7e9314, 0x8, 0xc0000185c0, 0x10)
        /home/nebula-importer/pkg/config/config.go:437 +0x7f
github.com/vesoft-inc/nebula-importer/pkg/config.(*Edge).String(0xc000014c80, 0x4, 0xc0000185b4)
        /home/nebula-importer/pkg/config/config.go:576 +0x5a4
github.com/vesoft-inc/nebula-importer/pkg/config.(*Schema).String(0xc00000e5e0, 0x1e, 0xc0000932c0)
        /home/nebula-importer/pkg/config/config.go:367 +0xa4
github.com/vesoft-inc/nebula-importer/pkg/reader.(*FileReader).startLog(0xc000014c40)
        /home/nebula-importer/pkg/reader/reader.go:64 +0x83
github.com/vesoft-inc/nebula-importer/pkg/reader.(*FileReader).Read(0xc000014c40, 0x0, 0x0)
        /home/nebula-importer/pkg/reader/reader.go:156 +0x626
github.com/vesoft-inc/nebula-importer/pkg/cmd.(*Runner).Run.func2(0xc00006c900, 0xc00006c940, 0xc000014c40, 0xc0000932c0, 0x1e)
        /home/nebula-importer/pkg/cmd/runner.go:70 +0x40
created by github.com/vesoft-inc/nebula-importer/pkg/cmd.(*Runner).Run
        /home/nebula-importer/pkg/cmd/runner.go:69 +0x705

导入错误的行，并没有保存下来

2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 18, error: parse error on line 18, column 121: extraneous or missing " in quoted-field
2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 29, error: parse error on line 29, column 187: bare " in non-quoted-field
2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 52, error: parse error on line 52, column 203: bare " in non-quoted-field
2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 54, error: parse error on line 54, column 176: bare " in non-quoted-field
2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 96, error: parse error on line 96, column 214: bare " in non-quoted-field
2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 113, error: parse error on line 113, column 130: extraneous or missing " in quoted-field
2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 114, error: parse error on line 114, column 119: extraneous or missing " in quoted-field
2021/05/08 11:48:20 [ERROR] reader.go:169: Fail to read file(/config/pathology_report.csv) line 116, error: parse error on line 116, column 126: extraneous or missing " in quoted-field
2021/05/08 11:48:20 [INFO] reader.go:180: Total lines of file(/config/r_visit_info.csv) is: 2641, error lines: 0

yee · 2021 年5 月 9 日 14:47

疑问 1:

支持指定 delimiter，就像你 YAML 中配置的那样

疑问 2:

支持，preStop 和 postStart 本质上都是发送一些 nGQL 语句给 nebula graph，至于语句做什么不关心。

疑问 3:

看你的错误提示是 csv 数据文件有格式错误，你可以根据提示的文件所在行确认一下。

保存错误的行是说如果某些 INSERT 语句执行出错，并在重试之后依然出错，就会将该语句中涉及到的数据保存到指定的错误文件，这里没有包括数据文件格式错误导致 parser 出错以致程序崩溃。因为 parser 出错，无法正确读出相应的数据也就无法再去保存一份了。

KaliAlbert · 2021 年5 月 10 日 03:01

同样遇到
（疑问3：配置文件中均配置了保存错误的行，但是导入错误的行文件中并没有显示）
如何解决该问题，使得错误数据输出到指定文件中

yee · 2021 年5 月 10 日 03:25

确认一下，是 csv 文件格式正确但是插入出错依然没有输出到错误文件中？还是也是像上面一样 csv 文件的格式有误？

cy-dream · 2021 年5 月 10 日 05:56

字段中包含英文双引号(")解析不了，把双引号加了转义字符也不行

yee · 2021 年5 月 10 日 06:16

如何转义的双引号，在 CSV 文件中双引号的转义要用两个双引号

steam · 2021 年11 月 15 日 03:06

2 个帖子被拆分为一个新话题：Importer 导入 csv 文件报错报错