nebula importer 批量删除数据

  • 版本:nebula graph 2.0.1
  • 部署方式(单机):
  • 是否为线上版本:Y

nebula importer 可发批量导入数据,哪怎么批量删除数据,yaml文件怎么配置删除?

importer 只能导入数据不能批量删除数据。 见下边 @HarrisChu 贴的,支持删除或者创建的语义,默认是创建。
请注意,数据支持 TTL,可以设定过期时间自动删除的。

可是不需要自动删除,我需要批量删除

我需要根据csv文件批量删除数据

只能用Nebula Java Client 代码循环一条一条的删除吗?

也可以用查询语句带上管道去删除哈

因为删除本身支持 Bulk/Batch
可以用比如 Java去构造比如 100条一行,这样效率高很多

DELETE VERTEX <vid0> <vid1 , <vid> ... <vid100>;

importer 根据 csv 去批量删除我觉得这也不失为是一个需求,您可以去 nebula-importer 提一下您的场景和需求么(github issue)?虽然和 importer 语义违背,但是按照数据源是 undo importer 行为看起来也是合理的。

https://github.com/vesoft-inc/nebula-importer.git 怎么在这里提需求?

到 issue 里点 new issue 就可以哈

importer 支持删除的,只是用的人比较少。
你可以参考:

https://github.com/vesoft-inc/nebula-importer#label-optional

https://github.com/vesoft-inc/nebula-importer/blob/master/examples/v2/example.yaml#L86

3 个赞

多谢 @HarrisChu !

@liaochunlin ,抱歉,如 @HarrisChu 提到,importer 已经支持删除数据的语法,利用这个,只需要构造带有 - 列的相同的 CSV,在 yaml 里填上 - 的列的index就好了。

试了带标题的导入报错
[root@test-oa-app cmd]# ./nebula-importer --config /home/fintech/nebula-importer/examples/v2/Industry.yaml
2021/07/23 14:01:33 — START OF NEBULA IMPORTER —
2021/07/23 14:01:33 [INFO] config.go:311: Invalid batch size in file(/home/fintech/nebula-importer/examples/v2/Industry.csv), reset to 128
2021/07/23 14:01:33 [INFO] config.go:404: files[0].schema.vertex is nil
2021/07/23 14:01:33 [INFO] connection_pool.go:77: [nebula-clients] connection pool is initialized successfully
2021/07/23 14:02:23 [INFO] clientmgr.go:28: Create 10 Nebula Graph clients
2021/07/23 14:02:23 [INFO] reader.go:26: The delimiter of /home/fintech/nebula-importer/examples/v2/Industry.csv is U+002C ‘,’
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x531fbd]

goroutine 31 [running]:
github.com/vesoft-inc/nebula-importer/pkg/config.(*VID).String(0xc000488090, 0x7c6b56, 0x8, 0xc0004860f0, 0xc0004c0c58)
/home/fintech/nebula-importer/pkg/config/config.go:437 +0x9d
github.com/vesoft-inc/nebula-importer/pkg/config.(*Edge).String(0xc0004aa300, 0x4, 0xc000484120)
/home/fintech/nebula-importer/pkg/config/config.go:573 +0x725
github.com/vesoft-inc/nebula-importer/pkg/config.(*Schema).String(0xc000488078, 0x47, 0xc000024230)
/home/fintech/nebula-importer/pkg/config/config.go:367 +0x74
github.com/vesoft-inc/nebula-importer/pkg/reader.(*FileReader).startLog(0xc0004aa2c0)
/home/fintech/nebula-importer/pkg/reader/reader.go:64 +0x85
github.com/vesoft-inc/nebula-importer/pkg/reader.(*FileReader).Read(0xc0004aa2c0, 0x0, 0x0)
/home/fintech/nebula-importer/pkg/reader/reader.go:156 +0x613
github.com/vesoft-inc/nebula-importer/pkg/cmd.(*Runner).Run.func2(0xc000026840, 0xc000026880, 0xc0004aa2c0, 0xc000024230, 0x47)
/home/fintech/nebula-importer/pkg/cmd/runner.go:70 +0x45
created by github.com/vesoft-inc/nebula-importer/pkg/cmd.(*Runner).Run
/home/fintech/nebula-importer/pkg/cmd/runner.go:69 +0x6b6

方便的话,head 一下 /home/fintech/nebula-importer/examples/v2/Industry.csv,我看一下你的数据格式。
然后把 importer 的配置文件发一下

Industry.yaml 配置如下

version: v2
description: example
removeTempFiles: false
clientSettings:
retry: 3
concurrency: 10 # number of graph clients
channelBufferSize: 128
space: inc_local
connection:
user: user
password: password
address: 100.65.240.45:9669
postStart:
commands: |
CREATE SPACE IF NOT EXISTS inc_local(partition_num=100, replica_factor=1, vid_type=FIXED_STRING(300));
USE inc_local;
CREATE TAG IF NOT EXISTS Industry(code string,name string,parent string,level string,is_leaf string,remark string);
CREATE TAG INDEX IF NOT EXISTS industry_index on Industry(code(300));
CREATE EDGE IF NOT EXISTS IndustryMinceRelationship(type int DEFAULT 3,level string,relation_type string DEFAULT ‘细分’);
CREATE EDGE INDEX IF NOT EXISTS industryMinceRelationship on IndustryMinceRelationship();
afterPeriod: 50s
preStop:
commands: |
USE inc_local;
logPath: ./err/test.log
files:

  • path: ./Industry.csv
    failDataPath: ./err/Industryerr.csv
    batchsize: 10
    limit: 10000000000
    inOrder: true
    type: csv
    csv:
    withHeader: true
    withLabel: true
    delimiter: “,”
    schema:
    type: vertex
  • path: ./IndustryMinceRelationship.csv
    failDataPath: ./err/IndustryMinceRelationshiperr.csv
    batchSize: 10
    limit: 10000000000
    inOrder: true
    type: csv
    csv:
    withHeader: true
    withLabel: true
    schema:
    type: edge
    edge:
    name: IndustryMinceRelationship
    withRanking: false

/Industry.csv如下:

:LABEL,:VID(string),Industry.code:string,Industry.name:string,Industry.parent:string,Industry.level:string,Industry.is_leaf:string,Industry.remark:string
+,in3404040518,3404040518,氨糖软骨素加钙片,34040405,5,1,
+,in3404040517,3404040517,多维素片,34040405,5,1,

IndustryMinceRelationship.csv数据如下:

:LABEL,:DST_VID(string),SRC_VID(string),IndustryMinceRelationship.level:string
+,in3404040518,in34040405,5
+,in3404040517,in34040405,5

已经发了

看内容好像没什么问题,但是格式乱了看不出来,你能重新编辑下么,用 markdown 处理下。

image

version: v2
description: example
removeTempFiles: false
clientSettings:
  retry: 3
  concurrency: 10 # number of graph clients
  channelBufferSize: 128
  space: inc_local
  connection:
    user: user
    password: password
    address: 100.65.240.45:9669
  postStart:
    commands: |
      CREATE SPACE IF NOT EXISTS inc_local(partition_num=100, replica_factor=1, vid_type=FIXED_STRING(300));
      USE inc_local;
      CREATE TAG IF NOT EXISTS Industry(code string,name string,parent string,level string,is_leaf string,remark string);
      CREATE TAG INDEX IF NOT EXISTS industry_index on Industry(code(300));
      CREATE EDGE IF NOT EXISTS IndustryMinceRelationship(type int DEFAULT 3,level string,relation_type string DEFAULT '细分');
      CREATE EDGE INDEX IF NOT EXISTS industryMinceRelationship on IndustryMinceRelationship();
    afterPeriod: 50s
  preStop:
    commands: |
      USE inc_local;
logPath: ./err/test.log
files:
  - path: ./Industry.csv
    failDataPath: ./err/Industryerr.csv
    batchsize: 10
    limit: 10000000000
    inOrder: true
    type: csv
    csv:
       withHeader: true
       withLabel: true
       delimiter: ","
    schema:
      type: vertex
  - path: ./IndustryMinceRelationship.csv
    failDataPath: ./err/IndustryMinceRelationshiperr.csv
    batchSize: 10
    limit: 10000000000
    inOrder: true
    type: csv
    csv:
      withHeader: true
      withLabel: true
    schema:
      type: edge
      edge:
        name: IndustryMinceRelationship
        withRanking: false

/Industry.csv如下:

:LABEL,:VID(string),Industry.code:string,Industry.name:string,Industry.parent:string,Industry.level:string,Industry.is_leaf:string,Industry.remark:string
+,in3404040518,3404040518,氨糖软骨素加钙片,34040405,5,1,
+,in3404040517,3404040517,多维素片,34040405,5,1,

:LABEL,:DST_VID(string),SRC_VID(string),IndustryMinceRelationship.level:string
+,in3404040518,in34040405,5
+,in3404040517,in34040405,5

IndustryMinceRelationship.csv文件数据如下:

:LABEL,:DST_VID(string),SRC_VID(string),IndustryMinceRelationship.level:string
+,in3404040518,in34040405,5
+,in3404040517,in34040405,5

/Industry.csv如下:

:LABEL,:VID(string),Industry.code:string,Industry.name:string,Industry.parent:string,Industry.level:string,Industry.is_leaf:string,Industry.remark:string
+,in3404040518,3404040518,氨糖软骨素加钙片,34040405,5,1,
+,in3404040517,3404040517,多维素片,34040405,5,1,