nebula-importer 导入csv报错

wey · 2021 年7 月 20 日 09:14

这里的 edge 不存在是 edge 的这个schema 不存在哈，应该是您 postStart 或者预先的空间里没有创建好需要的 edge schema哈

wey · 2021 年7 月 20 日 09:15

  postStart:
    # 配置连接Nebula Graph服务器之后，在插入数据之前执行的一些操作。
    commands: |
      DROP SPACE IF EXISTS inc_local;
      CREATE SPACE IF NOT EXISTS inc_local(partition_num=5, replica_factor=1, vid_type=FIXED_STRING(300));
      USE inc_local;
      CREATE TAG IF NOT EXISTS Industry(code string,name string,parent string,level string,is_leaf string,remark string);
      CREATE TAG IF NOT EXISTS IndustryMinceRelationship(type int DEFAULT 3,level string,relation_type string DEFAULT '细分'); # <------------比如这里您之前写错了，应该是 CREATE EDGE... 而非 CREATE TAG
    afterPeriod: 15s
  preStop:
    # 配置断开Nebula Graph服务器连接之前执行的一些操作。
    commands: USE inc_local; # put some random lines instead of the empty...

liaochunlin · 2021 年7 月 20 日 09:17

CREATE TAG IF NOT EXISTS IndustryMinceRelationship这个我改成了CREATE EDGE IF NOT EXISTS IndustryMinceRelationship

wey · 2021 年7 月 20 日 09:18

nebula-importer是先创建 tag，tag创建完才创建edge吗？我想要的效果是先创建完tag，然后才创建edge,而不是一边创建tag，一边创建edge，这样肯定会报不存在。

这里 tag/ edge 是不依赖的，就是可以 先创建完tag，才创建edge。您这里报错只是因为，插入edge的时候（您第二个 file path里定义的是edge），edge的schema没有建立，数据插入要依赖 schema 存在。

liaochunlin · 2021 年7 月 20 日 09:19

我的意思是yaml怎么配置先插入完tag的数据才能插入EDGE数据

liaochunlin · 2021 年7 月 20 日 09:21

我的意思是yaml怎么配置先插入完tag的数据才能插入EDGE数据，这样创建edge时，srcVID和dstvID都存在tag中了

wey · 2021 年7 月 20 日 09:26

我的意思是yaml怎么配置先插入完tag的数据才能插入EDGE数据，这样创建edge时，srcVID和dstvID都存在tag中了

创建 edge 的时候，服务端是并不要求 srcVID 和 distVID 的哈。
(您的那个报错是整个 edge type 的 schema 不存在，而非 srcVID, dstVID 不存在)

wey · 2021 年7 月 20 日 09:33

你可以在导入期间执行以下 show edges; show tags; 么？
报错里指向点、边都不存在，不知道是不是还没有同步上

wey · 2021 年7 月 20 日 09:33

如果没有同步上可以调大 afterPeriod: 15s

liaochunlin · 2021 年7 月 20 日 09:34

  afterPeriod: 15s
  preStop:
    # 配置断开Nebula Graph服务器连接之前执行的一些操作。
    commands: USE inc_local; # put some random lines instead of the empty...

加了这个之后提示这个

  [root@test-oa-app cmd]# ./nebula-importer --config /home/fintech/nebula-importer/examples/v2/Industry.yaml
2021/07/20 17:39:35 --- START OF NEBULA IMPORTER ---
2021/07/20 17:39:35 [INFO] connection_pool.go:77: [nebula-clients] connection pool is initialized successfully
2021/07/20 17:39:50 [INFO] clientmgr.go:28: Create 10 Nebula Graph clients
2021/07/20 17:39:50 [INFO] reader.go:64: Start to read file(1): /home/fintech/nebula-importer/examples/v2/Industry.csv, schema: < :DST_VID(string),:IGNORE,:SRC_VID(string),IndustryMinceRelationship.level:string >
2021/07/20 17:39:50 [INFO] reader.go:26: The delimiter of /home/fintech/nebula-importer/examples/v2/Industry.csv is U+002C ','
2021/07/20 17:39:50 [INFO] reader.go:64: Start to read file(0): /home/fintech/nebula-importer/examples/v2/Industry.csv, schema: < :VID(string)/Industry.code:string,Industry.name:string,Industry.parent:string,Industry.level:string,Industry.is_leaf:string,Industry.remark:string >
2021/07/20 17:39:50 [INFO] reader.go:180: Total lines of file(/home/fintech/nebula-importer/examples/v2/Industry.csv) is: 10, error lines: 0
2021/07/20 17:39:50 [INFO] reader.go:180: Total lines of file(/home/fintech/nebula-importer/examples/v2/Industry.csv) is: 10, error lines: 0
2021/07/20 17:39:50 [INFO] statsmgr.go:61: Done(/home/fintech/nebula-importer/examples/v2/Industry.csv): Time(15.02s), Finished(19), Failed(0), Latency AVG(1065us), Batches Req AVG(1509us), Rows AVG(1.27/s)
2021/07/20 17:39:50 [INFO] statsmgr.go:61: Done(/home/fintech/nebula-importer/examples/v2/Industry.csv): Time(15.02s), Finished(20), Failed(0), Latency AVG(1033us), Batches Req AVG(1462us), Rows AVG(1.33/s)
2021/07/20 17:39:50 Finish import data, consume time: 15.52s
2021/07/20 17:39:51 --- END OF NEBULA IMPORTER ---

但在nebula graph studio中查询match (n:Industry) return count(n);
报SemanticError:Cant’t solve the start vids from the sentence :match (n:Industry) return count(n)

liaochunlin · 2021 年7 月 20 日 09:38

tag和edge都创建了，就是没有数据

liaochunlin · 2021 年7 月 20 日 09:40

点和边都创建了，但数据没有插入成功

wey · 2021 年7 月 20 日 09:41

ok, 导入成功了。

这个是因为 match 如果不是通过指定 vid 的情况下都是依赖索引的。

如果您只是想要统计数据，可以用 show stats(注意前提要提交一个 job（作业管理 - NebulaGraph Database 手册）

如果要是用 match(除非指定 where id(v) == “a_id_of_vertex”)，您需要创建索引，如果是已经导入的索引，还需要rebuild 索引。

liaochunlin · 2021 年7 月 20 日 09:44

yaml文件中可以创建索引吗？

liaochunlin · 2021 年7 月 20 日 09:45

yaml文件中可以创建索引吗？要用match

wey · 2021 年7 月 20 日 09:46

批量导入的话，最好在导入之后创建索引再 rebuild，当然也可以在导入之前把创建索引的行写进 preStart，但是那样会影响导入性能。

liaochunlin · 2021 年7 月 20 日 09:48

你的意思是在nebula graph studio中先创建好索引，写在yaml的preStar中会影响导入性能

wey · 2021 年7 月 20 日 09:54

创建 index 可以从 studio，也可以通过 nGQL 去操作。
只要有了 index，数据插入会比没有的时候慢。
如果是先有了数据，后创建的 index，是需要 rebuild 的，rebuild相当于把带着index 时候插入慢的那些流程（创建索引数据）集中做了。

除非你需要从属性去查询 vertex，edge，否则不要用索引哈（有代价），这个索引只是给属性查点边用的，并不是拓展时候过滤条件用的（后者不需要索引的）

liaochunlin · 2021 年7 月 20 日 10:02

插入数据后，在nebula graph studio中添加索引，执行match (n:Industry) return count(n)为0

liaochunlin · 2021 年7 月 20 日 10:03

插入数据后，在nebula graph studio中添加索引，执行match (n:Industry) return count(n)为0，rebuild命令怎么写？