neo4j 导入Nebula Graph 后,数据不见了

nebula :2.0.1
Exchange:2.0

执行命令:

./spark-2.4.8-bin-hadoop2.7/bin/spark-submit --class com.vesoft.nebula.exchange.Exchange --master local nebula-exchange-2.0-SNAPSHOT.jar -c  neo4j_to_nebula.conf 

后结果如下:

21/06/24 11:25:44 INFO Executor: Finished task 2.0 in stage 75.0 (TID 410). 1111 bytes result sent to driver
21/06/24 11:25:44 INFO TaskSetManager: Starting task 3.0 in stage 75.0 (TID 411, localhost, executor driver, partition 3, PROCESS_LOCAL, 7816 bytes)
21/06/24 11:25:44 INFO Executor: Running task 3.0 in stage 75.0 (TID 411)
21/06/24 11:25:44 INFO TaskSetManager: Finished task 2.0 in stage 75.0 (TID 410) in 436 ms on localhost (executor driver) (3/10)
21/06/24 11:25:44 INFO NebulaPool: Get connection to 192.168.10.182:9669
21/06/24 11:25:44 INFO GraphProvider: switch space medical
21/06/24 11:25:44 INFO NebulaGraphClientWriter: Connection to List(192.168.10.182:9559)
21/06/24 11:25:44 INFO Driver: Direct driver instance 1622321871 created for server address 192.168.10.132:7687
21/06/24 11:25:44 INFO Driver: Closing driver instance 1622321871
21/06/24 11:25:44 INFO ConnectionPool: Closing connection pool towards 192.168.10.132:7687
21/06/24 11:25:44 INFO EdgeProcessor: spark partition for edge cost time:3-412
21/06/24 11:25:44 INFO Executor: Finished task 3.0 in stage 75.0 (TID 411). 1068 bytes result sent to driver
21/06/24 11:25:44 INFO TaskSetManager: Starting task 4.0 in stage 75.0 (TID 412, localhost, executor driver, partition 4, PROCESS_LOCAL, 7816 bytes)
21/06/24 11:25:44 INFO Executor: Running task 4.0 in stage 75.0 (TID 412)
21/06/24 11:25:44 INFO TaskSetManager: Finished task 3.0 in stage 75.0 (TID 411) in 437 ms on localhost (executor driver) (4/10)
21/06/24 11:25:44 INFO NebulaPool: Get connection to 192.168.10.182:9669
21/06/24 11:25:44 INFO GraphProvider: switch space medical
21/06/24 11:25:44 INFO NebulaGraphClientWriter: Connection to List(192.168.10.182:9559)
21/06/24 11:25:44 INFO Driver: Direct driver instance 1468464864 created for server address 192.168.10.132:7687
21/06/24 11:25:45 INFO Driver: Closing driver instance 1468464864
21/06/24 11:25:45 INFO ConnectionPool: Closing connection pool towards 192.168.10.132:7687
21/06/24 11:25:45 INFO EdgeProcessor: spark partition for edge cost time:4-394
21/06/24 11:25:45 INFO Executor: Finished task 4.0 in stage 75.0 (TID 412). 1068 bytes result sent to driver
21/06/24 11:25:45 INFO TaskSetManager: Starting task 5.0 in stage 75.0 (TID 413, localhost, executor driver, partition 5, PROCESS_LOCAL, 7816 bytes)
21/06/24 11:25:45 INFO Executor: Running task 5.0 in stage 75.0 (TID 413)
21/06/24 11:25:45 INFO TaskSetManager: Finished task 4.0 in stage 75.0 (TID 412) in 412 ms on localhost (executor driver) (5/10)
21/06/24 11:25:45 INFO NebulaPool: Get connection to 192.168.10.182:9669
21/06/24 11:25:45 INFO GraphProvider: switch space medical
21/06/24 11:25:45 INFO NebulaGraphClientWriter: Connection to List(192.168.10.182:9559)
21/06/24 11:25:45 INFO Driver: Direct driver instance 1582562034 created for server address 192.168.10.132:7687
21/06/24 11:25:45 INFO Driver: Closing driver instance 1582562034
21/06/24 11:25:45 INFO ConnectionPool: Closing connection pool towards 192.168.10.132:7687
21/06/24 11:25:45 INFO EdgeProcessor: spark partition for edge cost time:5-385
21/06/24 11:25:45 INFO Executor: Finished task 5.0 in stage 75.0 (TID 413). 1068 bytes result sent to driver
21/06/24 11:25:45 INFO TaskSetManager: Starting task 6.0 in stage 75.0 (TID 414, localhost, executor driver, partition 6, PROCESS_LOCAL, 7816 bytes)
21/06/24 11:25:45 INFO Executor: Running task 6.0 in stage 75.0 (TID 414)
21/06/24 11:25:45 INFO TaskSetManager: Finished task 5.0 in stage 75.0 (TID 413) in 398 ms on localhost (executor driver) (6/10)
21/06/24 11:25:45 INFO NebulaPool: Get connection to 192.168.10.182:9669
21/06/24 11:25:45 INFO GraphProvider: switch space medical
21/06/24 11:25:45 INFO NebulaGraphClientWriter: Connection to List(192.168.10.182:9559)
21/06/24 11:25:45 INFO Driver: Direct driver instance 1345203406 created for server address 192.168.10.132:7687
21/06/24 11:25:45 INFO Driver: Closing driver instance 1345203406
21/06/24 11:25:45 INFO ConnectionPool: Closing connection pool towards 192.168.10.132:7687
21/06/24 11:25:46 INFO EdgeProcessor: spark partition for edge cost time:6-400
21/06/24 11:25:46 INFO Executor: Finished task 6.0 in stage 75.0 (TID 414). 1111 bytes result sent to driver
21/06/24 11:25:46 INFO TaskSetManager: Starting task 7.0 in stage 75.0 (TID 415, localhost, executor driver, partition 7, PROCESS_LOCAL, 7816 bytes)
21/06/24 11:25:46 INFO Executor: Running task 7.0 in stage 75.0 (TID 415)
21/06/24 11:25:46 INFO TaskSetManager: Finished task 6.0 in stage 75.0 (TID 414) in 410 ms on localhost (executor driver) (7/10)
21/06/24 11:25:46 INFO NebulaPool: Get connection to 192.168.10.182:9669
21/06/24 11:25:46 INFO GraphProvider: switch space medical
21/06/24 11:25:46 INFO NebulaGraphClientWriter: Connection to List(192.168.10.182:9559)
21/06/24 11:25:46 INFO Driver: Direct driver instance 962494340 created for server address 192.168.10.132:7687
21/06/24 11:25:46 INFO Driver: Closing driver instance 962494340
21/06/24 11:25:46 INFO ConnectionPool: Closing connection pool towards 192.168.10.132:7687
21/06/24 11:25:46 INFO EdgeProcessor: spark partition for edge cost time:7-395
21/06/24 11:25:46 INFO Executor: Finished task 7.0 in stage 75.0 (TID 415). 1068 bytes result sent to driver
21/06/24 11:25:46 INFO TaskSetManager: Starting task 8.0 in stage 75.0 (TID 416, localhost, executor driver, partition 8, PROCESS_LOCAL, 7816 bytes)
21/06/24 11:25:46 INFO Executor: Running task 8.0 in stage 75.0 (TID 416)
21/06/24 11:25:46 INFO TaskSetManager: Finished task 7.0 in stage 75.0 (TID 415) in 403 ms on localhost (executor driver) (8/10)
21/06/24 11:25:46 INFO NebulaPool: Get connection to 192.168.10.182:9669
21/06/24 11:25:46 INFO GraphProvider: switch space medical
21/06/24 11:25:46 INFO NebulaGraphClientWriter: Connection to List(192.168.10.182:9559)
21/06/24 11:25:46 INFO Driver: Direct driver instance 1658746004 created for server address 192.168.10.132:7687
21/06/24 11:25:46 INFO Driver: Closing driver instance 1658746004
21/06/24 11:25:46 INFO ConnectionPool: Closing connection pool towards 192.168.10.132:7687
21/06/24 11:25:46 INFO EdgeProcessor: spark partition for edge cost time:8-418
21/06/24 11:25:46 INFO Executor: Finished task 8.0 in stage 75.0 (TID 416). 1068 bytes result sent to driver
21/06/24 11:25:46 INFO TaskSetManager: Starting task 9.0 in stage 75.0 (TID 417, localhost, executor driver, partition 9, PROCESS_LOCAL, 7816 bytes)
21/06/24 11:25:46 INFO Executor: Running task 9.0 in stage 75.0 (TID 417)
21/06/24 11:25:46 INFO TaskSetManager: Finished task 8.0 in stage 75.0 (TID 416) in 431 ms on localhost (executor driver) (9/10)
21/06/24 11:25:46 INFO NebulaPool: Get connection to 192.168.10.182:9669
21/06/24 11:25:46 INFO GraphProvider: switch space medical
21/06/24 11:25:46 INFO NebulaGraphClientWriter: Connection to List(192.168.10.182:9559)
21/06/24 11:25:46 INFO Driver: Direct driver instance 593867949 created for server address 192.168.10.132:7687
21/06/24 11:25:47 INFO Driver: Closing driver instance 593867949
21/06/24 11:25:47 INFO ConnectionPool: Closing connection pool towards 192.168.10.132:7687
21/06/24 11:25:47 INFO EdgeProcessor: spark partition for edge cost time:9-396
21/06/24 11:25:47 INFO Executor: Finished task 9.0 in stage 75.0 (TID 417). 1111 bytes result sent to driver
21/06/24 11:25:47 INFO TaskSetManager: Finished task 9.0 in stage 75.0 (TID 417) in 411 ms on localhost (executor driver) (10/10)
21/06/24 11:25:47 INFO TaskSchedulerImpl: Removed TaskSet 75.0, whose tasks have all completed, from pool 
21/06/24 11:25:47 INFO DAGScheduler: ResultStage 75 (foreachPartition at EdgeProcessor.scala:340) finished in 4.015 s
21/06/24 11:25:47 INFO DAGScheduler: Job 56 finished: foreachPartition at EdgeProcessor.scala:340, took 4.016270 s
21/06/24 11:25:47 INFO Exchange$: Client-Import: batchSuccess.need_check: 40
21/06/24 11:25:47 INFO Exchange$: Client-Import: batchFailure.need_check: 0
21/06/24 11:25:47 INFO SparkUI: Stopped Spark web UI at http://192.168.10.17:4040
21/06/24 11:25:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/06/24 11:25:47 INFO MemoryStore: MemoryStore cleared
21/06/24 11:25:47 INFO BlockManager: BlockManager stopped
21/06/24 11:25:47 INFO BlockManagerMaster: BlockManagerMaster stopped
21/06/24 11:25:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/06/24 11:25:47 INFO SparkContext: Successfully stopped SparkContext
21/06/24 11:25:47 INFO ShutdownHookManager: Shutdown hook called
21/06/24 11:25:47 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea1f5114-1fe1-4d21-bb31-a723cd4c8ffb
21/06/24 11:25:47 INFO ShutdownHookManager: Deleting directory /tmp/spark-1f9a3bb0-ff04-49eb-86eb-57c3997212b4

在nebula studio中执行命令match (a:Disease) return count(*) 结果为0
我导入的数据那里去了?

你在 console 里执行过这个命令吗?

nebula :2.0.1
Exchange:2.0
studio 是github 下载后,cd 进入 v2 启动的

Studio 也有版本号的,你看界面右上角,有个版本号的

这是在nebula console中的结果:

Welcome to Nebula Graph!

(admin@nebula) [(none)]> show spaces;
+-----------+
| Name      |
+-----------+
| "medical" |
+-----------+
Got 1 rows (time spent 1420/2123 us)

Thu, 24 Jun 2021 11:46:24 CST

(admin@nebula) [(none)]> use medical
Execution succeeded (time spent 1494/2165 us)

Thu, 24 Jun 2021 11:46:34 CST

(admin@nebula) [medical]> match (a:Disease) return count(*)
+----------+
| COUNT(*) |
+----------+
| 0        |
+----------+
Got 1 rows (time spent 2951/3701 us)

Thu, 24 Jun 2021 11:46:50 CST

另外关于vertex id:若是neo4j 导入nebula时选用neo4j 的id(v)属性作为 nebula 的vid ,是将nebula graph的vertex id设为int 还是 fixed_string? 两个类型哪个更合适些?

你导入的是 need_check, 查询的是Disease

1 个赞

need_check 是边,它的起始节点就是Disease

你用 match 语句之前创建索引了吗?文档里写了如果要使用 match 的话是需要建索引的。

导入边不会自动增加点的数据的。

创建了Disease 的TAG索引。
由于日志量过于庞大,所以之前仅仅贴出来最后的部分日志,下面是节点Disease的相关日志:
21/06/24 11:23:52 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/06/24 11:23:52 INFO DAGScheduler: ResultStage 3 (foreachPartition at VerticesProcessor.scala:275) finished in 9.183 s
21/06/24 11:23:52 INFO DAGScheduler: Job 2 finished: foreachPartition at VerticesProcessor.scala:275, took 9.185154 s
21/06/24 11:23:52 INFO Exchange$: Client-Import: batchSuccess.Disease: 10
21/06/24 11:23:52 INFO Exchange$: Client-Import: batchFailure.Disease: 0
21/06/24 11:23:52 INFO Exchange$: Processing Tag Symptom
21/06/24 11:23:52 INFO Exchange$: field keys: name
21/06/24 11:23:52 INFO Exchange$: nebula keys: name
21/06/24 11:23:52 INFO Exchange$: Loading from neo4j config: Neo4J source address: bolt://192.168.10.132:7687, user: neo4j, password: 111111, encryption: false, checkPointPath: Some(/tmp/medical), exec: match (n:Symptom) with id(n) as _id, n.name as name return _id,name order by _id, parallel: 10, database: Some(neo4j)

发现至少有10条数据是导入成功了的,但是查询结果显示没有一条数据

你导入的是不是只有边…

你submit job stats,然后show jobs看看job是否完成
如果finished话,show stats看看tag下面是否有数据就好了。就知道导入数据了没

Client-Import: batchSuccess.Disease: 10 从这条数据日志来看不是只有边;还有21/06/24 11:24:03 INFO Exchange$: Client-Import: batchSuccess.Department: 10
21/06/24 11:24:03 INFO Exchange$: Client-Import: batchFailure.Department: 0
21/06/24 11:24:03 INFO Exchange$: Processing Tag Check
21/06/24 11:24:03 INFO Exchange$: field keys: name
21/06/24 11:24:03 INFO Exchange$: nebula keys: name
Department 这个点也到入了的

你按照上面 panda 大佬的话试试呢?

submit job stats: new job id 44
job id:44 status: finish
按照你的提示发现有数据,但执行show tag index status 发现结果为空,我创建的索引没有任何状态结果。。

执行
(admin@nebula) [medical]> match (n:Disease) return n limit 1
Empty set (time spent 1950/2487 us)

还是没结果。。
但是show stats 后显示Disease 的count为 8808

你rebuild一下index,然后再执行match语句