exchange 2.0 使用官方提供模板及数据导入失败

  • nebula 版本:version: 1.2.0
  • 部署方式:单机
  • 硬件信息
    • 磁盘 ssd 256
    • CPU、内存信息:32G
  • 出问题的 Space 的创建方式:
-- 创建图空间
CREATE SPACE csv(partition_num=10, replica_factor=1);
-- 选择图空间 csv
USE csv;
-- 创建标签 user
CREATE TAG user(userId int);
-- 创建标签 course
CREATE TAG course(courseId int, courseName string);
-- 创建边类型 action
CREATE EDGE action (actionId int, duration double, label bool, feature0 double, feature1 double, feature2 double, feature3 double);
20/12/16 16:01:30 INFO GraphProvider: switch space csv
20/12/16 16:01:30 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.RuntimeException: Switch Failed
	at com.vesoft.nebula.tools.importer.writer.NebulaGraphClientWriter.prepare(ServerBaseWriter.scala:135)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor.com$vesoft$nebula$tools$importer$processor$VerticesProcessor$$processEachPartition(VerticesProcessor.scala:67)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
20/12/16 16:01:30 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 4, localhost, executor driver, partition 1, ANY, 7767 bytes)
20/12/16 16:01:30 INFO Executor: Running task 1.0 in stage 3.0 (TID 4)
20/12/16 16:01:30 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.lang.RuntimeException: Switch Failed
	at com.vesoft.nebula.tools.importer.writer.NebulaGraphClientWriter.prepare(ServerBaseWriter.scala:135)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor.com$vesoft$nebula$tools$importer$processor$VerticesProcessor$$processEachPartition(VerticesProcessor.scala:67)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

20/12/16 16:01:30 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times; aborting job
20/12/16 16:01:30 INFO TaskSchedulerImpl: Cancelling stage 3
20/12/16 16:01:30 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
20/12/16 16:01:30 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
20/12/16 16:01:30 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage cancelled
20/12/16 16:01:30 INFO Executor: Executor is trying to kill task 1.0 in stage 3.0 (TID 4), reason: Stage cancelled
20/12/16 16:01:30 INFO TaskSchedulerImpl: Stage 3 was cancelled
20/12/16 16:01:30 INFO DAGScheduler: ResultStage 3 (foreachPartition at VerticesProcessor.scala:137) failed in 0.177 s due to Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.lang.RuntimeException: Switch Failed
	at com.vesoft.nebula.tools.importer.writer.NebulaGraphClientWriter.prepare(ServerBaseWriter.scala:135)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor.com$vesoft$nebula$tools$importer$processor$VerticesProcessor$$processEachPartition(VerticesProcessor.scala:67)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
20/12/16 16:01:30 INFO NebulaPool: Get connection to 192.168.10.188:3699
20/12/16 16:01:30 INFO GraphProvider: switch space csv
20/12/16 16:01:30 INFO DAGScheduler: Job 2 failed: foreachPartition at VerticesProcessor.scala:137, took 0.344655 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.lang.RuntimeException: Switch Failed
	at com.vesoft.nebula.tools.importer.writer.NebulaGraphClientWriter.prepare(ServerBaseWriter.scala:135)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor.com$vesoft$nebula$tools$importer$processor$VerticesProcessor$$processEachPartition(VerticesProcessor.scala:67)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1925)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1913)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1912)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1912)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:948)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:948)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:948)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2146)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2095)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2084)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:980)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:978)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
	at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:978)
	at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2741)
	at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2741)
	at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2741)
	at org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3355)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
	at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3351)
	at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2740)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor.process(VerticesProcessor.scala:137)
	at com.vesoft.nebula.tools.importer.Exchange$$anonfun$main$2.apply(Exchange.scala:141)
	at com.vesoft.nebula.tools.importer.Exchange$$anonfun$main$2.apply(Exchange.scala:118)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at com.vesoft.nebula.tools.importer.Exchange$.main(Exchange.scala:118)
	at com.vesoft.nebula.tools.importer.Exchange.main(Exchange.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Switch Failed
	at com.vesoft.nebula.tools.importer.writer.NebulaGraphClientWriter.prepare(ServerBaseWriter.scala:135)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor.com$vesoft$nebula$tools$importer$processor$VerticesProcessor$$processEachPartition(VerticesProcessor.scala:67)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at com.vesoft.nebula.tools.importer.processor.VerticesProcessor$$anonfun$process$2.apply(VerticesProcessor.scala:137)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
20/12/16 16:01:30 INFO Executor: Executor interrupted and killed task 1.0 in stage 3.0 (TID 4), reason: Stage cancelled
20/12/16 16:01:30 WARN TaskSetManager: Lost task 1.0 in stage 3.0 (TID 4, localhost, executor driver): TaskKilled (Stage cancelled)
20/12/16 16:01:30 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 

你用的是 Nebula Graph 1.2.0?Exchange 2.0 适用的是 Nebula Graph 2.x

文档没有标注清晰:https://github.com/vesoft-inc/nebula-spark-utils/tree/main/nebula-exchange
换成exchange1.1 client方式无问题, sst方式遇到nebula_codec 问题 exchange 读csv写sst nebula_codec错误

1 个赞

感谢 :handshake:我们会更新文档。

关于 nebula_codec 错误,呼唤 @nicole @darionyaphet

sst方式要在linux环境下执行

运行系统 ubuntu 16.04 exchange 读csv写sst nebula_codec错误