-
nebula 版本:2.5.0
-
部署方式(分布式 CDH6.3.2 集群):
-
是否为线上版本: N
-
硬件信息
- 磁盘:200G
- CPU、内存信息: 4核16G内存
-
问题的具体描述
-
使用的spark-submit 提交命令:
# !/bin/bash
set -x
spark-submit \
--master yarn \
--files /data/nebula-datas/player-csv-sst.conf \
--deploy-mode cluster \
--class com.vesoft.nebula.exchange.Exchange /data/softs/nebula-sst/nebula-exchange-2.5.0.jar \
-c /data/nebula-datas/player-csv-sst.conf
从报错日志信息看,任务提交的时候,spark task 执行的时候,对象未序列化, 使用的是官网手册中player 的实例数据,完全按照操作步骤来的, 不知道这是spark-submit 命令上哪里有问题么,我们环境是yarn-cluster模式,另外还有一个问题,就是一直报配置文件not exist , 我们使用yarn-cluster模式,在启动的时候也指定–files 配置文件名,为什么任务执行还是找不到配置呢?我在CDH中每个节点的对应目录下创建该目录,把文件放过去,再指定–files /data/nebula-datas/player-csv-sst.conf 就可以running,但是running 报错如上
- 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
21/09/13 14:04:33 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.cloudera.hive/hive-site.xml
21/09/13 14:04:33 INFO security.YARNHadoopDelegationTokenManager: Attempting to load user's ticket cache.
21/09/13 14:04:33 INFO yarn.Client: Submitting application application_1628391375886_0163 to ResourceManager
21/09/13 14:04:34 INFO impl.YarnClientImpl: Submitted application application_1628391375886_0163
21/09/13 14:04:35 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:35 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.root
start time: 1631513073997
final status: UNDEFINED
tracking URL: http://CDH01:8088/proxy/application_1628391375886_0163/
user: root
21/09/13 14:04:36 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:37 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:38 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:39 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:39 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: CDH02
ApplicationMaster RPC port: 35514
queue: root.users.root
start time: 1631513073997
final status: UNDEFINED
tracking URL: http://CDH01:8088/proxy/application_1628391375886_0163/
user: root
21/09/13 14:04:40 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:41 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:42 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:43 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:44 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:45 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:46 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:47 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:48 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:49 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:49 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.root
start time: 1631513073997
final status: UNDEFINED
tracking URL: http://CDH01:8088/proxy/application_1628391375886_0163/
user: root
21/09/13 14:04:50 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:51 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:52 INFO yarn.Client: Application report for application_1628391375886_0163 (state: ACCEPTED)
21/09/13 14:04:53 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:53 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: CDH02
ApplicationMaster RPC port: 37239
queue: root.users.root
start time: 1631513073997
final status: UNDEFINED
tracking URL: http://CDH01:8088/proxy/application_1628391375886_0163/
user: root
21/09/13 14:04:54 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:55 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:56 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:57 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:58 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:04:59 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:05:00 INFO yarn.Client: Application report for application_1628391375886_0163 (state: RUNNING)
21/09/13 14:05:01 INFO yarn.Client: Application report for application_1628391375886_0163 (state: FINISHED)
21/09/13 14:05:01 INFO yarn.Client:
client token: N/A
diagnostics: User class threw exception: org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2346)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:850)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:849)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:849)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:618)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.DeserializeToObjectExec.doExecute(objects.scala:89)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3037)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3035)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2735)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735)
at org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3349)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3345)
at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2734)
at com.vesoft.nebula.exchange.processor.VerticesProcessor.process(VerticesProcessor.scala:179)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:153)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:128)
at scala.collection.immutable.List.foreach(List.scala:392)
at com.vesoft.nebula.exchange.Exchange$.main(Exchange.scala:128)
at com.vesoft.nebula.exchange.Exchange.main(Exchange.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: java.io.NotSerializableException: com.google.common.net.HostAndPort
Serialization stack:
- object not serializable (class: com.google.common.net.HostAndPort, value: 192.168.1.118:9559)
- writeObject data (class: scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.List$SerializationProxy, scala.collection.immutable.List$SerializationProxy@7e86dfd0)
- writeReplace data (class: scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.$colon$colon, List(192.168.1.118:9559))
- field (class: com.vesoft.nebula.exchange.processor.VerticesProcessor$$anonfun$process$1, name: address$1, type: class scala.collection.immutable.List)
- object (class com.vesoft.nebula.exchange.processor.VerticesProcessor$$anonfun$process$1, <function1>)
- field (class: org.apache.spark.sql.execution.MapPartitionsExec, name: func, type: interface scala.Function1)
- object (class org.apache.spark.sql.execution.MapPartitionsExec, MapPartitions <function1>, obj#32: scala.Tuple2
+- DeserializeToObject createexternalrow(_c0#10.toString, _c1#11.toString, _c2#12.toString, StructField(_c0,StringType,true), StructField(_c1,StringType,true), StructField(_c2,StringType,true)), obj#31: org.apache.spark.sql.Row
+- Exchange RoundRobinPartitioning(32)
+- *(1) FileScan csv [_c0#10,_c1#11,_c2#12] Batched: false, Format: CSV, Location: InMemoryFileIndex[hdfs://192.168.1.115:8020/data/nebula/vertex_player.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string>
)
- field (class: org.apache.spark.sql.execution.InputAdapter, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
- object (class org.apache.spark.sql.execution.InputAdapter, MapPartitions <function1>, obj#32: scala.Tuple2
+- DeserializeToObject createexternalrow(_c0#10.toString, _c1#11.toString, _c2#12.toString, StructField(_c0,StringType,true), StructField(_c1,StringType,true), StructField(_c2,StringType,true)), obj#31: org.apache.spark.sql.Row
+- Exchange RoundRobinPartitioning(32)
+- *(1) FileScan csv [_c0#10,_c1#11,_c2#12] Batched: false, Format: CSV, Location: InMemoryFileIndex[hdfs://192.168.1.115:8020/data/nebula/vertex_player.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string>
)
- field (class: org.apache.spark.sql.execution.SerializeFromObjectExec, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
- object (class org.apache.spark.sql.execution.SerializeFromObjectExec, SerializeFromObject [input[0, scala.Tuple2, true]._1 AS value#27 AS _1#33, input[0, scala.Tuple2, true]._2 AS value#28 AS _2#34]
+- MapPartitions <function1>, obj#32: scala.Tuple2
+- DeserializeToObject createexternalrow(_c0#10.toString, _c1#11.toString, _c2#12.toString, StructField(_c0,StringType,true), StructField(_c1,StringType,true), StructField(_c2,StringType,true)), obj#31: org.apache.spark.sql.Row
+- Exchange RoundRobinPartitioning(32)
+- *(1) FileScan csv [_c0#10,_c1#11,_c2#12] Batched: false, Format: CSV, Location: InMemoryFileIndex[hdfs://192.168.1.115:8020/data/nebula/vertex_player.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string>
)
- field (class: org.apache.spark.sql.execution.ProjectExec, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
- object (class org.apache.spark.sql.execution.ProjectExec, Project [_1#33 AS key#35, _2#34 AS value#36]
+- SerializeFromObject [input[0, scala.Tuple2, true]._1 AS value#27 AS _1#33, input[0, scala.Tuple2, true]._2 AS value#28 AS _2#34]
+- MapPartitions <function1>, obj#32: scala.Tuple2
+- DeserializeToObject createexternalrow(_c0#10.toString, _c1#11.toString, _c2#12.toString, StructField(_c0,StringType,true), StructField(_c1,StringType,true), StructField(_c2,StringType,true)), obj#31: org.apache.spark.sql.Row
+- Exchange RoundRobinPartitioning(32)
+- *(1) FileScan csv [_c0#10,_c1#11,_c2#12] Batched: false, Format: CSV, Location: InMemoryFileIndex[hdfs://192.168.1.115:8020/data/nebula/vertex_player.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string>
)
- field (class: org.apache.spark.sql.execution.SortExec, name: child, type: class org.apache.spark.sql.execution.SparkPlan)
- object (class org.apache.spark.sql.execution.SortExec, Sort [key#35 ASC NULLS FIRST], false, 0
+- Project [_1#33 AS key#35, _2#34 AS value#36]
+- SerializeFromObject [input[0, scala.Tuple2, true]._1 AS value#27 AS _1#33, input[0, scala.Tuple2, true]._2 AS value#28 AS _2#34]
+- MapPartitions <function1>, obj#32: scala.Tuple2
+- DeserializeToObject createexternalrow(_c0#10.toString, _c1#11.toString, _c2#12.toString, StructField(_c0,StringType,true), StructField(_c1,StringType,true), StructField(_c2,StringType,true)), obj#31: org.apache.spark.sql.Row
+- Exchange RoundRobinPartitioning(32)
+- *(1) FileScan csv [_c0#10,_c1#11,_c2#12] Batched: false, Format: CSV, Location: InMemoryFileIndex[hdfs://192.168.1.115:8020/data/nebula/vertex_player.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string>
)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 4)
- field (class: org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11, name: references$1, type: class [Ljava.lang.Object;)
- object (class org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11, <function2>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400)
... 47 more
ApplicationMaster host: CDH02
ApplicationMaster RPC port: 37239
queue: root.users.root
start time: 1631513073997
final status: FAILED
tracking URL: http://CDH01:8088/proxy/application_1628391375886_0163/
user: root
21/09/13 14:05:01 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2346)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:850)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:849)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:849)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:618)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.DeserializeToObjectExec.doExecute(objects.scala:89)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3037)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3035)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2735)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2735)
at org.apache.spark.sql.Dataset$$anonfun$withNewRDDExecutionId$1.apply(Dataset.scala:3349)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3345)
at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2734)
at com.vesoft.nebula.exchange.processor.VerticesProcessor.process(VerticesProcessor.scala:179)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:153)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:128)
at scala.collection.immutable.List.foreach(List.scala:392)
at com.vesoft.nebula.exchange.Exchange$.main(Exchange.scala:128)
at com.vesoft.nebula.exchange.Exchange.main(Exchange.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: java.io.NotSerializableException: com.google.common.net.HostAndPort