nebula exchange yarn提交出现错误

提问参考模版:

  • nebula 版本:2.5.0
  • 部署方式:分布式 :
  • 是否为线上版本:Y
  • 问题的具体描述
    通过spark-submit local方式任务已跑完。
    通过spark-submit yarn方式提交总是出现问题,提交方式为CDH与nebula同一网络下,但在不同机器,且已在该nebula节点机器部署CDH客户端,日志描述如下.
2021-10-19 17:07:18 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:19 WARN  NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-10-19 17:07:19 INFO  RMProxy:133 - Connecting to ResourceManager at bj-azsyb-demo2.cloud.onecloud.io/10.172.7.214:8032
2021-10-19 17:07:19 INFO  Client:57 - Requesting a new application from cluster with 2 NodeManagers
2021-10-19 17:07:19 INFO  Configuration:2662 - resource-types.xml not found
2021-10-19 17:07:19 INFO  ResourceUtils:419 - Unable to find 'resource-types.xml'.
2021-10-19 17:07:19 INFO  Client:57 - Verifying our application has not requested more than the maximum memory capability of the cluster (21800 MB per container)
2021-10-19 17:07:19 INFO  Client:57 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2021-10-19 17:07:19 INFO  Client:57 - Setting up container launch context for our AM
2021-10-19 17:07:19 INFO  Client:57 - Setting up the launch environment for our AM container
2021-10-19 17:07:19 INFO  Client:57 - Preparing resources for our AM container
2021-10-19 17:07:19 WARN  Client:69 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2021-10-19 17:07:19 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:20 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:21 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:22 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:23 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:24 INFO  Client:57 - Uploading resource file:/data1/spark.local2/spark-80840954-daf8-472e-a1c0-7c984d7389de/__spark_libs__7266210689639360240.zip -> hdfs://bj-azsyb-demo2.cloud.onecloud.io:8020/user/root/.sparkStaging/application_1633660048269_0217/__spark_libs__7266210689639360240.zip
2021-10-19 17:07:24 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:25 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:26 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:27 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:28 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:29 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:30 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:31 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:32 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:33 INFO  Client:57 - Uploading resource file:/data1/spark.local2/spark-80840954-daf8-472e-a1c0-7c984d7389de/__spark_conf__3748966254314244853.zip -> hdfs://bj-azsyb-demo2.cloud.onecloud.io:8020/user/root/.sparkStaging/application_1633660048269_0217/__spark_conf__.zip
2021-10-19 17:07:33 INFO  SecurityManager:57 - Changing view acls to: root
2021-10-19 17:07:33 INFO  SecurityManager:57 - Changing modify acls to: root
2021-10-19 17:07:33 INFO  SecurityManager:57 - Changing view acls groups to: 
2021-10-19 17:07:33 INFO  SecurityManager:57 - Changing modify acls groups to: 
2021-10-19 17:07:33 INFO  SecurityManager:57 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2021-10-19 17:07:33 INFO  Client:57 - Submitting application application_1633660048269_0217 to ResourceManager
2021-10-19 17:07:33 INFO  YarnClientImpl:310 - Submitted application application_1633660048269_0217
2021-10-19 17:07:33 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:34 INFO  Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:34 INFO  Client:57 - 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.users.root
	 start time: 1634634453735
	 final status: UNDEFINED
	 tracking URL: http://bj-azsyb-demo2.cloud.onecloud.io:8088/proxy/application_1633660048269_0217/
	 user: root
2021-10-19 17:07:34 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:35 INFO  Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:35 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:36 INFO  Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:36 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:37 INFO  Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:37 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:38 INFO  Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:38 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:39 INFO  Client:57 - Application report for application_1633660048269_0217 (state: RUNNING)
2021-10-19 17:07:39 INFO  Client:57 - 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 10.172.7.205
	 ApplicationMaster RPC port: -1
	 queue: root.users.root
	 start time: 1634634453735
	 final status: UNDEFINED
	 tracking URL: http://bj-azsyb-demo2.cloud.onecloud.io:8088/proxy/application_1633660048269_0217/
	 user: root
2021-10-19 17:07:39 INFO  YarnClientSchedulerBackend:57 - Application application_1633660048269_0217 has started running.
2021-10-19 17:07:39 INFO  SchedulerExtensionServices:57 - Starting Yarn extension services with app application_1633660048269_0217 and attemptId None
2021-10-19 17:07:39 INFO  Utils:57 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31921.
2021-10-19 17:07:39 INFO  NettyBlockTransferService:57 - Server created on bj-ansyb-crs-15.cloud.onecloud.io:31921
2021-10-19 17:07:39 INFO  BlockManager:57 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2021-10-19 17:07:39 INFO  BlockManagerMaster:57 - Registering BlockManager BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO  BlockManagerMasterEndpoint:57 - Registering block manager bj-ansyb-crs-15.cloud.onecloud.io:31921 with 366.3 MB RAM, BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:39 INFO  BlockManagerMaster:57 - Registered BlockManager BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO  BlockManager:57 - Initialized BlockManager: BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO  YarnClientSchedulerBackend:57 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> bj-azsyb-demo2.cloud.onecloud.io, PROXY_URI_BASES -> http://bj-azsyb-demo2.cloud.onecloud.io:8088/proxy/application_1633660048269_0217), /proxy/application_1633660048269_0217
2021-10-19 17:07:40 INFO  JettyUtils:57 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
2021-10-19 17:07:40 INFO  JettyUtils:57 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
2021-10-19 17:07:40 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@45658133{/metrics/json,null,AVAILABLE,@Spark}
2021-10-19 17:07:40 INFO  YarnSchedulerBackend$YarnSchedulerEndpoint:57 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2021-10-19 17:07:40 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:41 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:42 INFO  SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:43 INFO  YarnSchedulerBackend$YarnDriverEndpoint:57 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.172.7.205:14554) with ID 2
2021-10-19 17:07:43 WARN  ScriptBasedMapping:254 - Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.172.7.205 
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/data1/ganliang/nebula_importer_demo"): error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:939)
	at org.apache.hadoop.util.Shell.run(Shell.java:902)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
	at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
	at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
	at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
	at org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:72)
	at org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66)
	at org.apache.spark.scheduler.cluster.YarnScheduler.getRacksForHosts(YarnScheduler.scala:35)
	at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:372)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$2.apply(CoarseGrainedSchedulerBackend.scala:263)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$2.apply(CoarseGrainedSchedulerBackend.scala:255)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:731)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:255)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receiveAndReply$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:226)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 22 more

spark-submit --master yarn --conf spark.local.dir=/data1/spark.local2/ --class com.vesoft.nebula.exchange.Exchange nebula-exchange-2.5.1.jar -c person_demo2.applications -h
改为如下:
spark-submit --master yarn –deploy-mode=cluster --conf spark.local.dir=/data1/spark.local2/ --class com.vesoft.nebula.exchange.Exchange nebula-exchange-2.5.1.jar -c person_demo2.applications -h

不要指定spark.local.dir。
ps:–master yarn 与 --master yarn -deploy-mode=cluster 是一样的, --master yarn-client 与 --master yarn -deploy-mode=client是一样的

主要是上述原因:
1.不加上–deploy-mode=cluster 就会报上述错误。
2.主要是数据量大,/tmp都写满了。

改为yarn提交方式时,其执行命令为:spark-submit --master yarn --deploy-mode=cluster --class com.vesoft.nebula.exchange.Exchange /data1/ganliang/nebula_importer_demo/nebula-exchange-2.5.1.jar -c ‘/root/person_demo2.applications’ -h

根据提示日志报文件不存在,然后我将/root/person_demo2.applications文件在hadoop集群每个节点下都存放一份该文件,包括nebula执行客户端,spark任务日志依然出现如下错误。

2021-10-19 18:15:11 ERROR ApplicationMaster:94 - User class threw exception: java.lang.IllegalArgumentException: /root/person_demo2.applications not exist
java.lang.IllegalArgumentException: /root/person_demo2.applications not exist
	at com.vesoft.nebula.exchange.config.Configs$.parse(Configs.scala:225)
	at com.vesoft.nebula.exchange.Exchange$.main(Exchange.scala:71)
	at com.vesoft.nebula.exchange.Exchange.main(Exchange.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

spark-submit --master  yarn --deploy-mode=cluster   --class com.vesoft.nebula.exchange.Exchange /data1/ganliang/nebula_importer_demo/nebula-exchange-2.5.1.jar   -c "hdfs:////bj-azsyb-demo2.cloud.onecloud.io:8020/root/person_demo2.applications"  -h

2021-10-19 18:18:38 ERROR ApplicationMaster:94 - User class threw exception: java.lang.IllegalArgumentException: hdfs:/bj-azsyb-demo2.cloud.onecloud.io:8020/root/person_demo2.applications not exist
java.lang.IllegalArgumentException: hdfs:/bj-azsyb-demo2.cloud.onecloud.io:8020/root/person_demo2.applications not exist
	at com.vesoft.nebula.exchange.config.Configs$.parse(Configs.scala:225)
	at com.vesoft.nebula.exchange.Exchange$.main(Exchange.scala:71)
	at com.vesoft.nebula.exchange.Exchange.main(Exchange.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)

回复以你的新帖子的回复为准了哈,避免两处回复,这个帖子先关闭了。