提问参考模版:
- nebula 版本:2.5.0
- 部署方式:分布式 :
- 是否为线上版本:Y
- 问题的具体描述
通过spark-submit local方式任务已跑完。
通过spark-submit yarn方式提交总是出现问题,提交方式为CDH与nebula同一网络下,但在不同机器,且已在该nebula节点机器部署CDH客户端,日志描述如下.
2021-10-19 17:07:18 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:19 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-10-19 17:07:19 INFO RMProxy:133 - Connecting to ResourceManager at bj-azsyb-demo2.cloud.onecloud.io/10.172.7.214:8032
2021-10-19 17:07:19 INFO Client:57 - Requesting a new application from cluster with 2 NodeManagers
2021-10-19 17:07:19 INFO Configuration:2662 - resource-types.xml not found
2021-10-19 17:07:19 INFO ResourceUtils:419 - Unable to find 'resource-types.xml'.
2021-10-19 17:07:19 INFO Client:57 - Verifying our application has not requested more than the maximum memory capability of the cluster (21800 MB per container)
2021-10-19 17:07:19 INFO Client:57 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2021-10-19 17:07:19 INFO Client:57 - Setting up container launch context for our AM
2021-10-19 17:07:19 INFO Client:57 - Setting up the launch environment for our AM container
2021-10-19 17:07:19 INFO Client:57 - Preparing resources for our AM container
2021-10-19 17:07:19 WARN Client:69 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2021-10-19 17:07:19 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:20 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:21 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:22 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:23 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:24 INFO Client:57 - Uploading resource file:/data1/spark.local2/spark-80840954-daf8-472e-a1c0-7c984d7389de/__spark_libs__7266210689639360240.zip -> hdfs://bj-azsyb-demo2.cloud.onecloud.io:8020/user/root/.sparkStaging/application_1633660048269_0217/__spark_libs__7266210689639360240.zip
2021-10-19 17:07:24 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:25 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:26 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:27 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:28 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:29 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:30 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:31 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:32 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:33 INFO Client:57 - Uploading resource file:/data1/spark.local2/spark-80840954-daf8-472e-a1c0-7c984d7389de/__spark_conf__3748966254314244853.zip -> hdfs://bj-azsyb-demo2.cloud.onecloud.io:8020/user/root/.sparkStaging/application_1633660048269_0217/__spark_conf__.zip
2021-10-19 17:07:33 INFO SecurityManager:57 - Changing view acls to: root
2021-10-19 17:07:33 INFO SecurityManager:57 - Changing modify acls to: root
2021-10-19 17:07:33 INFO SecurityManager:57 - Changing view acls groups to:
2021-10-19 17:07:33 INFO SecurityManager:57 - Changing modify acls groups to:
2021-10-19 17:07:33 INFO SecurityManager:57 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2021-10-19 17:07:33 INFO Client:57 - Submitting application application_1633660048269_0217 to ResourceManager
2021-10-19 17:07:33 INFO YarnClientImpl:310 - Submitted application application_1633660048269_0217
2021-10-19 17:07:33 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:34 INFO Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:34 INFO Client:57 -
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.root
start time: 1634634453735
final status: UNDEFINED
tracking URL: http://bj-azsyb-demo2.cloud.onecloud.io:8088/proxy/application_1633660048269_0217/
user: root
2021-10-19 17:07:34 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:35 INFO Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:35 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:36 INFO Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:36 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:37 INFO Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:37 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:38 INFO Client:57 - Application report for application_1633660048269_0217 (state: ACCEPTED)
2021-10-19 17:07:38 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:39 INFO Client:57 - Application report for application_1633660048269_0217 (state: RUNNING)
2021-10-19 17:07:39 INFO Client:57 -
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.172.7.205
ApplicationMaster RPC port: -1
queue: root.users.root
start time: 1634634453735
final status: UNDEFINED
tracking URL: http://bj-azsyb-demo2.cloud.onecloud.io:8088/proxy/application_1633660048269_0217/
user: root
2021-10-19 17:07:39 INFO YarnClientSchedulerBackend:57 - Application application_1633660048269_0217 has started running.
2021-10-19 17:07:39 INFO SchedulerExtensionServices:57 - Starting Yarn extension services with app application_1633660048269_0217 and attemptId None
2021-10-19 17:07:39 INFO Utils:57 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31921.
2021-10-19 17:07:39 INFO NettyBlockTransferService:57 - Server created on bj-ansyb-crs-15.cloud.onecloud.io:31921
2021-10-19 17:07:39 INFO BlockManager:57 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2021-10-19 17:07:39 INFO BlockManagerMaster:57 - Registering BlockManager BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO BlockManagerMasterEndpoint:57 - Registering block manager bj-ansyb-crs-15.cloud.onecloud.io:31921 with 366.3 MB RAM, BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:39 INFO BlockManagerMaster:57 - Registered BlockManager BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO BlockManager:57 - Initialized BlockManager: BlockManagerId(driver, bj-ansyb-crs-15.cloud.onecloud.io, 31921, None)
2021-10-19 17:07:39 INFO YarnClientSchedulerBackend:57 - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> bj-azsyb-demo2.cloud.onecloud.io, PROXY_URI_BASES -> http://bj-azsyb-demo2.cloud.onecloud.io:8088/proxy/application_1633660048269_0217), /proxy/application_1633660048269_0217
2021-10-19 17:07:40 INFO JettyUtils:57 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
2021-10-19 17:07:40 INFO JettyUtils:57 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
2021-10-19 17:07:40 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@45658133{/metrics/json,null,AVAILABLE,@Spark}
2021-10-19 17:07:40 INFO YarnSchedulerBackend$YarnSchedulerEndpoint:57 - ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
2021-10-19 17:07:40 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:41 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:42 INFO SparkRackResolver:57 - Got an error when resolving hostNames. Falling back to /default-rack for all
2021-10-19 17:07:43 INFO YarnSchedulerBackend$YarnDriverEndpoint:57 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.172.7.205:14554) with ID 2
2021-10-19 17:07:43 WARN ScriptBasedMapping:254 - Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.172.7.205
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/data1/ganliang/nebula_importer_demo"): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:939)
at org.apache.hadoop.util.Shell.run(Shell.java:902)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:72)
at org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRacksForHosts(YarnScheduler.scala:35)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:372)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$2.apply(CoarseGrainedSchedulerBackend.scala:263)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$2.apply(CoarseGrainedSchedulerBackend.scala:255)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$$withLock(CoarseGrainedSchedulerBackend.scala:731)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:255)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receiveAndReply$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:226)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 22 more