请问一下,官网中说明了exchange 提交,需要安装spark, 咨询一下, 我在一台服务器提交spark 任务给我的CDH集群, 那我这台服务器需要安装spark 之外还需要安装HADOOP么
我现在安装了spark, 提交任务到CDH yarn 执行的时候,报错如下:
[root@ecs-5368-0007 /data/softs/spark-2.4.4-bin-hadoop2.7/bin]#./spark-submit --master “yarn-cluster” --class com.vesoft.nebula.exchange.Exchange /data/softs/nebula-spark-utils-2.5/nebula-exchange/target/nebula-exchange-2.5.0.jar
Warning: Master yarn-cluster is deprecated since 2.0. Please use master “yarn” with specified deploy mode instead.
21/08/22 18:17:32 WARN Utils: Your hostname, ecs-5368-0007 resolves to a loopback address: 127.0.0.1; using 192.168.1.118 instead (on interface eth0)
21/08/22 18:17:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
21/08/22 18:17:35 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/08/22 18:17:36 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/08/22 18:17:37 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/08/22 18:17:38 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/08/22 18:17:39 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/08/22 18:17:40 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/08/22 18:17:41 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
^C21/08/22 18:17:41 INFO ShutdownHookManager: Shutdown hook called
21/08/22 18:17:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-62a41984-3f17-4cd4-afa6-aac5c5e85a6c
[root@ecs-5368-0007 /data/softs/spark-2.4.4-bin-hadoop2.7/bin]#
看下来是提交任务给CDH yarn 没有找到yarn 服务,这台机器我安装了Spark,也安装了Hadoop,在spark-env.sh 中添加了hadoop 的配置,如下:
export JAVA_HOME=/usr/local/jdk1.8.0_151
#export SPARK_MASTER_IP=192.168.1.118
export SPARK_MASTER_IP=117.78.2.196(CDH 主节点地址)
export SPARK_MASTER_port=7077
export HADOOP_CONF_DIR=/data/softs/hadoop-3.0.0
CDH 是3台机器,任务提交是单独一台机器,我想知道我在这台机器上提交spark 任务到CDH yarn执行,我这台机器到底需要如何配置呢?谢谢!