spark-submit提交jar包在hdfs中运行报错Wrong FS: file://spark-warehouse, expected: file:///

提问参考模版:

  • nebula 版本:(为节省回复者核对版本信息的时间,首次发帖的版本信息记得以截图形式展示)
  • 部署方式:分布式

报错提示:

21/12/30 20:20:51 WARN SparkConf: Setting 'spark.executor.extraClassPath' to '/export/servers/spark' as a work-around.
21/12/30 20:20:51 WARN SparkConf: Setting 'spark.driver.extraClassPath' to '/export/servers/spark' as a work-around.
21/12/30 20:20:51 INFO SecurityManager: Changing view acls to: root
21/12/30 20:20:51 INFO SecurityManager: Changing modify acls to: root
21/12/30 20:20:51 INFO SecurityManager: Changing view acls groups to: 
21/12/30 20:20:51 INFO SecurityManager: Changing modify acls groups to: 
21/12/30 20:20:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/12/30 20:20:51 INFO Utils: Successfully started service 'sparkDriver' on port 57442.
21/12/30 20:20:51 INFO SparkEnv: Registering MapOutputTracker
21/12/30 20:20:51 INFO SparkEnv: Registering BlockManagerMaster
21/12/30 20:20:51 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-33ddfa52-9c8d-4bbe-b7a5-83fbea770fbe
21/12/30 20:20:51 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
21/12/30 20:20:51 INFO SparkEnv: Registering OutputCommitCoordinator
21/12/30 20:20:52 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/12/30 20:20:52 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.226.128:4040
21/12/30 20:20:52 INFO SparkContext: Added JAR file:/shujuqixi.jar at spark://192.168.226.128:57442/jars/shujuqixi.jar with timestamp 1640924452057
21/12/30 20:20:52 INFO Executor: Starting executor ID driver on host localhost
21/12/30 20:20:52 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39372.
21/12/30 20:20:52 INFO NettyBlockTransferService: Server created on 192.168.226.128:39372
21/12/30 20:20:52 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.226.128, 39372)
21/12/30 20:20:52 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.226.128:39372 with 413.9 MB RAM, BlockManagerId(driver, 192.168.226.128, 39372)
21/12/30 20:20:52 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.226.128, 39372)
21/12/30 20:20:52 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
21/12/30 20:20:52 INFO SharedState: Warehouse path is 'file://spark-warehouse'.
21/12/30 20:20:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 413.7 MB)
21/12/30 20:20:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.6 KB, free 413.7 MB)
21/12/30 20:20:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.226.128:39372 (size: 20.6 KB, free: 413.9 MB)
21/12/30 20:20:55 INFO SparkContext: Created broadcast 0 from json at Test1.scala:14
21/12/30 20:20:55 INFO FileInputFormat: Total input paths to process : 1
21/12/30 20:20:55 INFO SparkContext: Starting job: json at Test1.scala:14
21/12/30 20:20:55 INFO DAGScheduler: Got job 0 (json at Test1.scala:14) with 1 output partitions
21/12/30 20:20:55 INFO DAGScheduler: Final stage: ResultStage 0 (json at Test1.scala:14)
21/12/30 20:20:55 INFO DAGScheduler: Parents of final stage: List()
21/12/30 20:20:55 INFO DAGScheduler: Missing parents: List()
21/12/30 20:20:55 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at json at Test1.scala:14), which has no missing parents
21/12/30 20:20:55 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.2 KB, free 413.7 MB)
21/12/30 20:20:55 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 413.7 MB)
21/12/30 20:20:55 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.226.128:39372 (size: 2.5 KB, free: 413.9 MB)
21/12/30 20:20:55 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1012
21/12/30 20:20:55 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at json at Test1.scala:14)
21/12/30 20:20:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
21/12/30 20:20:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, ANY, 5438 bytes)
21/12/30 20:20:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/12/30 20:20:55 INFO Executor: Fetching spark://192.168.226.128:57442/jars/shujuqixi.jar with timestamp 1640924452057
21/12/30 20:20:56 INFO TransportClientFactory: Successfully created connection to /192.168.226.128:57442 after 20 ms (0 ms spent in bootstraps)
21/12/30 20:20:56 INFO Utils: Fetching spark://192.168.226.128:57442/jars/shujuqixi.jar to /tmp/spark-e4e0cb9c-4d7c-4691-99e6-87b9083cc385/userFiles-fc08bce3-923e-4d7a-97c1-fb67c1dbfc60/fetchFileTemp187051986861525205.tmp
21/12/30 20:20:56 INFO Executor: Adding file:/tmp/spark-e4e0cb9c-4d7c-4691-99e6-87b9083cc385/userFiles-fc08bce3-923e-4d7a-97c1-fb67c1dbfc60/shujuqixi.jar to class loader
21/12/30 20:20:56 INFO HadoopRDD: Input split: hdfs://master:9000/platform_data/wm_5902.json:0+10715136
21/12/30 20:20:56 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
21/12/30 20:20:56 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
21/12/30 20:20:56 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
21/12/30 20:20:56 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
21/12/30 20:20:56 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
21/12/30 20:20:57 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 5419 bytes result sent to driver
21/12/30 20:20:57 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1402 ms on localhost (1/1)
21/12/30 20:20:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
21/12/30 20:20:57 INFO DAGScheduler: ResultStage 0 (json at Test1.scala:14) finished in 1.434 s
21/12/30 20:20:57 INFO DAGScheduler: Job 0 finished: json at Test1.scala:14, took 1.597588 s
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://spark-warehouse, expected: file:///
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
	at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
	at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:116)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
	at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
	at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
	at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112)
	at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
	at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
	at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
	at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:287)
	at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:249)
	at com.itcast.Test1$.main(Test1.scala:14)
	at com.itcast.Test1.main(Test1.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/12/30 20:20:58 INFO SparkContext: Invoking stop() from shutdown hook
21/12/30 20:20:58 INFO SparkUI: Stopped Spark web UI at http://192.168.226.128:4040
21/12/30 20:20:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/12/30 20:20:58 INFO MemoryStore: MemoryStore cleared
21/12/30 20:20:58 INFO BlockManager: BlockManager stopped
21/12/30 20:20:58 INFO BlockManagerMaster: BlockManagerMaster stopped
21/12/30 20:20:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/12/30 20:20:58 INFO SparkContext: Successfully stopped SparkContext
21/12/30 20:20:58 INFO ShutdownHookManager: Shutdown hook called
21/12/30 20:20:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-e4e0cb9c-4d7c-4691-99e6-87b9083cc385

首次提问就不要改提问模版了,把 Nebula 和 Spark Connector 的版本号补充下。

大哥 要不要看下错误信息的描述:
Wrong FS: file://spark-warehouse, expected: file:///
已经告诉你expected的是啥了, file后面有三个斜杠

val data1 = spark.read.json(“hdfs://master:9000/wm.json”)这是我读取hdfs文件的代码 这个格式没有问题吧

你这个没问题啊 你标题中的错误是ware的路径不对啊

但是我好像没有设置这个路径啊 这个要在哪里设置啊 实在集群里面 还是在代码里面

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。