Nebula Exchange 导出Hive数据报错

提问参考模版:

  • nebula 版本:3.4.0

  • 部署方式:单机

  • 安装方式:Docker

  • 是否上生产环境: N

  • 硬件信息

  • 尝试使用Nebula Exchange把Hive表数据导出成SST文件。 远程服务器上的Hive,本地运行的Spark.

  • 本地spark版本:spark-3.3.2-bin-hadoop3

  • jdbc驱动文件:mysql-connector-java-8.0.30

  • Nebula Exchange版本:nebula-exchange_spark_3.0-3.4.0.jar

application.conf中Hive的配置:

   hive: {
     warehouse: "hdfs://xxx.xxx.xxx.xxx:8020/hive/warehouse/"
     connectionURL: "jdbc:hive2://xxx.xxx.xxx.xxx:10000/c4"
     connectionDriverName: "com.mysql.jdbc.Driver"
     connectionUserName: "hive"
     connectionPassword: ""
   }

命令行运行结果:

nabula ~/bin/spark-3.3.2-bin-hadoop3/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange nebula-exchange_spark_3.0-3.4.0.jar -c application.conf -h

23/03/30 15:22:41 WARN Utils: Your hostname, MacBook-Air.local resolves to a loopback address: 127.0.0.1; using 192.168.0.109 instead (on interface en0)
23/03/30 15:22:41 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
23/03/30 15:22:41 INFO Configs$: DataBase Config com.vesoft.exchange.common.config.DataBaseConfigEntry@50961e34
23/03/30 15:22:41 INFO Configs$: User Config com.vesoft.exchange.common.config.UserConfigEntry@3161b833
23/03/30 15:22:41 INFO Configs$: Connection Config Some(Config(SimpleConfigObject({"timeout":30000})))
23/03/30 15:22:41 INFO Configs$: Execution Config com.vesoft.exchange.common.config.ExecutionConfigEntry@c757796a
23/03/30 15:22:41 INFO Configs$: Source Config Hive source exec: select gid, name, source_flag from c4.graph_node_company
23/03/30 15:22:41 INFO Configs$: Sink Config Hive source exec: select gid, name, source_flag from c4.graph_node_company
23/03/30 15:22:41 INFO Configs$: name company  batch 2000
23/03/30 15:22:41 INFO Configs$: Tag Config: Tag name: company, source: Hive source exec: select gid, name, source_flag from c4.graph_node_company, sink: File sink: from /tmp to hdfs://name_node:9000/sst, vertex field: gid, vertex policy: None, batch: 2000, partition: 60, repartitionWithNebula: true, enableTagless: false.
23/03/30 15:22:41 INFO Configs$: Source Config Hive source exec: select gid, parentgid, r from c4.graph_edge_branch
23/03/30 15:22:41 INFO Configs$: Sink Config Hive source exec: select gid, parentgid, r from c4.graph_edge_branch
23/03/30 15:22:41 INFO Configs$: Edge Config: Edge name: branch, source: Hive source exec: select gid, parentgid, r from c4.graph_edge_branch, sink: File sink: from /tmp to hdfs://name_node:9000/sst, source field: gid, source policy: None, ranking: None, target field: parentgid, target policy: None, batch: 2000, partition: 60.
23/03/30 15:22:41 INFO Exchange$: Config Configs(com.vesoft.exchange.common.config.DataBaseConfigEntry@50961e34,com.vesoft.exchange.common.config.UserConfigEntry@3161b833,com.vesoft.exchange.common.config.ConnectionConfigEntry@5c12072f,com.vesoft.exchange.common.config.ExecutionConfigEntry@c757796a,com.vesoft.exchange.common.config.ErrorConfigEntry@55508fa6,com.vesoft.exchange.common.config.RateConfigEntry@fc4543af,com.vesoft.exchange.common.config.SslConfigEntry@745bf6f4,,List(Tag name: company, source: Hive source exec: select gid, name, source_flag from c4.graph_node_company, sink: File sink: from /tmp to hdfs://name_node:9000/sst, vertex field: gid, vertex policy: None, batch: 2000, partition: 60, repartitionWithNebula: true, enableTagless: false.),List(Edge name: branch, source: Hive source exec: select gid, parentgid, r from c4.graph_edge_branch, sink: File sink: from /tmp to hdfs://name_node:9000/sst, source field: gid, source policy: None, ranking: None, target field: parentgid, target policy: None, batch: 2000, partition: 60.),Some(HiveConfigEntry:{warehouse=hdfs://10.96.20.239:8020/hive/warehouse/, connectionURL=jdbc:hive2://10.96.20.239:10000/c4, connectionDriverName=com.mysql.jdbc.Driver, connectionUserName=hive, connectionPassWord=}))
23/03/30 15:22:41 INFO HiveConf: Found configuration file null
23/03/30 15:22:41 INFO SparkContext: Running Spark version 3.3.2
23/03/30 15:22:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/03/30 15:22:41 INFO ResourceUtils: ==============================================================
23/03/30 15:22:41 INFO ResourceUtils: No custom resources configured for spark.driver.
23/03/30 15:22:41 INFO ResourceUtils: ==============================================================
23/03/30 15:22:41 INFO SparkContext: Submitted application: com.vesoft.nebula.exchange.Exchange
23/03/30 15:22:41 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/03/30 15:22:41 INFO ResourceProfile: Limiting resource is cpu
23/03/30 15:22:41 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/03/30 15:22:41 INFO SecurityManager: Changing view acls to: xiaocj
23/03/30 15:22:41 INFO SecurityManager: Changing modify acls to: xiaocj
23/03/30 15:22:41 INFO SecurityManager: Changing view acls groups to: 
23/03/30 15:22:41 INFO SecurityManager: Changing modify acls groups to: 
23/03/30 15:22:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(xiaocj); groups with view permissions: Set(); users  with modify permissions: Set(xiaocj); groups with modify permissions: Set()
23/03/30 15:22:41 INFO Utils: Successfully started service 'sparkDriver' on port 57798.
23/03/30 15:22:41 INFO SparkEnv: Registering MapOutputTracker
23/03/30 15:22:41 INFO SparkEnv: Registering BlockManagerMaster
23/03/30 15:22:41 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/03/30 15:22:41 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/03/30 15:22:41 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/03/30 15:22:41 INFO DiskBlockManager: Created local directory at /private/var/folders/40/jv_rqtd56jq5x9fpctgkch840000gp/T/blockmgr-c5ef6d73-45a8-4e11-a506-bd2c136623cc
23/03/30 15:22:41 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
23/03/30 15:22:41 INFO SparkEnv: Registering OutputCommitCoordinator
23/03/30 15:22:41 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/03/30 15:22:41 INFO SparkContext: Added JAR file:/Users/xiaocj/bin/nabula/nebula-exchange_spark_3.0-3.4.0.jar at spark://192.168.0.109:57798/jars/nebula-exchange_spark_3.0-3.4.0.jar with timestamp 1680160961554
23/03/30 15:22:41 INFO Executor: Starting executor ID driver on host 192.168.0.109
23/03/30 15:22:41 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/03/30 15:22:41 INFO Executor: Fetching spark://192.168.0.109:57798/jars/nebula-exchange_spark_3.0-3.4.0.jar with timestamp 1680160961554
23/03/30 15:22:41 INFO TransportClientFactory: Successfully created connection to /192.168.0.109:57798 after 8 ms (0 ms spent in bootstraps)
23/03/30 15:22:41 INFO Utils: Fetching spark://192.168.0.109:57798/jars/nebula-exchange_spark_3.0-3.4.0.jar to /private/var/folders/40/jv_rqtd56jq5x9fpctgkch840000gp/T/spark-7e93f401-01c0-4c3d-a681-55a3ce57162e/userFiles-57e6bd11-0654-4143-9180-1b4ef456f5b8/fetchFileTemp6402631326289553812.tmp
23/03/30 15:22:42 INFO Executor: Adding file:/private/var/folders/40/jv_rqtd56jq5x9fpctgkch840000gp/T/spark-7e93f401-01c0-4c3d-a681-55a3ce57162e/userFiles-57e6bd11-0654-4143-9180-1b4ef456f5b8/nebula-exchange_spark_3.0-3.4.0.jar to class loader
23/03/30 15:22:42 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57800.
23/03/30 15:22:42 INFO NettyBlockTransferService: Server created on 192.168.0.109:57800
23/03/30 15:22:42 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/03/30 15:22:42 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.109, 57800, None)
23/03/30 15:22:42 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.109:57800 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.0.109, 57800, None)
23/03/30 15:22:42 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.109, 57800, None)
23/03/30 15:22:42 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.109, 57800, None)
23/03/30 15:22:42 INFO Exchange$: Processing Tag company
23/03/30 15:22:42 INFO Exchange$: field keys: gid, name, source_flag
23/03/30 15:22:42 INFO Exchange$: nebula keys: gid, name, source_flag
23/03/30 15:22:42 INFO Exchange$: Loading from Hive and exec select gid, name, source_flag from c4.graph_node_company
23/03/30 15:22:42 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
23/03/30 15:22:42 INFO SharedState: Warehouse path is 'hdfs://10.96.20.239:8020/hive/warehouse'.
23/03/30 15:22:42 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.9 using Spark classes.
23/03/30 15:22:42 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.9) is hdfs://10.96.20.239:8020/hive/warehouse
23/03/30 15:22:42 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
23/03/30 15:22:42 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
23/03/30 15:22:42 INFO HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
23/03/30 15:22:42 INFO ObjectStore: ObjectStore, initialize called
23/03/30 15:22:42 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
23/03/30 15:22:42 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
23/03/30 15:22:42 INFO Utils: Supplied authorities: 10.96.20.239:10000
23/03/30 15:22:42 INFO Utils: Resolved authority: 10.96.20.239:10000
23/03/30 15:22:43 INFO Utils: Supplied authorities: 10.96.20.239:10000
23/03/30 15:22:43 INFO Utils: Resolved authority: 10.96.20.239:10000
23/03/30 15:22:43 INFO Utils: Supplied authorities: 10.96.20.239:10000
23/03/30 15:22:43 INFO Utils: Resolved authority: 10.96.20.239:10000
23/03/30 15:22:43 INFO Utils: Supplied authorities: 10.96.20.239:10000
23/03/30 15:22:43 INFO Utils: Resolved authority: 10.96.20.239:10000
23/03/30 15:22:43 WARN Datastore: No Database Adapter was found for your JDBC driver specified. Faling back to the generic DatabaseAdapter!
23/03/30 15:22:43 INFO General: Error in initialisation of RDBMSStoreManager
Error accessing database MetaData
org.datanucleus.exceptions.NucleusDataStoreException: Error accessing database MetaData
	at org.datanucleus.store.rdbms.adapter.BaseDatastoreAdapter.<init>(BaseDatastoreAdapter.java:420)
	at org.datanucleus.store.rdbms.adapter.DatastoreAdapterFactory.getDatastoreAdapter(DatastoreAdapterFactory.java:91)
     ....
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.sql.SQLFeatureNotSupportedException: Method not supported
	at org.apache.hive.jdbc.HiveDatabaseMetaData.getMaxTableNameLength(HiveDatabaseMetaData.java:504)
	at org.datanucleus.store.rdbms.adapter.BaseDatastoreAdapter.<init>(BaseDatastoreAdapter.java:355)
	... 149 more
Nested Throwables StackTrace:
java.sql.SQLFeatureNotSupportedException: Method not supported
	at org.apache.hive.jdbc.HiveDatabaseMetaData.getMaxTableNameLength(HiveDatabaseMetaData.java:504)
   ...

spark_home/jars下有没有hive-jdbc?

有的,hive-jdbc-2.3.9.jar

我使用的是官网下载的 spark-3.3.2-bin-hadoop3.tgz。 解压后自己在spark_home/jars/目录下添加了"mysql-connector-java-8.0.30.jar"文件

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。