使用nebula-exchange3.8.0将Hive数据导入nebula-graph3.8.0,在运行过程中详细日志如下
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/app/spark-2.4.4-bin-hadoop2.6-local/jars/nebula-spark-connector-2.5.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/app/spark-2.4.4-bin-hadoop2.6-local/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Warning: Ignoring non-spark config property: --spark.local.dir=/data/spark_local
log4j:WARN No appenders could be found for logger (com.vesoft.exchange.common.config.Configs$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
24/06/11 10:59:26 INFO SparkContext: Running Spark version 2.4.4
24/06/11 10:59:26 INFO SparkContext: Submitted application: com.vesoft.nebula.exchange.Exchange
24/06/11 10:59:26 INFO SecurityManager: Changing view acls to: admin,modeling-production
24/06/11 10:59:26 INFO SecurityManager: Changing modify acls to: admin,modeling-production
24/06/11 10:59:26 INFO SecurityManager: Changing view acls groups to:
24/06/11 10:59:26 INFO SecurityManager: Changing modify acls groups to:
24/06/11 10:59:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(admin, modeling-production); groups with view permissions: Set(); users with modify permissions: Set(admin, modeling-production); groups with modify permissions: Set()
24/06/11 10:59:26 INFO Utils: Successfully started service 'sparkDriver' on port 34968.
24/06/11 10:59:26 INFO SparkEnv: Registering MapOutputTracker
24/06/11 10:59:26 INFO SparkEnv: Registering BlockManagerMaster
24/06/11 10:59:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
24/06/11 10:59:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
24/06/11 10:59:26 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-d5bc25fb-21f8-44fa-9a41-3fb36fbf15dc
24/06/11 10:59:26 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
24/06/11 10:59:27 INFO SparkEnv: Registering OutputCommitCoordinator
24/06/11 10:59:27 INFO Utils: Successfully started service 'SparkUI' on port 4040.
24/06/11 10:59:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://test01:4040
24/06/11 10:59:27 INFO SparkContext: Added JAR file:/data/app/nebula380_echange/nebula-exchange_spark_2.4-3.8.0.jar at spark://test01:34968/jars/nebula-exchange_spark_2.4-3.8.0.jar with timestamp 1718103567270
24/06/11 10:59:27 INFO Executor: Starting executor ID driver on host localhost
24/06/11 10:59:27 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 20543.
24/06/11 10:59:27 INFO NettyBlockTransferService: Server created on test01:20543
24/06/11 10:59:27 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
24/06/11 10:59:27 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, test01, 20543, None)
24/06/11 10:59:27 INFO BlockManagerMasterEndpoint: Registering block manager test01:20543 with 366.3 MB RAM, BlockManagerId(driver, test01, 20543, None)
24/06/11 10:59:27 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, test01, 20543, None)
24/06/11 10:59:27 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, test01, 20543, None)
24/06/11 10:59:27 INFO Exchange$: >>>>> Processing Tag idfv
24/06/11 10:59:27 INFO Exchange$: >>>>> field keys: name
24/06/11 10:59:27 INFO Exchange$: >>>>> nebula keys: name
24/06/11 10:59:27 INFO Exchange$: >>>>> Loading from Hive and exec select _vertexid as vid, name from model.xxx_idfv
24/06/11 10:59:27 INFO SharedState: loading hive config file: file:/data/app/spark2-env/hive-site.xml
24/06/11 10:59:27 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
24/06/11 10:59:27 INFO SharedState: Warehouse path is '/user/hive/warehouse'.
24/06/11 10:59:28 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
24/06/11 10:59:28 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
24/06/11 10:59:29 INFO metastore: Trying to connect to metastore with URI thrift://cdh76.akulaku.com:9083
24/06/11 10:59:29 WARN LdapGroupsMapping: Connection is closed, will try to reconnect
24/06/11 10:59:29 WARN LdapGroupsMapping: Connection being closed, reconnecting failed, retryCount = 1
24/06/11 10:59:29 WARN LdapGroupsMapping: Connection being closed, reconnecting failed, retryCount = 2
24/06/11 10:59:29 WARN LdapGroupsMapping: Connection being closed, reconnecting failed, retryCount = 3
24/06/11 10:59:29 WARN UserGroupInformation: No groups available for user modeling-production
24/06/11 10:59:29 INFO metastore: Connected to metastore.
24/06/11 10:59:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/06/11 10:59:32 INFO SessionState: Created local directory: /tmp/bc685e9d-5c59-4e51-8b87-4a1f278e8f97_resources
24/06/11 10:59:32 INFO SessionState: Created HDFS directory: /tmp/hive/modeling-production/bc685e9d-5c59-4e51-8b87-4a1f278e8f97
24/06/11 10:59:32 INFO SessionState: Created local directory: /tmp/admin/bc685e9d-5c59-4e51-8b87-4a1f278e8f97
24/06/11 10:59:32 INFO SessionState: Created HDFS directory: /tmp/hive/modeling-production/bc685e9d-5c59-4e51-8b87-4a1f278e8f97/_tmp_space.db
24/06/11 10:59:32 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
24/06/11 10:59:34 INFO FileSourceStrategy: Pruning directories with:
24/06/11 10:59:34 INFO FileSourceStrategy: Post-Scan Filters:
24/06/11 10:59:34 INFO FileSourceStrategy: Output Data Schema: struct<_vertexId: bigint, name: string>
24/06/11 10:59:34 INFO FileSourceScanExec: Pushed Filters:
24/06/11 10:59:34 INFO CodeGenerator: Code generated in 202.213088 ms
24/06/11 10:59:35 INFO CodeGenerator: Code generated in 35.893224 ms
24/06/11 10:59:35 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 284.8 KB, free 366.0 MB)
24/06/11 10:59:35 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.1 KB, free 366.0 MB)
24/06/11 10:59:35 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on test01:20543 (size: 26.1 KB, free: 366.3 MB)
24/06/11 10:59:35 INFO SparkContext: Created broadcast 0 from show at Exchange.scala:403
24/06/11 10:59:35 INFO FileSourceScanExec: Planning scan with bin packing, max size: 62930671 bytes, open cost is considered as scanning 4194304 bytes.
24/06/11 10:59:35 INFO SparkContext: Starting job: show at Exchange.scala:403
24/06/11 10:59:35 INFO DAGScheduler: Got job 0 (show at Exchange.scala:403) with 1 output partitions
24/06/11 10:59:35 INFO DAGScheduler: Final stage: ResultStage 0 (show at Exchange.scala:403)
24/06/11 10:59:35 INFO DAGScheduler: Parents of final stage: List()
24/06/11 10:59:35 INFO DAGScheduler: Missing parents: List()
24/06/11 10:59:35 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at show at Exchange.scala:403), which has no missing parents
24/06/11 10:59:35 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 10.0 KB, free 366.0 MB)
24/06/11 10:59:35 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.8 KB, free 366.0 MB)
24/06/11 10:59:35 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on test01:20543 (size: 4.8 KB, free: 366.3 MB)
24/06/11 10:59:35 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
24/06/11 10:59:35 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at show at Exchange.scala:403) (first 15 tasks are for partitions Vector(0))
24/06/11 10:59:35 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
24/06/11 10:59:35 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 10766 bytes)
24/06/11 10:59:35 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
24/06/11 10:59:35 INFO Executor: Fetching spark://test01:34968/jars/nebula-exchange_spark_2.4-3.8.0.jar with timestamp 1718103567270
24/06/11 10:59:35 INFO TransportClientFactory: Successfully created connection to test01/192.168.4.15:34968 after 38 ms (0 ms spent in bootstraps)
24/06/11 10:59:35 INFO Utils: Fetching spark://test01:34968/jars/nebula-exchange_spark_2.4-3.8.0.jar to /tmp/spark-321a54d6-6b2e-4fcb-adc3-3a4b31218b3f/userFiles-44d5833d-286b-4837-b008-98248413948a/fetchFileTemp1116339858089345666.tmp
24/06/11 10:59:36 INFO Executor: Adding file:/tmp/spark-321a54d6-6b2e-4fcb-adc3-3a4b31218b3f/userFiles-44d5833d-286b-4837-b008-98248413948a/nebula-exchange_spark_2.4-3.8.0.jar to class loader
24/06/11 10:59:36 INFO FileScanRDD: Reading File path: hdfs://nameservice-nkb/user/hive/warehouse/model.db/xxx_idfv/part-00001-acca10f6-dd6b-4b3d-8d81-8000ef886ab7-c000.snappy.parquet, range: 0-1218, partition values: [empty row]
24/06/11 10:59:37 INFO CodecPool: Got brand-new decompressor [.snappy]
24/06/11 10:59:37 INFO FileScanRDD: Reading File path: hdfs://nameservice-nkb/user/hive/warehouse/model.db/xxx_idfv/part-00002-acca10f6-dd6b-4b3d-8d81-8000ef886ab7-c000.snappy.parquet, range: 0-1171, partition values: [empty row]
24/06/11 10:59:37 INFO FileScanRDD: Reading File path: hdfs://nameservice-nkb/user/hive/warehouse/model.db/xxx_idfv/part-00006-acca10f6-dd6b-4b3d-8d81-8000ef886ab7-c000.snappy.parquet, range: 0-1169, partition values: [empty row]
24/06/11 10:59:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2288 bytes result sent to driver
24/06/11 10:59:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1889 ms on localhost (executor driver) (1/1)
24/06/11 10:59:37 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
24/06/11 10:59:37 INFO DAGScheduler: ResultStage 0 (show at Exchange.scala:403) finished in 2.038 s
24/06/11 10:59:37 INFO DAGScheduler: Job 0 finished: show at Exchange.scala:403, took 2.084592 s
+-------+--------------------+
| vid| name|
+-------+--------------------+
|8264971|80B0B1A5-F638-431...|
|7898386|91F5B5ED-44B9-4FB...|
|8186926|A17D6C41-EBB7-4A4...|
|8278111|289C2681-7B91-40C...|
|2033326|9E6DB9AC-02B8-426...|
|8222386|640D8EE7-A9B0-4C5...|
|8023486|E5A8F81D-9B44-467...|
| 28636|4344859A-FED4-4FF...|
|8273641|D3E257D4-B98D-4A9...|
|8195566|5AA7C10A-48A0-4B2...|
|8271677|483B4941-E91A-478...|
|8186702|6E99028E-F3DB-4D5...|
|8264537|194B49EF-195D-4CF...|
|8186972|AC97971D-7214-4CF...|
|7980407|26A546D0-FB82-4B7...|
|8252057|2244772D-8AF7-404...|
|8186777|B35C2F76-159B-429...|
|8166617|B2BCBAED-0289-4F7...|
|8292842|393F0060-90C2-451...|
|8267796|B520DE5C-7E36-486...|
+-------+--------------------+
only showing top 20 rows
24/06/11 10:59:37 INFO BlockManagerInfo: Removed broadcast_0_piece0 on test01:20543 in memory (size: 26.1 KB, free: 366.3 MB)
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 0
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 16
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 18
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 22
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 24
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 1
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 15
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 19
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 23
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 12
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 27
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 14
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 21
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 8
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 26
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 6
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 10
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 5
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 3
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 4
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 28
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 20
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 9
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 2
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 13
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 7
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 29
24/06/11 10:59:37 INFO BlockManagerInfo: Removed broadcast_1_piece0 on test01:20543 in memory (size: 4.8 KB, free: 366.3 MB)
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 17
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 11
24/06/11 10:59:37 INFO ContextCleaner: Cleaned accumulator 25
24/06/11 10:59:37 INFO FileSourceStrategy: Pruning directories with:
24/06/11 10:59:37 INFO FileSourceStrategy: Post-Scan Filters:
24/06/11 10:59:37 INFO FileSourceStrategy: Output Data Schema: struct<_vertexId: bigint, name: string>
24/06/11 10:59:37 INFO FileSourceScanExec: Pushed Filters:
24/06/11 10:59:37 INFO CodeGenerator: Code generated in 20.054558 ms
24/06/11 10:59:37 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 284.8 KB, free 366.0 MB)
24/06/11 10:59:37 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 26.1 KB, free 366.0 MB)
24/06/11 10:59:37 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on test01:20543 (size: 26.1 KB, free: 366.3 MB)
24/06/11 10:59:37 INFO SparkContext: Created broadcast 2 from rdd at Exchange.scala:380
24/06/11 10:59:37 INFO FileSourceScanExec: Planning scan with bin packing, max size: 62930671 bytes, open cost is considered as scanning 4194304 bytes.
Exception in thread "main" java.lang.NoSuchMethodError: com.vesoft.nebula.meta.ColumnTypeDef.getType()Lcom/vesoft/nebula/PropertyType;
at com.vesoft.exchange.common.MetaProvider$$anonfun$getTagSchema$1.apply(MetaProvider.scala:72)
at com.vesoft.exchange.common.MetaProvider$$anonfun$getTagSchema$1.apply(MetaProvider.scala:71)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.vesoft.exchange.common.MetaProvider.getTagSchema(MetaProvider.scala:71)
at com.vesoft.exchange.common.utils.NebulaUtils$.getDataSourceFieldType(NebulaUtils.scala:43)
at com.vesoft.nebula.exchange.processor.VerticesProcessor.process(VerticesProcessor.scala:115)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:160)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:116)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach_quick(ParArray.scala:143)
at scala.collection.parallel.mutable.ParArray$ParArrayIterator.foreach(ParArray.scala:136)
at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
24/06/11 10:59:37 INFO SparkContext: Invoking stop() from shutdown hook
24/06/11 10:59:37 INFO SparkUI: Stopped Spark web UI at http://test01:4040
24/06/11 10:59:37 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/06/11 10:59:37 INFO MemoryStore: MemoryStore cleared
24/06/11 10:59:37 INFO BlockManager: BlockManager stopped
24/06/11 10:59:37 INFO BlockManagerMaster: BlockManagerMaster stopped
24/06/11 10:59:37 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/06/11 10:59:37 INFO SparkContext: Successfully stopped SparkContext
24/06/11 10:59:37 INFO ShutdownHookManager: Shutdown hook called
24/06/11 10:59:37 INFO ShutdownHookManager: Deleting directory /tmp/spark-321a54d6-6b2e-4fcb-adc3-3a4b31218b3f
24/06/11 10:59:37 INFO ShutdownHookManager: Deleting directory /tmp/spark-83993ff2-c8a7-412a-af2d-bc82097122d1
Hive表结构
CREATE TABLE xxx_idfv (
"_vertexid" bigint,
name varchar
);
Graph
CREATE SPACE `xxx_test_h2n` (partition_num = 5, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = INT64)
CREATE TAG idfv(name string);
啥原因会导致上面的错误呢?