- nebula 版本:2.6.0
- 部署方式:单机
- 安装方式:Docker-compose
- 是否为线上版本:N
- exchange版本: 2.6
- spark版本: 2.4.8
- scala版本: 2.11.12
- 硬件信息
- 问题的具体描述
利用exchange导入basketballplayer数据报错
- spark报错日志
[wjc@localhost bin]$ ./spark-submit --master local --class com.vesoft.nebula.exchange.Exchange ~/nebula-exchange-2.6/nebula-exchange/target/nebula-exchange-2.6.3.jar -c ~/nebula-exchange-2.6/nebula-exchange/target/classes/csv_application.conf
22/03/16 11:18:20 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 172.20.10.14 instead (on interface ens33)
22/03/16 11:18:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/03/16 11:18:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (com.vesoft.nebula.exchange.config.Configs$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/03/16 11:18:21 INFO SparkContext: Running Spark version 2.4.8
22/03/16 11:18:21 INFO SparkContext: Submitted application: com.vesoft.nebula.exchange.Exchange
22/03/16 11:18:21 INFO SecurityManager: Changing view acls to: wjc
22/03/16 11:18:21 INFO SecurityManager: Changing modify acls to: wjc
22/03/16 11:18:21 INFO SecurityManager: Changing view acls groups to:
22/03/16 11:18:21 INFO SecurityManager: Changing modify acls groups to:
22/03/16 11:18:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(wjc); groups with view permissions: Set(); users with modify permissions: Set(wjc); groups with modify permissions: Set()
22/03/16 11:18:22 INFO Utils: Successfully started service 'sparkDriver' on port 46097.
22/03/16 11:18:22 INFO SparkEnv: Registering MapOutputTracker
22/03/16 11:18:22 INFO SparkEnv: Registering BlockManagerMaster
22/03/16 11:18:22 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/03/16 11:18:22 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/03/16 11:18:22 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-3250b5a2-226c-486b-b2fd-e65bdcb1e60e
22/03/16 11:18:22 INFO MemoryStore: MemoryStore started with capacity 366.1 MB
22/03/16 11:18:22 INFO SparkEnv: Registering OutputCommitCoordinator
22/03/16 11:18:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/03/16 11:18:22 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.20.10.14:4040
22/03/16 11:18:22 INFO SparkContext: Added JAR file:/home/wjc/nebula-exchange-2.6/nebula-exchange/target/nebula-exchange-2.6.3.jar at spark://172.20.10.14:46097/jars/nebula-exchange-2.6.3.jar with timestamp 1647400702966
22/03/16 11:18:23 INFO Executor: Starting executor ID driver on host localhost
22/03/16 11:18:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33021.
22/03/16 11:18:23 INFO NettyBlockTransferService: Server created on 172.20.10.14:33021
22/03/16 11:18:23 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/03/16 11:18:23 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.20.10.14, 33021, None)
22/03/16 11:18:23 INFO BlockManagerMasterEndpoint: Registering block manager 172.20.10.14:33021 with 366.1 MB RAM, BlockManagerId(driver, 172.20.10.14, 33021, None)
22/03/16 11:18:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.20.10.14, 33021, None)
22/03/16 11:18:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.20.10.14, 33021, None)
22/03/16 11:18:23 INFO Exchange$: Processing Tag player
22/03/16 11:18:23 INFO Exchange$: field keys: _c1, _c2
22/03/16 11:18:23 INFO Exchange$: nebula keys: age, name
22/03/16 11:18:23 INFO Exchange$: Loading CSV files from file:///home/wjc/daoshu/dataset/vertex_player.csv
22/03/16 11:18:23 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/wjc/spark-2.4.8-bin-hadoop2.6/bin/spark-warehouse').
22/03/16 11:18:23 INFO SharedState: Warehouse path is 'file:/home/wjc/spark-2.4.8-bin-hadoop2.6/bin/spark-warehouse'.
22/03/16 11:18:24 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
22/03/16 11:18:24 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
22/03/16 11:18:24 INFO InMemoryFileIndex: It took 34 ms to list leaf files for 1 paths.
22/03/16 11:18:24 INFO InMemoryFileIndex: It took 1 ms to list leaf files for 1 paths.
22/03/16 11:18:27 INFO FileSourceStrategy: Pruning directories with:
22/03/16 11:18:27 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
22/03/16 11:18:27 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
22/03/16 11:18:27 INFO FileSourceScanExec: Pushed Filters:
22/03/16 11:18:27 INFO CodeGenerator: Code generated in 262.076914 ms
22/03/16 11:18:28 INFO CodeGenerator: Code generated in 21.848622 ms
22/03/16 11:18:28 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 200.3 KB, free 366.0 MB)
22/03/16 11:18:28 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 21.0 KB, free 365.9 MB)
22/03/16 11:18:28 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.20.10.14:33021 (size: 21.0 KB, free: 366.1 MB)
22/03/16 11:18:28 INFO SparkContext: Created broadcast 0 from csv at FileBaseReader.scala:86
22/03/16 11:18:28 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4195699 bytes, open cost is considered as scanning 4194304 bytes.
22/03/16 11:18:28 INFO SparkContext: Starting job: csv at FileBaseReader.scala:86
22/03/16 11:18:28 INFO DAGScheduler: Got job 0 (csv at FileBaseReader.scala:86) with 1 output partitions
22/03/16 11:18:28 INFO DAGScheduler: Final stage: ResultStage 0 (csv at FileBaseReader.scala:86)
22/03/16 11:18:28 INFO DAGScheduler: Parents of final stage: List()
22/03/16 11:18:28 INFO DAGScheduler: Missing parents: List()
22/03/16 11:18:28 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at csv at FileBaseReader.scala:86), which has no missing parents
22/03/16 11:18:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.9 KB, free 365.9 MB)
22/03/16 11:18:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.6 KB, free 365.9 MB)
22/03/16 11:18:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.20.10.14:33021 (size: 4.6 KB, free: 366.1 MB)
22/03/16 11:18:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
22/03/16 11:18:29 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at csv at FileBaseReader.scala:86) (first 15 tasks are for partitions Vector(0))
22/03/16 11:18:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
22/03/16 11:18:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 8265 bytes)
22/03/16 11:18:29 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
22/03/16 11:18:29 INFO Executor: Fetching spark://172.20.10.14:46097/jars/nebula-exchange-2.6.3.jar with timestamp 1647400702966
22/03/16 11:18:29 INFO TransportClientFactory: Successfully created connection to /172.20.10.14:46097 after 43 ms (0 ms spent in bootstraps)
22/03/16 11:18:29 INFO Utils: Fetching spark://172.20.10.14:46097/jars/nebula-exchange-2.6.3.jar to /tmp/spark-ce0772b1-f41c-4aa0-acea-07a42345bfaa/userFiles-469bd7eb-e976-4904-bc17-73aee2d96a75/fetchFileTemp6355599499474469828.tmp
22/03/16 11:18:31 INFO Executor: Adding file:/tmp/spark-ce0772b1-f41c-4aa0-acea-07a42345bfaa/userFiles-469bd7eb-e976-4904-bc17-73aee2d96a75/nebula-exchange-2.6.3.jar to class loader
22/03/16 11:18:31 INFO FileScanRDD: Reading File path: file:///home/wjc/daoshu/dataset/vertex_player.csv, range: 0-1395, partition values: [empty row]
22/03/16 11:18:31 INFO CodeGenerator: Code generated in 17.728156 ms
22/03/16 11:18:31 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1259 bytes result sent to driver
22/03/16 11:18:31 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2375 ms on localhost (executor driver) (1/1)
22/03/16 11:18:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/03/16 11:18:31 INFO DAGScheduler: ResultStage 0 (csv at FileBaseReader.scala:86) finished in 2.560 s
22/03/16 11:18:31 INFO DAGScheduler: Job 0 finished: csv at FileBaseReader.scala:86, took 2.675931 s
22/03/16 11:18:31 INFO FileSourceStrategy: Pruning directories with:
22/03/16 11:18:31 INFO FileSourceStrategy: Post-Scan Filters:
22/03/16 11:18:31 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
22/03/16 11:18:31 INFO FileSourceScanExec: Pushed Filters:
22/03/16 11:18:31 INFO CodeGenerator: Code generated in 16.122899 ms
22/03/16 11:18:31 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 200.3 KB, free 365.7 MB)
22/03/16 11:18:31 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 21.0 KB, free 365.7 MB)
22/03/16 11:18:31 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.20.10.14:33021 (size: 21.0 KB, free: 366.1 MB)
22/03/16 11:18:31 INFO SparkContext: Created broadcast 2 from csv at FileBaseReader.scala:86
22/03/16 11:18:31 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4195699 bytes, open cost is considered as scanning 4194304 bytes.
Exception in thread "main" com.facebook.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:206)
at com.vesoft.nebula.client.meta.MetaClient.getClient(MetaClient.java:145)
at com.vesoft.nebula.client.meta.MetaClient.doConnect(MetaClient.java:124)
at com.vesoft.nebula.client.meta.MetaClient.connect(MetaClient.java:113)
at com.vesoft.nebula.exchange.MetaProvider.<init>(MetaProvider.scala:56)
at com.vesoft.nebula.exchange.processor.VerticesProcessor.process(VerticesProcessor.scala:110)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:150)
at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:126)
at scala.collection.immutable.List.foreach(List.scala:392)
at com.vesoft.nebula.exchange.Exchange$.main(Exchange.scala:126)
at com.vesoft.nebula.exchange.Exchange.main(Exchange.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at com.facebook.thrift.transport.TSocket.open(TSocket.java:201)
... 22 more
22/03/16 11:18:31 INFO SparkContext: Invoking stop() from shutdown hook
22/03/16 11:18:31 INFO SparkUI: Stopped Spark web UI at http://172.20.10.14:4040
22/03/16 11:18:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/03/16 11:18:31 INFO MemoryStore: MemoryStore cleared
22/03/16 11:18:31 INFO BlockManager: BlockManager stopped
22/03/16 11:18:31 INFO BlockManagerMaster: BlockManagerMaster stopped
22/03/16 11:18:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/03/16 11:18:31 INFO SparkContext: Successfully stopped SparkContext
22/03/16 11:18:31 INFO ShutdownHookManager: Shutdown hook called
22/03/16 11:18:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-f664c208-638d-4554-831a-273e42352a98
22/03/16 11:18:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-ce0772b1-f41c-4aa0-acea-07a42345bfaa
{
spark: {
app: {
name: Nebula Exchange 2.0
}
master:local
driver: {
cores: 1
maxResultSize: 128M
}
executor: {
memory:1513M
}
cores:{
max: 1
}
}
nebula: {
address:{
graph:["172.18.0.9:49177","172.18.0.8:9669","172.18.0.10:49178"]
meta:["172.18.0.3:49160","172.18.0.2:49159","172.18.0.4:49161"]
}
user: root
pswd: nebula
space: basketballplayer
connection {
timeout: 30000
}
error: {
max: 1
output: /tmp/errors
}
rate: {
limit: 1024
timeout: 1000
}
}
tags: [
{
name: player
type: {
source: csv
sink: client
}
path: "file:///home/wjc/daoshu/dataset/vertex_player.csv"
fields: [_c1, _c2]
nebula.fields: [age, name]
vertex: {
field:_c0
}
separator: ","
header: false
batch: 128
partition: 32
}
{
name: team
type: {
source: csv
sink: client
}
path: "file:///home/wjc/daoshu/dataset/vertex_team.csv"
fields: [_c1]
nebula.fields: [name]
vertex: {
field:_c0
}
separator: ","
header: false
batch: 128
partition: 32
}
]
edges: [
{
name: follow
type: {
source: csv
sink: client
}
path: "file:///home/wjc/daoshu/dataset/edge_follow.csv"
fields: [_c2]
nebula.fields: [degree]
source: {
field: _c0
}
target: {
field: _c1
}
separator: ","
header: false
batch: 128
partition: 32
}
{
name: serve
type: {
source: csv
sink: client
}
path: "file:///home/wjc/daoshu/dataset/edge_serve.csv"
fields: [_c2,_c3]
nebula.fields: [start_year, end_year]
source: {
field: _c0
}
target: {
field: _c1
}
separator: ","
header: false
batch: 128
partition: 32
}
]
}