- nebula 版本:2.0
- 部署方式(分布式 ):
- 硬件信息
- 磁盘2T SSD
- CPU、内存信息:12核心+128内存
- 问题的具体描述:spark任务提交后,正常执行,为发现日志中有报错,但是数据没有入库到图数据库中
任务提交命令:spark-submit --class com.vesoft.nebula.exchange.Exchange --master yarn --deploy-mode client nebula-exchange-2.0.0.jar -c application-hive-vertexs.conf -h
配置:
# Processing tags
# There are tag config examples for different dataSources.
tags: [
{
name: ip
type: {
source: hive
sink: client
}
exec: "SELECT sip from sip.ods_sip_dns where day=20210127 and hour=09"
fields: [sip]
nebula.fields: [ip]
vertex: {
field: sip
# policy: "hash"
}
batch: 256
partition: 32
}
{
name: ip
type: {
source: hive
sink: client
}
exec: "SELECT dip from sip.ods_sip_dns where day=20210127 and hour=09"
fields: [dip]
nebula.fields: [ip]
vertex: {
field: dip
# policy: "hash"
}
batch: 256
partition: 32
}
]
# Processing edges
# There are edge config examples for different dataSources.
edges: [
]
hive表,确认:sip,dip字段为string类型。
截取部分日志:
21/03/04 14:53:52 INFO Configs$: DataBase Config com.vesoft.nebula.exchange.config.DataBaseConfigEntry@5e054d22
21/03/04 14:53:52 INFO Configs$: User Config com.vesoft.nebula.exchange.config.UserConfigEntry@3161b833
21/03/04 14:53:52 INFO Configs$: Connection Config Some(Config(SimpleConfigObject({"retry":3,"timeout":3000})))
21/03/04 14:53:52 INFO Configs$: Execution Config com.vesoft.nebula.exchange.config.ExecutionConfigEntry@7f9c3944
21/03/04 14:53:52 INFO Configs$: Source Config Hive source exec: SELECT sip from sip.ods_sip_dns where day=20210127 and hour=09
21/03/04 14:53:52 INFO Configs$: Sink Config Hive source exec: SELECT sip from sip.ods_sip_dns where day=20210127 and hour=09
21/03/04 14:53:52 INFO Configs$: name ip batch 256
21/03/04 14:53:52 INFO Configs$: Tag Config: Tag name: ip, source: Hive source exec: SELECT sip from sip.ods_sip_dns where day=20210127 and hour=09, sink: Nebula sink addresses: [xxx.81:9669, xxx.82:9669, xxx.83:9669], vertex field: sip, vertex policy: None, batch: 256, partition: 32.
21/03/04 14:53:52 INFO Configs$: Source Config Hive source exec: SELECT dip from sip.ods_sip_dns where day=20210127 and hour=09
21/03/04 14:53:52 INFO Configs$: Sink Config Hive source exec: SELECT dip from sip.ods_sip_dns where day=20210127 and hour=09
21/03/04 14:53:52 INFO Configs$: name ip batch 256
21/03/04 14:53:52 INFO Configs$: Tag Config: Tag name: ip, source: Hive source exec: SELECT dip from sip.ods_sip_dns where day=20210127 and hour=09, sink: Nebula sink addresses: [xxx.81:9669, xxx.82:9669, xxx.83:9669], vertex field: dip, vertex policy: None, batch: 256, partition: 32.
21/03/04 14:53:52 INFO Exchange$: Config Configs(com.vesoft.nebula.exchange.config.DataBaseConfigEntry@5e054d22,com.vesoft.nebula.exchange.config.UserConfigEntry@3161b833,com.vesoft.nebula.exchange.config.ConnectionConfigEntry@c419f174,com.vesoft.nebula.exchange.config.ExecutionConfigEntry@7f9c3944,com.vesoft.nebula.exchange.config.ErrorConfigEntry@55508fa6,com.vesoft.nebula.exchange.config.RateConfigEntry@fc4543af,,List(Tag name: ip, source: Hive source exec: SELECT sip from sip.ods_sip_dns where day=20210127 and hour=09, sink: Nebula sink addresses: [xxx.81:9669, xxx.82:9669, xxx.83:9669], vertex field: sip, vertex policy: None, batch: 256, partition: 32., Tag name: ip, source: Hive source exec: SELECT dip from sip.ods_sip_dns where day=20210127 and hour=09, sink: Nebula sink addresses: [xxx.81:9669, xxx.82:9669, xxx.83:9669], vertex field: dip, vertex policy: None, batch: 256, partition: 32.),List(),Some(HiveConfigEntry:{waredir=hdfs://mycluster/apps/hive/warehouse, connectionURL=jdbc:hive2://sz-abdi1-zookeeper-1.novalocal:2181,sz-abdi1-zookeeper-2.novalocal:2181,sz-abdi1-zookeeper-3.novalocal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/sz-abdi1-hive-1.novalocal@EXAMPLE.COM, connectionDriverName=com.mysql.jdbc.Driver, connectionUserName=hive, connectionPassWord=hive}))
21/03/04 14:53:53 INFO SparkContext: Running Spark version 2.3.2.3.1.0.0-78
21/03/04 14:53:53 INFO SparkContext: Submitted application: Nebula Exchange 2.0
21/03/04 14:53:53 INFO SecurityManager: Changing view acls to: root,ambari-qa
21/03/04 14:53:53 INFO SecurityManager: Changing modify acls to: root,ambari-qa
21/03/04 14:53:53 INFO SecurityManager: Changing view acls groups to:
21/03/04 14:53:53 INFO SecurityManager: Changing modify acls groups to:
21/03/04 14:53:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls enabled; users with view permissions: Set(root, ambari-qa); groups with view permissions: Set(); users with modify permissions: Set(root, ambari-qa); groups with modify permissions: Set()
21/03/04 14:53:54 INFO Utils: Successfully started service 'sparkDriver' on port 36788.
21/03/04 14:54:27 INFO YarnClientSchedulerBackend: Application application_1613965514658_0077 has started running.
21/03/04 14:54:27 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33776.
21/03/04 14:54:27 INFO NettyBlockTransferService: Server created on sz-abdi1-hadoop-datanode-1.novalocal:33776
21/03/04 14:54:27 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/04 14:54:27 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, sz-abdi1-hadoop-datanode-1.novalocal, 33776, None)
21/03/04 14:54:27 INFO BlockManagerMasterEndpoint: Registering block manager sz-abdi1-hadoop-datanode-1.novalocal:33776 with 366.3 MB RAM, BlockManagerId(driver, sz-abdi1-hadoop-datanode-1.novalocal, 33776, None)
21/03/04 14:54:27 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, sz-abdi1-hadoop-datanode-1.novalocal, 33776, None)
21/03/04 14:54:27 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, sz-abdi1-hadoop-datanode-1.novalocal, 33776, None)
21/03/04 14:54:28 INFO JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
21/03/04 14:54:28 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5d601832{/metrics/json,null,AVAILABLE,@Spark}
21/03/04 14:54:28 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/application_1613965514658_0077
21/03/04 14:54:28 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
21/03/04 14:54:28 INFO Exchange$: Processing Tag ip
21/03/04 14:54:28 INFO Exchange$: field keys: sip
21/03/04 14:54:28 INFO Exchange$: nebula keys: ip
21/03/04 14:54:28 INFO Exchange$: Loading from Hive and exec SELECT sip from sip.ods_sip_dns where day=20210127 and hour=09
21/03/04 14:54:28 INFO SharedState: loading hive config file: file:/etc/spark2/3.1.0.0-78/0/hive-site.xml
21/03/04 14:54:28 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('/warehouse/tablespace/managed/hive').
21/03/04 14:57:15 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/03/04 14:57:15 INFO DAGScheduler: ResultStage 1 (foreachPartition at VerticesProcessor.scala:243) finished in 131.451 s
21/03/04 14:57:16 INFO DAGScheduler: Job 0 finished: foreachPartition at VerticesProcessor.scala:243, took 146.968890 s
21/03/04 14:57:16 INFO Exchange$: batchSuccess.ip: 0
21/03/04 14:57:16 INFO Exchange$: batchFailure.ip: 1376
21/03/04 14:57:16 INFO Exchange$: Processing Tag ip
21/03/04 14:57:16 INFO Exchange$: field keys: dip
21/03/04 14:57:16 INFO Exchange$: nebula keys: ip
21/03/04 14:57:16 INFO Exchange$: Loading from Hive and exec SELECT dip from sip.ods_sip_dns where day=20210127 and hour=09
21/03/04 14:57:16 INFO HiveMetaStoreClient: Closed a connection to metastore, current connections: 1
21/03/04 14:57:18 INFO FileSourceStrategy: Pruning directories with: isnotnull(day#176),isnotnull(hour#177),(cast(day#176 as int) = 20210127),(cast(hour#177 as int) = 9)
21/03/04 14:57:18 INFO FileSourceStrategy: Post-Scan Filters:
21/03/04 14:57:18 INFO FileSourceStrategy: Output Data Schema: struct<dip: string>
21/03/04 14:57:18 INFO FileSourceScanExec: Pushed Filters:
21/03/04 14:57:18 INFO PrunedInMemoryFileIndex: Selected 2 partitions out of 2, pruned 0.0% partitions.
21/03/04 14:57:18 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 412.6 KB, free 365.4 MB)
21/03/04 14:57:18 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 35.7 KB, free 365.4 MB)
21/03/04 14:57:18 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on sz-abdi1-hadoop-datanode-1.novalocal:33776 (size: 35.7 KB, free: 366.2 MB)
21/03/04 14:57:19 INFO SparkContext: Created broadcast 3 from foreachPartition at VerticesProcessor.scala:243
21/03/04 14:57:19 INFO FileSourceScanExec: Planning scan with bin packing, max size: 91534365 bytes, open cost is considered as scanning 4194304 bytes.
21/03/04 14:57:19 INFO SparkContext: Starting job: foreachPartition at VerticesProcessor.scala:243
21/03/04 14:57:19 INFO DAGScheduler: Registering RDD 11 (foreachPartition at VerticesProcessor.scala:243)
21/03/04 14:57:19 INFO DAGScheduler: Got job 1 (foreachPartition at VerticesProcessor.scala:243) with 32 output partitions
21/03/04 14:57:19 INFO DAGScheduler: Final stage: ResultStage 3 (foreachPartition at VerticesProcessor.scala:243)
21/03/04 14:57:19 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
21/03/04 14:57:19 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2)
21/03/04 14:57:19 INFO DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[11] at foreachPartition at VerticesProcessor.scala:243), which has no missing parents
21/03/04 14:57:19 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 12.6 KB, free 365.4 MB)
21/03/04 14:57:19 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 5.7 KB, free 365.4 MB)
21/03/04 14:57:19 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on sz-abdi1-hadoop-datanode-1.novalocal:33776 (size: 5.7 KB, free: 366.2 MB)
21/03/04 14:57:19 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1039
21/03/04 14:57:19 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[11] at foreachPartition at VerticesProcessor.scala:243) (first 15 tasks are for partitions Vector(0, 1))