nebula exchange2.0无报错，hive中有数据，但是数据没有从hive导入到图数据中

GuangFeiZHU · 2021 年3 月 26 日 02:31

nebula 版本：2.0-rc1
部署方式（分布式）：
硬件信息
磁盘2T SSD
CPU、内存信息：12核心+128内存
exchange为2.0版本：
nebula-java-rc1
nebula-exchange-2.0.0.jar
问题的具体描述：spark任务提交后，正常执行，未发现日志中有报错，但是数据没有入库到图数据库中

提交命令：

spark-submit  --name ${task_name} \
             --num-executors ${executors} \
             --executor-memory ${memory} \
             --executor-cores ${cores} \
             --master yarn --deploy-mode cluster \
             --executor-cores ${cores} \
             --queue default \
             --files ${config_path} \
             --class com.vesoft.nebula.exchange.Exchange  nebula-exchange-2.0.0.jar -c ${config_file_name} -h

tag的配置：

    {
      name: domain
      type: {
        source: hive
        sink: client
      }
      exec: "SELECT key, md5(key) as vid, found_time, update_time, data_source as source, str1 as attr, str2 as label from csc.ods_csc_vertex  where graph_type='domain' and day='{day}' and hour='{hour}'"
      fields: [key, found_time, update_time, source, attr, label]
      nebula.fields: [key, found_time, update_time, source, attr, label]
      vertex: {
        field: vid
      }
      batch: 256
      partition: 16
    }

相关日志：

21/03/26 09:53:00 INFO Exchange$: Processing Tag domain
21/03/26 09:53:00 INFO Exchange$: field keys: key, found_time, update_time, source, attr, label
21/03/26 09:53:00 INFO Exchange$: nebula keys: key, found_time, update_time, source, attr, label
21/03/26 09:53:00 INFO Exchange$: Loading from Hive and exec SELECT key, md5(key) as vid, found_time, update_time, data_source as source, str1 as attr, str2 as label from csc.dwb_csc_vertex  where graph_type='domain' and day='23' and hour='17'
21/03/26 09:53:00 INFO SharedState: loading hive config file: file:/xxx/abdi/disks/data2/yarn/local/usercache/yunnaosec/filecache/6481/__spark_conf__.zip/__hadoop_conf__/hive-site.xml
21/03/26 09:53:00 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('/warehouse/tablespace/managed/hive').
21/03/26 09:53:00 INFO SharedState: Warehouse path is '/warehouse/tablespace/managed/hive'.
21/03/26 09:53:00 INFO JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL.
21/03/26 09:53:00 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1c9e27c0{/SQL,null,AVAILABLE,@Spark}
21/03/26 09:53:00 INFO JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/json.
21/03/26 09:53:00 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6af39b4e{/SQL/json,null,AVAILABLE,@Spark}
21/03/26 09:53:00 INFO JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution.
21/03/26 09:53:00 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@21f6af31{/SQL/execution,null,AVAILABLE,@Spark}
21/03/26 09:53:00 INFO JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json.
21/03/26 09:53:00 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4755b56{/SQL/execution/json,null,AVAILABLE,@Spark}
21/03/26 09:53:00 INFO JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql.
21/03/26 09:53:00 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2595e781{/static/sql,null,AVAILABLE,@Spark}
21/03/26 09:53:02 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
21/03/26 09:53:04 INFO HiveUtils: Initializing HiveMetastoreConnection version 3.0 using file:/usr/hdp/current/spark2-client/standalone-metastore/standalone-metastore-1.21.2.3.1.0.0-78-hive3.jar:file:/usr/hdp/current/spark2-client/standalone-metastore/hive-ud-2.2.0.jar:file:/xxx/abdi/disks/data1/yarn/local/usercache/yunnaosec/appcache/application_1616364960915_0390/container_e217_1616364960915_0390_01_000001/__hive_libs__/standalone-metastore-1.21.2.3.1.0.0-78-hive3.jar:file:/usr/hdp/current/spark2-client/standalone-metastore/standalone-metastore-1.21.2.3.1.0.0-78-hive3.jar:file:/usr/hdp/current/spark2-client/standalone-metastore/hive-ud-2.2.0.jar
21/03/26 09:53:04 INFO HiveConf: Found configuration file file:/usr/hdp/current/spark2-client/conf/hive-site.xml
Hive Session ID = 599d0e97-68a2-4e52-8a22-90736eb8e58f
21/03/26 09:53:05 INFO SessionState: Hive Session ID = 599d0e97-68a2-4e52-8a22-90736eb8e58f
21/03/26 09:53:05 INFO SessionState: Created local directory: /xxx/abdi/disks/data1/yarn/local/usercache/yunnaosec/appcache/application_1616364960915_0390/container_e217_1616364960915_0390_01_000001/tmp/yunnaosec
21/03/26 09:53:05 INFO SessionState: Created HDFS directory: /tmp/spark/yunnaosec/599d0e97-68a2-4e52-8a22-90736eb8e58f
21/03/26 09:53:05 INFO SessionState: Created local directory: /xx/abdi/disks/data1/yarn/local/usercache/yunnaosec/appcache/application_1616364960915_0390/container_e217_1616364960915_0390_01_000001/tmp/yunnaosec/599d0e97-68a2-4e52-8a22-90736eb8e58f
21/03/26 09:53:05 INFO SessionState: Created HDFS directory: /tmp/spark/yunnaosec/599d0e97-68a2-4e52-8a22-90736eb8e58f/_tmp_space.db
21/03/26 09:53:05 INFO HiveClientImpl: Warehouse location for Hive client (version 3.0.0) is /warehouse/tablespace/managed/hive
21/03/26 09:53:07 INFO HiveMetaStoreClient
21/03/26 09:53:48 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 9) in 15070 ms on xxxhadoop-datanode-1.novalocal (executor 3) (14/16)
21/03/26 09:53:48 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 6) in 15313 ms on xxxhadoop-datanode-1.novalocal (executor 3) (15/16)
21/03/26 09:53:48 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 3) in 15325 ms on xxxhadoop-datanode-1.novalocal (executor 3) (16/16)
21/03/26 09:53:48 INFO YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/03/26 09:53:48 INFO DAGScheduler: ResultStage 1 (foreachPartition at VerticesProcessor.scala:243) finished in 16.087 s
21/03/26 09:53:49 INFO DAGScheduler: Job 0 finished: foreachPartition at VerticesProcessor.scala:243, took 16.670279 s
21/03/26 09:53:49 INFO Exchange$: batchSuccess.domain: 0
21/03/26 09:53:49 INFO Exchange$: batchFailure.domain: 0
21/03/26 09:53:49 INFO Exchange$: Processing Tag url

nicole · 2021 年3 月 26 日 03:13

你先在spark-sql 中执行下这个select 语句，看下结果呢

另：你是yarn-cluster模式提交的，你看到的日志是driver的日志，具体的向Nebula导入过程中打印的日志在executor上，你可以去yarn的18088 界面上去看日志，找下有没有导入error的信息。

GuangFeiZHU · 2021 年3 月 29 日 03:05

一直以为是hive sql,就只查看了hive表，实际上spark-sql没查询到内容，重新导入数据，通过spark sql进行查询可以查到数据，导入成功了。