nebula-exchange 3.6.1导入neo4j数据缺失

Ian · 2024 年5 月 14 日 02:56

nebula 版本：3.6.0
nebula-exchange 版本：3.6.1
部署方式：分布式
安装方式：RPM
是否上生产环境：Y
硬件信息
- 磁盘：SSD
- CPU、内存信息：16C128G
问题的具体描述
使用nebula-exchange导入neo4j数据，其中一个关系address_book出现数据缺失，address_book关系数据量230亿，使用nebula-exchange导入成功，从日志中观察到recordSuccess数量是和neo4j数量一致的。
但是在nebula中执行完成submit job compact和submit job stats，观察到数据只有20亿。
相关的 meta / storage / graph info 日志信息
nebula-graph.ERROR、nebula-storaged.ERROR、nebula-meta.ERROR没有任何错误日志
启动命令
/usr/local/datadisk/spark-2.4.8-bin-hadoop2.7/bin/spark-submit --master local[*] --total-executor-cores 30 --executor-cores 2 --executor-memory 2g --conf spark.driver.memory=90G --class com.vesoft.nebula.exchange.Exchange nebula-exchange_spark_2.4/target/nebula-exchange_spark_2.4-3.6.1.jar -c nebula-exchange_spark_2.4/src/main/resources/address_book_edge.conf -h > nebula_edge.log 2>&1 &

{
  # address_book_edge相关配置
  spark: {
    app: {
      name: NebulaGraph Exchange 3.6.1
    }
    driver: {
      cores: 1
      maxResultSize: 1G
    }
    cores: {
      max: 32
    }
  }
  # NebulaGraph 相关配置
  nebula: {
    address:{
      graph: ["172.22.9.1:9669","172.22.9.2:9669","172.22.9.3:9669"]
      meta: ["172.22.9.1:9569","172.22.9.2:9569","172.22.9.3:9569"]
    }
    user: root
    pswd: 
    space: nebula_graph
    connection: {
      timeout: 30000
      retry: 3
    }
    execution: {
      retry: 3
    }
    error: {
      max: 32
      output: /tmp/errors
    }
    rate: {
      limit: 10000
      timeout: 30000
    }
  }
  # 处理边数据
  edges: [
    {
      name: address_book
      type: {
        source: neo4j
        sink: client
      }
      server: "bolt://172.22.9.4:7687"
      user: neo4j_read
      password: 
      exec: "match (a:uid)-[r:address_book]->(b:phone_number) return toString(a.name) as src, left(b.name,128) as dst"
      fields: []
      nebula.fields: []
      source: {
        field: src
      }
      target: {
        field: dst
      }
      partition: 60
      batch: 2000
      check_point_path: /data/code/nebula-exchange/checkpoint/edge/address_book
    }
  ]
}

MuYi-方扬 · 2024 年5 月 15 日 01:29

需要检查下，里面是否有重复数据；即 uid和 phone_number 都相同。如果都相同你现在的配置会直接覆盖；
如果需要区分 uid 和 phone_number 都相同的情况，可以通过ranking字段。
具体请注意看 nebula 的数据结构

system · 2024 年6 月 14 日 01:30

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。