使用 Nebula Exchange 导入数据边数据错误

提问参考模版:

  • nebula 版本:1.2.0, Git: 53f56b69, Build Time: Dec 8 2020 11:20:46

  • 部署方式(Docker):

  • exchange 版本: 1.1.0

  • 硬件信息

    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息
  • 问题的具体描述
    –case
    —tags
    user(user_id string),
    label(tag_type int, tag_value string)
    —edges
    label_edge(optype_1 int, optype_2 int, optype_3 int)
    —数据
    ----user
    {“user_id”:“16114111701381287670533^4”}
    {“user_id”:“199dc87cca97846e986b5043421ccdef38aa3cdf^16”}
    {“user_id”:“40a9b0fa14aa2a6b08bbe273848bc8f670007a7e^16”}
    {“user_id”:“861608039074305^8”}
    {“user_id”:“aaaaaa”}
    {“user_id”:“ccccc”}
    {“user_id”:“dddd”}
    ----label
    {“tag_value”:“111111”,“tag_type”:11,“dest”:“11-111111”}
    {“tag_value”:“1230”,“tag_type”:11,“dest”:“11-1230”}
    {“tag_value”:“12741”,“tag_type”:11,“dest”:“11-12741”}
    {“tag_value”:“14327”,“tag_type”:11,“dest”:“11-14327”}
    {“tag_value”:“222222”,“tag_type”:11,“dest”:“11-222222”}
    {“tag_value”:“3789”,“tag_type”:11,“dest”:“11-3789”}
    {“tag_value”:“6499”,“tag_type”:11,“dest”:“11-6499”}
    {“tag_value”:“6508”,“tag_type”:11,“dest”:“11-6508”}
    ----label_edge
    {“optype_3”:64,“user_id”:“dddd”,“ranking”:1,“dest”:“11-111111”}
    {“optype_3”:82,“user_id”:“dddd”,“ranking”:2,“dest”:“11-111111”}
    {“optype_3”:92,“user_id”:“ccccc”,“ranking”:1,“dest”:“11-222222”}
    {“optype_3”:92,“user_id”:“ccccc”,“ranking”:2,“dest”:“11-222222”}
    {“optype_3”:69,“user_id”:“aaaaaa”,“ranking”:1,“dest”:“11-1230”}
    {“optype_3”:65,“user_id”:“aaaaaa”,“ranking”:1,“dest”:“11-14327”}
    {“optype_3”:86,“user_id”:“861608039074305^8”,“ranking”:1,“dest”:“11-6499”}

    {“optype_3”:73,“user_id”:“40a9b0fa14aa2a6b08bbe273848bc8f670007a7e^16”,“ranking”:1,“dest”:“11-12741”}
    {“optype_3”:41,“user_id”:“199dc87cca97846e986b5043421ccdef38aa3cdf^16”,“ranking”:1,“dest”:“11-6508”}
    {“optype_3”:63,“user_id”:“16114111701381287670533^4”,“ranking”:1,“dest”:“11-3789”}

—创建索引
create edge index label_index on label_edge(optype_3)

-出现的问题
–使用 lookup 查询
lookup on label_edge where label_edge.optype_3 > 0 | yield count(*) 结果为7 ,少了3条边
–使用fetch查询属性
fetch prop on label_edge hash(“dddd”)->hash(“11-111111”) 没有数据
fetch prop on label_edge hash(“dddd”)->hash(“11-111111”)@1 有数据
fetch prop on label_edge hash(“dddd”)->hash(“11-111111”)@2 有数据

配置信息:

{
  # Spark relation config
  spark: {
    app: {
      name: Spark Writer
    }

    driver: {
      cores: 8
      maxResultSize: 10G
    }

    cores {
      max: 16
    }
  }

  # Nebula Graph relation config
  nebula: {
    address:{
      graph: ["172....153.61:3699"]
      meta: ["172.18.153.61:45500"]
    }
    user: userkg
    pswd: 123456
    space: kg_test_7

    connection {
      timeout: 3000
      retry: 3
    }

    execution {
      retry: 3
    }

    error: {
      max: 32
      output: /tmp/errors
    }

    rate: {
      limit: 1024
      timeout: 1000
    }
  }

  # Processing tags
  tags: [

    # Loading from Hive
    {
      name: user
      type: {
        source: hive
        sink: client
      }
      exec: "select user_id from tmp.kg_nebula_user"
      fields: [user_id]
      nebula.fields: [user_id]
      vertex: {
        field: user_id
        policy: "hash"
      }
      batch: 32
      partition: 100
      isImplicit: true
    }

    {
      name: label
      type: {
        source: hive
        sink: client
      }
      exec: "select dest, tag_type, tag_value from tmp.kg_nebula_label"
      fields: [tag_type, tag_value]
      nebula.fields: [tag_type, tag_value]
      vertex: {
        field: dest
        policy: "hash"
      }
      batch: 32
      partition: 100
      isImplicit: true
    }

  ]

  # Processing edges
  edges: [
    # Loading from Hive
    {
      name: label_edge
      type: {
        source: hive
        sink: client
      }
      exec: "select ranking, user_id, dest, optype_1, optype_2, optype_3 from tmp.kg_nebula_label_edge"
      fields: [optype_1, optype_2, optype_3]
      nebula.fields: [optype_1, optype_2, optype_3]
      source: {
        field: user_id
        policy: "hash"
      }
      target: {
        field: dest
        policy: "hash"
      }
      ranking: ranking
      batch: 32
      partition: 100
    }
  ]
}

你看下导入时的日志,导入边有打印错误日志么

fetch的结果是对的,因为你没有rank是0的数据,所以第一条ngql没有数据。

浙ICP备20010487号