Exchange 从 neo4j 导入报错

625562580 · 2021 年1 月 21 日 03:11

报错：Exception in thread “main” java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Interger

625562580 · 2021 年1 月 21 日 03:12

配置文件

{
  # Spark 相关配置
  spark: {
    app: {
      name: Spark Writer
    }

    driver: {
      cores: 1
      maxResultSize: 1G
    }

    cores {
      max: 16
    }
  }

  # Nebula Graph 相关配置
  nebula: {
    address:{
      # 以下为 Nebula Graph 的 Graph 服务和 Meta 服务所在机器的 IP 地址及端口。
      # 如果有多个地址，格式为 "ip1:port","ip2:port","ip3:port"。
      # 不同地址之间以英文逗号 (,) 隔开。
      graph:["0.0.0.1:3699"]
      meta:["0.0.0.1:45500"]
    }
    # 填写的账号必须拥有 Nebula Graph 相应图空间的写数据权限。
    user: root
    pswd: nebula
    # 填写 Nebula Graph 中需要写入数据的图空间名称。
    space: test
    connection {
      timeout: 3000
      retry: 3
    }
    execution {
      retry: 3
    }
    error: {
      max: 32
      output: /tmp/errors
    }
    rate: {
      limit: 64M
      timeout: 1000
    }
  }

  # 处理点数据
  tags: [
    # 设置标签相关信息
    {
    name: movie
    # 设置 Neo4j 数据库服务器地址，String 类型，格式必须为 "bolt://ip:port"。
    server: "bolt://100.84.85.117:7687"

    # Neo4j 数据库登录账号和密码。
    user: neo4j
    password: sure123

    # 传输是否加密，默认值为 false，表示不加密；设置为 true 时，表示加密。
    encryption: false

    # 设置源数据所在 Neo4j 数据库的名称。如果您使用 Community Edition Neo4j，不支持这个参数。
    # database: graph.db

    type: {
        # 指定数据来源，设置为 neo4j。
        source: neo4j
        # 指定点数据导入 Nebula Graph 的方式，
        # 可以设置为：client（以客户端形式导入）和 sst（以 SST 文件格式导入）。
        # 关于 SST 文件导入配置，参考文档：导入 SST 文件（https://
        # docs.nebula-graph.com.cn/nebula-exchange/
        # use-exchange/ex-ug-import-sst/）。
        sink: client
    }

    # 指定 Nebula Graph Schema 中标签对应的属性名称，以列表形式列出，
    # 与 tags.fields 列表中的属性名称一一对应，形成映射关系，
    # 多个属性名称之间以英文逗号（,）隔开。
    nebula.fields: [released, tagline, title]

    # 指定源数据中与 Nebula Graph 标签对应的属性名称，
    # 以列表形式列出，多个属性名称之间以英文逗号（,）隔开，
    # 列出的属性名称必须与 exec 中列出的属性名称保持一致。
    fields       : [released, tagline, title]

    # 将源数据中某个属性的值作为 Nebula Graph 点 VID 的来源，
    # 如果属性为 int 或者 long 类型，使用 vertex 设置 VID 列。
    #vertex: idInt
    # 如果数据不是 int 类型，则添加 vertex.policy 指定 VID 映射策略，建议设置为 "hash"。
    # vertex {
    #     field: title
    #     policy: "hash"
    #}

    # Spark 的分区数量，默认值为 32。
    partition: 10

    # 单次写入 Nebula Graph 的点数据量，默认值为 256。
    batch: 2000

    # 设置保存导入进度信息的目录，用于断点续传，
    # 如果未设置，表示不启用断点续传。
    check_point_path: "file:///tmp/test"

    # 写入 Cypher 语句，从 Neo4j 数据库中检索打了某种标签的点的属性，并指定别名
    # Cypher 语句不能以英文分号（`;`）结尾。
    exec: "match (n:Movie) return n.released as released, n.tagline as tagline, n.title as title order by n.title"
    }
    # 如果有多个标签，则参考以上说明添加更多的标签相关信息。
  ]

  
}

nicole · 2021 年1 月 21 日 03:39

请说明NebulaGraph的版本和使用的exchange的版本
tag配置中缺少 vertex的配置。你给出的配置文件中vertex是被注释掉的。

625562580:

# 将源数据中某个属性的值作为 Nebula Graph 点 VID 的来源，
# 如果属性为 int 或者 long 类型，使用 vertex 设置 VID 列。
#vertex: idInt
# 如果数据不是 int 类型，则添加 vertex.policy 指定 VID 映射策略，建议设置为 "hash"。
# vertex {
#     field: title
#     policy: "hash"
#}

625562580 · 2021 年1 月 21 日 06:45

改了报新的错误；有问题：
nebula graph 1.1.0 exchange1.0.1
报错新问题：No configuration setting found for key ‘address’

{
  # Spark 相关配置
  spark: {
    app: {
      name: Spark Writer
    }

    driver: {
      cores: 1
      maxResultSize: 1G
    }

    cores {
      max: 16
    }
  }

  # Nebula Graph 相关配置
  nebula: {
    address:{
      # 以下为 Nebula Graph 的 Graph 服务和 Meta 服务所在机器的 IP 地址及端口。
      # 如果有多个地址，格式为 "ip1:port","ip2:port","ip3:port"。
      # 不同地址之间以英文逗号 (,) 隔开。
      graph:["127.0.0.1:3699"]
      meta:["127.0.0.1:45500"]
    }
    # 填写的账号必须拥有 Nebula Graph 相应图空间的写数据权限。
    user: root
    pswd: nebula
    # 填写 Nebula Graph 中需要写入数据的图空间名称。
    space: eureka
    connection {
      timeout: 3000
      retry: 3
    }
    execution {
      retry: 3
    }
    error: {
      max: 32
      output: /tmp/errors
    }
    rate: {
      limit: 64M
      timeout: 1000
    }
  }

  # 处理点数据
  tags: [
    # 设置标签相关信息
    {
    name: movie
    # 设置 Neo4j 数据库服务器地址，String 类型，格式必须为 "bolt://ip:port"。
    server: "bolt://100.84.85.117:7687"

    # Neo4j 数据库登录账号和密码。
    user: neo4j
    password: sure123

    # 传输是否加密，默认值为 false，表示不加密；设置为 true 时，表示加密。
    encryption: false

    # 设置源数据所在 Neo4j 数据库的名称。如果您使用 Community Edition Neo4j，不支持这个参数。
    # database: graph.db

    type: {
        # 指定数据来源，设置为 neo4j。
        source: neo4j
        # 指定点数据导入 Nebula Graph 的方式，
        # 可以设置为：client（以客户端形式导入）和 sst（以 SST 文件格式导入）。
        # 关于 SST 文件导入配置，参考文档：导入 SST 文件（https://
        # docs.nebula-graph.com.cn/nebula-exchange/
        # use-exchange/ex-ug-import-sst/）。
        sink: client
    }

    # 指定 Nebula Graph Schema 中标签对应的属性名称，以列表形式列出，
    # 与 tags.fields 列表中的属性名称一一对应，形成映射关系，
    # 多个属性名称之间以英文逗号（,）隔开。
    nebula.fields: [released, tagline, title]

    # 指定源数据中与 Nebula Graph 标签对应的属性名称，
    # 以列表形式列出，多个属性名称之间以英文逗号（,）隔开，
    # 列出的属性名称必须与 exec 中列出的属性名称保持一致。
    fields       : [released, tagline, title]

    # 将源数据中某个属性的值作为 Nebula Graph 点 VID 的来源，
    # 如果属性为 int 或者 long 类型，使用 vertex 设置 VID 列。
   # vertex: idInt
    # 如果数据不是 int 类型，则添加 vertex.policy 指定 VID 映射策略，建议设置为 "hash"。
     vertex {
         field: title
         policy: "hash"
    }

    # Spark 的分区数量，默认值为 32。
    partition: 10

    # 单次写入 Nebula Graph 的点数据量，默认值为 256。
    batch: 2000

    # 设置保存导入进度信息的目录，用于断点续传，
    # 如果未设置，表示不启用断点续传。
    check_point_path: "file:///tmp/test"

    # 写入 Cypher 语句，从 Neo4j 数据库中检索打了某种标签的点的属性，并指定别名
    # Cypher 语句不能以英文分号（`;`）结尾。
    exec: "match (n:Movie) return n.released as released, n.tagline as tagline, n.title as title order by n.title"
    }
    # 如果有多个标签，则参考以上说明添加更多的标签相关信息。
  ]

}

nicole · 2021 年1 月 21 日 06:50

你使用的exchange是1.0.1，但你用的配置文件是1.1.0的。 1.1.0有对配置做调整。

最好按照exchange/src/resources/下面给出的配置文件示例来配置，文档作为参考用于明确每项配置表示的含义。

625562580 · 2021 年1 月 21 日 07:40

请问type 和path怎么填？

tags: [

    # Loading tag from HDFS and data type is parquet
    {
      name: movie
  ？    type: parquet
  ？    path: hdfs tag path 0
      fields: {
        hive-field-0: nebula-field-0,
        hive-field-1: nebula-field-1,
        hive-field-2: nebula-field-2
      }
      vertex: hive-field-0
      batch: 2
      separator: ","
      header: true
    }

625562580 · 2021 年1 月 21 日 07:44

请问exchange 1.0.1的路径配置文档的路径在哪儿？

nicole · 2021 年1 月 21 日 07:45

如果你是用的exchange1.0.1，那么你本地源代码该目录下有示例配置文件： nebula-java/tools/exchange/src/main/resources/application.conf
请参考该配置文件中的配置

625562580 · 2021 年1 月 21 日 07:50

您好！虚拟机是离线的我下载了一个 exchange 1.0.1.jar，在它下面的子目录下的application.conf，是从hive入图的，可否给一个正确application.conf的下载

nicole · 2021 年1 月 21 日 07:57

这是1.0.1的配置示例：你上面报错信息指的是 nebula.address配置错误，要改成下面的nebula.addresses。

{
  # Spark relation config
  spark: {
    app: {
      name: Spark Writer
    }

    driver: {
      cores: 1
      maxResultSize: 1G
    }

    cores {
      max: 16
    }
  }

  # Nebula Graph relation config
  nebula: {
    addresses: ["127.0.0.1:3699"]
    user: user
    pswd: password
    space: test

    connection {
      timeout: 3000
      retry: 3
    }

    execution {
      retry: 3
    }

    error: {
      max: 32
      output: /tmp/errors
    }

    rate: {
      limit: 1024
      timeout: 1000
    }
  }

  # Processing tags
  tags: [

    # Loading tag from HDFS and data type is parquet
    {
      name: tag-name-0
      type: {
        source: parquet
        sink: client
      }
      path: hdfs tag path 0
      fields: {
        parquet-field-0: nebula-field-0,
        parquet-field-1: nebula-field-1,
        parquet-field-2: nebula-field-2
      }
      vertex: hive-field-0
      batch: 256
      partition: 32
    }

    # Loading from Hive
    {
      name: tag-name-1
      type: {
        source: hive
        sink: client
      }
      exec: "select hive-field0, hive-field1, hive-field2 from database.table"
      fields: {
        hive-field-0: nebula-field-0,
        hive-field-1: nebula-field-1,
        hive-field-2: nebula-field-2
      }
      vertex: {
        field: hive-field-0
        policy: "hash"
      }
      vertex: hive-field-0
      batch: 256
      partition: 32
    }

    # Loading tag from HDFS and data type is csv
    {
      name: tag-name-2
      type: {
        source: csv
        sink: client
      }
      path: hdfs tag path 2
      fields: {
        csv-field-0: nebula-field-0,
        csv-field-1: nebula-field-1,
        csv-field-2: nebula-field-2
      }
      vertex: csv-field-0
      separator: ","
      header: true
      batch: 256
      partition: 32
    }
  ]

  # Processing edges
  edges: [
    # Loading tag from HDFS and data type is json
    {
      name: edge-name-0
      type: {
        source: json
        sink: client
      }
      path: hdfs edge path 0
      fields: [json-field-0, json-field-1, json-field-2]
      nebula.fields: [nebula-field-0, nebula-field-1, nebula-field-2]
      source: {
        field: json-field-0
        policy: "hash"
      }
      target: {
        field: json-field-1
        policy: "uuid"
      }
      ranking: json-field-2
      batch: 256
      partition: 32
    }

    # Loading from Hive
    {
      name: edge-name-1
      type: {
        source: hive
        sink: client
      }
      exec: "select hive-field0, hive-field1, hive-field2 from database.table"
      fields: {
        hive-field-0: nebula-field-0,
        hive-field-1: nebula-field-1,
        hive-field-2: nebula-field-2
      }
      source: hive-field-0
      target: hive-field-1
      batch: 256
      partition: 32
    }
  ]
}

625562580 · 2021 年1 月 21 日 08:31

但是这个是hive to nebula的，请问有没有满足1.0.1的conf配置文件

nicole · 2021 年1 月 21 日 08:54

你看一下tags下面，有多种数据源的配置，包括csv、json、hive，你用符合你自己的数据源的配置就好了，把其他删除或者注释掉。