exchange导入hdfs上的csv报错

  • nebula 版本:2.6.0
  • 部署方式:单机
  • 安装方式:Docker-compose
  • 是否为线上版本:N
  • 硬件信息
    96C200G30T
  • spark版本: 2.4.0
  • scala版本: 2.11.12
  • 问题的具体描述
    exchange导入hdfs上的csv报错
  • 报错信息
  • 导入命令
spark2-submit \
--name nebula_exchange \
--deploy-mode cluster \
--queue v_wdfxb_d \
--conf spark.yarn.maxAppAttempts=1 \
--files hive_application.conf \
--class com.vesoft.nebula.exchange.Exchange nebula-exchange-2.6.0.jar -c hive_application.conf
  • hive_application.conf
{

  spark: {
    app: {
      name: Nebula Exchange 2.6.0
    }

    driver: {
      cores: 1
      maxResultSize: 1G
    }

    executor: {
        memory: 1G
    }

    cores:{
      max: 16
    }
  }

  nebula: {
    address:{
      graph:["172.18.0.9:9669","172.18.0.8:9669","172.18.0.10:9669"]
      meta:["172.18.0.3:9559","172.18.0.2:9559","172.18.0.4:9559"]
    }
    user: root
    pswd: nebula
    space: basketballplayer

    connection {

      timeout: 30000
    }

    error: {

      max: 1

      output: /tmp/wangjinchao/nebula/exchange/errors
    }

    rate: {

      limit: 1024

      timeout: 1000
    }
  }

  tags: [

    {

      name: player
      type: {

        source: csv

        sink: client
      }

      path: "hdfs://10.234.12.14:8020/tmp/wangjinchao/nebula/vertex_player.csv"

      fields: [_c1, _c2]

      nebula.fields: [age, name]

      vertex: {
        field:_c0

      }

      separator: ","

      header: false

      batch: 128

      partition: 32
    }

    {

      name: team
      type: {

        source: csv

        sink: client
      }

      path: "hdfs://10.234.12.14:8020/tmp/wangjinchao/nebula/vertex_team.csv"

      fields: [_c1]

      nebula.fields: [name]

      vertex: {
        field:_c0

      }

      separator: ","

      header: false

      batch: 128

      partition: 32
    }

  ]

  edges: [

    {

      name: follow
      type: {

        source: csv

        sink: client
      }

      path: "hdfs://10.234.12.14:8020/tmp/wangjinchao/nebula/edge_follow.csv"

      fields: [_c2]

      nebula.fields: [degree]

      source: {
        field: _c0
      }
      target: {
        field: _c1
      }

      separator: ","

      header: false

      batch: 128

      partition: 32
    }

    {

      name: serve
      type: {

        source: csv

        sink: client
      }

      path: "hdfs://10.234.12.14:8020/tmp/wangjinchao/nebula/edge_serve.csv"

      fields: [_c2,_c3]

      nebula.fields: [start_year, end_year]

      source: {
        field: _c0
      }
      target: {
        field: _c1
      }

      separator: ","

      header: false

      batch: 128

      partition: 32
    }

  ]

}
1 个赞

单机。为啥有多个 graphd?

在单个机器上用docker-compose搭的nebula, 应该是因为有三个graphd的容器吧

回题,:thinking: 你的那个报错信息是不是没全,感觉是 HDFS 的读取问题,我简单搜了下 HDFS 报错 https://blog.csdn.net/u012037852/article/details/71708925 你看看那这个帖子的法子是不是可行。

谢谢, 但是好像不太行, 我把jar包的参数-c改成hdfs的路径之后直接运行不起来了 :joy:

我这个文章里的 source 也是 hdfs,你可以照着弄一个自己的容器里的 hdfs 试试哈?

看报错直接指向登录 hdfs 鉴权失败

这是公司搭好的hdfs了…

嗯嗯,看报错是没法鉴权

是kerberos认证吗, 我在提交命令之前已经认证过了

不知道这个会不会帮到你,现在就是 spark 访问 hdfs 鉴权不通过

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。