nebula 导入avro序列化后的数据,感谢

  • nebula 版本:3.2.1
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本: N
  • nebula导入avro序列化的数据,提示报错如下,请问是否可以导入avro序列化数据,感谢。
INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.streaming.DataStreamWriter.foreachBatch(Lscala/Function2;)Lorg/apache/spark/sql/streaming/DataStreamWriter;
	at com.vesoft.nebula.exchange.processor.VerticesProcessor.process(VerticesProcessor.scala:172)
	at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:123)
	at com.vesoft.nebula.exchange.Exchange$$anonfun$main$2.apply(Exchange.scala:95)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at com.vesoft.nebula.exchange.Exchange$.main(Exchange.scala:95)
	at com.vesoft.nebula.exchange.Exchange.main(Exchange.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
…

你的 exchange 什么版本的?导入的代码贴一下。

spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange /mnt/sdl/xsn/b1_nebula_test/tool/nebula-exchange_spark_2.4-3.0.0.jar -c
配置文件如下


{
  spark: {
    app: {
      name: Nebula Exchange 3.0.0
    }
    driver: {
      cores: 1
      maxResultSize: 1G
    }
    executor: {
        memory:1G
    }

    cores: {
      max: 16
    }
  }

  nebula: {
    address:{
      graph:["192.168.1.35:9669","192.168.1.36:9669","192.168.1.37:9669"]
      meta:["192.168.1.35:9559","192.168.1.36:9559","192.168.1.37:9559"]
    }

    space: est_kafka
    connection: {
      timeout: 3000
      retry: 3
    }
    execution: {
      retry: 3
    }
    error: {
      max: 32
      output: /mnt/errors
    }
    rate: {
      limit: 1024
      timeout: 1000
    }
  }
tags: [
    {
      name: ioc
      type: {
        source: kafka
        sink: client
      }
     service:"34:6667"
     topic:"vertex_ioc"
     fields:[ioc_type,malicious_type,family,opinion,severity,pattern]
     nebula.fields:[ioc_type,malicious_type,family,opinion,severity,pattern]
      vertex: {
        field:ioc_id
      }
      batch: 256
      partition: 32
      startOffset:latest
    }
  ]
}

spark啥版本,你用着这个exchange包只支持spark2.4.x

spark是2.4没问题,之前导入json都测试通过了,官方是支持avro序列化数据反序列化后导入的是吗,

你上面的报错跟数据序列化没任何关系,先把这个问题解决了吧。 你执行下 spark-submit --version, 把结果贴上来

1 个赞

Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .__/_,// //_\ version 2.3.2.3.1.0.0-78
/
/

Using Scala version 2.11.8, Java HotSpot™ 64-Bit Server VM, 1.8.0_271
Branch HEAD
Compiled by user jenkins on 2018-12-06T12:26:34Z
Revision 9b78096afddf26e2d73f0c078a112c9bf979ed53
Url git@github.com:hortonworks/spark2.git

请问是否与httpclient的依赖包有关系,是否可以附一下exchange使用的依赖包版本

这不是 2.4 吧?

Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
// .__/_,// //_\ version 2.3.2.3.1.0.0-78
//

Using Scala version 2.11.8, Java HotSpot™ 64-Bit Server VM, 1.8.0_271
Branch HEAD
Compiled by user jenkins on 2018-12-06T12:26:34Z
Revision 9b78096afddf26e2d73f0c078a112c9bf979ed53
Url git@github.com:hortonworks/spark2.git