想问下spark-connector支持spark3吗

Nebula:3.0.0,目前我们spark集群会统一升级为spark3,想问下 spark-connector能够支持吗

貌似现在还不支持,主要是 spark3 用的scala版本是1.12. 而spark-connector 用的scala版本是1.11.

spark2.3.0好像也不支持,這個可以通過編譯源碼來支持嗎

应该是可以通过修改源码 重新编译支持的

我去下載 https://github.com/vesoft-inc/nebula-spark-connector/tree/v3.0.0 ,然後改spark的版本2.3.0,在进行maven编译打包就会报错,后面我改成2.4.x就能正常编译

我记得 spark 支持版本是 2.4 以上

是的 但是我们线上环境是2.3.0. 我自己搭建虚拟机2.4.4可以跑的通,但是线上版本升级比较难,所以再想有没有什么办法可以兼容

@darionyaphet 有好的支持方法吗?

编译一个 2.3 的 jar 然后部署到线上

我也這麽想 就是 下載github的代碼后 改成2.3.0就一直編譯打包不成功

我想问下,我是直接修改pom文件spark的版本,然后mvn clean package 打包,然后就不行报错,想问下应该怎么修改

报啥错?

如果spark版本是2.4.4就不会报错,我改成2.3.0就会报错如下:Could not transfer artifact commons-codec:commons-codec:pom:1.15-SNAPSHOT from/to snapshots (https://oss.sonatype.org/content/repositories/snapshots/): Transfer failed for https://oss.sonatype.org/content/repositories/snapshots/commons-codec/commons-codec/1.15-SNAPSHOT/commons-codec-1.15-SNAPSHOT.pom

重试 或者换个maven 源

我换了种方式,不调用nebula-spark这个API。我现在是采用nebula-java的客户端,通过spark拉取数据Dataset,然后通过ForeachPartitionFunction重写call方法,主要逻辑就是获取nebula的session,执行写入操作,开发测试没问题,到线上就会出现线上数据量大,就会出现如下问题:

Caused by: com.vesoft.nebula.client.graph.exception.NotValidConnectionException: No extra connection: All servers are broken.
at com.vesoft.nebula.client.graph.net.NebulaPool.getConnection(NebulaPool.java:215)
at com.vesoft.nebula.client.graph.net.NebulaPool.getSession(NebulaPool.java:137)
at graph.write.NebulaForeachPartition.call(NebulaForeachPartition.java:60)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$2.apply(Dataset.scala:2691)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$2.apply(Dataset.scala:2691)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)

查看Nebula Graph的日志:

I20220331 17:53:00.622897 19428 GraphService.cpp:67] Authenticating user root from [::ffff:10.227.111.121]:44961
I20220331 17:53:00.622998 19425 GraphService.cpp:67] Authenticating user root from [::ffff:10.227.111.121]:44960
I20220331 17:53:02.821435 19428 GraphService.cpp:67] Authenticating user root from [::ffff:10.227.111.125]:42508
I20220331 17:53:02.821532 19426 GraphService.cpp:67] Authenticating user root from [::ffff:10.227.111.125]:42504
E20220331 18:44:26.330394 19548 HeaderServerChannel.cpp:100] Received invalid request from client: apache::thrift::transport::TTransportException: Header transport frame is too large: 4294246397 (hex 0xfff4fffd) (transport apache::thrift::PreReceivedDataAsyncTransportWrapper, address ::ffff:10.227.111.101, port 40594)
E20220331 18:44:26.339790 19548 PeekingManager.h:262] peekSuccess failed, dropping connection: apache::thrift::transport::TTransportException: Channel is !good()
E20220331 18:46:04.687847 19486 HeaderServerChannel.cpp:100] Received invalid request from client: apache::thrift::transport::TTransportException: Header transport frame is too large: 4294246397 (hex 0xfff4fffd) (transport apache::thrift::PreReceivedDataAsyncTransportWrapper, address ::ffff:10.227.111.131, port 39136)
E20220331 18:46:04.688063 19486 PeekingManager.h:262] peekSuccess failed, dropping connection: apache::thrift::transport::TTransportException: Channel is !good()

拼的太大了

我现在是1000条写一次,那我把每次写入的批次减少试试

嗯嗯,如果你尝试之后可以了,记得来更新下帖子哈 :joy: 不可以的话也来更新下帖子哈

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。