nebula-spark-connection操作pyspark报错

  • nebula 版本:3.6
  • 部署方式:云端
  • 安装方式:RPM
  • 是否上生产环境: N
  • 问题的具体描述
    使用spark-connector的pyspark测试时,github的测试代码报错
# read vertex
df = spark.read.format(
  "com.vesoft.nebula.connector.NebulaDataSource").option(
    "type", "vertex").option(
    "spaceName", "demo_basketballplayer").option(
    "label", "player").option(
    "returnCols", "name,age").option(
    "metaAddress", "meta0:9559").option(
    "partitionNumber", 1).load()

报错日志

org.apache.linkis.engineplugin.spark.exception.ExecuteError: errCode: 40003 ,desc: Py4JJavaError: An error occurred while calling o104.load.
: java.util.NoSuchElementException: key not found: operateType
	at scala.collection.MapLike$class.default(MapLike.scala:228)
	at org.apache.spark.sql.catalyst.util.CaseInsensitiveMap.default(CaseInsensitiveMap.scala:28)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at org.apache.spark.sql.catalyst.util.CaseInsensitiveMap.apply(CaseInsensitiveMap.scala:28)
	at com.vesoft.nebula.connector.NebulaOptions.<init>(NebulaOptions.scala:38)
	at com.vesoft.nebula.connector.NebulaDataSource.getNebulaOptions(NebulaDataSource.scala:148)
	at com.vesoft.nebula.connector.NebulaDataSource.createReader(NebulaDataSource.scala:45)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$SourceHelpers.createReader(DataSourceV2Relation.scala:155)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:172)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:186)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

是 spark 3 对么?应该是要带上 operateType 的 option spark 3 最近支持得, pyspark 我们还没验证过

缺的默认 option 可以在 https://github.com/vesoft-inc/nebula-spark-connector/blob/master/nebula-spark-common/src/main/scala/com/vesoft/nebula/connector/NebulaOptions.scala 里找一下,填上,列一下,回头我们更新到 pyspark 文档例子里(也欢迎来 PR)补充例子

spark2.4.7 Scala2.11.12
刚测试了,得加上"operateType","read"才可以

1 个赞

@nicole 奇怪,为什么我的代码里这个选项不是必须的,是因为我用的不是最新的 spark connector 么?

https://github.com/wey-gu/nebulagraph-ai/blob/bc349ec16d8c2d0ac22f7f26c9a40e29ff8383bd/ng_ai/nebula_reader.py#L203

https://github.com/vesoft-inc/nebula-spark-connector/pull/122

我用的spark-connect为https://repo1.maven.org/maven2/com/vesoft/nebula-spark-connector/3.6.0/下载的nebula-spark-connector-3.6.0.jar

感谢,应该是 3.6.0 的行为变化。

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。