添加nebula-spark-connector依赖出现错误

根据nebula-spark-connector的README文件执行了以下步骤:

$ git clone https://github.com/vesoft-inc/nebula-spark-utils.git
$ cd nebula-spark-utils/nebula-spark-connector
$ mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true

编译打包完成后,可以在 nebula-spark-utils/nebula-spark-connector/target/ 目录下看到 nebula-spark-connector-2.0.0.jar 文件。
编译打包完成的是nebula-spark-connector-3.0.0.jar
编写了一个scala文件来读取nebula graph数据
代码如下:


使用scala my_reader.py命令
出现错误:

/root/spark-project/my_reader.scala:1: error: object NebulaConnectionConfig is not a member of package com.vesoft.nebula
import com.vesoft.nebula.NebulaConnectionConfig
       ^
/root/spark-project/my_reader.scala:2: error: object configs is not a member of package com.vesoft.nebula.client.meta
import com.vesoft.nebula.client.meta.configs.ReadNebulaConfig
                                     ^
/root/spark-project/my_reader.scala:3: error: object sql is not a member of package org.apache.spark
import org.apache.spark.sql.SparkSession
                        ^
/root/spark-project/my_reader.scala:7: error: not found: value SparkSession
    val spark = SparkSession.builder()
                ^
/root/spark-project/my_reader.scala:11: error: not found: value NebulaConnectionConfig
    val config = NebulaConnectionConfig
                 ^
/root/spark-project/my_reader.scala:19: error: not found: type ReadNebulaConfig
    val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
                                ^
/root/spark-project/my_reader.scala:19: error: not found: value ReadNebulaConfig
    val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
                                                   ^

这个代码截图的第二行代码不对,不知道你的代码是不是 gpt 给你写的,你看 spark-connector 里的示例代码,不要用 AI 机器人编的。

看了示例里的代码之后,把代码修改成了

package com.vesoft.nebula.examples.connector
import com.facebook.thrift.protocol.TCompactProtocol
import com.vesoft.nebula.connector.connector.NebulaDataFrameReader
import com.vesoft.nebula.connector.{NebulaConnectionConfig, ReadNebulaConfig}
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.slf4j.LoggerFactory

object NebulaSparkReaderExample {

  private val LOG = LoggerFactory.getLogger(this.getClass)

  def main(args: Array[String]): Unit = {

    val sparkConf = new SparkConf
    sparkConf
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .registerKryoClasses(Array[Class[_]](classOf[TCompactProtocol]))
    val spark = SparkSession
      .builder()
      .master("local")
      .config(sparkConf)
      .getOrCreate()

    readVertex(spark)
    
    spark.close()
    sys.exit()
  }

  def readVertex(spark: SparkSession): Unit = {
    LOG.info("start to read nebula vertices")
    val config =
      NebulaConnectionConfig
        .builder()
        .withMetaAddress("")
        .withConenctionRetry(2)
        .build()
    val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
      .builder()
      .withSpace("my_space_1")
      .withLabel("team")
      .withNoColumn(false)
      .withReturnCols(List())
      .withLimit(10)
      .withPartitionNum(10)
      .build()
    val vertex = spark.read.nebula(config, nebulaReadVertexConfig).loadVerticesToDF()
    vertex.printSchema()
    vertex.show(20)
    println("vertex count: " + vertex.count())
  }

}

使用命令scala my_reader.scala编译,结果报错
error: illegal start of definition
package com.vesoft.nebula.examples.connector
这句代码是第一行,前面也没有空格注释,没有学过scala,没有找到解决办法,请问该怎么修改,还是代码出错了?

所以你到底改了啥。。没必要的话不需要改啊,你改改一些连接参数就好了呀

你把 ssl import 删了?package 和 import 不需要中间用空行分割。

那还是用第一版的代码,根据spark-connector里的代码修改第二行代码,还是会报错说找不到这些类和对象,是哪个依赖没有安装好的问题吗,还是版本兼容问题
代码:

import com.vesoft.nebula.connector.{NebulaConnectionConfig, ReadNebulaConfig}
import org.apache.spark.sql.SparkSession

object MyReader {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .appName("My Reader")
      .getOrCreate()

    val config = NebulaConnectionConfig
      .builder()
      .withMetaAddress("")
      .withConnectionRetry(2)
      .withExecuteRetry(2)
      .withTimeout(6000)
      .build()

    val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
      .builder()
      .withSpace("my_space_1")
      .withLabel("team")
      .withNoColumn(false)
      .withReturnCols(List())
      .withLimit(10)
      .withPartitionNum(10)
      .build()

    val vertex = spark.read.nebula(config, nebulaReadVertexConfig).loadVerticesToDF()


    spark.stop()
  }
}
报错:
/root/spark-project/my_reader.scala:1: error: object vesoft is not a member of package com
import com.vesoft.nebula.connector.{NebulaConnectionConfig, ReadNebulaConfig}
           ^
/root/spark-project/my_reader.scala:2: error: object apache is not a member of package org
import org.apache.spark.sql.SparkSession
           ^
/root/spark-project/my_reader.scala:6: error: not found: value SparkSession
    val spark = SparkSession
                ^
/root/spark-project/my_reader.scala:11: error: not found: value NebulaConnectionConfig
    val config = NebulaConnectionConfig
                 ^
/root/spark-project/my_reader.scala:19: error: not found: type ReadNebulaConfig
    val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
                                ^
/root/spark-project/my_reader.scala:19: error: not found: value ReadNebulaConfig
    val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
                                                   ^

麻烦大神再指点一下,感谢!

代码改了之后没有出错了,但是读取nebula graph里的边,没有任何输出,不知道是什么原因。
代码如下:

import com.vesoft.nebula.connector.{NebulaConnectionConfig, ReadNebulaConfig}
import org.apache.spark.sql.SparkSession
import com.vesoft.nebula.connector.connector.NebulaDataFrameReader
object MyReader {
  def main(args: Array[String]): Unit = {
    var sparkSession: SparkSession = null
    val spark = SparkSession
      .builder()
      .appName("My Reader")
      .getOrCreate()
    val config = NebulaConnectionConfig
      .builder()
      .withMetaAddress("127.0.0.1:9669")
      .withConenctionRetry(2)
      .withExecuteRetry(2)
      .withTimeout(6000)
      .build()
    val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
      .builder()
      .withSpace("my_space_1")
      .withLabel("serve")
      .withNoColumn(true)
      .withLimit(10)
      .withPartitionNum(10)
      .build()
    val vertex = sparkSession.read.nebula(config, nebulaReadVertexConfig).loadVerticesToDF()
    vertex.printSchema()
    vertex.show()
    assert(vertex.count() == 18)
    assert(vertex.schema.fields.length == 1)
    spark.stop()
  }
}

使用spark运行后,输出只有
defined object MyReader

这里要填写 meta 端口,9559,示例也是 9559 呀,你咋改成 graphd 的端口了。

1.改成9559后还是没有数据输出,只有
defined object MyReader,运行spark-connector中的读取数据的例子也是只输出定义了类,运行的方法先是使用spark-shell,然后使用:load my_reader.scala,方法是有什么问题吗
2.val nebulaReadVertexConfig: ReadNebulaConfig = ReadNebulaConfig
.builder()
.withSpace(“my_space_1”)
.withLabel(“serve”)
.withNoColumn(true)
.withLimit(10)
.withPartitionNum(10)
.build()
这些里面的配置是填写graph中自己上传数据的信息吗
3.对scala语言不熟,请问有使用python语言的例子吗

GitHub - vesoft-inc/nebula-python: Client API of Nebula Graph in Python 可以看下这个 README。

meta 是用来通信的,不知道你是不是其他地方配置有问题。:thinking:

上面的信息是根据你自己的数据来的定制数据,比如 my_space_1,是你创建的图空间名,如果你没有同名的图空间的话,会报错;同样的,下面的 label serve 就是你创建的类型名。

1 个赞

好的,非常感谢!