spark连接nebula graph失败,有大神指导一下吗?
spark版本2.4.2,nebula graph版本3.2.0,connector版本3.4.2;
过程步骤为
1.创建虚拟环境:conda activate studio;
2.进入目录:cd /root/MYH/connector/nebula-spark-connector/nebula-spark-connector/target/
3.提交指令:spark-submit
–master local[4]
–jars /root/MYH/connector/nebula-spark-connector/nebula-spark-connector_2.2/target/nebula-spark-connector_2.2-3.4.2-jar-with-dependencies.jar
–conf “spark.executor.memory=4g”
–conf “spark.driver.memory=2g”
Read.py
报错为:
py4j.protocol.Py4JJavaError: An error occurred while calling o50.load.
py文件内容为:
from pyspark.sql import SparkSession
def main():
# 创建 SparkSession
spark = SparkSession.builder
.appName(“Read”)
.config(“spark.driver.extraJavaOptions”, “-Xss4m”)
.config(“spark.executor.extraJavaOptions”, “-Xss4m”)
.getOrCreate()
# 读取 Nebula Graph 的点数据
vertex_df = spark.read.format("com.vesoft.nebula.connector.NebulaDataSource") \
.option("metaAddress", "localhost:9559") \
.option("spaceName", "test") \
.option("limit", 100) \
.option("type", "vertex") \
.option("label", "player")\
.option("returnCols", "*") \
.option("partitionNumber", "1")\
.load()
# 打印 DataFrame 的 schema 以确认其结构
print("DataFrame Schema:")
vertex_df.printSchema()
# 打印 DataFrame 的内容以确认数据是否被正确读取
print("DataFrame Content:")
vertex_df.show(truncate=False) # 设置 truncate=False 可以显示完整列内容
# 使用 count() 方法检查 DataFrame 中是否有数据
print("Number of rows in DataFrame:")
print(vertex_df.count())
# 停止 SparkSession
spark.stop()
if name == “main”:
main()