Python版本也需要和服务器对上吧
这个好像不是必须,但是 pyspark 应该有支持范围,我这个是 python 2.7 如果 export PYSPARK_PYTHON=python3
之后,也可以 python3,参考我那个 jupyter notebook
我通过sparkUI可以看见我提交的任务但是就是访问不通meta
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("successful2").setMaster('spark://192.168.1.230:7077')\
.set("--driver-class-path", "/root/download/nebula-spark-connector.jar")\
.set("--driver-class-path", r"D:\PycharmProjects\pythonProject2\path_to\nebula-spark-connector-3.0.0.jar")\
.set("--driver-class-path", "/root/download/nebula-algo.jar")\
.set("--jars", "/root/download/nebula-spark-connector.jar")\
.set("--jars", r"D:\PycharmProjects\pythonProject2\path_to\nebula-spark-connector-3.0.0.jar")\
.set("--jars", "/root/download/nebula-algo.jar")
# spark = SparkContext(conf=conf)
# print(spark)
spark = SparkSession.builder.config(conf=conf).getOrCreate()
print(spark)
运行结果
sparkUI上面也可以看见
但是我去读取meta就会报错
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("successful2").setMaster('spark://192.168.1.230:7077')\
.set("--driver-class-path", "/root/download/nebula-spark-connector.jar")\
.set("--driver-class-path", r"D:\PycharmProjects\pythonProject2\path_to\nebula-spark-connector-3.0.0.jar")\
.set("--driver-class-path", "/root/download/nebula-algo.jar")\
.set("--jars", "/root/download/nebula-spark-connector.jar")\
.set("--jars", r"D:\PycharmProjects\pythonProject2\path_to\nebula-spark-connector-3.0.0.jar")\
.set("--jars", "/root/download/nebula-algo.jar")
# spark = SparkContext(conf=conf)
# print(spark)
spark = SparkSession.builder.config(conf=conf).getOrCreate()
print(spark)
df = spark.read.format(
"com.vesoft.nebula.connector.NebulaDataSource").option(
"type", "vertex").option(
"spaceName", "demo").option(
"label", "player").option(
"returnCols", "name,age").option(
"metaAddress", "192.168.1.230:33473").option(
"partitionNumber", 1).load()
df.show(n=2)
报错信息
E:\anaconda\envs\successful\python.exe D:/PycharmProjects/pythonProject2/1.py
Warning: Ignoring non-Spark config property: --jars
Warning: Ignoring non-Spark config property: --driver-class-path
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
<pyspark.sql.session.SparkSession object at 0x00000298EA03E4E0>
Traceback (most recent call last):
File "D:/PycharmProjects/pythonProject2/1.py", line 64, in <module>
"partitionNumber", 1).load()
File "D:\spark\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\sql\readwriter.py", line 172, in load
return self._df(self._jreader.load())
File "E:\anaconda\envs\successful\lib\site-packages\py4j\java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "D:\spark\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco
return f(*a, **kw)
File "E:\anaconda\envs\successful\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o51.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.vesoft.nebula.connector.NebulaDataSource. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.vesoft.nebula.connector.NebulaDataSource.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
... 13 more
进程已结束,退出代码为 1
我本地的spark和服务器的也对应上了,pyspark是直接用的spark/python/pyspark
此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。