nebula algorithm调用时报错NoClassDefFoundError

  • nebula 版本:3.3.0
  • spark 版本: spark-2.4.2-bin-hadoop2.7/bin/spark-submit
  • nebula algorithm版本: nebula-algorithm-3.0.0.jar

使用时是部署在linux虚拟机环境调用的, 想调用algorithm中的连通分量算法,对于配置文件application.conf如下:

{
  # Spark相关配置
  spark: {
    app: {
      name: "ConnectedComponent-GangBuild"  # 任务名称(自定义)
      partitionNum: 10  # 分区数,与你的cyf图空间分区数一致(之前建的是10)
    }
    master: local  # 单机模式,利用所有CPU核心
  }

  data: {
    source: nebula  # 数据来源:Nebula
    sink: nebula    # 结果写入:Nebula(关键!写回user节点的gang_id)
    hasWeight: false  # 无需权重(团伙识别仅需关联关系)
  }

  # Nebula读取配置(从cyf图空间读取数据)
  nebula: {
    read: {
      metaAddress: "127.0.0.1:9559"  # 你的Nebula Meta地址(端口9559)
      space: "hyl"                      # 目标图空间(必须是cyf)
      labels: ["user_inline"]           # 你的边类型(用户-关联号码的边)
      weightCols: [""]                  # 无权重,留空
    }

    # Nebula写入配置(结果写回cyf图空间的user节点)
    write: {
      graphAddress: "127.0.0.1:9669"  # 你的Nebula Graph地址(端口9669)
      metaAddress: "127.0.0.1:9559"   # 与read的Meta地址一致
      user: "root"                       # Nebula默认用户名
      pswd: "nebula"                     # Nebula默认密码(若修改过需替换)
      space: "hyl"                       # 与读取的图空间一致
      tag: "user"                        # 写入到user标签(用户节点)
      type: "update"                     # 更新模式(给现有user节点添加gang_id)
      graphProperty: "gang_id"           # 写入的属性名(之前添加的团伙ID字段)
    }
  }

  local: {
    # 本地读写配置(当前用不到,保留默认)
    read:{
      filePath: "file:///tmp/algo_edge.csv"
      srcId:"_c0"
      dstId:"_c1"
      header: false
      delimiter:","
    }
    write:{
      resultPath:/tmp/count
    }
  }

  algorithm: {
    executeAlgo: connectedcomponent  # 核心算法:连通分量(团伙识别)
    connectedcomponent: {
      maxIter: 20                     # 最大迭代次数(默认即可)
      encodeId: true                  # 开启ID编码(适配你的字符串类型用户ID)
    }

    # 其他算法参数(无需修改,保留默认)
    pagerank: { maxIter: 10, resetProb: 0.15, encodeId:false }
    louvain: { maxIter: 21, internalIter: 10, tol: 0.5, encodeId:false }
    labelpropagation: { maxIter: 20, encodeId:false }
    shortestpaths: { landmarks: "1" }
    degreestatic: { encodeId:false }
    kcore:{ maxIter:10, degree:1, encodeId:false }
    trianglecount:{ encodeId:false }
    graphtrianglecount:{}
    betweenness:{ maxIter:5, encodeId:false }
    clusteringcoefficient:{ type: local, encodeId:false }
    closeness:{}
    bfs:{ maxIter:5, root:"10", encodeId:false }
    dfs:{ maxIter:5, root:"10", encodeId:false }
    hanp:{ hopAttenuation:0.1, maxIter:10, preference:1.0, encodeId:false }
    node2vec:{ maxIter: 10, lr: 0.025, dataNumPartition: 10, modelNumPartition: 10, dim: 10, window: 3, walkLength: 5, numWalks: 3, p: 1.0, q: 1.0, directed: false, degree: 30, embSeparate: ",", modelPath: "hdfs://127.0.0.1:9000/model", encodeId:false }
    jaccard:{ tol: 1.0, encodeId:false }
  }
}

报错信息如下:

[admin@qa1-rcspt-nebulaGraph tmp]$ sudo /home/ops/spark24/spark-2.4.2-bin-hadoop2.7/bin/spark-submit \
>   --class com.vesoft.nebula.algorithm.Main \
>   --master local[*] \
>   --executor-memory 8G \
>   /tmp/nebula-algorithm-3.0.0.jar \
>   -p /tmp/application.conf
25/11/19 11:05:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (com.vesoft.nebula.algorithm.Main$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
25/11/19 11:05:04 INFO SparkContext: Running Spark version 2.4.2
25/11/19 11:05:04 INFO SparkContext: Submitted application: ConnectedComponent-GangBuild
25/11/19 11:05:04 INFO SecurityManager: Changing view acls to: root
25/11/19 11:05:04 INFO SecurityManager: Changing modify acls to: root
25/11/19 11:05:04 INFO SecurityManager: Changing view acls groups to: 
25/11/19 11:05:04 INFO SecurityManager: Changing modify acls groups to: 
25/11/19 11:05:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
25/11/19 11:05:04 INFO Utils: Successfully started service 'sparkDriver' on port 46117.
25/11/19 11:05:04 INFO SparkEnv: Registering MapOutputTracker
25/11/19 11:05:04 INFO SparkEnv: Registering BlockManagerMaster
25/11/19 11:05:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
25/11/19 11:05:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
25/11/19 11:05:04 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-608281b4-fd32-4731-9575-81e2e87c3364
25/11/19 11:05:04 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
25/11/19 11:05:04 INFO SparkEnv: Registering OutputCommitCoordinator
25/11/19 11:05:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
25/11/19 11:05:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://qa1-rcspt-nebulaGraph:4040
25/11/19 11:05:04 INFO SparkContext: Added JAR file:/tmp/nebula-algorithm-3.0.0.jar at spark://qa1-rcspt-nebulaGraph:46117/jars/nebula-algorithm-3.0.0.jar with timestamp 1763521504564
25/11/19 11:05:04 INFO Executor: Starting executor ID driver on host localhost
25/11/19 11:05:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34363.
25/11/19 11:05:04 INFO NettyBlockTransferService: Server created on qa1-rcspt-nebulaGraph:34363
25/11/19 11:05:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
25/11/19 11:05:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, qa1-rcspt-nebulaGraph, 34363, None)
25/11/19 11:05:04 INFO BlockManagerMasterEndpoint: Registering block manager qa1-rcspt-nebulaGraph:34363 with 366.3 MB RAM, BlockManagerId(driver, qa1-rcspt-nebulaGraph, 34363, None)
25/11/19 11:05:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, qa1-rcspt-nebulaGraph, 34363, None)
25/11/19 11:05:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, qa1-rcspt-nebulaGraph, 34363, None)
25/11/19 11:05:04 WARN ReadNebulaConfig$: returnCols is empty and your result will contain all properties for user_inline
25/11/19 11:05:04 INFO ReadNebulaConfig$: NebulaReadConfig={space=hyl,label=user_inline,returnCols=List(),noColumn=false,partitionNum=10}
25/11/19 11:05:04 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/tmp/spark-warehouse').
25/11/19 11:05:04 INFO SharedState: Warehouse path is 'file:/tmp/spark-warehouse'.
25/11/19 11:05:05 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
        at com.vesoft.nebula.connector.NebulaOptions.<init>(NebulaOptions.scala:21)
        at com.vesoft.nebula.connector.NebulaDataSource.getNebulaOptions(NebulaDataSource.scala:142)
        at com.vesoft.nebula.connector.NebulaDataSource.createReader(NebulaDataSource.scala:41)
        at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$SourceHelpers.createReader(DataSourceV2Relation.scala:155)
        at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:172)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:206)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
        at com.vesoft.nebula.connector.connector.package$NebulaDataFrameReader.loadEdgesToDF(package.scala:172)
        at com.vesoft.nebula.algorithm.reader.NebulaReader$$anonfun$read$1.apply$mcVI$sp(DataReader.scala:52)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
        at com.vesoft.nebula.algorithm.reader.NebulaReader.read(DataReader.scala:38)
        at com.vesoft.nebula.algorithm.Main$.createDataSource(Main.scala:118)
        at com.vesoft.nebula.algorithm.Main$.main(Main.scala:84)
        at com.vesoft.nebula.algorithm.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 26 more
25/11/19 11:05:05 INFO SparkContext: Invoking stop() from shutdown hook
25/11/19 11:05:05 INFO SparkUI: Stopped Spark web UI at http://qa1-rcspt-nebulaGraph:4040
25/11/19 11:05:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
25/11/19 11:05:05 INFO MemoryStore: MemoryStore cleared
25/11/19 11:05:05 INFO BlockManager: BlockManager stopped
25/11/19 11:05:05 INFO BlockManagerMaster: BlockManagerMaster stopped
25/11/19 11:05:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
25/11/19 11:05:05 INFO SparkContext: Successfully stopped SparkContext
25/11/19 11:05:05 INFO ShutdownHookManager: Shutdown hook called
25/11/19 11:05:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-a8d9dc76-32cc-49b6-b1a0-4aa156d907c5
25/11/19 11:05:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-d92a3569-9a61-4745-9215-9855a2e70e8b

求助各位大佬,前后换了好几个版本,包括也对nebula版本进行了降级,但是还是不行

连通分量的参数在 GitHub 上有详细说明,建议看 GitHub 去用GitHub - vesoft-inc/nebula-algorithm: Nebula-Algorithm is a Spark Application based on GraphX, which enables state of art Graph Algorithms to run on top of NebulaGraph and write back results to NebulaGraph.

大佬,我参考github这个配置更新了我的application.conf,配置如下:

{
  # 1. Spark 相关配置(已修复语法错误)
  spark: {
    app: {
      name: "ConnectedComponent-GangBuild"
      partitionNum: 10
    }
    master: "local[*]"  # 关键修复:双引号包裹 local[*]
  }

  # 2. 数据输入输出配置
  data: {
    source: nebula
    sink: nebula
    hasWeight: false
  }

  # 3. NebulaGraph 连接与读写配置
  nebula: {
    read: {
      metaAddress: "10.10.102.13:9559"
      space: "hyl"
      labels: ["user_inline"]
      weightCols: []
    }
    write: {
      graphAddress: "10.10.102.13:9669"
      metaAddress: "10.10.102.13:9559"
      user: "root"
      pswd: "nebula"
      space: "hyl"
      tag: "user"
      type: "update"
    }
  }

  # 4. 算法配置
  algorithm: {
    executeAlgo: "connectedcomponent"
    connectedcomponent: {
      maxIter: 20
      encodeId: true
    }
  }
}

但是还是有报错

[admin@qa1-rcspt-nebulaGraph ~]$ sudo /home/ops/spark24/spark-2.4.2-bin-hadoop2.7/bin/spark-submit   --class com.vesoft.nebula.algorithm.Main   --master local[*]   --executor-memory 8G   /tmp/nebula-algorithm-3.0.0.jar   -p /tmp/application.conf
25/11/19 15:59:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (com.vesoft.nebula.algorithm.Main$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
25/11/19 15:59:46 INFO SparkContext: Running Spark version 2.4.2
25/11/19 15:59:46 INFO SparkContext: Submitted application: ConnectedComponent-GangBuild
25/11/19 15:59:46 INFO SecurityManager: Changing view acls to: root
25/11/19 15:59:46 INFO SecurityManager: Changing modify acls to: root
25/11/19 15:59:46 INFO SecurityManager: Changing view acls groups to: 
25/11/19 15:59:46 INFO SecurityManager: Changing modify acls groups to: 
25/11/19 15:59:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
25/11/19 15:59:46 INFO Utils: Successfully started service 'sparkDriver' on port 42805.
25/11/19 15:59:46 INFO SparkEnv: Registering MapOutputTracker
25/11/19 15:59:46 INFO SparkEnv: Registering BlockManagerMaster
25/11/19 15:59:46 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
25/11/19 15:59:46 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
25/11/19 15:59:46 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-836f41a4-36ec-4329-a930-48a72f686425
25/11/19 15:59:46 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
25/11/19 15:59:46 INFO SparkEnv: Registering OutputCommitCoordinator
25/11/19 15:59:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
25/11/19 15:59:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://qa1-rcspt-nebulaGraph:4040
25/11/19 15:59:46 INFO SparkContext: Added JAR file:/tmp/nebula-algorithm-3.0.0.jar at spark://qa1-rcspt-nebulaGraph:42805/jars/nebula-algorithm-3.0.0.jar with timestamp 1763539186707
25/11/19 15:59:46 INFO Executor: Starting executor ID driver on host localhost
25/11/19 15:59:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40973.
25/11/19 15:59:46 INFO NettyBlockTransferService: Server created on qa1-rcspt-nebulaGraph:40973
25/11/19 15:59:46 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
25/11/19 15:59:46 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, qa1-rcspt-nebulaGraph, 40973, None)
25/11/19 15:59:46 INFO BlockManagerMasterEndpoint: Registering block manager qa1-rcspt-nebulaGraph:40973 with 366.3 MB RAM, BlockManagerId(driver, qa1-rcspt-nebulaGraph, 40973, None)
25/11/19 15:59:46 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, qa1-rcspt-nebulaGraph, 40973, None)
25/11/19 15:59:46 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, qa1-rcspt-nebulaGraph, 40973, None)
25/11/19 15:59:47 INFO ReadNebulaConfig$: NebulaReadConfig={space=hyl,label=user_inline,returnCols=List(),noColumn=true,partitionNum=10}
25/11/19 15:59:47 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/admin/spark-warehouse').
25/11/19 15:59:47 INFO SharedState: Warehouse path is 'file:/home/admin/spark-warehouse'.
25/11/19 15:59:47 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
        at com.vesoft.nebula.connector.NebulaOptions.<init>(NebulaOptions.scala:21)
        at com.vesoft.nebula.connector.NebulaDataSource.getNebulaOptions(NebulaDataSource.scala:142)
        at com.vesoft.nebula.connector.NebulaDataSource.createReader(NebulaDataSource.scala:41)
        at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$SourceHelpers.createReader(DataSourceV2Relation.scala:155)
        at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:172)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:206)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
        at com.vesoft.nebula.connector.connector.package$NebulaDataFrameReader.loadEdgesToDF(package.scala:172)
        at com.vesoft.nebula.algorithm.reader.NebulaReader$$anonfun$read$1.apply$mcVI$sp(DataReader.scala:52)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
        at com.vesoft.nebula.algorithm.reader.NebulaReader.read(DataReader.scala:38)
        at com.vesoft.nebula.algorithm.Main$.createDataSource(Main.scala:118)
        at com.vesoft.nebula.algorithm.Main$.main(Main.scala:84)
        at com.vesoft.nebula.algorithm.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 26 more
25/11/19 15:59:47 INFO SparkContext: Invoking stop() from shutdown hook
25/11/19 15:59:47 INFO SparkUI: Stopped Spark web UI at http://qa1-rcspt-nebulaGraph:4040
25/11/19 15:59:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
25/11/19 15:59:47 INFO MemoryStore: MemoryStore cleared
25/11/19 15:59:47 INFO BlockManager: BlockManager stopped
25/11/19 15:59:47 INFO BlockManagerMaster: BlockManagerMaster stopped
25/11/19 15:59:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
25/11/19 15:59:47 INFO SparkContext: Successfully stopped SparkContext
25/11/19 15:59:47 INFO ShutdownHookManager: Shutdown hook called
25/11/19 15:59:47 INFO ShutdownHookManager: Deleting directory /tmp/spark-3411fa98-aebb-4e62-9881-4226cbe75b83
25/11/19 15:59:47 INFO ShutdownHookManager: Deleting directory /tmp/spark-f8c72f2e-4b8a-493c-bfb2-019c246f6608

问ai的话,总是说我版本不兼容,但是我看论坛里其他用户用的也是spark2.4

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。