pagerank官方例子报NumberFormatException

  • nebula 版本:(2.0.1)

  • 部署方式(分布式):

  • 是否为线上版本:Y

  • 问题的具体描述
    当使用pagerank时,报如下错误

Caused by: java.lang.NumberFormatException: For input string: "player105"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Long.parseLong(Long.java:589)
	at java.lang.Long.parseLong(Long.java:631)
	at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:277)
	at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
	at com.vesoft.nebula.algorithm.utils.NebulaUtil$$anonfun$1.apply(NebulaUtil.scala:30)
	at com.vesoft.nebula.algorithm.utils.NebulaUtil$$anonfun$1.apply(NebulaUtil.scala:26)
	at org.apache.spark.sql.execution.MapElementsExec$$anonfun$7$$anonfun$apply$1.apply(objects.scala:236)
	at org.apache.spark.sql.execution.MapElementsExec$$anonfun$7$$anonfun$apply$1.apply(objects.scala:236)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107)
	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
	at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

pagerank配置:

    read: {
        # Nebula metad server address, multiple addresses are split by English comma
        metaAddress: "192.168.5.180:9559,192.168.5.181:9559,192.168.5.182:9559"
        # Nebula space
        space: basketballplayer
        # Nebula edge types, multiple labels means that data from multiple edges will union together
        labels: ["serve"]
        # Nebula edge property name for each edge type, this property will be as weight col for algorithm.
        # Make sure the weightCols are corresponding to labels.
        weightCols: ["start_year"]
    }

使用Pagerank时,vid不能是string类型吗、

2.x 版本的话,vid 可以是 int64 或者是定长 string,:thinking: 这个和你的设置有关系的

您误会我的意思了。我想问,为什么会报这个错误。我应该如何解决。

从报错中可以看出,在做格式转换时,失败了。因为string 不能转Long
Edge(row.get(0).toString.toLong, row.get(1).toString.toLong, 1.0)

这个等图计算这块的研发同学来回复你吧

好 的。感谢。

在官方文档看到使用限制:
点ID的数据必须为整数,即点ID可以是INT类型,或者是String类型但数据本身为整数。

哎。结贴。
不过我借用了1.2.1的方法,使用hash来处理。
建议2.0的文档更新相应使用方法。

2 个赞

感谢反馈,只能用整数是因为GraphX的图计算框架限制,我们会在文档里补充您的方法。

1 个赞

该话题在最后一个回复创建后30天后自动关闭。不再允许新的回复。