使用spark-reader时，no nebula_codec in java.library.path 错误

sixiang · 2021 年1 月 23 日 06:42

nebula 版本：V1.2.0
部署方式：分布式
硬件信息
- 磁盘： SSD
- CPU、内存信息：192GB 20 × 2 cores

在使用Spark Reader时报如下错误：
Spark version v2.3.2
运行环境：Linux

Cause of mission failure: 21/01/23 14:33:07 ERROR NebulaCodec [Executor task launch worker for task 1]: no nebula_codec in java.library.path
java.lang.UnsatisfiedLinkError: no nebula_codec in java.library.path

Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:334)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

nicole · 2021 年1 月 25 日 01:48

你好，exchange也是在linux环境下执行么

sixiang · 2021 年1 月 25 日 01:55

是的

nicole · 2021 年1 月 25 日 02:07

发一个完整的堆栈信息吧，1.x的spark-reader是需要调用JNI做编解码的，不支持windows和macos，在windows和macos系统下会有UnsatisfiedLinkError的错误，一般linux环境下是可以调用的。

sixiang · 2021 年1 月 25 日 02:26

Container id: container_e13_1611043359538_2079596_01_000004
Cause of mission failure: 21/01/25 10:11:31 ERROR NebulaCodec [Executor task launch worker for task 0]: no nebula_codec in java.library.path
java.lang.UnsatisfiedLinkError: no nebula_codec in java.library.path

Exit code: 127
Stack trace: ExitCodeException exitCode=127: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:334)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 127

21/01/25 10:11:34 WARN YarnSchedulerBackend$YarnSchedulerEndpoint [dispatcher-event-loop-27]: Requesting driver to remove executor 3 for reason Container marked as failed: container_e13_1611043359538_2079596_01_000004 on host: xxx. Exit status: 127. Diagnostics: Exception from container-launch.
Container id: container_e13_1611043359538_2079596_01_000004
Cause of mission failure: 21/01/25 10:11:31 ERROR NebulaCodec [Executor task launch worker for task 0]: no nebula_codec in java.library.path
java.lang.UnsatisfiedLinkError: no nebula_codec in java.library.path

Exit code: 127
Stack trace: ExitCodeException exitCode=127: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:334)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 127

21/01/25 10:11:34 ERROR YarnClusterScheduler [dispatcher-event-loop-14]: Lost executor 3 on xxx: Container marked as failed: container_e13_1611043359538_2079596_01_000004 on host: xxx. Exit status: 127. Diagnostics: Exception from container-launch.
Container id: container_e13_1611043359538_2079596_01_000004
Cause of mission failure: 21/01/25 10:11:31 ERROR NebulaCodec [Executor task launch worker for task 0]: no nebula_codec in java.library.path
java.lang.UnsatisfiedLinkError: no nebula_codec in java.library.path

Exit code: 127
Stack trace: ExitCodeException exitCode=127: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:334)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

sixiang · 2021 年1 月 25 日 02:27

没什么奇特特别的了，一直循环这个错误

nicole · 2021 年1 月 25 日 05:42

我看你是yarn-cluster模式提交的，libnebula_codec.so文件是封装在jar包中的，client有从classpath中读取，正常不需要单独放在java.lib.path中的。

你可以改成yarn client或者local模式看是否是同样的问题，如果不出现那就需要你在yarn集群的机器中把libnebula_codec.so文件放到java.lib.path路径下，可以通过System.out.println(System.getProperty("Java.library.path")); 打印看下java.lib.path是什么，一般是usr/local/lib。

SZU-IC · 2021 年2 月 23 日 07:40

你好，我提交的程序日志里只显示了一行错误ERROR NebulaCodec:no nebula_codec in java.library.path，这个有影响吗？错误这一行上面的日志是好像是在加载so文件INFO utils.NativeUtils: Load /…/libnebula_codec.so as libnebula_codec.so，这是加载失败了吗？

nicole · 2021 年2 月 23 日 08:09

SZU-IC · 2021 年2 月 23 日 08:25

最近使用spark-reader读数据到GraphX有一种edge读到的数据老是读不全，生产和测试上都是这种edge出现问题，我还以为是decode和encode出错了。今天用了ScanEdgeInSpaceExamples.java这个demo去读取edge就可以读全，这是啥原因

nicole · 2021 年2 月 23 日 08:44

你的nebula版本信息、部署方式发一下吧，space和edge的schema信息也发一下谢谢

SZU-IC · 2021 年2 月 23 日 08:54

nebula版本是1.2，部署方式是docker swarm，之前有在 Spark reader两次读入的vertice和edge的数量不一致 - #29 由 nicole 提过

steam · 2021 年2 月 23 日 09:09

我加了个空格，链接如果后面直接跟着文本的话，是不解析成可跳转链接的

nicole · 2021 年2 月 25 日 01:58

你是tag可以稳定正确读取，edge是有固定的一个边读取不稳定其他边读取都稳定，按理说都走相同的逻辑结果应该一致的。需要你提供下space和edge的schema信息看下，指明哪个edge读取不稳定。

SZU-IC · 2021 年2 月 25 日 02:47

space 是sy_graph,A0110这个边之前是读边不稳定的，前天重启图库后每次读边都只读了1000条

nicole · 2021 年2 月 25 日 02:52

A0100和A0110这两个边的schema是一样的，A0100读取正常的，A0110是读取少数据么？
你通过db_dump统计下A0110这个edge 的数量看下， db_dump使用见手册： Dump Tool - Nebula Graph Database 手册

SZU-IC · 2021 年2 月 25 日 02:57

A0110的schema比A0100的schema多了一个属性，A0100读取正常，A0110读取数据少；

nicole · 2021 年2 月 25 日 03:07

查看下A0110的数据

ps 统计前先执行下submit job flush，要统计分布式集群中每台storage的数据并求和

SZU-IC · 2021 年2 月 25 日 03:21

我之前用过ScanEdgeInSpaceExamples.java这个demo读出的数据量是符合写入的数据量大小

nicole · 2021 年2 月 25 日 03:30

你这种情况一个边读取正常另一个边读取不正常我们没复现出来你那边可以给spark connector传入partition参数为1来debug看下扫出来的数据么