发一个完整的堆栈信息吧,1.x的spark-reader是需要调用JNI做编解码的,不支持windows和macos,在windows和macos系统下会有UnsatisfiedLinkError的错误,一般linux环境下是可以调用的。
Container id: container_e13_1611043359538_2079596_01_000004
Cause of mission failure: 21/01/25 10:11:31 ERROR NebulaCodec [Executor task launch worker for task 0]: no nebula_codec in java.library.path
java.lang.UnsatisfiedLinkError: no nebula_codec in java.library.path
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:334)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
21/01/25 10:11:34 WARN YarnSchedulerBackend$YarnSchedulerEndpoint [dispatcher-event-loop-27]: Requesting driver to remove executor 3 for reason Container marked as failed: container_e13_1611043359538_2079596_01_000004 on host: xxx. Exit status: 127. Diagnostics: Exception from container-launch.
Container id: container_e13_1611043359538_2079596_01_000004
Cause of mission failure: 21/01/25 10:11:31 ERROR NebulaCodec [Executor task launch worker for task 0]: no nebula_codec in java.library.path
java.lang.UnsatisfiedLinkError: no nebula_codec in java.library.path
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:334)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
21/01/25 10:11:34 ERROR YarnClusterScheduler [dispatcher-event-loop-14]: Lost executor 3 on xxx: Container marked as failed: container_e13_1611043359538_2079596_01_000004 on host: xxx. Exit status: 127. Diagnostics: Exception from container-launch.
Container id: container_e13_1611043359538_2079596_01_000004
Cause of mission failure: 21/01/25 10:11:31 ERROR NebulaCodec [Executor task launch worker for task 0]: no nebula_codec in java.library.path
java.lang.UnsatisfiedLinkError: no nebula_codec in java.library.path
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:334)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
没什么奇特特别的了,一直循环这个错误
我看你是yarn-cluster模式提交的,libnebula_codec.so文件是封装在jar包中的,client有从classpath中读取,正常不需要单独放在java.lib.path中的。
你可以改成yarn client或者local模式看是否是同样的问题,如果不出现那就需要你在yarn集群的机器中把libnebula_codec.so文件放到java.lib.path路径下,可以通过System.out.println(System.getProperty("Java.library.path"));
打印看下java.lib.path是什么,一般是usr/local/lib。
你好,我提交的程序日志里只显示了一行错误ERROR NebulaCodec:no nebula_codec in java.library.path,这个有影响吗?错误这一行上面的日志是好像是在加载so文件INFO utils.NativeUtils: Load /…/libnebula_codec.so as libnebula_codec.so,这是加载失败了吗?
最近使用spark-reader读数据到GraphX有一种edge读到的数据老是读不全,生产和测试上都是这种edge出现问题,我还以为是decode和encode出错了。今天用了ScanEdgeInSpaceExamples.java这个demo去读取edge就可以读全 ,这是啥原因
你的nebula版本信息、部署方式发一下吧,space和edge的schema信息也发一下谢谢
我加了个空格,链接如果后面直接跟着文本的话,是不解析成可跳转链接的
你是tag可以稳定正确读取,edge是有固定的一个边读取不稳定其他边读取都稳定,按理说都走相同的逻辑结果应该一致的。 需要你提供下space和edge的schema信息看下,指明哪个edge读取不稳定。
A0100和A0110这两个边的schema是一样的,A0100读取正常的,A0110是读取少数据么?
你通过db_dump统计下A0110这个edge 的数量看下, db_dump使用见手册: Dump Tool - Nebula Graph Database 手册
A0110的schema比A0100的schema多了一个属性,A0100读取正常,A0110读取数据少;
查看下A0110的数据
ps 统计前先执行下submit job flush,要统计分布式集群中每台storage的数据并求和
我之前用过ScanEdgeInSpaceExamples.java这个demo读出的数据量是符合写入的数据量大小
你这种情况一个边读取正常另一个边读取不正常我们没复现出来 你那边可以给spark connector传入partition参数为1来debug看下扫出来的数据么
好的,我试一下
我在我本地debug不了,因为连不到测试上图库。我修改partition参数打包在集群上跑并打印数据信息,打印的数据没啥问题,发现partition为1时可以获取到8万+的edge(图库数据为9万+),试了给partition传入2、3、4、5、10、50、100、150,获取的edge数量都逐渐变少,其中传入partition参数为50、100、150获取到的数据都为1000