-
nebula 版本:2.0.0
-
部署方式:分布式 / 单机 都有这个问题
-
安装方式:源码编译
-
问题的具体描述:
1.一个线程不停的调用rpc接口执行 (go from xxx over edge1 where xxx) union/minus/intersect (go from xxx over edge1 where xxx),同时打开console,执行drop edge edge1。graphd是稳定coredump。
2.或者在edge1不存在时执行查询语句,然后同时create edge edge1,那么也会core。
3.必须drop go语句中的同名edge才会core,drop/create不相关的edge不会有问题。
-日志:
太长了,没有贴。简单来说,在drop语句执行完之后,可以继续正常执行命令,然后在drop之后1~3s左右才会core。
-gdb调试:
gdb调试,会发现每次段错误在不同地方出现。暂时发现,会有三个地方出错。
1.
#0 0x00007ff708d74428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ff708d7602a in __GI_abort () at abort.c:89
#2 0x0000000002e26e64 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] ()
#3 0x00000000047f5a66 in __cxxabiv1::__terminate(void (*)()) ()
#4 0x00000000047f5ab1 in std::terminate() ()
#5 0x00000000047f610f in __cxa_pure_virtual ()
#6 0x0000000003250287 in nebula::graph::Scheduler::execute (this=0x7ff6d5226500, executor=0x7ff6fe287f40)
at /home/nebula/src/scheduler/Scheduler.cpp:171
#7 0x000000000324f1a7 in nebula::graph::Scheduler::<lambda(nebula::Status)>::operator()(nebula::Status) const (__closure=0x7ff6d3a67648,
stats=...) at /home/nebula/src/scheduler/Scheduler.cpp:126
#8 0x0000000003257212 in nebula::graph::Scheduler::ExecTask<nebula::graph::Scheduler::doSchedule(nebula::graph::Executor*)::<lambda(nebula::Status)> >::operator()(<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ac4c>) (this=0x7ff6d3a67640,
arg=<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ac4c>) at /home/nebula/src/scheduler/Scheduler.h:56
#9 0x0000000003252f82 in folly::futures::detail::CoreCallbackState<nebula::Status, nebula::graph::Scheduler::ExecTask<nebula::graph::Scheduler::doSchedule(nebula::graph::Executor*)::<lambda(nebula::Status)> > >::invoke<nebula::Status>(<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ed1a>) (this=0x7ff6d3a67640,
args#0=<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ed1a>)
at /opt/vesoft/third-party/include/folly/futures/Future-inl.h:91
Thread 84 "executor-pri3-2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f61835fe700 (LWP 17162)]
0x000000000309094e in nebula::graph::ExecutionPlan::addProfileStats(long, nebula::ProfilingStats&&) (this=0x0, planNodeId=11,
profilingStats=<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x3bb18f0, DIE 0x3c54314>)
at /home/nebula/src/planner/ExecutionPlan.cpp:98
98 /home/nebula/src/planner/ExecutionPlan.cpp: No such file or directory.
(gdb) where
#0 0x000000000309094e in nebula::graph::ExecutionPlan::addProfileStats(long, nebula::ProfilingStats&&) (this=0x0, planNodeId=11,
profilingStats=<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x3bb18f0, DIE 0x3c54314>)
at /home/nebula/src/planner/ExecutionPlan.cpp:98
#1 0x00000000030dc299 in nebula::graph::Executor::close (this=0x7f616c09d800) at /home/nebula/src/executor/Executor.cpp:549
#2 0x0000000003125c3b in nebula::graph::GetNeighborsExecutor::close (this=0x7f616c09d800)
at /home/nebula/src/executor/query/GetNeighborsExecutor.cpp:39
#3 0x00000000032501ab in nebula::graph::Scheduler::<lambda(nebula::Status)>::operator()(nebula::Status) const (__closure=0x7f616c4a03c0,
s=...) at /home/nebula/src/scheduler/Scheduler.cpp:173
#4 0x0000000003254f23 in folly::futures::detail::CoreCallbackState<nebula::Status, nebula::graph::Scheduler::execute(nebula::graph::Executor*)::<lambda(nebula::Status)> >::invoke<nebula::Status>(<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881dbcc>) (
this=0x7f616c4a03c0, args#0=<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881dbcc>)
at /opt/vesoft/third-party/include/folly/futures/Future-inl.h:91
#5 0x0000000003254f90 in folly::futures::detail::detail_msvc_15_7_workaround::invoke<false, folly::futures::detail::CoreCallbackState<nebula::Status, nebula::graph::Scheduler::execute(nebula::graph::Executor*)::<lambda(nebula::Status)> >, nebula::Status, nebula::Status&&>(folly::futures::detail::CoreCallbackState<nebula::Status, nebula::graph::Scheduler::execute(nebula::graph::Executor*)::<lambda(nebula::Status)> > &, folly::Try<nebula::Status> &) (state=..., t=...) at /opt/vesoft/third-party/include/folly/futures/Future-inl.h:288
#0 0x00000000030e17ac in nebula::graph::RequestContext<nebula::ExecutionResponse>::runner (this=0x2000000000000)
at /home/nebula/src/service/RequestContext.h:68
#1 0x00000000030dc618 in nebula::graph::Executor::runner (this=0x7fdd6068e400) at /home/nebula/src/executor/Executor.cpp:573
#2 0x00000000030dc3d9 in nebula::graph::Executor::error (this=0x7fdd6068e400, status=...) at /home/nebula/src/executor/Executor.cpp:559
#3 0x000000000324f177 in nebula::graph::Scheduler::<lambda(nebula::Status)>::operator()(nebula::Status) const (__closure=0x7fdd31e84b48,
stats=...) at /home/nebula/src/scheduler/Scheduler.cpp:125
#4 0x0000000003257212 in nebula::graph::Scheduler::ExecTask<nebula::graph::Scheduler::doSchedule(nebula::graph::Executor*)::<lambda(nebula::Status)> >::operator()(<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ac4c>) (this=0x7fdd31e84b40,
arg=<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ac4c>) at /home/nebula/src/scheduler/Scheduler.h:56
#5 0x0000000003252f82 in folly::futures::detail::CoreCallbackState<nebula::Status, nebula::graph::Scheduler::ExecTask<nebula::graph::Scheduler::doSchedule(nebula::graph::Executor*)::<lambda(nebula::Status)> > >::invoke<nebula::Status>(<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ed1a>) (this=0x7fdd31e84b40,
args#0=<unknown type in /usr/local/nebula/bin/nebula-graphd, CU 0x874ed3a, DIE 0x881ed1a>)
at /opt/vesoft/third-party/include/folly/futures/Future-inl.h:91
#6 0x0000000003252fc6 in folly::futures::detail::CoreCallbackState<nebula::Status, nebula::graph::Scheduler::ExecTask<nebula::graph::Scheduler::doSchedule(nebula::graph::Executor*)::<lambda(nebula::Status)> > >::<lambda()>::operator()(void) const (this=0x7fdd31e84b40)
at /opt/vesoft/third-party/include/folly/futures/Future-inl.h:96
-猜测:
看起来像是内存问题。
-相关操作:
(lookup xxx intersect/minus lookup xxx)似乎不会出现这个问题(同样场景模拟了几十次都没core),我看了执行计划,差别在于go语句会多一个project执行节点。
-期待:
这个问题在2.0之后有修复过吗 我想看看问题是在哪里…有点难找