graphd进程崩溃,异常退出

  • nebula 版本:1.2.0

  • 部署方式:分布式

  • 是否为线上版本:Y

  • 硬件信息

    • 磁盘:SSD
    • CPU、内存信息:2C*12core、384GB
  • 问题的具体描述
    图数据库已运行3个月无异常,突然间无法连接,调查发现graphd进程crash了,有coredump文件产生。这个问题反复出现,只要开始查询或写入,过一会儿graphd进程就会崩溃,第一次崩溃时coredump文件有90多GB,后面崩溃时coredump文件基本为700MB左右。
    通过新建图空间,插入数据进行查询和写入不会有问题,但是对旧的图数据空间不能进行操作,否则graphd进程就会崩溃。

  • stderr日志信息

RowReader.cpp:166] Check failed: ver == schema->getVersion() (1 vs. 0)
*** Check failure stack trace: ***
    @          0x19919dc  google::LogMessage::Fail()
    @          0x199654d  google::LogMessage::SendToLog()
    @          0x19916ad  google::LogMessage::Flush()
    @          0x1991f08  google::LogMessageFatal::~LogMessageFatal()
    @          0x1578567  nebula::RowReader::getRowReader()
    @           0xfea2f3  nebula::graph::FetchVerticesExecutor::processResult()
    @           0xfebb7f  _ZN5folly6detail8function14FunctionTraitsIFvONS_3TryIN6nebula7storage18StorageRpcResponseINS5_4cpp213QueryResponseEEEEEEE9callSmallIZNS_7futures6detail10FutureBaseIS9_E18thenImplementationIZNOS_6FutureIS9_E9thenValueIRZNS4_5graph21FetchVerticesExecutor13fetchVerticesEvEUlOS9_E_EENSK_INSG_19valueCallableResultIS9_T_E10value_typeEEEOST_EUlSB_E_NSG_14callableResultIS9_SY_EELb1EJSB_EEENSt9enable_ifIXntsrNT0_13ReturnsFutureE5valueENS12_6ReturnEE4typeESX_NSG_9argResultIXT1_EST_JDpT2_EEEEUlSB_E_EEvRNS1_4DataESB_
    @           0xf26481  _ZN5folly6detail8function14FunctionTraitsIFvvEE9callSmallIZNS_7futures6detail4CoreIN6nebula7storage18StorageRpcResponseINSA_4cpp213QueryResponseEEEE10doCallbackEvEUlvE0_EEvRNS1_4DataE
    @          0x10aaee9  apache::thrift::concurrency::FunctionRunner::run()
    @          0x16a3c37  apache::thrift::concurrency::ThreadManager::Task::run()
    @          0x16e8b9d  apache::thrift::concurrency::ThreadManager::ImplT<>::Worker<>::run()
    @          0x17b9e1b  apache::thrift::concurrency::PthreadThread::threadMain()
    @     0x2af05fe74e24  start_thread
    @     0x2af06018134c  __clone
  • coredump文件调试信息
(gdb) backtrace
#0  0x00002b11df8db1f7 in raise () from /lib64/libc.so.6
#1  0x00002b11df8dc8e8 in abort () from /lib64/libc.so.6
#2  0x0000000000eb1bf6 in ?? ()
#3  0x00000000019919dd in google::LogMessage::Fail() ()
#4  0x000000000199654e in google::LogMessage::SendToLog() ()
#5  0x00000000019916ae in google::LogMessage::Flush() ()
#6  0x0000000001991f09 in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x0000000001578568 in nebula::RowReader::getRowReader(folly::Range<char const*>, std::shared_ptr<nebula::meta::SchemaProviderIf const>) ()
#8  0x0000000000fea2f4 in nebula::graph::FetchVerticesExecutor::processResult(nebula::storage::StorageRpcResponse<nebula::storage::cpp2::QueryResponse>&&) ()
#9  0x0000000000febb80 in _ZN5folly6detail8function14FunctionTraitsIFvONS_3TryIN6nebula7storage18StorageRpcResponseINS5_4cpp213QueryResponseEEEEEEE9callSmallIZNS_7futures6detail10FutureBaseIS9_E18thenImplementationIZNOS_6Futu  reIS9_E9thenValueIRZNS4_5graph21FetchVerticesExecutor13fetchVerticesEvEUlOS9_E_EENSK_INSG_19valueCallableResultIS9_T_E10value_typeEEEOST_EUlSB_E_NSG_14callableResultIS9_SY_EELb1EJSB_EEENSt9enable_ifIXntsrNT0_13ReturnsFutureE5  valueENS12_6ReturnEE4typeESX_NSG_9argResultIXT1_EST_JDpT2_EEEEUlSB_E_EEvRNS1_4DataESB_ ()
#10 0x0000000000f26482 in void folly::detail::function::FunctionTraits<void ()>::callSmall<folly::futures::detail::Core<nebula::storage::StorageRpcResponse<nebula::storage::cpp2::QueryResponse> >::doCallback()::{lambda()#2}>(  folly::detail::function::Data&) ()
#11 0x00000000010aaeea in apache::thrift::concurrency::FunctionRunner::run() ()
#12 0x00000000016a3c38 in apache::thrift::concurrency::ThreadManager::Task::run() ()
#13 0x00000000016e8b9e in apache::thrift::concurrency::ThreadManager::ImplT<folly::LifoSemImpl<std::atomic, folly::SaturatingSemaphore<true, std::atomic> > >::Worker<folly::LifoSemImpl<std::atomic, folly::SaturatingSemaphore<  true, std::atomic> > >::run() ()
#14 0x00000000017b9e1c in apache::thrift::concurrency::PthreadThread::threadMain(void*) ()
#15 0x00002b11df691e25 in start_thread () from /lib64/libpthread.so.0
#16 0x00002b11df99e34d in clone () from /lib64/libc.so.6

升级一下1.2.1

我们一直在猜是数据的问题还是代码的问题,能大致分析一下这种情况的原因吗?

可能是1.2.0版本中graph的一个bug,已在1.2.1中修复。
通过您的errlog我们认为是这个问题。

该话题在最后一个回复创建后30天后自动关闭。不再允许新的回复。