nebula-storaged 启动失败,

提问参考模版:

  • nebula 版本:3.0.2

  • 部署方式: 单机

  • 安装方式:源码编译

  • 是否为线上版本:Y

  • 问题的具体描述
    nebula-storaged 启动失败,错误原因:

F20220520 14:55:31.405052 134050 RocksEngine.cpp:398] Check failed: key.size() == sizeof(PartitionID) + sizeof(NebulaSystemKeyType) (12 vs. 8)
*** Check failure stack trace: ***
*** Aborted at 1653029731 (Unix time, try 'date -d @1653029731') ***
*** Signal 6 (SIGABRT) (0x20ba2) received by PID 134050 (pthread TID 0x7fc9ad46e0c0) (linux TID 134050) (maybe from PID 134050, UID 0) (code: -6), stack trace: ***
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x31)[0x24fa641]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x26)[0x24f1b46]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged[0x24efad7]
/lib64/libpthread.so.0(+0xf62f)[0x7fc9ac92b62f]
/lib64/libc.so.6(gsignal+0x37)[0x7fc9ac584387]
/lib64/libc.so.6(abort+0x147)[0x7fc9ac585a77]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged[0x2523349]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged[0x2526443]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged[0x2522e18]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged[0x2526af8]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged(_ZN6nebula7kvstore11RocksEngine8allPartsEv+0x226)[0x1b63546]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged(_ZN6nebula7kvstore11RocksEngineC2EiiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_St10shared_ptrIN7rocksdb13MergeOperatorEESA_INSB_23CompactionFilterFactoryEEb+0x374)[0x1b64614]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged(_ZN6nebula7storage13StorageServer21getAdminStoreInstanceEv+0x98)[0x110ff58]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged(_ZN6nebula7storage13StorageServer5startEv+0xc8c)[0x1114aac]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged(main+0xccd)[0x10c95ed]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x7fc9ac570554]
/diska/risk-graph/nebula-graph-3.0.2/bin/nebula-storaged[0x110070d]
(safe mode, symbolizer not available)

core文件看堆栈信息:

#0  0x00007f9363d8e387 in raise () from /lib64/libc.so.6
#1  0x00007f9363d8fa78 in abort () from /lib64/libc.so.6
#2  0x000000000252334a in ?? ()
#3  0x0000000002526444 in ?? ()
#4  0x0000000002522e19 in ?? ()
#5  0x0000000002526af9 in ?? ()
#6  0x0000000001b63547 in nebula::kvstore::RocksEngine::allParts() ()
#7  0x0000000001b64615 in nebula::kvstore::RocksEngine::RocksEngine(int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<rocksdb::MergeOperator>, std::shared_ptr<rocksdb::CompactionFilterFactory>, bool) ()
#8  0x000000000110ff59 in nebula::storage::StorageServer::getAdminStoreInstance() ()
#9  0x0000000001114aad in nebula::storage::StorageServer::start() ()
#10 0x00000000010c95ee in main ()

v3.0.2 的话,你 storage add hosts 过没

add hosts 了,已经部署近一个月了,
今天nebula 服务挂了,storage 就重启不了了
graph 和meta 服务可以正常启动

稍等,我让研发给你看看哈

好的,麻烦尽快看下哈
服务一直挂着。。。。

ingest过sst吗?这个前缀的key是内部使用的,大概率被污染了,最大可能就是ingest之类的。

如果你是源码编译 可以把RocksEngine.cpp:398之前把这个key打印出来看一眼

LOG(INFO) << folly::hexlify(key);
  1. 没有ingest过sst哦,都是使用 java 客户端 实时通过Ngql 写入节点和边的。早上突然 graph 服务就停了,重启操作后 storaged 就起不来了

  2. 我是tar 包部署的,有办法打印出来看吗?

  3. 这种如果被污染了,要怎么清除呢,在不删除数据的情况下

你好,请问下有没有什么比较好的解决方案。
:joy:

哦 我看到了 你看下你data_path下面 有个spaceId为0的目录 把它删了 试试能不能启动成功