Nebula3.2.0 Match 语句造成Storaged 崩溃

  • nebula 版本:3.2.0
  • 部署方式:单机
  • 安装方式:deb
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘 SSD
    • CPU Xeon Silver 4116 12核心
    • 内存信息 64G
  • 问题的具体描述
    • 执行 match 语句直接报错 Storage Error: part: 1, error: E_RPC_FAILURE(-3). 然后Storaged 挂掉.。对图建立了节点索引和节点单属性索引。
    • 执行语句:MATCH p=(v:ACTOR{name:“"Kinno"”})-[e:related_incident*1]-(v2) RETURN p limit 2
    • 图大小,5万点,3万边,storage config 限制了最大边数为2
  • storage报错信息
I20221116 16:07:03.385926 15244 NebulaStore.cpp:271] Init data from partManager for "127.0.0.1":9779
I20221116 16:07:03.385987 15244 NebulaStore.cpp:369] Data space 45 has existed!
I20221116 16:07:03.386037 15244 NebulaStore.cpp:430] [Space: 45, Part: 1] has existed!
I20221116 16:07:03.386059 15244 NebulaStore.cpp:430] [Space: 45, Part: 2] has existed!
I20221116 16:07:03.386075 15244 NebulaStore.cpp:430] [Space: 45, Part: 3] has existed!
I20221116 16:07:03.386092 15244 NebulaStore.cpp:430] [Space: 45, Part: 4] has existed!
I20221116 16:07:03.386108 15244 NebulaStore.cpp:430] [Space: 45, Part: 5] has existed!
I20221116 16:07:03.386124 15244 NebulaStore.cpp:430] [Space: 45, Part: 6] has existed!
I20221116 16:07:03.386140 15244 NebulaStore.cpp:430] [Space: 45, Part: 7] has existed!
I20221116 16:07:03.386155 15244 NebulaStore.cpp:430] [Space: 45, Part: 8] has existed!
I20221116 16:07:03.386171 15244 NebulaStore.cpp:430] [Space: 45, Part: 9] has existed!
I20221116 16:07:03.386188 15244 NebulaStore.cpp:430] [Space: 45, Part: 10] has existed!
I20221116 16:07:03.386202 15244 NebulaStore.cpp:369] Data space 1 has existed!
I20221116 16:07:03.386221 15244 NebulaStore.cpp:430] [Space: 1, Part: 1] has existed!
I20221116 16:07:03.386237 15244 NebulaStore.cpp:430] [Space: 1, Part: 2] has existed!
I20221116 16:07:03.386277 15244 NebulaStore.cpp:78] Register handler...
I20221116 16:07:03.386294 15244 StorageServer.cpp:228] Init LogMonitor
I20221116 16:07:03.386445 15244 StorageServer.cpp:96] Starting Storage HTTP Service
I20221116 16:07:03.386788 15244 StorageServer.cpp:100] Http Thread Pool started
I20221116 16:07:03.392299 15520 WebService.cpp:124] Web service started on HTTP[19779]
I20221116 16:07:03.392453 15244 TransactionManager.cpp:24] TransactionManager ctor()
I20221116 16:07:03.392707 15244 RocksEngineConfig.cpp:366] Emplace rocksdb option max_bytes_for_level_base=268435456
I20221116 16:07:03.392738 15244 RocksEngineConfig.cpp:366] Emplace rocksdb option max_write_buffer_number=4
I20221116 16:07:03.392755 15244 RocksEngineConfig.cpp:366] Emplace rocksdb option write_buffer_size=67108864
I20221116 16:07:03.393010 15244 RocksEngineConfig.cpp:366] Emplace rocksdb option block_size=8192
I20221116 16:07:03.429795 15244 RocksEngine.cpp:97] open rocksdb on /usr/local/nebula/data/storage/nebula/0/data
I20221116 16:07:03.429946 15244 AdminTaskManager.cpp:22] max concurrent subtasks: 10
I20221116 16:07:03.430173 15244 AdminTaskManager.cpp:40] exit AdminTaskManager::init()
I20221116 16:07:03.430264 15541 AdminTaskManager.cpp:227] waiting for incoming task
E20221116 16:07:19.234073 15580 Serializer.h:43] Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type `nebula::Value`.
  • graph info
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20221116 16:07:19.224514 15212 Serializer.h:43] Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to serialize a value of type `nebula::Value`.
E20221116 16:07:19.225472 15212 IndexScanRule.cpp:440] No valid index found
E20221116 16:07:21.694936 15413 StorageClientBase-inl.h:206] Request to "127.0.0.1":9779 failed: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E20221116 16:07:21.695276 15212 StorageClientBase-inl.h:135] There some RPC errors: RPC failure in StorageClient: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E20221116 16:07:21.695677 15212 StorageAccessExecutor.h:39] Traverse failed, error E_RPC_FAILURE, part 1
E20221116 16:07:21.695791 15212 StorageAccessExecutor.h:39] Traverse failed, error E_RPC_FAILURE, part 2
E20221116 16:07:21.695840 15212 StorageAccessExecutor.h:136] Storage Error: part: 1, error: E_RPC_FAILURE(-3).
E20221116 16:07:21.696003 15212 QueryInstance.cpp:137] Storage Error: part: 1, error: E_RPC_FAILURE(-3).

能看看storage的log吗,还有core的栈

请问core栈怎么看

storaged-stderr.log (21.8 KB)
@liweihui storage 的log

抱歉,@错了 @liwenhui。求问目前这个错误是什么原因造成的?配置问题吗

emm…信息还是太少。。。目前这个错误能复现吗。比如新建个集群,导数据,然后执行语句,能复现吗?

您这边还需要哪些信息?我独立构建的两个单机都是这个问题,在basketball那个样例space执行没有问题,但在我自己的space上就会出现这个问题

请检查几个点:

  1. 内存是不是 OOM 了?看下dmesg,有没有啥oom kill
  2. 连接数是不是不够?通过 ulimit -n 检查下当前最大文件打开数,如果过小的话,可能新建立不起来连接。
  3. 如果 Storaged 宕机能复现,可以通过设置 ulimit -c unlimited 抓一下 corefile(一般在 storaged bin 目录附近,如果找不到可以搜下 ulimit -c 咋用)。然后用 gdb 打开 corefile,贴下出错堆栈。
1 个赞
dmesg -T 

看看是不是 OOM 被系统 kill 了

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。