服务凌晨出现Heartbeat failed,查询超时,集群重启后服务不可用

  • nebula 版本:2.0.1.el7
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘 500G
    • CPU、内存 8核 32G

1、凌晨graph info日志出现Heartbeat failed, status:LeaderChanged: Leader changed,查询超时严重

graph info日志:q

E1228 04:12:05.345849 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:20:56.903790 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:23:26.385289 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:25:17.969818 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:25:36.983022 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:35:05.218590 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:41:18.639410 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:49:24.752252 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:52:15.506036 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:54:12.941144 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:09:16.360565 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:14:45.133926 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:23:14.778748 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:26:33.099114 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:33:30.971602 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:30:40.618229 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:31:45.810036 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:33:29.017489 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:38:00.149250 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 07:15:18.661478 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 07:22:27.766355 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 07:31:16.818842 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
I1228 07:42:58.195263 69805 FileBasedWal.cpp:738] [Port: 9780, Space: 4, Part: 14] Clean wals number 1
E1228 07:47:12.174989 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 08:13:25.373438 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 08:21:13.092036 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 08:22:43.536974 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 08:55:44.702302 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 08:56:23.662308 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 09:09:28.416532 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 09:12:01.267494 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 09:14:43.572655 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 09:18:00.915102 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 09:29:03.364143 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!

meta info日志

I1228 04:12:04.360867 69231 Part.cpp:191] [Port: 9560, Space: 0, Part: 0] Find the new leader "10.63.11.34":9560
I1228 04:12:04.361742 69199 FileBasedWal.cpp:400] [Port: 9560, Space: 0, Part: 0] Rollback to log 19473405
I1228 04:12:04.361783 69199 RaftPart.cpp:1628] [Port: 9560, Space: 0, Part: 0] Rollback succeeded! lastLogId is 19473405, logLogTerm is 597, committedLogId is 19473405, term is 614
I1228 04:12:05.371356 69199 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473426. Need to catch up
I1228 04:12:11.051193 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473433, lastLogTerm 614, lastLogIdSent 19473432, lastLogTermSent 614
I1228 04:12:19.416007 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473437, lastLogTerm 614, lastLogIdSent 19473436, lastLogTermSent 614
I1228 04:12:19.416178 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473437, lastLogTerm 614, lastLogIdSent 19473436, lastLogTermSent 614
I1228 04:12:19.416312 69222 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473437, lastLogTerm 614, lastLogIdSent 19473436, lastLogTermSent 614
I1228 04:12:25.572573 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473447, lastLogTerm 614, lastLogIdSent 19473446, lastLogTermSent 614
I1228 04:12:25.572749 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473447, lastLogTerm 614, lastLogIdSent 19473446, lastLogTermSent 614
I1228 04:12:25.572890 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473447, lastLogTerm 614, lastLogIdSent 19473446, lastLogTermSent 614
I1228 04:12:26.625326 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473447, lastLogTerm 614, lastLogIdSent 19473446, lastLogTermSent 614
I1228 04:12:37.833799 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473461, lastLogTerm 614, lastLogIdSent 19473460, lastLogTermSent 614
I1228 04:12:40.586129 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473471, lastLogTerm 614, lastLogIdSent 19473470, lastLogTermSent 614
I1228 04:12:40.586356 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473471, lastLogTerm 614, lastLogIdSent 19473470, lastLogTermSent 614
I1228 04:12:42.719830 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473471, lastLogTerm 614, lastLogIdSent 19473470, lastLogTermSent 614
I1228 04:12:54.395668 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473486, lastLogTerm 614, lastLogIdSent 19473485, lastLogTermSent 614
I1228 04:12:57.802315 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473487, lastLogTerm 614, lastLogIdSent 19473486, lastLogTermSent 614
I1228 04:12:58.808308 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473488, lastLogTerm 614, lastLogIdSent 19473487, lastLogTermSent 614
I1228 04:13:16.168349 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473508, lastLogTerm 614, lastLogIdSent 19473507, lastLogTermSent 614
I1228 04:13:24.068727 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473522, lastLogTerm 614, lastLogIdSent 19473521, lastLogTermSent 614
I1228 04:13:35.108091 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473536, lastLogTerm 614, lastLogIdSent 19473535, lastLogTermSent 614
I1228 04:13:37.412648 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473536, lastLogTerm 614, lastLogIdSent 19473535, lastLogTermSent 614
I1228 04:13:49.087165 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473549, lastLogTerm 614, lastLogIdSent 19473548, lastLogTermSent 614
I1228 04:13:51.648509 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473550, lastLogTerm 614, lastLogIdSent 19473549, lastLogTermSent 614
I1228 04:13:51.648792 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473550, lastLogTerm 614, lastLogIdSent 19473549, lastLogTermSent 614
I1228 04:13:51.648978 69222 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473550, lastLogTerm 614, lastLogIdSent 19473549, lastLogTermSent 614
I1228 04:13:51.649127 69209 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473550, lastLogTerm 614, lastLogIdSent 19473549, lastLogTermSent 614
I1228 04:13:51.706035 69209 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473550, lastLogTerm 614, lastLogIdSent 19473549, lastLogTermSent 614
I1228 04:13:54.949151 69222 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473553, lastLogTerm 614, lastLogIdSent 19473552, lastLogTermSent 614
I1228 04:13:55.758502 69222 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473553. Need to catch up
I1228 04:13:59.094991 69222 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473562, lastLogTerm 614, lastLogIdSent 19473561, lastLogTermSent 614
I1228 04:14:02.729728 69209 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473563, lastLogTerm 614, lastLogIdSent 19473562, lastLogTermSent 614
I1228 04:14:02.729964 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473563, lastLogTerm 614, lastLogIdSent 19473562, lastLogTermSent 614
I1228 04:14:02.745608 69218 RaftPart.cpp:1595] [Port: 9560, Space: 0, Part: 0] Stale log! The log 19473562, term 614 i had committed yet. My committedLogId is 19473571, term is 614
I1228 04:14:09.100823 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473574, lastLogTerm 614, lastLogIdSent 19473573, lastLogTermSent 614
I1228 04:14:28.970634 69218 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473588, lastLogTerm 614, lastLogIdSent 19473587, lastLogTermSent 614
I1228 04:14:35.859918 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473599, lastLogTerm 614, lastLogIdSent 19473598, lastLogTermSent 614
I1228 04:14:49.728612 69231 FileBasedWal.cpp:738] [Port: 9560, Space: 0, Part: 0] Clean wals number 1
I1228 04:14:59.293138 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473633, lastLogTerm 614, lastLogIdSent 19473632, lastLogTermSent 614
I1228 04:15:12.268908 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473646, lastLogTerm 614, lastLogIdSent 19473645, lastLogTermSent 614
I1228 04:15:18.440745 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473657, lastLogTerm 614, lastLogIdSent 19473656, lastLogTermSent 614
I1228 04:15:20.673915 69208 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473658, lastLogTerm 614, lastLogIdSent 19473657, lastLogTermSent 614
I1228 04:15:27.190588 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473660, lastLogTerm 614, lastLogIdSent 19473659, lastLogTermSent 614
I1228 04:15:27.190860 69209 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473660. Need to catch up
I1228 04:15:27.191545 69218 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473660. Need to catch up
I1228 04:15:28.446431 69218 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473660. Need to catch up
I1228 04:15:36.985105 69209 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473671, lastLogTerm 614, lastLogIdSent 19473670, lastLogTermSent 614
I1228 04:15:36.985340 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473671, lastLogTerm 614, lastLogIdSent 19473670, lastLogTermSent 614
I1228 04:15:36.985452 69208 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473671. Need to catch up
I1228 04:15:39.019850 69208 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473671. Need to catch up
I1228 04:15:44.846427 69208 RaftPart.cpp:1361] [Port: 9560, Space: 0, Part: 0] Recieved a VOTING request: space = 0, partition = 0, candidateAddr = 10.63.11.36:9560, term = 615, lastLogId = 19473680, lastLogTerm = 614
I1228 04:15:44.846760 69208 RaftPart.cpp:1394] [Port: 9560, Space: 0, Part: 0] The partition currently is a Follower, lastLogId 19473680, lastLogTerm 614, committedLogId 19473679, term 614
I1228 04:15:44.847272 69208 RaftPart.cpp:1463] [Port: 9560, Space: 0, Part: 0] The partition will vote for the candidate "10.63.11.36":9560
I1228 04:15:44.847694 69208 RaftPart.cpp:1761] [Port: 9560, Space: 0, Part: 0] The current role is Follower. Will follow the new leader 10.63.11.36:9560 [Term: 615]
I1228 04:15:44.847883 69231 Part.cpp:191] [Port: 9560, Space: 0, Part: 0] Find the new leader "10.63.11.36":9560
I1228 04:15:50.240667 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473682, lastLogTerm 615, lastLogIdSent 19473681, lastLogTermSent 615
I1228 04:15:50.241006 69209 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473682. Need to catch up
I1228 04:15:53.383584 69209 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473692, lastLogTerm 615, lastLogIdSent 19473691, lastLogTermSent 615
I1228 04:16:12.112587 69199 RaftPart.cpp:1651] [Port: 9560, Space: 0, Part: 0] Stale log! Local lastLogId 19473708, lastLogTerm 615, lastLogIdSent 19473707, lastLogTermSent 615
I1228 04:16:13.796914 69199 RaftPart.cpp:1361] [Port: 9560, Space: 0, Part: 0] Recieved a VOTING request: space = 0, partition = 0, candidateAddr = 10.63.11.34:9560, term = 616, lastLogId = 19473713, lastLogTerm = 615
I1228 04:16:13.797056 69199 RaftPart.cpp:1394] [Port: 9560, Space: 0, Part: 0] The partition currently is a Follower, lastLogId 19473708, lastLogTerm 615, committedLogId 19473707, term 615
I1228 04:16:13.797086 69199 RaftPart.cpp:1463] [Port: 9560, Space: 0, Part: 0] The partition will vote for the candidate "10.63.11.34":9560
I1228 04:16:13.799127 69199 RaftPart.cpp:1761] [Port: 9560, Space: 0, Part: 0] The current role is Follower. Will follow the new leader 10.63.11.34:9560 [Term: 616]
I1228 04:16:13.799172 69199 RaftPart.cpp:1637] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 19473708. Need to catch up

storaged info 日志

I1228 00:54:16.425451 69810 EventListner.h:18] Rocksdb start compaction column family: default because of LevelL0FilesNum, status: OK, compacted 11 files into 0, base level is 0, output level is 1
I1228 00:54:16.425695 69810 CompactionFilter.h:62] Do custom minor compaction!
I1228 00:54:16.427359 16712 CompactionFilter.h:62] Do custom minor compaction!
I1228 00:54:16.427392 16713 CompactionFilter.h:62] Do custom minor compaction!
I1228 00:54:16.427400 16714 CompactionFilter.h:66] Do default minor compaction!
I1228 00:54:24.198482 69810 EventListner.h:30] Rocksdb compaction completed column family: default because of LevelL0FilesNum, status: OK, compacted 11 files into 7, base level is 0, output level is 1
I1228 00:54:24.302139 69810 EventListner.h:18] Rocksdb start compaction column family: default because of LevelMaxLevelSize, status: OK, compacted 3 files into 0, base level is 1, output level is 2
I1228 00:54:24.302273 69810 CompactionFilter.h:66] Do default minor compaction!
I1228 00:54:36.353516 69810 EventListner.h:30] Rocksdb compaction completed column family: default because of LevelMaxLevelSize, status: OK, compacted 3 files into 3, base level is 1, output level is 2
E1228 04:12:05.345849 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:20:56.903790 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:23:26.385289 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:25:17.969818 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:25:36.983022 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:35:05.218590 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:41:18.639410 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:49:24.752252 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:52:15.506036 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 04:54:12.941144 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:09:16.360565 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:14:45.133926 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:23:14.778748 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:26:33.099114 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 05:33:30.971602 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:30:40.618229 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:31:45.810036 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:33:29.017489 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 06:38:00.149250 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 07:15:18.661478 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 07:22:27.766355 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 07:31:16.818842 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
I1228 07:42:58.195263 69805 FileBasedWal.cpp:738] [Port: 9780, Space: 4, Part: 14] Clean wals number 1
E1228 07:47:12.174989 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!
E1228 08:13:25.373438 69803 MetaClient.cpp:124] Heartbeat failed, status:LeaderChanged: Leader changed!

2、集群整体重启后,多次尝试,服务不能完全启动
storaged-stderr.log

nebula-storaged: /usr/src/third-party/third-party/build/folly/source/folly/detail/MemoryIdler.cpp:134: void folly::detail::fetchStackLimits(): Assertion `false' failed.
*** Aborted at 1640668713 (unix time) try "date -d @1640668713" if you are using GNU date ***
PC: @     0x2afa89dc4207 __GI_raise
*** SIGABRT (@0x11a84) received by PID 72324 (TID 0x2afa8925d700) from PID 72324; stack trace: ***
    @          0x2024271 (unknown)
    @     0x2afa89b815cf (unknown)
    @     0x2afa89dc4207 __GI_raise
    @     0x2afa89dc58f7 __GI_abort
    @     0x2afa89dbd025 __assert_fail_base
    @     0x2afa89dbd0d1 __GI___assert_fail
    @          0x1effad9 folly::detail::MemoryIdler::unmapUnusedStack()
    @          0x1f1d65e folly::fibers::Baton::waitThread()
    @          0x1f1d90f folly::fibers::Baton::wait()
    @          0x14915da folly::futures::detail::waitImpl<>()
    @          0x1494b71 _ZNO5folly6FutureIN6nebula8StatusOrIbEEE3getEv
    @          0x13c5276 nebula::meta::MetaClient::isMetadReady()
    @          0x13c55b2 nebula::meta::MetaClient::waitForMetadReady()
    @          0x1122648 nebula::storage::StorageServer::start()
    @          0x10e6e37 main
    @     0x2afa89db03d4 __libc_start_main
    @          0x111a55d (unknown)

graphd-stderr.log

E1228 15:06:52.629854 10223 MetaClient.cpp:597] Send request to "10.63.11.36":9559, exceed retry limit
E1228 15:06:52.630111 10230 MetaClient.cpp:124] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not 
open, errno = 111 (Connection refused): Connection refused
E1228 15:09:05.647876 10224 MetaClient.cpp:597] Send request to "10.63.11.36":9559, exceed retry limit
E1228 15:09:05.648178 10230 MetaClient.cpp:124] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not 
open, errno = 111 (Connection refused): Connection refused
E1228 15:13:18.677623 10225 MetaClient.cpp:597] Send request to "10.63.11.24":9559, exceed retry limit
E1228 15:13:18.678031 10230 MetaClient.cpp:124] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E1228 15:14:36.748534  2851 MemoryIdler.cpp:132] pthread_attr_getstack returned insane stack size 93163914248192
nebula-graphd: /usr/src/third-party/third-party/build/folly/source/folly/detail/MemoryIdler.cpp:134: void folly::detail::fetchStackLimits(): Assertion `false' failed.
*** Aborted at 1640675676 (unix time) try "date -d @1640675676" if you are using GNU date ***
PC: @     0x2b41853bd207 __GI_raise
*** SIGABRT (@0xb23) received by PID 2851 (TID 0x2b4184856640) from PID 2851; stack trace: ***
    @          0x1e66491 (unknown)
    @     0x2b418517a5cf (unknown)
    @     0x2b41853bd207 __GI_raise
    @     0x2b41853be8f7 __GI_abort
    @     0x2b41853b6025 __assert_fail_base
    @     0x2b41853b60d1 __GI___assert_fail
    @          0x1d418a9 folly::detail::MemoryIdler::unmapUnusedStack()
    @          0x1d5f87e folly::fibers::Baton::waitThread()
    @          0x1d5fb2f folly::fibers::Baton::wait()
    @          0x16d78ea folly::futures::detail::waitImpl<>()
    @          0x16dae81 _ZNO5folly6FutureIN6nebula8StatusOrIbEEE3getEv
    @          0x1625ff6 nebula::meta::MetaClient::isMetadReady()
    @          0x1626332 nebula::meta::MetaClient::waitForMetadReady()
    @          0x10cac2d nebula::graph::QueryEngine::init()
    @          0x10c40bf nebula::graph::GraphService::init()
    @          0x108282a main
    @     0x2b41853a93d4 __libc_start_main
    @          0x10988ad (unknown)
E1228 16:07:28.421797 66588 MemoryIdler.cpp:132] pthread_attr_getstack returned insane stack size 93376521891840

2.0.1好像meta没处理leader change, 你最好升下版本

1 个赞

我也遇到这样的问题了。只能强行升级吗?线上环境啊 :grinning_face_with_smiling_eyes:
而且相关依赖也必须要一起升

你们也是用的 2.0.1 版本吗?

嗯呀。还好我重启解决这个问题了。不然真的是头疼