meta E_LEADER_CHANGED

nebula 版本:3.0.1
部署方式 :分布式
是否为线上版本:Y
磁盘: SSD

问题描述:meta频繁出现E_LEADER_CHANGED错误(不定期会出现)

I20230614 13:14:04.771785 55528 BaseProcessor-inl.h:426] Put data error on meta server
E20230614 13:14:04.777541 55528 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName player error: E_LEADER_CHANGED
I20230614 13:14:29.000424 55335 FileBasedWal.cpp:673] [Port: 9560, Space: 0, Part: 0] Clean wals number 12
I20230614 14:04:28.026458 55528 BaseProcessor-inl.h:426] Put data error on meta server
E20230614 14:04:28.032284 55528 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName player error: E_LEADER_CHANGED

什么原因或者条件会频繁触发E_LEADER_CHANGED

报这个错误的时候,你是在进行什么操作的么?看下 meta 的日志呢?

在进行查询,集群的CPU和内存使用都没有明显变化,但是上行网络和下行网络突然升高至7M/s

@steam 从 nebula-metad.INFO 文件下拿到的具体日志还有

I20230614 13:05:51.633414 55528 BaseProcessor-inl.h:426] Put data error on meta server
E20230614 13:05:51.635138 55528 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
E20230614 13:05:51.654628 55528 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName player  error: E_LEADER_CHANGED
I20230614 13:07:20.714313 55364 EventListener.h:18] Rocksdb start compaction column family: default because of LevelL0FilesNum, status: OK, compacted 5 files into 0, base level is 0, output level is 1
I20230614 13:07:20.731410 55364 EventListener.h:28] Rocksdb compaction completed column family: default because of LevelL0FilesNum, status: OK, compacted 5 files into 1, base level is 0, output level is 1
I20230614 13:13:20.091154 55528 BaseProcessor-inl.h:426] Put data error on meta server
E20230614 13:13:20.105736 55528 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName player error: E_LEADER_CHANGED
I20230614 13:13:20.120905 55528 BaseProcessor-inl.h:426] Put data error on meta server
E20230614 13:13:20.138609 55528 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName player error: E_LEADER_CHANGED
I20230614 13:14:03.196099 55528 BaseProcessor-inl.h:426] Put data error on meta server
E20230614 13:14:03.201380 55528 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName player error: E_LEADER_CHANGED
I20230614 13:14:03.501564 55528 BaseProcessor-inl.h:426] Put data error on meta server

还有一条

E20230614 13:05:51.635138 55528 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
E20230614 13:05:51.654628 55528 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName player error: E_LEADER_CHANGED

data/meta/nebula/0/data/LOG中的内容如下

** File Read Latency Histogram By Level [default] **
2023/06/14-13:03:02.921807 7f15dd9f7700 [db/db_impl/db_impl.cc:1004] ------- DUMPING STATS -------
2023/06/14-13:03:02.921854 7f15dd9f7700 [db/db_impl/db_impl.cc:1006] 
** DB Stats **
Uptime(secs): 8026220.5 total, 600.0 interval
Cumulative writes: 27M writes, 9126M keys, 27M commit groups, 1.0 writes per commit group, ingest: 1077.94 GB, 0.14 MB/s
Cumulative WAL: 27M writes, 0 syncs, 27190338.00 writes per sync, written: 1077.94 GB, 0.14 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 1606 writes, 280 keys, 1606 commit groups, 1.0 writes per commit group, ingest: 32.01 MB, 0.05 MB/s
Interval WAL: 1606 writes, 0 syncs, 1606.00 writes per sync, written: 0.03 GB, 0.05 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 L0      3/0   757.53 KB   0.8      0.0     0.0      0.0      16.7     16.7       0.0   1.0      0.0      9.3   1845.20           1477.43     20752    0.089       0      0       0.0       0.0
 L1      1/0   407.95 KB   0.0     21.4    16.7      4.7       4.7     -0.0       0.0   0.3     62.1     13.6    352.56            258.87      5188    0.068    556M   436M       0.0       0.0
  Sum      4/0    1.14 MB   0.0     21.4    16.7      4.7      21.4     16.7       0.0   1.3     10.0     10.0   2197.76           1736.30     25940    0.085    556M   436M       0.0       0.0
Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0

** Compaction Stats [default] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Low      0/0    0.00 KB   0.0     21.4    16.7      4.7       4.7     -0.0       0.0   0.0     62.1     13.6    352.56            258.87      5188    0.068    556M   436M       0.0       0.0
High      0/0    0.00 KB   0.0      0.0     0.0      0.0      16.7     16.7       0.0   0.0      0.0      9.3   1845.16           1477.43     20751    0.089       0      0       0.0       0.0
User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     92.4      0.04              0.00         1    0.039       0      0       0.0       0.0
Blob file count: 0, total size: 0.0 GB

Uptime(secs): 8026220.5 total, 600.0 interval
Flush(GB): cumulative 16.713, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 21.38 GB write, 0.00 MB/s write, 21.39 GB read, 0.00 MB/s read, 2197.8 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count):0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
Block cache LRUCache@0x7f161021dc30#55285 capacity: 1.00 GB collections: 13378 last_copies: 0 last_secs: 0.000752 secs_since: 0
Block cache entry stats(count,size,portion): DataBlock(8,32.62 KB,0.00311136%) Misc(1,0.00 KB,0%)

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。