- nebula 版本:GitHub 上最新版本
- 部署方式(分布式 ):集群部署,3 台机器
- 硬件信息
- 磁盘:NVME SSD 3.2TB
- 内存:256GB
(root@nebula) [(none)]> describe space push_new
+----+------------+------------------+----------------+---------+------------+--------------------+-----------+
| ID | Name | Partition Number | Replica Factor | Charset | Collate | Vid Type | Group |
+----+------------+------------------+----------------+---------+------------+--------------------+-----------+
| 2 | "push_new" | 30 | 3 | "utf8" | "utf8_bin" | "FIXED_STRING(24)" | "default" |
+----+------------+------------------+----------------+---------+------------+--------------------+-----------+
Got 1 rows (time spent 1050/1298 us)
- 问题的具体描述:
在压测过程中发现 RaftPart 的 Leader 自动变化了(所有的进程都没有 core 过),有没有什么参数可以控制这个变化。昨天看每台机器上的 leader count 都是 20,今天看发现有变化。
(root@nebula) [(none)]> show hosts
+------------------+-------+----------+--------------+------------------------------+------------------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution |
+------------------+-------+----------+--------------+------------------------------+------------------------------+
| "xxx.xxx.xxx.xxx" | 44500 | "ONLINE" | 16 | "push_new:4, push_space:12" | "push_new:30, push_space:30" |
+------------------+-------+----------+--------------+------------------------------+------------------------------+
| "xxx.xxx.xxx.xxx" | 44500 | "ONLINE" | 15 | "push_new:10, push_space:5" | "push_new:30, push_space:30" |
+------------------+-------+----------+--------------+------------------------------+------------------------------+
| "xxx.xxx.xxx.xxx" | 44500 | "ONLINE" | 29 | "push_new:16, push_space:13" | "push_new:30, push_space:30" |
+------------------+-------+----------+--------------+------------------------------+------------------------------+
| "Total" | | | 60 | "push_new:30, push_space:30" | "push_new:90, push_space:90" |
+------------------+-------+----------+--------------+------------------------------+------------------------------+
并且storaged上出现了错误日志,应该是Leader变化导致的。
E0127 22:02:59.256729 31565 RaftPart.cpp:367] [Port: 44501, Space: 2, Part: 20] The partition is not a leader
E0127 22:02:59.256877 31565 RaftPart.cpp:687] [Port: 44501, Space: 2, Part: 20] Cannot append logs, clean the buffer
E0127 22:02:59.679719 31568 RaftPart.cpp:367] [Port: 44501, Space: 2, Part: 19] The partition is not a leader
E0127 22:02:59.679809 31568 RaftPart.cpp:687] [Port: 44501, Space: 2, Part: 19] Cannot append logs, clean the buffer
E0127 22:02:59.680357 31581 RaftPart.cpp:687] [Port: 44501, Space: 2, Part: 19] Cannot append logs, clean the buffer
E0127 22:02:59.680359 31552 RaftPart.cpp:367] [Port: 44501, Space: 2, Part: 18] The partition is not a leader
E0127 22:02:59.680780 31552 RaftPart.cpp:687] [Port: 44501, Space: 2, Part: 6] Cannot append logs, clean the buffer
E0127 22:02:59.680786 31561 RaftPart.cpp:367] [Port: 44501, Space: 2, Part: 1] The partition is not a leader
E0127 22:02:59.681671 31545 RaftPart.cpp:367] [Port: 44501, Space: 2, Part: 18] The partition is not a leader
E0127 22:02:59.681681 31570 RaftPart.cpp:687] [Port: 44501, Space: 2, Part: 19] Cannot append logs, clean the buffer
E0127 22:02:59.681684 31575 RaftPart.cpp:687] [Port: 44501, Space: 2, Part: 6] Cannot append logs, clean the buffer