2.5版本 banlance leader不起作用

2.0.1的时候是正常的,升级到2.5.1之后,我把数据全部清空了,都不能balance leader

show hosts图贴下
额 大概率还是你的配置不对

这个还有吗

有的,还多了一些其他错误
E1028 10:34:17.735702 12478 MetaClient.cpp:635] Send request to “172.19.143.225”:9559, exceed retry limit
E1028 10:34:17.736052 12437 MetaClient.cpp:65] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1028 10:35:19.389958 12642 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 16] Receive response about askForVote from “172.19.143.225”:9780, error code is E_TERM_OUT_OF_DATE
E1028 10:35:19.390007 12642 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 16] Receive response about askForVote from “172.19.143.226”:9780, error code is E_TERM_OUT_OF_DATE

额 和这个贴是一波人吗 Storaged进程无法启动 - #7,来自 fengliang
网一定是有问题啊 查端口是不是被占用了

不是一起的,我这里查询status是正常的,而且这次新出现了上千条E_TERM_OUT_OF_DATE的错误。之前master配错的时候还没有。

你试试全部停掉 把所有storage里的data_path路径下的data_path/nebula/spaceId/wal做下备份(各个space独立的) 然后删掉 重启 看看能行吗

不过还是推荐先看看网 因为网一定是有问题的 就算按上面操作做了 也有问题

Heartbeat failed这个日志就不该有 netstat或者ss看看

不行的,我上次清空数据试过,不行,而且我还有另外一个集群,用了完全不同的端口,但是报的错误一模一样。
这是另一个集群的日志

所以你得去查配置和网络啊 一定是有错的…… 要么配错了 要么重复了 要么被占用了……

9780端口么?

能列举一下nebula有哪些隐藏端口么?我这边配置文件里的端口貌似都没问题

storage会占用配置里port, port + 1, port - 1, port - 2总共4个
meta会占用配置里的port, port + 1应该是两个
graph只占port
文档里应该有写

端口看着都正常呢

我是直接拿2.0.1版本升级到2.5.1的,配置文件都没改过呢,升级前balance leader还是可用的。刚才试了下19559、19669、19779、19780这些端口也是正常的

有一台storage的日志和其它服务不一样

块引用
E1028 10:34:44.355659 24824 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 5] Receive response about askForVote from “172.19.143.226”:9780, error code is E_UNKNOWN_PART
E1028 10:34:44.429636 24824 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 9] Receive response about askForVote from “172.19.143.226”:9780, error code is E_UNKNOWN_PART
E1028 10:34:44.587910 24823 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 4] Receive response about askForVote from “172.19.143.226”:9780, error code is E_UNKNOWN_PART
E1028 10:34:44.886487 24822 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 20] Receive response about askForVote from “172.19.143.226”:9780, error code is E_UNKNOWN_PART
E1028 10:34:44.900859 24824 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 1] Receive response about askForVote from “172.19.143.226”:9780, error code is E_UNKNOWN_PART
E1028 10:35:05.969920 24823 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 15] Receive response about askForVote from “172.19.143.227”:9780, error code is E_UNKNOWN_PART
E1028 10:35:06.330148 24817 Host.cpp:375] [Port: 9780, Space: 90, Part: 5] [Host: 172.19.143.227:9780] Failed to append logs to the host (Err: E_UNKNOWN_PART)
E1028 10:35:06.332677 24824 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 19] Receive response about askForVote from “172.19.143.227”:9780, error code is E_UNKNOWN_PART
E1028 10:35:06.446908 24824 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 11] Receive response about askForVote from “172.19.143.227”:9780, error code is E_UNKNOWN_PART
E1028 10:35:07.217571 24824 RaftPart.cpp:1118] [Port: 9780, Space: 90, Part: 15] Receive response about askForVote from “172.19.143.227”:9780, error code is E_UNKNOWN_PART

那你还能退到2.0.1吗 6楼是说改配置了? 找报Heartbeat failed那个机器 然后netstat grep对应meta接口 看链接啥状况 日志已经说了链接没建立 先把这个解决了再说

配置没有改呢,虽然日志报了错,但是我这边netstat结果都是ESTABLISHED状态呢

你按照上面 critical 27 的方法操作下,然后截图下相关的情况,- -。不然问题就卡在这,进行不下去了



看着貌似没啥问题,第一张图是日志,第二张是netstat结果