提问参考模版:
nebula 版本:v2.6.1
部署方式:分布式 3 meta + 6 graph + 6 storage
安装方式:RPM
是否为线上版本:Y
硬件信息
磁盘( 推荐使用 SSD)SSD
CPU、内存信息
问题的具体描述
相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
做了操作:
周四测试环境有服务用了guest a账号,但是该账号没有给对应space 授权,出现了大量的read space no permission 和 少量的 create session error leader changed[原先这个guest 账号是正常的可以服务]
意识到no permission 是忘记授权,后出现了大量的
create session error leader changed
如下图
一开始以为是meta 服务出现授权问题,就晚上重启服务,结果storaged 服务一直不能正常启动
下面是节点1 的log
meta info log
meta error log
storaged info
storaged err
你现在先得把3副本meta状态搞好 没leader
就先启动meta 看到日志里选出leader再启动别的
有存在error:
但是我看info log 理由貌似成功了,这算是吗
节点1
节点2
节点3
spw
2022 年2 月 19 日 08:41
8
现在怎么样了? meta 的日志可以多给点,不太能看出是否成功启动,另外,最好给文字日志。
我看看有没有办法复制,我在家远程,复制功能被禁了…
我过了两个小时后,启动meta 所在的graphd 和 storaged 还是不行
出现了meta error
Log file created at: 2022/02/19 11:41:54
Running on machine: nebula-server-prod-01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0219 11:41:54.369613 22660 RaftPart.cpp:1050] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from "10.0.7.205":9560, error code is E_TERM_OUT_OF_DATE
E0219 16:20:17.298580 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:20:22.936306 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:20:22.957576 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:26:37.991048 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:28:03.037576 22743 FTServiceProcessor.cpp:63] List fulltext failed, error: E_LEADER_CHANGED
E0219 16:30:15.057971 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:31:09.541867 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:31:17.061564 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:31:52.440718 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:32:20.555871 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:42:07.907642 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:48:31.397135 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:51:01.723522 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:54:13.709414 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:54:58.461683 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
你把3个meta的INFO日志都发出来(目前看是有leader的)但是storage发的心跳 meta没收到或者没有发给meta leader
比如你发的节点1的storage 都超时了(storage日志里的Timed Out) 网络好好查查 防火墙一类的
Reid00
2022 年2 月 19 日 11:52
13
节点2, 好多个1G info log
nebula-metad.nebula-server-prod-03.root.log.ERROR.20220219-114211.6430 (1.7 KB)
nebula-metad.nebula-server-prod-03.root.log.INFO.20220219-114210.6430 (7.1 KB)
nebula-metad.nebula-server-prod-03.root.log.WARNING.20220219-114211.6430 (1.9 KB)
都是这种
[root@nebula-server-prod-02 logs]# head -n 50 nebula-metad.nebula-server-prod-02.root.log.INFO.20220219-162219.25800
Log file created at: 2022/02/19 16:22:19
Running on machine: nebula-server-prod-02
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0219 16:22:19.623003 25906 SessionManagerProcessor.cpp:131] resp list session: 1638930566301495
I0219 16:22:19.623008 25909 SessionManagerProcessor.cpp:131] resp list session: 1645165727982404
I0219 16:22:19.623014 25905 SessionManagerProcessor.cpp:131] resp list session: 1645101817595442
I0219 16:22:19.623023 25907 SessionManagerProcessor.cpp:131] resp list session: 1645107236869175
I0219 16:22:19.623028 25911 SessionManagerProcessor.cpp:131] resp list session: 1645095833434788
I0219 16:22:19.623255 25911 SessionManagerProcessor.cpp:131] resp list session: 1638344329591460
I0219 16:22:19.623261 25911 SessionManagerProcessor.cpp:131] resp list session: 1645131804113572
I0219 16:22:19.623052 25914 SessionManagerProcessor.cpp:131] resp list session: 1645153300472547
I0219 16:22:19.623059 25904 SessionManagerProcessor.cpp:131] resp list session: 1638521605981233
I0219 16:22:19.623064 25912 SessionManagerProcessor.cpp:131] resp list session: 1645111249910197
I0219 16:22:19.623071 25908 SessionManagerProcessor.cpp:131] resp list session: 1645149749199682
I0219 16:22:19.623080 25915 SessionManagerProcessor.cpp:131] resp list session: 1645108418328815
I0219 16:22:19.623219 25906 SessionManagerProcessor.cpp:131] resp list session: 1639037722838839
I0219 16:22:19.623229 25909 SessionManagerProcessor.cpp:131] resp list session: 1645090734547780
I0219 16:22:19.623239 25905 SessionManagerProcessor.cpp:131] resp list session: 1645156846994994
I0219 16:22:19.623250 25907 SessionManagerProcessor.cpp:131] resp list session: 1645186241565751
I0219 16:22:19.623034 25913 SessionManagerProcessor.cpp:131] resp list session: 1645096393388470
I0219 16:22:19.623044 25910 SessionManagerProcessor.cpp:131] resp list session: 1645084604167844
I0219 16:22:19.623266 25911 SessionManagerProcessor.cpp:131] resp list session: 1645094944635556
I0219 16:22:19.623276 25914 SessionManagerProcessor.cpp:131] resp list session: 1645177409463011
I0219 16:22:19.623282 25904 SessionManagerProcessor.cpp:131] resp list session: 1645113055511601
I0219 16:22:19.623288 25912 SessionManagerProcessor.cpp:131] resp list session: 1645085548853685
I0219 16:22:19.623296 25908 SessionManagerProcessor.cpp:131] resp list session: 1645145923994434
I0219 16:22:19.623306 25915 SessionManagerProcessor.cpp:131] resp list session: 1645099845171439
I0219 16:22:19.623311 25906 SessionManagerProcessor.cpp:131] resp list session: 1645119245797175
I0219 16:22:19.623317 25909 SessionManagerProcessor.cpp:131] resp list session: 1641889860035396
I0219 16:22:19.623323 25905 SessionManagerProcessor.cpp:131] resp list session: 1645104451880498
I0219 16:22:19.623330 25907 SessionManagerProcessor.cpp:131] resp list session: 1645170957718583
I0219 16:22:19.623335 25913 SessionManagerProcessor.cpp:131] resp list session: 1645113238827446
I0219 16:22:19.623344 25910 SessionManagerProcessor.cpp:131] resp list session: 1645153324234404
I0219 16:22:19.623349 25911 SessionManagerProcessor.cpp:131] resp list session: 1645174602988196
I0219 16:22:19.623356 25914 SessionManagerProcessor.cpp:131] resp list session: 1641311586706147
I0219 16:22:19.623363 25904 SessionManagerProcessor.cpp:131] resp list session: 1644918725477425
I0219 16:22:19.623368 25912 SessionManagerProcessor.cpp:131] resp list session: 1645165811251637
I0219 16:22:19.623375 25908 SessionManagerProcessor.cpp:131] resp list session: 1642498177581890
I0219 16:22:19.623381 25915 SessionManagerProcessor.cpp:131] resp list session: 1645098520492271
I0219 16:22:19.623386 25906 SessionManagerProcessor.cpp:131] resp list session: 1645141609891639
I0219 16:22:19.623392 25909 SessionManagerProcessor.cpp:131] resp list session: 1645084863112004
I0219 16:22:19.623399 25905 SessionManagerProcessor.cpp:131] resp list session: 1645116430878258
I0219 16:22:19.623405 25907 SessionManagerProcessor.cpp:131] resp list session: 1645089772770359
I0219 16:22:19.623411 25913 SessionManagerProcessor.cpp:131] resp list session: 1645154410181046
I0219 16:22:19.623417 25910 SessionManagerProcessor.cpp:131] resp list session: 1645124350178980
I0219 16:22:19.623423 25911 SessionManagerProcessor.cpp:131] resp list session: 1645089425259172
I0219 16:22:19.623430 25914 SessionManagerProcessor.cpp:131] resp list session: 1645100335260387
I0219 16:22:19.623436 25904 SessionManagerProcessor.cpp:131] resp list session: 1645093594465329
I0219 16:22:19.623442 25912 SessionManagerProcessor.cpp:131] resp list session: 1645096404826549
I0219 16:22:19.623450 25908 SessionManagerProcessor.cpp:131] resp list session: 1645117218636610
219 16:59:12.436491 25912 SessionManagerProcessor.cpp:131] resp list session: 1645148060845784
I0219 16:59:12.436496 25913 SessionManagerProcessor.cpp:131] resp list session: 1645090815046866
I0219 16:59:12.436502 25908 SessionManagerProcessor.cpp:131] resp list session: 1645119606519974
I0219 16:59:12.436511 25907 SessionManagerProcessor.cpp:131] resp list session: 1645139276281759
I0219 16:59:12.436517 25909 SessionManagerProcessor.cpp:131] resp list session: 1645101953281707
I0219 16:59:12.436522 25915 SessionManagerProcessor.cpp:131] resp list session: 1645107632198670
I0219 16:59:12.436529 25905 SessionManagerProcessor.cpp:131] resp list session: 1645156872040811
I0219 16:59:12.436535 25903 SessionManagerProcessor.cpp:131] resp list session: 1640574639094085
I0219 16:59:12.436542 25911 SessionManagerProcessor.cpp:131] resp list session: 1645084128541681
I0219 16:59:12.436547 25900 SessionManagerProcessor.cpp:131] resp list session: 1645091543042570
I0219 16:59:12.436553 25914 SessionManagerProcessor.cpp:131] resp list session: 1638930265437206
I0219 16:59:12.436560 25902 SessionManagerProcessor.cpp:131] resp list session: 1645171079530297
I0219 16:59:12.436568 25906 SessionManagerProcessor.cpp:131] resp list session: 1645154098420340
I0219 16:59:12.436573 25904 SessionManagerProcessor.cpp:131] resp list session: 1645111350619771
I0219 16:59:12.436580 25901 SessionManagerProcessor.cpp:131] resp list session: 1645118056401221
I0219 16:59:12.436585 25912 SessionManagerProcessor.cpp:131] resp list session: 1645170224858840
I0219 16:59:12.436591 25913 SessionManagerProcessor.cpp:131] resp list session: 1645129487643858
I0219 16:59:12.436597 25908 SessionManagerProcessor.cpp:131] resp list session: 1645156466129062
节点3 请查看附件
你们session咋用的 不会是一个query一个吧? 我感觉吧 就是那个guest a用的方式不对……
日志要是大 你可以把这行日志"grep -v"过滤掉
“Receive heartbeat from” 然后看看附近有没有啥报错
流量要是能停 可以都停掉 先把服务恢复正常再说
Reid00
2022 年2 月 19 日 12:59
16
critical27:
流量要是能停 可以都停掉 先把服
a.log (144.4 KB)
现在流量是已经停了。所有服务停下了,所以这个session 就挺奇怪的,没排查到来自哪边。
这个是上午时间点的。对应时间找不到了。
我重新操作下,停掉所有meta 然后重启,把对应的log 拿上来
Reid00
2022 年2 月 19 日 13:02
17
session 可能是一个query 从pool 里面get session一个然后release.
Reid00
2022 年2 月 19 日 14:07
19
确认了下发现有一个增量 自动重启了,没有停成功,现在全部停了
1 个赞