nebula storaged 启动失败

提问参考模版:

  • nebula 版本:v2.6.1
  • 部署方式:分布式 3 meta + 6 graph + 6 storage
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘( 推荐使用 SSD)SSD
    • CPU、内存信息
  • 问题的具体描述
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
    做了操作:
    周四测试环境有服务用了guest a账号,但是该账号没有给对应space 授权,出现了大量的read space no permission 和 少量的 create session error leader changed[原先这个guest 账号是正常的可以服务]
    意识到no permission 是忘记授权,后出现了大量的
    create session error leader changed
    如下图

一开始以为是meta 服务出现授权问题,就晚上重启服务,结果storaged 服务一直不能正常启动

下面是节点1 的log
meta info log

meta error log

storaged info

storaged err

你现在先得把3副本meta状态搞好 没leader

怎么弄呢?我试过先单独起来三个meta 服务

就先启动meta 看到日志里选出leader再启动别的

好的,我先试试

有存在error:

但是我看info log 理由貌似成功了,这算是吗
节点1


节点2

节点3

并且会一直刷这个log

现在怎么样了? meta 的日志可以多给点,不太能看出是否成功启动,另外,最好给文字日志。

我看看有没有办法复制,我在家远程,复制功能被禁了…
我过了两个小时后,启动meta 所在的graphd 和 storaged 还是不行
出现了meta error

Log file created at: 2022/02/19 11:41:54
Running on machine: nebula-server-prod-01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0219 11:41:54.369613 22660 RaftPart.cpp:1050] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from "10.0.7.205":9560, error code is E_TERM_OUT_OF_DATE
E0219 16:20:17.298580 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:20:22.936306 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:20:22.957576 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:26:37.991048 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:28:03.037576 22743 FTServiceProcessor.cpp:63] List fulltext failed, error: E_LEADER_CHANGED
E0219 16:30:15.057971 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:31:09.541867 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:31:17.061564 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:31:52.440718 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:32:20.555871 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:42:07.907642 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:48:31.397135 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:51:01.723522 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:54:13.709414 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED
E0219 16:54:58.461683 22743 ActiveHostsMan.cpp:246] Get last update time failed, error: E_LEADER_CHANGED

我把节点1的meta log 和 storage log 落下来了,麻烦看下
nebula-metad.nebula-server-prod-01.root.log.ERROR.20220219-114154.22619 (1.8 KB)
nebula-metad.nebula-server-prod-01.root.log.INFO.20220219-114143.22619 (12.9 KB)
nebula-metad.nebula-server-prod-01.root.log.WARNING.20220219-114154.22619 (2.1 KB)
nebula-storaged.nebula-server-prod-01.root.log.ERROR.20220219-162213.5633 (3.3 KB)
nebula-storaged.nebula-server-prod-01.root.log.INFO.20220219-162009.5633 (4.6 KB)
nebula-storaged.nebula-server-prod-01.root.log.ERROR.20220219-162213.5633 (3.3 KB)

好的,我们看下

你把3个meta的INFO日志都发出来(目前看是有leader的)但是storage发的心跳 meta没收到或者没有发给meta leader
比如你发的节点1的storage 都超时了(storage日志里的Timed Out) 网络好好查查 防火墙一类的

节点2, 好多个1G info log
nebula-metad.nebula-server-prod-03.root.log.ERROR.20220219-114211.6430 (1.7 KB)
nebula-metad.nebula-server-prod-03.root.log.INFO.20220219-114210.6430 (7.1 KB)
nebula-metad.nebula-server-prod-03.root.log.WARNING.20220219-114211.6430 (1.9 KB)
都是这种

[root@nebula-server-prod-02 logs]# head -n 50 nebula-metad.nebula-server-prod-02.root.log.INFO.20220219-162219.25800
Log file created at: 2022/02/19 16:22:19
Running on machine: nebula-server-prod-02
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0219 16:22:19.623003 25906 SessionManagerProcessor.cpp:131] resp list session: 1638930566301495
I0219 16:22:19.623008 25909 SessionManagerProcessor.cpp:131] resp list session: 1645165727982404
I0219 16:22:19.623014 25905 SessionManagerProcessor.cpp:131] resp list session: 1645101817595442
I0219 16:22:19.623023 25907 SessionManagerProcessor.cpp:131] resp list session: 1645107236869175
I0219 16:22:19.623028 25911 SessionManagerProcessor.cpp:131] resp list session: 1645095833434788
I0219 16:22:19.623255 25911 SessionManagerProcessor.cpp:131] resp list session: 1638344329591460
I0219 16:22:19.623261 25911 SessionManagerProcessor.cpp:131] resp list session: 1645131804113572
I0219 16:22:19.623052 25914 SessionManagerProcessor.cpp:131] resp list session: 1645153300472547
I0219 16:22:19.623059 25904 SessionManagerProcessor.cpp:131] resp list session: 1638521605981233
I0219 16:22:19.623064 25912 SessionManagerProcessor.cpp:131] resp list session: 1645111249910197
I0219 16:22:19.623071 25908 SessionManagerProcessor.cpp:131] resp list session: 1645149749199682
I0219 16:22:19.623080 25915 SessionManagerProcessor.cpp:131] resp list session: 1645108418328815
I0219 16:22:19.623219 25906 SessionManagerProcessor.cpp:131] resp list session: 1639037722838839
I0219 16:22:19.623229 25909 SessionManagerProcessor.cpp:131] resp list session: 1645090734547780
I0219 16:22:19.623239 25905 SessionManagerProcessor.cpp:131] resp list session: 1645156846994994
I0219 16:22:19.623250 25907 SessionManagerProcessor.cpp:131] resp list session: 1645186241565751
I0219 16:22:19.623034 25913 SessionManagerProcessor.cpp:131] resp list session: 1645096393388470
I0219 16:22:19.623044 25910 SessionManagerProcessor.cpp:131] resp list session: 1645084604167844
I0219 16:22:19.623266 25911 SessionManagerProcessor.cpp:131] resp list session: 1645094944635556
I0219 16:22:19.623276 25914 SessionManagerProcessor.cpp:131] resp list session: 1645177409463011
I0219 16:22:19.623282 25904 SessionManagerProcessor.cpp:131] resp list session: 1645113055511601
I0219 16:22:19.623288 25912 SessionManagerProcessor.cpp:131] resp list session: 1645085548853685
I0219 16:22:19.623296 25908 SessionManagerProcessor.cpp:131] resp list session: 1645145923994434
I0219 16:22:19.623306 25915 SessionManagerProcessor.cpp:131] resp list session: 1645099845171439
I0219 16:22:19.623311 25906 SessionManagerProcessor.cpp:131] resp list session: 1645119245797175
I0219 16:22:19.623317 25909 SessionManagerProcessor.cpp:131] resp list session: 1641889860035396
I0219 16:22:19.623323 25905 SessionManagerProcessor.cpp:131] resp list session: 1645104451880498
I0219 16:22:19.623330 25907 SessionManagerProcessor.cpp:131] resp list session: 1645170957718583
I0219 16:22:19.623335 25913 SessionManagerProcessor.cpp:131] resp list session: 1645113238827446
I0219 16:22:19.623344 25910 SessionManagerProcessor.cpp:131] resp list session: 1645153324234404
I0219 16:22:19.623349 25911 SessionManagerProcessor.cpp:131] resp list session: 1645174602988196
I0219 16:22:19.623356 25914 SessionManagerProcessor.cpp:131] resp list session: 1641311586706147
I0219 16:22:19.623363 25904 SessionManagerProcessor.cpp:131] resp list session: 1644918725477425
I0219 16:22:19.623368 25912 SessionManagerProcessor.cpp:131] resp list session: 1645165811251637
I0219 16:22:19.623375 25908 SessionManagerProcessor.cpp:131] resp list session: 1642498177581890
I0219 16:22:19.623381 25915 SessionManagerProcessor.cpp:131] resp list session: 1645098520492271
I0219 16:22:19.623386 25906 SessionManagerProcessor.cpp:131] resp list session: 1645141609891639
I0219 16:22:19.623392 25909 SessionManagerProcessor.cpp:131] resp list session: 1645084863112004
I0219 16:22:19.623399 25905 SessionManagerProcessor.cpp:131] resp list session: 1645116430878258
I0219 16:22:19.623405 25907 SessionManagerProcessor.cpp:131] resp list session: 1645089772770359
I0219 16:22:19.623411 25913 SessionManagerProcessor.cpp:131] resp list session: 1645154410181046
I0219 16:22:19.623417 25910 SessionManagerProcessor.cpp:131] resp list session: 1645124350178980
I0219 16:22:19.623423 25911 SessionManagerProcessor.cpp:131] resp list session: 1645089425259172
I0219 16:22:19.623430 25914 SessionManagerProcessor.cpp:131] resp list session: 1645100335260387
I0219 16:22:19.623436 25904 SessionManagerProcessor.cpp:131] resp list session: 1645093594465329
I0219 16:22:19.623442 25912 SessionManagerProcessor.cpp:131] resp list session: 1645096404826549
I0219 16:22:19.623450 25908 SessionManagerProcessor.cpp:131] resp list session: 1645117218636610
219 16:59:12.436491 25912 SessionManagerProcessor.cpp:131] resp list session: 1645148060845784
I0219 16:59:12.436496 25913 SessionManagerProcessor.cpp:131] resp list session: 1645090815046866
I0219 16:59:12.436502 25908 SessionManagerProcessor.cpp:131] resp list session: 1645119606519974
I0219 16:59:12.436511 25907 SessionManagerProcessor.cpp:131] resp list session: 1645139276281759
I0219 16:59:12.436517 25909 SessionManagerProcessor.cpp:131] resp list session: 1645101953281707
I0219 16:59:12.436522 25915 SessionManagerProcessor.cpp:131] resp list session: 1645107632198670
I0219 16:59:12.436529 25905 SessionManagerProcessor.cpp:131] resp list session: 1645156872040811
I0219 16:59:12.436535 25903 SessionManagerProcessor.cpp:131] resp list session: 1640574639094085
I0219 16:59:12.436542 25911 SessionManagerProcessor.cpp:131] resp list session: 1645084128541681
I0219 16:59:12.436547 25900 SessionManagerProcessor.cpp:131] resp list session: 1645091543042570
I0219 16:59:12.436553 25914 SessionManagerProcessor.cpp:131] resp list session: 1638930265437206
I0219 16:59:12.436560 25902 SessionManagerProcessor.cpp:131] resp list session: 1645171079530297
I0219 16:59:12.436568 25906 SessionManagerProcessor.cpp:131] resp list session: 1645154098420340
I0219 16:59:12.436573 25904 SessionManagerProcessor.cpp:131] resp list session: 1645111350619771
I0219 16:59:12.436580 25901 SessionManagerProcessor.cpp:131] resp list session: 1645118056401221
I0219 16:59:12.436585 25912 SessionManagerProcessor.cpp:131] resp list session: 1645170224858840
I0219 16:59:12.436591 25913 SessionManagerProcessor.cpp:131] resp list session: 1645129487643858
I0219 16:59:12.436597 25908 SessionManagerProcessor.cpp:131] resp list session: 1645156466129062

节点3 请查看附件

发出来才发现乱了,节点3的是附件,节点2的是文本

你们session咋用的 不会是一个query一个吧? 我感觉吧 就是那个guest a用的方式不对……
日志要是大 你可以把这行日志"grep -v"过滤掉
“Receive heartbeat from” 然后看看附近有没有啥报错

流量要是能停 可以都停掉 先把服务恢复正常再说

a.log (144.4 KB)

现在流量是已经停了。所有服务停下了,所以这个session 就挺奇怪的,没排查到来自哪边。
这个是上午时间点的。对应时间找不到了。
我重新操作下,停掉所有meta 然后重启,把对应的log 拿上来

session 可能是一个query 从pool 里面get session一个然后release.

我9:06 重新启动了meta服务,日志如下:
节点2 有点奇怪,9:05 还产生了log 有一个1g 的日志和上面一样,session 管理的东西,没有heartbeat 信息。其他这个时间的log 全部在这里
nebula-metad.nebula-server-prod-01.root.log.ERROR.20220219-114154.22619 (2.0 KB)
nebula-metad.nebula-server-prod-01.root.log.INFO.20220219-114143.22619 (17.6 KB)
nebula-metad.nebula-server-prod-01.root.log.INFO.20220219-210606.21456 (10.3 KB)
nebula-metad.nebula-server-prod-01.root.log.WARNING.20220219-114154.22619 (2.3 KB)
nebula-metad.nebula-server-prod-02.root.log.ERROR.20220219-210540.25800 (450 字节)
nebula-metad.nebula-server-prod-02.root.log.ERROR.20220219-210633.25586 (491 字节)
nebula-metad.nebula-server-prod-02.root.log.INFO.20220219-210632.25586 (4.4 KB)
nebula-metad.nebula-server-prod-02.root.log.WARNING.20220219-210518.25800 (31.7 KB)
nebula-metad.nebula-server-prod-02.root.log.WARNING.20220219-210632.25586 (586 字节)
nebula-metad.nebula-server-prod-03.root.log.ERROR.20220219-210632.6303 (305 字节)
nebula-metad.nebula-server-prod-03.root.log.INFO.20220219-210621.6303 (5.7 KB)
nebula-metad.nebula-server-prod-03.root.log.ERROR.20220219-210632.6303 (305 字节)

确认了下发现有一个增量 自动重启了,没有停成功,现在全部停了

1 个赞

hello 在昨天晚上启动metad log 的基础上,我刚刚9:45 左右启动了graphd storaged 服务。
这次是节点3 有resp list session 这个问题。我过滤这个信息后,三个节点在metad 选主成功后 启动graphd storaged 的log 如下:
nebula-storaged.nebula-server-prod-01.root.log.ERROR.20220220-094100.29775 (999 字节)
nebula-storaged.nebula-server-prod-01.root.log.INFO.20220220-093857.29775 (1.6 KB)
nebula-storaged.nebula-server-prod-01.root.log.ERROR.20220220-094100.29775 (999 字节)
nebula-storaged.nebula-server-prod-02.root.log.ERROR.20220220-094215.2934 (900 字节)
nebula-storaged.nebula-server-prod-02.root.log.INFO.20220220-093911.2934 (1.3 KB)
nebula-storaged.nebula-server-prod-02.root.log.ERROR.20220220-094215.2934 (900 字节)
nebula-storaged.nebula-server-prod-03.root.log.ERROR.20220220-094224.16420 (1.1 KB)
nebula-storaged.nebula-server-prod-03.root.log.INFO.20220220-093920.16420 (1.7 KB)
nebula-storaged.nebula-server-prod-03.root.log.ERROR.20220220-094224.16420 (1.1 KB)
prod-03-metad.log (6.3 KB)
nebula-graphd.nebula-server-prod-01.root.log.INFO.20220220-093857.29757 (1.3 KB)
nebula-graphd.nebula-server-prod-01.root.log.WARNING.20220220-094101.29757 (685 字节)
nebula-graphd.nebula-server-prod-02.root.log.ERROR.20220220-094115.2916 (507 字节)
nebula-graphd.nebula-server-prod-02.root.log.INFO.20220220-093911.2916 (1.1 KB)
nebula-graphd.nebula-server-prod-02.root.log.ERROR.20220220-094115.2916 (507 字节)
nebula-graphd.nebula-server-prod-03.root.log.ERROR.20220220-094223.16398 (678 字节)
nebula-graphd.nebula-server-prod-03.root.log.INFO.20220220-093920.16398 (1.3 KB)
nebula-graphd.nebula-server-prod-03.root.log.ERROR.20220220-094223.16398 (678 字节)
nebula-metad.nebula-server-prod-01.root.log.ERROR.20220220-093911.21456 (1.1 KB)
nebula-metad.nebula-server-prod-01.root.log.INFO.20220219-210606.21456 (12.4 KB)
nebula-metad.nebula-server-prod-01.root.log.WARNING.20220220-093911.21456 (1.3 KB)
nebula-metad.nebula-server-prod-02.root.log.ERROR.20220219-210633.25586 (1.2 KB)
nebula-metad.nebula-server-prod-02.root.log.INFO.20220219-210632.25586 (5.9 KB)
nebula-metad.nebula-server-prod-02.root.log.WARNING.20220219-210632.25586 (1.5 KB)