helm安装nebula后,metad和storaged pod反复重启

  • nebula 版本:
    nebula-docker-compose-3.3.0
    image:
    graphd:
    repository: vesoft/nebula-graphd
    tag: v2.0.0
    pullPolicy: IfNotPresent
    metad:
    repository: vesoft/nebula-metad
    tag: v2.0.0
    pullPolicy: IfNotPresent
    storaged:
    repository: vesoft/nebula-storaged
    tag: v2.0.0
    pullPolicy: IfNotPresent

  • 部署方式:kubernetes v1.23.5

  • 安装方式:helm chart

  • 是否上生产环境:N

  • 硬件信息
    4个节点,其中一个master,其它worker节点,都是8C16G的虚拟机
    pv使用sig-storage-local-static-provisioner-2.4.0

  • 问题的具体描述
    metad和storaged pod反复重启,无法使用

[root@n08k8s01 ~]# kubectl get pv | grep nebula
local-pv-443a2aca                          9951Mi     RWO            Delete           Bound       default/metad-nebula-metad-1         fast-disks                   12h
local-pv-8ac7063                           9951Mi     RWO            Delete           Bound       default/storaged-nebula-storaged-1   fast-disks                   12h
local-pv-8f1f9727                          9951Mi     RWO            Delete           Bound       default/metad-nebula-metad-0         fast-disks                   65m
local-pv-98198883                          9951Mi     RWO            Delete           Bound       default/storaged-nebula-storaged-0   fast-disks                   64m
local-pv-f6c6e3ae                          9951Mi     RWO            Delete           Bound       default/storaged-nebula-storaged-2   fast-disks                   12h
[root@n08k8s01 ~]# kubectl get pods -o wide | grep nebula
nebula-graphd-b4dfd484f-gp6rz             1/1     Running                  0                12h     10.244.3.5     n08k8s04   <none>           <none>
nebula-graphd-b4dfd484f-nt2b9             1/1     Running                  0                12h     10.244.1.156   n08k8s02   <none>           <none>
nebula-metad-0                            0/1     CrashLoopBackOff         17 (103s ago)    12h     10.244.3.6     n08k8s04   <none>           <none>
nebula-metad-1                            0/1     CrashLoopBackOff         17 (3m23s ago)   12h     10.244.2.234   n08k8s03   <none>           <none>
nebula-metad-2                            0/1     Pending                  0                12h     <none>         <none>     <none>           <none>
nebula-storaged-0                         0/1     CrashLoopBackOff         19 (3m46s ago)   12h     10.244.3.7     n08k8s04   <none>           <none>
nebula-storaged-1                         1/1     Running                  19 (81s ago)     12h     10.244.1.157   n08k8s02   <none>           <none>
nebula-storaged-2                         0/1     CrashLoopBackOff         19 (73s ago)     12h     10.244.2.235   n08k8s03   <none>           <none>
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
[root@n08k8s01 ~]# kubectl logs nebula-metad-0 --previous
++ hostname
+ exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --daemonize=false --local_ip=nebula-metad-0.nebula-metad.default.svc.cluster.local
[root@n08k8s01 ~]# kubectl logs nebula-metad-1 --previous
++ hostname
+ exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --daemonize=false --local_ip=nebula-metad-1.nebula-metad.default.svc.cluster.local
[root@n08k8s01 ~]# kubectl logs nebula-metad-2 --previous
[root@n08k8s01 ~]# kubectl logs nebula-storaged-0 --previous
++ hostname
+ exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --daemonize=false --local_ip=nebula-storaged-0.nebula-storaged.default.svc.cluster.local
[root@n08k8s01 ~]# kubectl logs nebula-storaged-1
++ hostname
+ exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --daemonize=false --local_ip=nebula-storaged-1.nebula-storaged.default.svc.cluster.local
[root@n08k8s01 ~]# kubectl describe pod nebula-metad-0
  Warning  Unhealthy  6m31s (x127 over 163m)  kubelet  Liveness probe failed: Get "http://10.244.3.6:19559/status": dial tcp 10.244.3.6:19559: connect: connection refused
  Warning  BackOff    102s (x490 over 153m)   kubelet  Back-off restarting failed container

[root@n08k8s01 ~]# kubectl describe pod nebula-storaged-0
  Warning  Unhealthy  12m (x139 over 167m)    kubelet  Liveness probe failed: Get "http://10.244.3.7:19779/status": dial tcp 10.244.3.7:19779: connect: connection refused
  Warning  BackOff    2m43s (x543 over 161m)  kubelet  Back-off restarting failed container

您好,请问你用的是自己定义的helm吗,可以确认下每个服务的local_ip和meta_server_addrs配置的是否正确。在k8s环境,我们推荐使用nebula-operator进行部署 什么是 NebulaGraph Operator - NebulaGraph Database 手册

1 个赞

不是自己定义的helm,用的是 https://codeload.github.com/vesoft-inc/nebula-docker-compose/zip/refs/tags/v3.3.0 下载的helm包

看到了,这个helm比较老了,用的是statefulset来部署服务,并且不支持3.x的nebula。推荐您使用最新版的operator部署nebula,可以参考这个文档 部署 NebulaGraph Operator - NebulaGraph Database 手册

好的,多谢支持,我试试

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。