nebula 2.0.0-rc1 storage 循环重启

  • nebula版本:2.0.0-rc1
  • 部署方式 k8s

storaged 不停的重启

[root@p47022v log]# kubectl get pods |grep nebula
nebula-graphd-7559878d8b-cw542            1/1     Running            0          13d
nebula-graphd-7559878d8b-fglzh            1/1     Running            0          13d
nebula-graphd-7559878d8b-l9899            1/1     Running            0          13d
nebula-metad-0                            1/1     Running            0          13d
nebula-metad-1                            1/1     Running            0          13d
nebula-metad-2                            1/1     Running            0          13d
nebula-storaged-0                         0/1     CrashLoopBackOff   1196       13d
nebula-storaged-1                         0/1     CrashLoopBackOff   2081       13d
nebula-storaged-2                         0/1     CrashLoopBackOff   2005       13d
Log file created at: 2021/01/14 12:29:56
Running on machine: p47022v.hulk.shbt.qihoo.net
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0114 12:29:56.662881     1 FileUtils.cpp:384] Failed to read the directory "data/storage/nebula" (2): No such file or directory
E0115 01:13:56.055048    55 MetaClient.cpp:121] Heartbeat failed, status:LeaderChanged: Leader changed!
E0115 01:14:56.785461    55 MetaClient.cpp:121] Heartbeat failed, status:LeaderChanged: Leader changed!
E0121 23:15:00.626580    55 MetaClient.cpp:121] Heartbeat failed, status:LeaderChanged: Leader changed!

麻烦您kubectl describe pod nebula-storaged-0 贴出来看一下

Name:           nebula-storaged-0
Namespace:      default
Priority:       0
Node:           p51341v.hulk.shbt.qihoo.net/10.217.63.168
Start Time:     Tue, 23 Feb 2021 16:50:05 +0800
Labels:         app=nebula-storaged
                controller-revision-hash=nebula-storaged-576d9cd859
                statefulset.kubernetes.io/pod-name=nebula-storaged-0
Annotations:    <none>
Status:         Running
IP:             10.217.63.168
Controlled By:  StatefulSet/nebula-storaged
Containers:
  nebula-storaged:
    Container ID:  docker://05d145a181151f14fcbe352956dbed32d336504c12f9e509b1f84331bc55b528
    Image:         vesoft/nebula-storaged:v2.0.0-rc1
    Image ID:      docker-pullable://vesoft/nebula-storaged@sha256:49d7d4a7d05294fc29b27937f9e8b057a2f0c13a7c1fed19840a9ad4165a6f0f
    Ports:         45600/TCP, 45601/TCP, 12000/TCP, 12002/TCP
    Host Ports:    45600/TCP, 45601/TCP, 12000/TCP, 12002/TCP
    Command:
      /bin/bash
      -ecx
      exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --local_ip=$(hostname -i) --v=0 --minloglevel=2 --daemonize=false
    State:          Running
      Started:      Tue, 23 Feb 2021 16:50:58 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi
    Liveness:  http-get http://:12000/status delay=30s timeout=5s period=10s #success=1 #failure=3
    Environment:
      USER:  root
    Mounts:
      /etc/localtime from timezone (rw)
      /usr/local/nebula/data from data (rw)
      /usr/local/nebula/etc/ from config (rw)
      /usr/local/nebula/logs from log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-szpfj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-nebula-storaged-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebula-storaged
    Optional:  false
  timezone:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/localtime
    HostPathType:  
  log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  
  default-token-szpfj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-szpfj
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  nebula=yes
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From                                  Message
  ----     ------            ----               ----                                  -------
  Warning  FailedScheduling  14m (x2 over 14m)  default-scheduler                     pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled         14m                default-scheduler                     Successfully assigned default/nebula-storaged-0 to p51341v.hulk.shbt.qihoo.net
  Normal   Pulling           14m                kubelet, p51341v.hulk.shbt.qihoo.net  Pulling image "vesoft/nebula-storaged:v2.0.0-rc1"
  Normal   Pulled            13m                kubelet, p51341v.hulk.shbt.qihoo.net  Successfully pulled image "vesoft/nebula-storaged:v2.0.0-rc1"
  Normal   Created           13m                kubelet, p51341v.hulk.shbt.qihoo.net  Created container nebula-storaged
  Normal   Started           13m                kubelet, p51341v.hulk.shbt.qihoo.net  Started container nebula-storaged

storage的log也贴一下

[root@p51339v nebula]# kubectl logs nebula-storaged-0
++ hostname -i
+ exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --local_ip=10.217.63.168 --v=0 --minloglevel=2 --daemonize=false
E0223 16:51:01.502826    49 MetaClient.cpp:581] Send request to [10.217.63.69:44500], exceed retry limit
E0223 16:51:01.503618     1 MetaClient.cpp:60] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0223 16:51:06.532591    50 MetaClient.cpp:581] Send request to [10.217.63.172:44500], exceed retry limit
E0223 16:51:06.532765     1 MetaClient.cpp:60] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0223 16:51:11.567957    51 MetaClient.cpp:581] Send request to [10.217.63.69:44500], exceed retry limit
E0223 16:51:11.568163     1 MetaClient.cpp:60] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0223 16:51:14.659925     1 FileUtils.cpp:384] Failed to read the directory "data/storage/nebula" (2): No such file or directory

metad也有错误

[root@p51339v nebula]# kubectl logs nebula-metad-0
++ hostname -i
+ exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --local_ip=10.217.63.69 --v=0 --minloglevel=2 --daemonize=false
E0223 16:50:58.922847     1 FileUtils.cpp:384] Failed to read the directory "data/meta/nebula" (2): No such file or directory
E0223 16:51:13.053920   133 RaftPart.cpp:367] [Port: 44501, Space: 0, Part: 0] The partition is not a leader
E0223 16:51:13.054005   133 RaftPart.cpp:687] [Port: 44501, Space: 0, Part: 0] Cannot append logs, clean the buffer
E0223 17:11:58.104730    54 GeneratedCodeHelper.cpp:73] Method name listSpaces not found in function process
E0223 17:13:33.483300    55 GeneratedCodeHelper.cpp:73] Method name listSpaces not found in function process

你可以使用新版本的helm charts在k8s平台部署nebula 2.0
2.0支持组件内部基于域名地址通信,额外需要注意使用的storageClass,要根据你使用的环境来修改