K8s 部署后无法正常使用connect: connection refused

  • nebula 版本: 2.0
  • 部署方式:k8s
  • 安装方式:https://mp.weixin.qq.com/s/LUv3AM4a2jV1Z9lRTZK2oQ
  • 问题的具体描述
    搭建正常,pod正常,无法使用报错connection refused
  • 相关的 meta / storage / graph info 日志信息
    客户端报错:
[root@p47022v nebula]# docker run --rm -ti --entrypoint=/bin/sh vesoft/nebula-console:v2-nightly
/ # nebula-console -addr 10.217.61.189 -port 31559 -u root -p vesoft
2021/03/03 09:26:16 [INFO] connection pool is initialized successfully
2021/03/03 09:26:16 Fail to create a new session from connection pool, Failed to open transport, error: dial tcp 10.217.61.189:31559: connect: connection refused
panic: Fail to create a new session from connection pool, Failed to open transport, error: dial tcp 10.217.61.189:31559: connect: connection refused

goroutine 1 [running]:
log.Panicf(0x7ad9da, 0x35, 0xc0000c1eb8, 0x1, 0x1)
        /usr/local/go/src/log/log.go:345 +0xc0
main.main()
        /usr/src/main.go:337 +0x89a

各个pod的状态

[root@p47022v nebula]# kubectl get pods|grep nebula
nebula-graphd-6c54f8666f-6qcs8            1/1     Running            0          4m57s
nebula-graphd-6c54f8666f-jz56m            1/1     Running            0          4m57s
nebula-graphd-6c54f8666f-p2jpd            1/1     Running            0          4m57s
nebula-metad-0                            1/1     Running            0          4m57s
nebula-metad-1                            1/1     Running            0          4m57s
nebula-metad-2                            1/1     Running            0          4m57s
nebula-storaged-0                         1/1     Running            0          4m57s
nebula-storaged-1                         1/1     Running            0          4m56s
nebula-storaged-2                         1/1     Running            0          4m56s

server

[root@p47022v nebula]# kubectl get service |grep nebula
nebula-graphd        NodePort    10.104.237.76    <none>        9669:31559/TCP,19669:32199/TCP,19670:30573/TCP                5m32s
nebula-metad         ClusterIP   None             <none>        9559/TCP,19559/TCP,19560/TCP                                  5m32s
nebula-storaged      ClusterIP   None             <none>        9779/TCP,19779/TCP,19780/TCP                                  5m32s

日志:

[root@p47022v templates]# kubectl logs nebula-graphd-6c54f8666f-6qcs8
+ exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf --v=4 --minloglevel=0 --daemonize=false
[root@p47022v templates]# kubectl logs nebula-metad-0
++ hostname
+ exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --v=4 --minloglevel=0 --daemonize=false --local_ip=nebula-metad-0.nebula-metad.default.svc.cluster.local
[root@p47022v templates]# kubectl logs nebula-storaged-0
++ hostname
+ exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --v=4 --minloglevel=0 --daemonize=false --local_ip=nebula-storaged-0.nebula-storaged.default.svc.cluster.local

pod 详情:

graphd

[root@p47022v templates]# kubectl describe pod nebula-graphd-6c54f8666f-6qcs8
Name:           nebula-graphd-6c54f8666f-6qcs8
Namespace:      default
Priority:       0
Node:           p47022v.hulk.shbt.qihoo.net/10.217.61.189
Start Time:     Wed, 03 Mar 2021 17:24:12 +0800
Labels:         app.kubernetes.io/component=nebula-graphd
                pod-template-hash=6c54f8666f
Annotations:    <none>
Status:         Running
IP:             10.244.0.83
Controlled By:  ReplicaSet/nebula-graphd-6c54f8666f
Containers:
  nebula-graphd:
    Container ID:  docker://8e3915dbac767744d0859095a4a80b05f5d7a55aced5f628911fc6f06ec1ba18
    Image:         vesoft/nebula-graphd:v2-nightly
    Image ID:      docker-pullable://vesoft/nebula-graphd@sha256:d894ea7192d022cb5fe7849a0dfcbb1405c5b55ab371a500c7c92bf793ec86cc
    Ports:         9669/TCP, 19669/TCP, 19670/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/bash
      -ecx
    Args:
      exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf --v=4 --minloglevel=0 --daemonize=false
    State:          Running
      Started:      Wed, 03 Mar 2021 17:24:14 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi
    Liveness:  http-get http://:19669/status delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:
      USER:  root
    Mounts:
      /etc/localtime from timezone (rw)
      /usr/local/nebula/etc/ from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from nebula-token-h49fh (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  timezone:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/localtime
    HostPathType:  
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebula-graphd
    Optional:  false
  nebula-token-h49fh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nebula-token-h49fh
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  nebula=cloud
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                                  Message
  ----    ------     ----  ----                                  -------
  Normal  Scheduled  32m   default-scheduler                     Successfully assigned default/nebula-graphd-6c54f8666f-6qcs8 to p47022v.hulk.shbt.qihoo.net
  Normal  Pulled     32m   kubelet, p47022v.hulk.shbt.qihoo.net  Container image "vesoft/nebula-graphd:v2-nightly" already present on machine
  Normal  Created    32m   kubelet, p47022v.hulk.shbt.qihoo.net  Created container nebula-graphd
  Normal  Started    32m   kubelet, p47022v.hulk.shbt.qihoo.net  Started container nebula-graphd

metad

[root@p47022v templates]# kubectl describe pods nebula-metad-0
Name:           nebula-metad-0
Namespace:      default
Priority:       0
Node:           p22147v.hulk.shbt.qihoo.net/10.202.252.47
Start Time:     Wed, 03 Mar 2021 17:24:12 +0800
Labels:         app.kubernetes.io/component=nebula-metad
                controller-revision-hash=nebula-metad-6f7c97f459
                statefulset.kubernetes.io/pod-name=nebula-metad-0
Annotations:    <none>
Status:         Running
IP:             10.244.1.44
Controlled By:  StatefulSet/nebula-metad
Containers:
  nebula-metad:
    Container ID:  docker://dfaf898964d86a9baf58692e938f39b8fba8fa8db184c23d107a904e2bb14831
    Image:         vesoft/nebula-metad:v2-nightly
    Image ID:      docker-pullable://vesoft/nebula-metad@sha256:13157b4dfd82bfc18db795237101b24a17942390d79819772b362275213bf49e
    Ports:         9559/TCP, 19559/TCP, 19560/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/bash
      -ecx
    Args:
      exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --v=4 --minloglevel=0 --daemonize=false --local_ip=$(hostname).nebula-metad.default.svc.cluster.local
    State:          Running
      Started:      Wed, 03 Mar 2021 17:24:16 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi
    Liveness:  http-get http://:19559/status delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:
      USER:  root
    Mounts:
      /etc/localtime from timezone (rw)
      /usr/local/nebula/data from metad (rw,path="data")
      /usr/local/nebula/etc/ from config (rw)
      /usr/local/nebula/logs from metad (rw,path="logs")
      /var/run/secrets/kubernetes.io/serviceaccount from nebula-token-h49fh (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  metad:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  metad-nebula-metad-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebula-metad
    Optional:  false
  timezone:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/localtime
    HostPathType:  
  nebula-token-h49fh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nebula-token-h49fh
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  nebula=cloud
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                                  Message
  ----    ------     ----  ----                                  -------
  Normal  Scheduled  33m   default-scheduler                     Successfully assigned default/nebula-metad-0 to p22147v.hulk.shbt.qihoo.net
  Normal  Pulled     33m   kubelet, p22147v.hulk.shbt.qihoo.net  Container image "vesoft/nebula-metad:v2-nightly" already present on machine
  Normal  Created    32m   kubelet, p22147v.hulk.shbt.qihoo.net  Created container nebula-metad
  Normal  Started    32m   kubelet, p22147v.hulk.shbt.qihoo.net  Started container nebula-metad

storaged

[root@p47022v templates]# kubectl describe pods  nebula-storaged-0
Name:           nebula-storaged-0
Namespace:      default
Priority:       0
Node:           p22147v.hulk.shbt.qihoo.net/10.202.252.47
Start Time:     Wed, 03 Mar 2021 17:24:13 +0800
Labels:         app.kubernetes.io/component=nebula-storaged
                controller-revision-hash=nebula-storaged-5dc5976745
                statefulset.kubernetes.io/pod-name=nebula-storaged-0
Annotations:    <none>
Status:         Running
IP:             10.244.1.45
Controlled By:  StatefulSet/nebula-storaged
Containers:
  nebula-storaged:
    Container ID:  docker://630a2f65bce02d3ae1c1a1c6ba2ddb0b8e55808a496b744744b13da95ee7c200
    Image:         vesoft/nebula-storaged:v2-nightly
    Image ID:      docker-pullable://vesoft/nebula-storaged@sha256:9b028d0ebec637e508a07f64756a50906878d04a5949e94ddf0ee2c06b0a9bf8
    Ports:         9779/TCP, 19779/TCP, 19780/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/bash
      -ecx
    Args:
      exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --v=4 --minloglevel=0 --daemonize=false --local_ip=$(hostname).nebula-storaged.default.svc.cluster.local
    State:          Running
      Started:      Wed, 03 Mar 2021 17:24:17 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi
    Liveness:  http-get http://:19779/status delay=30s timeout=10s period=10s #success=1 #failure=3
    Environment:
      USER:  root
    Mounts:
      /etc/localtime from timezone (rw)
      /usr/local/nebula/data from storaged (rw,path="data")
      /usr/local/nebula/etc/ from config (rw)
      /usr/local/nebula/logs from storaged (rw,path="logs")
      /var/run/secrets/kubernetes.io/serviceaccount from nebula-token-h49fh (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  storaged:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storaged-nebula-storaged-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebula-storaged
    Optional:  false
  timezone:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/localtime
    HostPathType:  
  nebula-token-h49fh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nebula-token-h49fh
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  nebula=cloud
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                                  Message
  ----    ------     ----  ----                                  -------
  Normal  Scheduled  34m   default-scheduler                     Successfully assigned default/nebula-storaged-0 to p22147v.hulk.shbt.qihoo.net
  Normal  Pulled     34m   kubelet, p22147v.hulk.shbt.qihoo.net  Container image "vesoft/nebula-storaged:v2-nightly" already present on machine
  Normal  Created    34m   kubelet, p22147v.hulk.shbt.qihoo.net  Created container nebula-storaged
  Normal  Started    33m   kubelet, p22147v.hulk.shbt.qihoo.net  Started container nebula-storaged

@kevin.qiao 帮忙看看这个问题

nebula-console 容器内telnet 10.217.61.189 31559 是否能连接上,需要确认nebula-console启动所在的机器访问nebula 集群节点没有网络限制

1 个赞

这个nebula-console容器里面好像没有 telnet 命令
也没有yum或者apt 命令 :cold_face:

如果你在k8s集群内部,也可以使用graphd service port 访问,

$ kubectl get service nebula-graphd 
NAME            TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)                                          AGE
nebula-graphd   NodePort   10.101.238.113   <none>        9669:30159/TCP,19669:30078/TCP,19670:31913/TCP   12d

# k8s 集群内某个节点启动 nebula-console 容器
/ # nebula-console -addr 10.101.238.113  -port 9669 -u root -p vesoft
2021/03/04 03:37:57 [INFO] connection pool is initialized successfully

Welcome to Nebula Graph v2.0.0-rc1!

(root@nebula) [(none)]> show hosts;
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| Host                                                          | Port | Status   | Leader count | Leader distribution  | Partition distribution |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-0.nebula-storaged.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-1.nebula-storaged.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-2.nebula-storaged.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "Total"                                                       |      |          | 0            |                      |                        |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
Got 4 rows (time spent 3153/4542 us)

Thu, 04 Mar 2021 03:37:59 UTC

在node上使用telnet测试也可以,确定graphd的nodePort是否能连接上

Unknown host

[root@p47022v chenpeng6]# kubectl get pods|grep nebula
nebula-graphd-6c54f8666f-6qcs8            1/1     Running   0          20h
nebula-graphd-6c54f8666f-jz56m            1/1     Running   0          20h
nebula-graphd-6c54f8666f-p2jpd            1/1     Running   0          20h
nebula-metad-0                            1/1     Running   0          20h
nebula-metad-1                            1/1     Running   0          20h
nebula-metad-2                            1/1     Running   0          20h
nebula-storaged-0                         1/1     Running   0          20h
nebula-storaged-1                         1/1     Running   0          20h
nebula-storaged-2                         1/1     Running   0          20h
[root@p47022v chenpeng6]# 
[root@p47022v chenpeng6]# kubectl get service|grep nebula
nebula-graphd        NodePort    10.104.237.76    <none>        9669:31559/TCP,19669:32199/TCP,19670:30573/TCP                20h
nebula-metad         ClusterIP   None             <none>        9559/TCP,19559/TCP,19560/TCP                                  20h
nebula-storaged      ClusterIP   None             <none>        9779/TCP,19779/TCP,19780/TCP                                  20h
[root@p47022v chenpeng6]# telnet 10.104.237.76:31559
telnet: 10.104.237.76:31559: Name or service not known
10.104.237.76:31559: Unknown host

10.104.237.76 是 graphd 的 service ip,对应端口号 9669,如果改用 nodePort 31559,就需要用 node 节点的 ip 10.217.61.189,这两种方式你都测试下
telnet 的语法是 telnet [domainname or ip] [port],不用加:

[root@p47022v nebula]# kubectl get service|grep nebula
nebula-graphd        NodePort    10.104.237.76    <none>        9669:31559/TCP,19669:32199/TCP,19670:30573/TCP                23h
nebula-metad         ClusterIP   None             <none>        9559/TCP,19559/TCP,19560/TCP                                  23h
nebula-storaged      ClusterIP   None             <none>        9779/TCP,19779/TCP,19780/TCP                                  23h
[root@p47022v nebula]# telnet 10.104.237.76 9669
Trying 10.104.237.76...
telnet: connect to address 10.104.237.76: Connection timed out
[root@p47022v nebula]# telnet p47022v.hulk.shbt.qihoo.net 31559
Trying 10.217.61.189...
telnet: connect to address 10.217.61.189: Connection refused

这是k8s 网络相关的问题了,还有个办法就是使用pod启动nebula-console,尝试访问10.104.237.76 9669,nebula集群能启动成功,至少可以确定集群内部使用service地址访问是不受影响的

1 个赞

使用pod启动nebula-console 能使用

[root@p47022v nebulaK8s2]# kubectl get pods |grep nebula
nebula-console                            1/1     Running            0          7s
nebula-graphd-6c54f8666f-6qcs8            1/1     Running            0          42h
nebula-graphd-6c54f8666f-jz56m            1/1     Running            0          42h
nebula-graphd-6c54f8666f-p2jpd            1/1     Running            0          42h
nebula-metad-0                            1/1     Running            0          42h
nebula-metad-1                            1/1     Running            0          42h
nebula-metad-2                            1/1     Running            0          42h
nebula-storaged-0                         1/1     Running            0          42h
nebula-storaged-1                         1/1     Running            0          42h
nebula-storaged-2                         1/1     Running            0          42h
[root@p47022v nebulaK8s2]# kubectl exec -it nebula-console /bin/sh
/ # nebula-console -addr 10.104.237.76 -port 9669 -u root -p xxx
2021/03/05 03:41:42 [INFO] connection pool is initialized successfully

Welcome to Nebula Graph v2.0.0-rc1!

(root@nebula) [(none)]> show hosts
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| Host                                                          | Port | Status   | Leader count | Leader distribution  | Partition distribution |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-0.nebula-storaged.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-1.nebula-storaged.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-2.nebula-storaged.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "Total"                                                       |      |          | 0            |                      |                        |
+---------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
Got 4 rows (time spent 3530/5011 us)

Fri, 05 Mar 2021 03:42:01 UTC

(root@nebula) [(none)]> 
1 个赞

我在外部如何使用呀

使用NodePort是从集群外部访问相对简单的一种方式,需要你根据环境配置解决NodePort不通的问题,另外一种就是使用Ingress,可以使用nginx ingress,配置tcp路由规则转发到graphd pod上,如果你需要,我可以提供的nginx ingress的配置文件

可以的 麻烦提供一下 谢谢

好的,我这里测试后把配置贴出来

考虑到你的环境NodePort访问有问题,因此使用Daemonset+HostNetwork的组合为你提供graphd访问
操作步骤

  1. kubectl label node [NodeName] nginx-ingress=true
  2. kubectl create ns nginx-ingress
  3. kubectl apply -f nginx-ingress-daemonset-hostnetwork.yaml -n nginx-ingress

nginx-ingress-daemonset-hostnetwork.yaml (8.1 KB)

好的谢谢