- graphd一直提示如下
Readiness probe failed: Get http://172.20.2.92:19669/status: dial tcp 172.20.2.92:19669: connect: connection refused
- 我降初始探测时间设置成200秒也是如此
- 随后将就绪探针去掉,pod显示正常了 但是进如pod后相关端口都没有占用
Readiness probe failed: Get http://172.20.2.92:19669/status: dial tcp 172.20.2.92:19669: connect: connection refused
可以吧你的部署过程贴下吗?参考哪个教程来着
参考此文档:nebula-operator/nebula_cluster_helm_guide.md at master · vesoft-inc/nebula-operator · GitHub
没有做其他变动
你好,可以贴一下你的 nebula-cluster yaml配置吗,一般来说,只要graphd进程正常启动,探针都是会过的
helm install nebula nebula-operator/nebula-cluster
–namespace nebula --create-namespace --version 0.1.0
–set nameOverride=nebula
–set nebula.storageClassName=managed-nfs-storage \
直接这种部署的
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
nebula-graph.io/last-applied-configuration: '{"podManagementPolicy":"Parallel","replicas":2,"selector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"serviceName":"nebula-graphd-svc","template":{"metadata":{"annotations":{"nebula-graph.io/cm-hash":"94ea457be88fae25"},"creationTimestamp":null,"labels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"spec":{"containers":[{"command":["/bin/bash","-ecx","exec
/usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-1.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-2.nebula-metad-headless.nebula.svc.cluster.local:9559
--local_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local
--minloglevel=1 --v=0 --daemonize=false"],"image":"vesoft/nebula-graphd:v2.0.0","imagePullPolicy":"IfNotPresent","name":"graphd","ports":[{"containerPort":9669,"name":"thrift"},{"containerPort":19669,"name":"http"},{"containerPort":19670,"name":"http2"}],"readinessProbe":{"httpGet":{"path":"/status","port":19669,"scheme":"HTTP"},"initialDelaySeconds":20,"periodSeconds":10,"timeoutSeconds":5},"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"500m","memory":"500Mi"}},"volumeMounts":[{"mountPath":"/usr/local/nebula/logs","name":"graphd","subPath":"logs"},{"mountPath":"/usr/local/nebula/etc","name":"nebula-graphd"}]}],"schedulerName":"default-scheduler","topologySpreadConstraints":[{"labelSelector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"maxSkew":1,"topologyKey":"kubernetes.io/hostname","whenUnsatisfiable":"ScheduleAnyway"}],"volumes":[{"name":"graphd","persistentVolumeClaim":{"claimName":"graphd"}},{"configMap":{"items":[{"key":"nebula-graphd.conf","path":"nebula-graphd.conf"}],"name":"nebula-graphd"},"name":"nebula-graphd"}]}},"updateStrategy":{"rollingUpdate":{"partition":2},"type":"RollingUpdate"},"volumeClaimTemplates":[{"metadata":{"creationTimestamp":null,"name":"graphd"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":"managed-nfs-storage"},"status":{}}]}'
labels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: graphd
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
name: nebula-graphd
namespace: nebula
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: graphd
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
serviceName: nebula-graphd-svc
template:
metadata:
annotations:
nebula-graph.io/cm-hash: 94ea457be88fae25
creationTimestamp: null
labels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: graphd
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
spec:
containers:
- command:
- /bin/bash
- -ecx
- exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-1.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-2.nebula-metad-headless.nebula.svc.cluster.local:9559
--local_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local
--minloglevel=1 --v=0 --daemonize=false
image: vesoft/nebula-graphd:v2.0.0
imagePullPolicy: IfNotPresent
name: graphd
ports:
- containerPort: 9669
name: thrift
protocol: TCP
- containerPort: 19669
name: http
protocol: TCP
- containerPort: 19670
name: http2
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19669
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/logs
name: graphd
subPath: logs
- mountPath: /usr/local/nebula/etc
name: nebula-graphd
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: graphd
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
volumes:
- name: graphd
persistentVolumeClaim:
claimName: graphd
- configMap:
defaultMode: 420
items:
- key: nebula-graphd.conf
path: nebula-graphd.conf
name: nebula-graphd
name: nebula-graphd
updateStrategy:
rollingUpdate:
partition: 2
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: graphd
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: managed-nfs-storage
volumeMode: Filesystem
status:
phase: Pending
这是-o yaml出来的 其他两个组件都起来了 就这个一直就绪探针报错
可以执行kubectl describe出问题的pod吗,应该不是yaml配置的问题
[root@k8s-77-64 etc]# kubectl describe pods -n nebula nebula-graphd-0
Name: nebula-graphd-0
Namespace: nebula
Priority: 0
Node: 172.16.77.189/172.16.77.189
Start Time: Thu, 15 Jul 2021 15:50:01 +0800
Labels: app.kubernetes.io/cluster=nebula
app.kubernetes.io/component=graphd
app.kubernetes.io/managed-by=nebula-operator
app.kubernetes.io/name=nebula-graph
controller-revision-hash=nebula-graphd-ffddc8f75
statefulset.kubernetes.io/pod-name=nebula-graphd-0
Annotations: nebula-graph.io/cm-hash: 94ea457be88fae25
Status: Running
IP: 172.20.4.100
IPs:
IP: 172.20.4.100
Controlled By: StatefulSet/nebula-graphd
Containers:
graphd:
Container ID: docker://53a5b9d1474c1790f2b85557d28efd40223ea58a15f07be4f8d9ea536b7cf849
Image: vesoft/nebula-graphd:v2.0.0
Image ID: docker-pullable://vesoft/nebula-graphd@sha256:9033aa72f0ec1d8c0a7aaf3dc3db6b9089dcfde7257487f2ba9f4dacda135f52
Ports: 9669/TCP, 19669/TCP, 19670/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/bin/bash
-ecx
exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-1.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-2.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local --minloglevel=1 --v=0 --daemonize=false
State: Running
Started: Thu, 15 Jul 2021 15:50:02 +0800
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 500m
memory: 500Mi
Readiness: http-get http://:19669/status delay=20s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/usr/local/nebula/etc from nebula-graphd (rw)
/usr/local/nebula/logs from graphd (rw,path="logs")
/var/run/secrets/kubernetes.io/serviceaccount from default-token-r6pqt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
graphd:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: graphd-nebula-graphd-0
ReadOnly: false
nebula-graphd:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nebula-graphd
Optional: false
default-token-r6pqt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-r6pqt
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 42m default-scheduler running "VolumeBinding" filter plugin for pod "nebula-graphd-0": pod has unbound immediate PersistentVolumeClaims
Warning FailedScheduling 42m default-scheduler running "VolumeBinding" filter plugin for pod "nebula-graphd-0": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 42m default-scheduler Successfully assigned nebula/nebula-graphd-0 to 172.16.77.189
Normal Pulled 42m kubelet Container image "vesoft/nebula-graphd:v2.0.0" already present on machine
Normal Created 42m kubelet Created container graphd
Normal Started 42m kubelet Started container graphd
Warning Unhealthy 2m6s (x239 over 41m) kubelet Readiness probe failed: Get http://172.20.4.100:19669/status: dial tcp 172.20.4.100:19669: connect: connection refused
现在比较确定是graphd进程启动的问题,你可以用kubectl exec进入容器,查看logs文件夹,graphd启动时的日志保存在那里
terminate called after throwing an instance of 'std::system_error'
what(): Failed to resolve address for 'nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
*** Aborted at 1626335403 (unix time) try "date -d @1626335403" if you are using GNU date ***
PC: @ 0x7f5c679e5387 __GI_raise
*** SIGABRT (@0x1) received by PID 1 (TID 0x7f5c688d28c0) from PID 1; stack trace: ***
@ 0x1e5f9c1 (unknown)
@ 0x7f5c67d8c62f (unknown)
@ 0x7f5c679e5387 __GI_raise
@ 0x7f5c679e6a77 __GI_abort
@ 0x107f647 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x2219b85 __cxxabiv1::__terminate()
@ 0x2219bd0 std::terminate()
@ 0x2219d03 __cxa_throw
@ 0x1063e8b (unknown)
@ 0x1d12292 folly::SocketAddress::getAddrInfo()
@ 0x1d122b3 folly::SocketAddress::setFromHostPort()
@ 0x19fe77e nebula::WebService::start()
@ 0x1080872 main
@ 0x7f5c679d1554 __libc_start_main
@ 0x1096b4d (unknown)
这些是日志,无法解析内部域名
我大概知道是哪里的原因了,我这个集群的dns 设置的是search nebula.svc.cluster.local. svc.cluster.local. cluster.local. 后面多个. 但nebula-graph默认解析的的是不带.的。应该是这种情况导致。这种情况有什么办法提前设置吗
你好,pod域名这块operator目前是拼接的,只能对外部dns修改。我认为有可能是其他网络原因,因为storaged和metad内部也通过域名通信,如果graphd被dns限制的话,另外两个组件应该也会被限制。你可以使用 kubectl get ep nebula-graphd-svc -n nebula 看一看端点数量是否一致
spec:
containers:
- command:
- /bin/bash
- -ecx
- exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local.:9559,nebula-metad-1.nebula-metad-headless.nebula.svc.cluster.local.:9559,nebula-metad-2.nebula-metad-headless.nebula.svc.cluster.local.:9559
--local_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local. --ws_ip=$(hostname).nebula-graphd-svc.nebula.svc.cluster.local.
--minloglevel=1 --v=0 --daemonize=false
image: vesoft/nebula-graphd:v2.0.0
我将dns改成上面的带.的形式,然后就可以启动了 我打算把这几个组件都改一下 。有空用个默认dns的集群搭建再试试
该话题在最后一个回复创建后30天后自动关闭。不再允许新的回复。