-
nebula 版本:3.6.0
-
部署方式:k8s yaml手动部署
-
问题描述及相关信息
因为特殊需要,我使用了yaml文件的形式部署storaged
其中storaged的yaml文件如下:
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
name: nebula-storaged-headless
namespace: nebula-latest
spec:
clusterIP: None
ports:
- name: thrift
port: 9779
protocol: TCP
targetPort: 9779
- name: http
port: 19779
protocol: TCP
targetPort: 19779
- name: admin
port: 9778
protocol: TCP
targetPort: 9778
publishNotReadyAddresses: true
selector:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
generation: 1
labels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
name: nebula-storaged
namespace: nebula-latest
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
serviceName: nebula-storaged-headless
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
spec:
containers:
- command:
- /bin/sh
- -ecx
- exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless:9559
--local_ip=$(hostname).nebula-storaged-headless
--ws_ip=$(hostname).nebula-storaged-headless --daemonize=false
image: vesoft/nebula-storaged:v3.6.0
imagePullPolicy: Always
name: storaged
ports:
- containerPort: 9779
name: thrift
protocol: TCP
- containerPort: 19779
name: http
protocol: TCP
- containerPort: 9778
name: admin
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19779
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/data
name: storaged-claim0
subPath: data
- mountPath: /usr/local/nebula/logs
name: storaged-claim1
subPath: logs
- command:
- /bin/sh
- -ecx
- sh /logrotate.sh; exec cron -f
env:
- name: LOGROTATE_ROTATE
value: "5"
- name: LOGROTATE_SIZE
value: 100M
image: vesoft/nebula-agent:latest
imagePullPolicy: Always
name: ng-agent
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/logs
name: storaged-claim1
subPath: logs
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: storaged
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
volumeClaimTemplates:
- metadata:
name: storaged-claim0
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
- metadata:
name: storaged-claim1
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
kubectl apply的结果是pod的健康检查失败:
手动去连接也是失败:
相关的日志:
nebula-storaged.ERROR
nebula-storaged.WARNING
nebula-storaged.INFO
kqzh
2
您好,可以试试operator的yaml + nebulacluster的yaml,operator内部在做了很多工作,比如这个报错就是说服务启动没有执行add hosts
请问是否可以进一步指点下呢,是需要在哪个服务执行add hosts呢
应该不是通过nebula-console来add hosts吧
我是从部署成功的operator,拿出了service和statefulset对应的yaml内容,经过稍微修改后执行的
metad的yaml如下:
apiVersion: v1
kind: Service
metadata:
labels:
app: nebula-metad-headless
name: nebula-metad-headless
namespace: nebula-latest
spec:
clusterIP: None
ports:
- name: thrift
port: 9559
protocol: TCP
targetPort: 9559
- name: http
port: 19559
protocol: TCP
targetPort: 19559
publishNotReadyAddresses: true
selector:
app: nebula-metad-headless
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: nebula-metad-headless
name: nebula-metad
namespace: nebula-latest
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nebula-metad-headless
serviceName: nebula-metad-headless
template:
metadata:
creationTimestamp: null
labels:
app: nebula-metad-headless
spec:
containers:
- command:
- /bin/sh
- -ecx
- exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless:9559
--local_ip=$(hostname).nebula-metad-headless --ws_ip=$(hostname).nebula-metad-headless
--daemonize=false
image: vesoft/nebula-metad:v3.6.0
imagePullPolicy: IfNotPresent
name: metad
ports:
- containerPort: 9559
name: thrift
protocol: TCP
- containerPort: 19559
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19559
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/data
name: metad-claim0
subPath: data
- mountPath: /usr/local/nebula/logs
name: metad-claim1
subPath: logs
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
volumeClaimTemplates:
- metadata:
name: metad-claim0
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
- metadata:
name: metad-claim1
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
storaged的yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: nebula-storaged-headless
name: nebula-storaged-headless
namespace: nebula-latest
spec:
clusterIP: None
ports:
- name: storaged-thrift
port: 9779
protocol: TCP
targetPort: 9779
- name: storaged-http
port: 19779
protocol: TCP
targetPort: 19779
publishNotReadyAddresses: true
selector:
app: nebula-metad-headless
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: nebula-storaged-headless
name: nebula-storaged
namespace: nebula-latest
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nebula-storaged-headless
serviceName: nebula-storaged-headless
template:
metadata:
creationTimestamp: null
labels:
app: nebula-storaged-headless
spec:
containers:
- command:
- /bin/sh
- -ecx
- exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless:9559
--local_ip=$(hostname).nebula-storaged-headless
--ws_ip=$(hostname).nebula-storaged-headless --daemonize=false
image: vesoft/nebula-storaged:v3.6.0
imagePullPolicy: IfNotPresent
name: storaged
ports:
- containerPort: 9779
name: storaged-thrift
protocol: TCP
- containerPort: 19779
name: storaged-http
protocol: TCP
- containerPort: 9778
name: storaged-admin
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19779
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/data
name: storaged-claim0
subPath: data
- mountPath: /usr/local/nebula/logs
name: storaged-claim1
subPath: logs
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
volumeClaimTemplates:
- metadata:
name: storaged-claim0
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
- metadata:
name: storaged-claim1
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
kqzh
5
您好,可以使用nebula-console,连接grpah,然后执行add hosts语句,添加storaged节点至meta,可以参考文档 管理 Storage 主机 - NebulaGraph Database 手册
这个方式我试过了,不行
因为storage的pod都没有拉起来,使用了add hosts,show hosts的结果一直是OFFLINE
pod也一直拉不起来
wey
7
add hosts 之后,storaged 自己的 log 怎么说?
add hosts 之后,storaged 的log
nebula-storaged.ERROR
nebula-storaged.WARNING
nebula-storaged.INFO
metad的nebula-metad.INFO
后来我通过nebula-metad.INFO中日志,观察出了大概可能的问题。于是我采用了:
ADD HOSTS “nebula-storaged-0.nebula-storaged-headless”:9779
hosts的状态居然是ONLINE了
不过令我好奇的是:为什么,ADD HOSTS “nebula-storaged-headless”:9779 就只会处于OFFLINE
我换了一种方式,奇怪的是,storaged仍然是拉不起来
部署的yaml内容如下:
apiVersion: v1
kind: Service
metadata:
labels:
app: nebula-metad-headless
name: nebula-metad-headless
namespace: nebula-latest
spec:
clusterIP: None
ports:
- name: thrift
port: 9559
protocol: TCP
targetPort: 9559
- name: http
port: 19559
protocol: TCP
targetPort: 19559
- name: graphd-thrift
port: 9669
protocol: TCP
targetPort: 9669
- name: graphd-http
port: 19669
protocol: TCP
targetPort: 19669
- name: storaged-thrift
port: 9779
protocol: TCP
targetPort: 9779
- name: storaged-http
port: 19779
protocol: TCP
targetPort: 19779
- name: storaged-admin
port: 9778
protocol: TCP
targetPort: 9778
publishNotReadyAddresses: true
selector:
app: nebula-metad-headless
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: nebula-metad-headless
name: nebula-metad
namespace: nebula-latest
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nebula-metad-headless
serviceName: nebula-metad-headless
template:
metadata:
creationTimestamp: null
labels:
app: nebula-metad-headless
spec:
containers:
- command:
- /bin/sh
- -ecx
- exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless:9559
--local_ip=$(hostname).nebula-metad-headless --ws_ip=$(hostname).nebula-metad-headless
--daemonize=false
image: vesoft/nebula-metad:v3.6.0
imagePullPolicy: IfNotPresent
name: metad
ports:
- containerPort: 9559
name: thrift
protocol: TCP
- containerPort: 19559
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19559
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/data
name: metad-claim0
subPath: data
- mountPath: /usr/local/nebula/logs
name: metad-claim1
subPath: logs
- command:
- /bin/sh
- -ecx
- exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless:9559
--local_ip=nebula-metad-0.nebula-metad-headless.nebula-latest.svc.cluster.local --ws_ip=nebula-metad-0.nebula-metad-headless.nebula-latest.svc.cluster.local
--daemonize=false
image: vesoft/nebula-graphd:v3.6.0
imagePullPolicy: IfNotPresent
name: graphd
ports:
- containerPort: 9669
name: graphd-thrift
protocol: TCP
- containerPort: 19669
name: graphd-http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19669
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/logs
name: graphd-claim0
subPath: logs
- command:
- /bin/sh
- -ecx
- exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless:9559
--local_ip=nebula-metad-0.nebula-metad-headless
--ws_ip=nebula-metad-0.nebula-metad-headless
image: vesoft/nebula-storaged:v3.6.0
imagePullPolicy: IfNotPresent
name: storaged
ports:
- containerPort: 9779
name: storaged-thrift
protocol: TCP
- containerPort: 19779
name: storaged-http
protocol: TCP
- containerPort: 9778
name: storaged-admin
protocol: TCP
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/data
name: storaged-claim0
subPath: data
- mountPath: /usr/local/nebula/logs
name: storaged-claim1
subPath: logs
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
volumeClaimTemplates:
- metadata:
name: metad-claim0
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
- metadata:
name: metad-claim1
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
- metadata:
name: graphd-claim0
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
- metadata:
name: storaged-claim0
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
- metadata:
name: storaged-claim1
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard-nfs-storage
resources:
requests:
storage: 100Mi
下面的是问题的一些现象
而且storaged的日志都是空的
metad也没有storaged相关的日志
添加的hosts,一直处于离线状态
kqzh
11
您好,看起来是add hosts 语句有些问题,storaged的域名应该是nebual-storaged-xxx,看图里用了metad的域名
kqzh
12
因为storaged上报用的域名是nebula-storaged-0.nebula-storaged-headless,metad里记录的是nebula-storaged-headless,就不匹配了
那看起来是我弄错了
请问是需要调整command中的命令吗
现在卡在这里了,求帮助
我发现一个问题,就是storaged不能跟graphd或者metad放在一个pod中同时启动,否则storaged就启动不了
kqzh
15
嗯嗯,可以先drop host,再重新add正确的地址
1 个赞
kqzh
16
还是建议使用nebula-operator来部署nebulacluster,应该会少踩很多坑
wey
17
因为这里服务发现机制是:
- 每一个服务自己的服务 id 是自己配置里自己的地址
- storaged 的配置里自己写的是什么,那面能代表这个服务的 id 就是精确匹配的那个配置,它在和 meta 通信的时候只会说我是 <配置的地址>
- ADD HOSTS 时候 meta 会和主动上报上来的服务的这些地址比对
所以看起来都能解析、访问的多个 IP/不同 domain name 用来 ADD HOSTS 的时候,只能用那个配置的地址。
有一个我不太理解的是,为什么storaged 不能跟graphd或者metad放在同一个容器呢,否则就会导致pod启动不起来,而graphd和metad却可以
wey
20
技术上说,不考虑其他因素,放在同一个容器是可以的,启动不起来的原因应该是哪里有问题,可以配置里把 error forward 到 stderror,看看报错是啥。
可能是一些东西冲突(port、file),理论上肯定可以奇技淫巧绕开。
我不知道这种改造的原因是啥,有状态的 work load 在 k8s 里本来就很难,不是 day1 拉起来就好了,后边的可维护性很重要,有 operator 把这些麻烦事儿都做了,还有社区在维护,好处多多,它只是引入了 control plane 的一些 overhead,但是完全是可控的。
如果是因为不方便用 storage provider,自己也可以 hack 本地盘,我记得 operator 也快支持 local disk 了,非常不推荐这么折腾。
实在不想用 operator 还不如用 docker compose/swarm。
1 个赞
wey
22
试试 single pod multi container?