部署最新的Nebula Operator(0.9)失败

  • nebula 版本:v2.6.0
  • 部署方式:helm方式部署
  • 安装方式:k8s
  • 是否为线上版本:Y
  • 问题的具体描述
    之前部署的Nebula Operator和nebula-cluste版本是0.8.0,可以正常部署,但是那个版本好像有内存管理问题,所以上周Nebula Operator更新了0.9.0就立刻升级了,但是nebula-operator-controller-manager这个pod报错
k8s:1.18.0  
coreDNS:1.6.7
helm:3.7.0
CertManager:v1.6.0
OpenKruise:0.10.0
  • nebula-operator-controller-manager日志
[root@master ~]# kubectl logs -f nebula-operator-controller-manager-deployment-654b94ffb9-r4n7j -n nebula-operator-system -c controller-manager
2021-11-26T05:19:42.369Z        INFO    setup   command-line-arguments/main.go:131      Welcome to Nebula Operator.
2021-11-26T05:19:42.369Z        INFO    setup   command-line-arguments/main.go:132      Nebula Operator Version {"version": {"gitVersion":"770bc38bca1680ca-dev","gitCommit":"770bc38bca1680ca206f6964dfa366868cf96eaf","gitDate":"2021-11-24T05:43:05Z","buildDate":"2021-11-24T18:17:09Z","goVersion":"go1.16","compiler":"gc","platform":"linux/amd64"}}
I1126 05:19:43.471041       1 request.go:645] Throttling request took 1.090331924s, request: GET:https://241.254.0.1:443/apis/apps/v1?timeout=32s
2021-11-26T05:19:43.823Z        INFO    controller-runtime.metrics      metrics/listener.go:44  metrics server is starting to listen    {"addr": ":8080"}
2021-11-26T05:19:45.279Z        INFO    setup   command-line-arguments/main.go:181      starting manager
I1126 05:19:45.280001       1 leaderelection.go:243] attempting to acquire leader lease  nebula-operator-system/nebula-controller-manager-leader...
2021-11-26T05:19:45.280Z        INFO    controller-runtime.manager      manager/internal.go:406 starting metrics server {"path": "/metrics"}
I1126 05:20:02.700084       1 leaderelection.go:253] successfully acquired lease nebula-operator-system/nebula-controller-manager-leader
2021-11-26T05:20:02.700Z        DEBUG   controller-runtime.manager.events       recorder/recorder.go:98 Normal  {"object": {"kind":"ConfigMap","namespace":"nebula-operator-system","name":"nebula-controller-manager-leader","uid":"c969be71-3d1d-45b1-8005-4885eb03adde","apiVersion":"v1","resourceVersion":"54306854"}, "reason": "LeaderElection", "message": "nebula-operator-controller-manager-deployment-654b94ffb9-r4n7j_a26ee5b7-f930-4254-931c-010d31b1aaf4 became leader"}
2021-11-26T05:20:02.700Z        DEBUG   controller-runtime.manager.events       recorder/recorder.go:98 Normal  {"object": {"kind":"Lease","namespace":"nebula-operator-system","name":"nebula-controller-manager-leader","uid":"884a382d-f34c-4353-a3e9-5eee0cbf71db","apiVersion":"coordination.k8s.io/v1","resourceVersion":"54306855"}, "reason": "LeaderElection", "message": "nebula-operator-controller-manager-deployment-654b94ffb9-r4n7j_a26ee5b7-f930-4254-931c-010d31b1aaf4 became leader"}
2021-11-26T05:20:02.700Z        INFO    controller-runtime.manager.controller.nebulacluster     controller/controller.go:158    Starting EventSource    {"reconciler group": "apps.nebula-graph.io", "reconciler kind": "NebulaCluster", "source": "kind source: /, Kind="}
2021-11-26T05:20:02.801Z        INFO    controller-runtime.manager.controller.nebulacluster     controller/controller.go:158    Starting EventSource    {"reconciler group": "apps.nebula-graph.io", "reconciler kind": "NebulaCluster", "source": "kind source: /, Kind="}
2021-11-26T05:20:03.303Z        INFO    controller-runtime.manager.controller.nebulacluster     controller/controller.go:158    Starting EventSource    {"reconciler group": "apps.nebula-graph.io", "reconciler kind": "NebulaCluster", "source": "kind source: /, Kind="}
2021-11-26T05:20:03.405Z        INFO    controller-runtime.manager.controller.nebulacluster     controller/controller.go:158    Starting EventSource    {"reconciler group": "apps.nebula-graph.io", "reconciler kind": "NebulaCluster", "source": "kind source: /, Kind="}
2021-11-26T05:20:03.506Z        INFO    controller-runtime.manager.controller.nebulacluster     controller/controller.go:165    Starting Controller     {"reconciler group": "apps.nebula-graph.io", "reconciler kind": "NebulaCluster"}
2021-11-26T05:20:03.506Z        INFO    controller-runtime.manager.controller.nebulacluster     controller/controller.go:192    Starting workers        {"reconciler group": "apps.nebula-graph.io", "reconciler kind": "NebulaCluster", "worker count": 3}
2021-11-26T05:20:03.506Z        INFO    controller.nebulacluster        nebulacluster/nebula_cluster_controller.go:159  Start to reconcile      {"namespace": "test-nebula", "name": "test-nebula"}
I1126 05:20:04.559999       1 request.go:645] Throttling request took 1.048911945s, request: GET:https://241.254.0.1:443/apis/kibana.k8s.elastic.co/v1?timeout=32s
2021-11-26T05:20:04.973Z        INFO    controller.nebulacluster        nebulacluster/nebula_cluster_controller.go:143  Finished reconcile nebulaCluster        {"namespace": "test-nebula", "name": "test-nebula", "spendTime": "1.467130477s"}
E1126 05:20:04.973606       1 runtime.go:76] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 525 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1adea20, 0x2bd6530)
        k8s.io/apimachinery@v0.19.2/pkg/util/runtime/runtime.go:72 +0xa6
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        k8s.io/apimachinery@v0.19.2/pkg/util/runtime/runtime.go:46 +0x86
panic(0x1adea20, 0x2bd6530)
        runtime/panic.go:965 +0x1b9
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.(*metadComponent).GetLogStorageResources(0xc0016567b0, 0xc000889470)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_metad.go:104 +0x38
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.(*metadComponent).GenerateVolumeClaim(0xc0016567b0, 0xc0008894a0, 0x11, 0x1, 0x0, 0x0)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_metad.go:270 +0x89
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.generateStatefulSet(0x1fc4658, 0xc0016567b0, 0xc0016a9040, 0x1f32701, 0x1, 0x1d1ed8b, 0x2)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_common.go:297 +0x4e2
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.generateWorkload(0x1fc4658, 0xc0016567b0, 0xc0008da100, 0x4, 0xc0008da10c, 0x2, 0xc001769fc0, 0xb, 0xc0016a9040, 0x1, ...)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_common.go:357 +0x948
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.(*metadComponent).GenerateWorkload(0xc0016567b0, 0xc0008da100, 0x4, 0xc0008da10c, 0x2, 0xc001769fc0, 0xb, 0xc0016a9040, 0x1, 0xc001769fc0, ...)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_metad.go:311 +0x7b
github.com/vesoft-inc/nebula-operator/pkg/controller/component.(*metadCluster).syncMetadWorkload(0xc0007571c0, 0xc0002f8900, 0x0, 0x0)
        github.com/vesoft-inc/nebula-operator/pkg/controller/component/metad_cluster.go:96 +0x571
github.com/vesoft-inc/nebula-operator/pkg/controller/component.(*metadCluster).Reconcile(0xc0007571c0, 0xc0002f8900, 0x4, 0x4)
        github.com/vesoft-inc/nebula-operator/pkg/controller/component/metad_cluster.go:64 +0x85
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*defaultNebulaClusterControl).updateNebulaCluster(0xc0004f2310, 0xc0002f8900, 0x1, 0xc0004f2700)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_control.go:87 +0x18c
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*defaultNebulaClusterControl).UpdateNebulaCluster(0xc0004f2310, 0xc0002f8900, 0x12, 0x0)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_control.go:68 +0x8e
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*ClusterReconciler).syncNebulaCluster(...)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_controller.go:191
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*ClusterReconciler).Reconcile(0xc000757200, 0x1f9f4b8, 0xc000ac6330, 0xc00074d480, 0xb, 0xc00074d464, 0xb, 0xc000ac6300, 0x0, 0x0, ...)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_controller.go:165 +0x6f1
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0006ded20, 0x1f9f410, 0xc0007573c0, 0x1b47820, 0xc00019e520)
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:263 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0006ded20, 0x1f9f410, 0xc0007573c0, 0x0)
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:235 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x1f9f410, 0xc0007573c0)
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:198 +0x4a
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185 +0x37
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007bdf50)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001503f50, 0x1f5ece0, 0xc000ac6240, 0xc000757301, 0xc000546000)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007bdf50, 0x3b9aca00, 0x0, 0xc000402401, 0xc000546000)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x1f9f410, 0xc0007573c0, 0xc000a987f0, 0x3b9aca00, 0x0, 0x1df9f01)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185 +0xa6
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x1f9f410, 0xc0007573c0, 0xc000a987f0, 0x3b9aca00)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:99 +0x57
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:195 +0x497
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1491318]

goroutine 525 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        k8s.io/apimachinery@v0.19.2/pkg/util/runtime/runtime.go:53 +0x109
panic(0x1adea20, 0x2bd6530)
        runtime/panic.go:965 +0x1b9
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.(*metadComponent).GetLogStorageResources(0xc0016567b0, 0xc000889470)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_metad.go:104 +0x38
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.(*metadComponent).GenerateVolumeClaim(0xc0016567b0, 0xc0008894a0, 0x11, 0x1, 0x0, 0x0)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_metad.go:270 +0x89
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.generateStatefulSet(0x1fc4658, 0xc0016567b0, 0xc0016a9040, 0x1f32701, 0x1, 0x1d1ed8b, 0x2)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_common.go:297 +0x4e2
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.generateWorkload(0x1fc4658, 0xc0016567b0, 0xc0008da100, 0x4, 0xc0008da10c, 0x2, 0xc001769fc0, 0xb, 0xc0016a9040, 0x1, ...)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_common.go:357 +0x948
github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1.(*metadComponent).GenerateWorkload(0xc0016567b0, 0xc0008da100, 0x4, 0xc0008da10c, 0x2, 0xc001769fc0, 0xb, 0xc0016a9040, 0x1, 0xc001769fc0, ...)
        github.com/vesoft-inc/nebula-operator/apis/apps/v1alpha1/nebulacluster_metad.go:311 +0x7b
github.com/vesoft-inc/nebula-operator/pkg/controller/component.(*metadCluster).syncMetadWorkload(0xc0007571c0, 0xc0002f8900, 0x0, 0x0)
        github.com/vesoft-inc/nebula-operator/pkg/controller/component/metad_cluster.go:96 +0x571
github.com/vesoft-inc/nebula-operator/pkg/controller/component.(*metadCluster).Reconcile(0xc0007571c0, 0xc0002f8900, 0x4, 0x4)
        github.com/vesoft-inc/nebula-operator/pkg/controller/component/metad_cluster.go:64 +0x85
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*defaultNebulaClusterControl).updateNebulaCluster(0xc0004f2310, 0xc0002f8900, 0x1, 0xc0004f2700)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_control.go:87 +0x18c
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*defaultNebulaClusterControl).UpdateNebulaCluster(0xc0004f2310, 0xc0002f8900, 0x12, 0x0)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_control.go:68 +0x8e
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*ClusterReconciler).syncNebulaCluster(...)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_controller.go:191
github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster.(*ClusterReconciler).Reconcile(0xc000757200, 0x1f9f4b8, 0xc000ac6330, 0xc00074d480, 0xb, 0xc00074d464, 0xb, 0xc000ac6300, 0x0, 0x0, ...)
        github.com/vesoft-inc/nebula-operator/pkg/controller/nebulacluster/nebula_cluster_controller.go:165 +0x6f1
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0006ded20, 0x1f9f410, 0xc0007573c0, 0x1b47820, 0xc00019e520)
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:263 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0006ded20, 0x1f9f410, 0xc0007573c0, 0x0)
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:235 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x1f9f410, 0xc0007573c0)
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:198 +0x4a
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185 +0x37
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007bdf50)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001503f50, 0x1f5ece0, 0xc000ac6240, 0xc000757301, 0xc000546000)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007bdf50, 0x3b9aca00, 0x0, 0xc000402401, 0xc000546000)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x1f9f410, 0xc0007573c0, 0xc000a987f0, 0x3b9aca00, 0x0, 0x1df9f01)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:185 +0xa6
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x1f9f410, 0xc0007573c0, 0xc000a987f0, 0x3b9aca00)
        k8s.io/apimachinery@v0.19.2/pkg/util/wait/wait.go:99 +0x57
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
        sigs.k8s.io/controller-runtime@v0.7.0/pkg/internal/controller/controller.go:195 +0x497

如果有日志或者代码,记得用 Markdown 语法(下面语法)包裹它们提高阅读体验,让回复者更快解决问题哟~~

代码 / 终端输出 / 日志…

最后烦请删掉本模版和问题无关的信息之后,再提交提问,Thx

operator v0.9.0跟v0.8.0是不兼容的,文档对此有特别说明。主要是对crd里的storageClaim做了拆分,log与data分别挂载独立的存储盘,不会共用。
建议你用0.9.0单独部署新集群。

1 个赞

我已经卸载旧版本0.8.0,然后单独使用helm部署0.9.0,使用的命令是

helm install nebula-operator nebula-operator/nebula-operator \
  --namespace nebula-operator-system --create-namespace \
  --version 0.9.0 \
  --set image.kubeRBACProxy.image=kubesphere/kube-rbac-proxy:v0.8.0 \
  --set image.kubeScheduler.image=kubesphere/kube-scheduler:v1.18.8

我这边的经验是还需要手动 apply 一下 crd

# clone and cd to v0.9.0 operator repo directory

kubectl apply -f config/crd/bases/apps.nebula-graph.io_nebulaclusters.yaml
1 个赞

哥,你的意思是先apply 一下 crd再部署operator v0.9.0?

前后都可以哈,这个会排除后来你拉 nebulaCluster 的坑,和你的 controller 报错没有关系。

我还没到nebulaCluster ,controller报错了,我就无法继续下去了

@kevin.qiao 这里 controller 报错是因为什么呢?

还是上面那个错误吗
如果你用operator 0.9.0去管理0.8.0部署的nebula集群,是有问题的

现在operator没毛病了,但是cluster部署失败

export NEBULA_CLUSTER_NAME=mdm-nebula      
export NEBULA_CLUSTER_NAMESPACE=nebula-system  
export STORAGE_CLASS_NAME=share-storage     
helm install "${NEBULA_CLUSTER_NAME}" nebula-operator/nebula-cluster \
    --namespace="${NEBULA_CLUSTER_NAMESPACE}" \
    --set nameOverride=${NEBULA_CLUSTER_NAME} \
    --set nebula.storageClassName="${STORAGE_CLASS_NAME}"

报错信息

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error validating "": 
	error validating data: [ValidationError(NebulaCluster.spec.graphd): unknown field "logVolumeClaim" in io.nebula-graph.apps.v1alpha1.NebulaCluster.spec.graphd, 
	ValidationError(NebulaCluster.spec.metad): unknown field "dataVolumeClaim" in io.nebula-graph.apps.v1alpha1.NebulaCluster.spec.metad, 
	ValidationError(NebulaCluster.spec.metad): unknown field "logVolumeClaim" in io.nebula-graph.apps.v1alpha1.NebulaCluster.spec.metad, 
	ValidationError(NebulaCluster.spec.storaged): unknown field "dataVolumeClaim" in io.nebula-graph.apps.v1alpha1.NebulaCluster.spec.storaged, 
	ValidationError(NebulaCluster.spec.storaged): unknown field "logVolumeClaim" in io.nebula-graph.apps.v1alpha1.NebulaCluster.spec.storaged]

做我上边提到的这个就好了。

哦哦,谢谢老哥,我这边都是在线安装的,其实下载下来部署是比较适合生产环境的吗

不用手动装,operator 还是按照 helm 去装哈,只有现在这个 CRD 要手动 apply 一下,算是一个额外的步骤吧:sob:

问题已经记录在 issue 里了哈 compatibility: v0.8.0 --> v0.9 · Issue #103 · vesoft-inc/nebula-operator · GitHub

[root@master ~]# kubectl apply -f https://github.com/vesoft-inc/nebula-operator/blob/v0.9.0/config/crd/bases/apps.nebula-graph.io_nebulaclusters.yaml
error: error parsing https://github.com/vesoft-inc/nebula-operator/blob/v0.9.0/config/crd/bases/apps.nebula-graph.io_nebulaclusters.yaml: error converting YAML to JSON: yaml: line 145: mapping values are not allowed in this context

改成这个

kubectl apply -f https://github.com/vesoft-inc/nebula-operator/raw/v0.9.0/config/crd/bases/apps.nebula-graph.io_nebulaclusters.yaml

你原来的 url 是 一个 page 不是 raw file

哦哦 :sweat_smile:

谢谢哥,已经可以了,但是你这边有没有遇到这种情况,偶发性出现这种情况,是传输的数据量太大导致偶发性出现,还是其他原因。

ERRO[2021-12-01 09:35:12] graphservice_grpc.pb_impl.go:148 错误原因:Failed to open transport, error: dial tcp 170.18.9.200:31969: connect: connection refused 

我这边每发起一个请求,都会去获取session去执行,你们的策略是啥,是不是可能有这种原因

可以新开个帖子描述一下你的情况哈~

好的 十分感谢你的帮助 :laughing: