k8s集群，storaged、metad、graphd 配置未生效

ianhe · 2023 年7 月 27 日 09:01

nebula 版本：v3.5.0
operator 版本：v1.4.2
部署方式：云端
安装方式：k8s
是否上生产环境： N
问题的具体描述

k8s 部署修改了默认的日志级别，将 minloglevel 改成了 1，v 改成 3，但是只有 graphd 的配置修改了，metad 和 storaged 的配置没有修改成功

kqzh · 2023 年8 月 1 日 10:39

您好，可以贴一下您的nebula-cluster yaml吗

ianhe · 2023 年8 月 2 日 06:33

nebula:
  logRotate:
    rotate: 5
    size: 200M
  graphd:
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: graphd
              topologyKey: kubernetes.io/hostname
            weight: 100
    config:
      auth_type: password
      enable_authorize: "true"
      max_sessions_per_ip_per_user: "300"
      minloglevel: "2"
      session_idle_timeout_secs: "120"
      system_memory_high_watermark_ratio: "0.9"
      v: "2"
    env:
      - name: TZ
        value: Asia/Shanghai
    image: vesoft/nebula-graphd
    logStorage: 50Gi
    nodeSelector:
      nebula: ""
    replicas: 3
    resources:
      limits:
        cpu: "8"
        memory: 16Gi
      requests:
        cpu: "8"
        memory: 16Gi
    sidecarContainers:
      - command:
          - sh
          - -ce
          - |-
            version=3.5.0
            wget -O /usr/local/bin/nebula-console https://ghproxy.com/github.com/vesoft-inc/nebula-console/releases/download/v$version/nebula-console-linux-amd64-v$version
            chmod a+x /usr/local/bin/nebula-console
            while true; do find logs/ -size +1048576k -type f -delete;sleep 1h; done
        image: alpine:edge
        name: nebula-console
        volumeMounts:
          - mountPath: /logs
            name: graphd-log
            subPath: logs
  imagePullPolicy: IfNotPresent
  metad:
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: metad
              topologyKey: kubernetes.io/hostname
            weight: 100
    config:
      minloglevel: "2"
      v: "2"
    dataStorage: 100Gi
    env:
      - name: TZ
        value: Asia/Shanghai
    image: vesoft/nebula-metad
    logStorage: 50Gi
    nodeSelector:
      nebula: ""
    replicas: 3
    resources:
      limits:
        cpu: "8"
        memory: 16Gi
      requests:
        cpu: "8"
        memory: 16Gi
    sidecarContainers:
      - command:
          - sh
          - -ce
          - while true; do find logs/ -size +1048576k -type f -delete;sleep 1h; done
        image: alpine:edge
        name: clean-logs
        volumeMounts:
          - mountPath: /logs
            name: metad-log
            subPath: logs
  schedulerName: default-scheduler # default-scheduler, nebula-scheduler
  storageClassName: local-path-retain
  storaged:
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: storaged
              topologyKey: kubernetes.io/hostname
            weight: 100
    config:
      minloglevel: "2"
      v: "2"
    dataStorage: 100Gi
    env:
      - name: TZ
        value: Asia/Shanghai
    image: vesoft/nebula-storaged
    logStorage: 50Gi
    nodeSelector:
      nebula: ""
    replicas: 3
    resources:
      limits:
        cpu: "8"
        memory: 16Gi
      requests:
        cpu: "8"
        memory: 16Gi
    sidecarContainers:
      - command:
          - sh
          - -ce
          - while true; do find logs/ -size +1048576k -type f -delete;sleep 1h; done
        image: alpine:edge
        name: clean-logs
        volumeMounts:
          - mountPath: /logs
            name: storaged-log
            subPath: logs
  version: v3.5.0

kqzh · 2023 年8 月 3 日 02:07

您好，yaml看起来没什么问题，可以再贴一下metad和storaged的configmap吗，如果configmap里不存在该配置，可以试试手动删除pod后，configmap会不会更新

ianhe · 2023 年8 月 14 日 08:44

我后来各种方法都尝试了一遍，包括删除pod，或者删除configmap让operator重新建，都不能更新log的相关配置

kqzh · 2023 年8 月 15 日 07:00

您好，我在本地测试是正常的，修改nebulacluster的config字段后，会触发configmap更新以及container重启。另外，上周operator发布了1.5.0，可以升级到最新版试试

ianhe · 2023 年8 月 15 日 07:29

升级到 1.5.0 报错了

I0815 07:23:11.151002       1 cm.go:98] configMap [nebula/nebula-cluster-metad] updated successfully                                                                                                  
I0815 07:23:11.151130       1 nebula_cluster_controller.go:143] Finished reconciling NebulaCluster [nebula/nebula-cluster], spendTime: (4.440012706s)                                                 
I0815 07:23:11.151206       1 controller.go:118]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "NebulaCluster"={"name":"nebula-cluster","n
panic: runtime error: invalid memory address or nil pointer dereference [recovered]                                                                                                                   
    panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                           
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x14a32a8]

kqzh · 2023 年8 月 15 日 07:50

您好，请问crd有更新吗，可以贴下完整的报错日志吗，可能有兼容性的问题

ianhe · 2023 年8 月 15 日 07:59

crd 之前没更新，后来修了一下 operator 跑起来了，但是配置还是更新失败了

删除了所有 configmap 之后，依然是只有最后创建的 graphd 的配置被修改了 minloglevel 和 v，先创建出来的 metad 和 storaged 依旧是默认的 0 和 0

kqzh · 2023 年8 月 15 日 08:13

了解，可以手动执行 kubectl get nc -o yaml 确认下spec里的config是否发生了更改吗

ianhe · 2023 年8 月 15 日 08:32

找运维同学确认了一下，kubectl get nc -o yaml 里的 config 确实都已经修改成 3 了，但是实际确实是只有 graphd 的配置改成 3，其他依旧是 0

kqzh · 2023 年8 月 15 日 09:19

您好，可以进到storaged的容器里，curl 一下ip:19779/flags接口，看下对应的config是否已经更新，看了下源码，目前把config分成了两类，一类是需要重启更新的，一类是不需要重启更新的，需要重启的才会更新configmap，不需要重启的会直接调用http接口进行更新

ianhe · 2023 年8 月 15 日 10:11

curl 了一下，storaged 和 meta 的 v 配置返回的是 3，storaged 的 minloglevel 还是 0 但是 metad 的minloglevel 改成了 3，有点迷惑

ianhe · 2023 年8 月 15 日 10:37

手动 curl 改了一下三个服务的 log 配置之后，再重启 curl flags 接口改完配置之后，又被 operator 配置改成 0 了 (@_@)

ianhe · 2023 年8 月 15 日 11:01

额，我们运维仔细研究了一下，发现问题出在动态配置上了

这里会把配置 reset 掉，因为 config 里面只包含 minloglevel 和 v

后来我们加了一个 log_dir=logs （相当于没改）的配置在 config 里面，就会修改 metad 和 storaged 的配置文件

abby-cyber-fairy · 2023 年8 月 16 日 06:45

https://docs.nebula-graph.com.cn/3.6.0/nebula-operator/8.custom-cluster-configurations/8.1.custom-conf-parameter/#_3

PS: 只有当config中的配置项均为可在运行时动态修改的参数时，才能触发在线修改集群配置操作。如果config中的配置项包含不可动态修改的参数，那么集群配置将通过重启 Pod 的方式进行更新。

ianhe · 2023 年8 月 16 日 10:18

问题是，就算动态更新能更新上对的也行啊，但是 operator 给 reset 成默认之后就完了，就不给更新成自定义配置的了 ……

updateDynamicFlags

你看，它在 reset dynamic flags successfully 之后直接返回 nil 了，不管自定义的配置了

system · 2023 年8 月 23 日 10:18

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。