nebula-operator安装部署的集群使用java-client数据写入一段时间就挂掉

  • nebula 版本:2.0.1
  • 部署方式(分布式 / 单机 / Docker / DBaaS):nebula-operator
  • 是否为线上版本:Y
    写入数据3分钟后就挂掉,观察发现 storaged有一个节点异常CrashLoopBackOff,等待一会发现节点重新running,
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
spec:
  graphd:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-graphd
    version: v2.0.1
    service:
      type: NodePort
      externalTrafficPolicy: Local
    storageClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: gp2
  metad:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-metad
    version: v2.0.1
    storageClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: gp2
  storaged:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 3
    image: vesoft/nebula-storaged
    version: v2.0.1
    storageClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: gp2
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  imagePullPolicy: IfNotPresent

[INFO ][2021-06-04 15:27:23,399][com.sf.nebula.FileUtilsTest] :start
[INFO ][2021-06-04 15:27:32,832][com.sf.nebula.util.NebulaGraphUtil] :NebulaGraphUtil create appMode=k8s space=lineage_name2
[INFO ][2021-06-04 15:27:33,763][com.vesoft.nebula.client.graph.net.NebulaPool] :Get connection to 10.202.77.197:31928
[INFO ][2021-06-04 15:27:39,005][com.sf.nebula.FileUtilsTest] :handler data total :500
[INFO ][2021-06-04 15:27:45,920][com.sf.nebula.FileUtilsTest] :handler data total :1000
[INFO ][2021-06-04 15:28:20,216][com.sf.nebula.FileUtilsTest] :handler data total :1500
[INFO ][2021-06-04 15:28:28,923][com.sf.nebula.FileUtilsTest] :handler data total :2000
[INFO ][2021-06-04 15:28:53,600][com.sf.nebula.FileUtilsTest] :handler data total :2500
[INFO ][2021-06-04 15:28:58,484][com.sf.nebula.FileUtilsTest] :handler data total :3000
[INFO ][2021-06-04 15:29:47,762][com.sf.nebula.FileUtilsTest] :handler data total :3500
[INFO ][2021-06-04 15:29:56,902][com.sf.nebula.FileUtilsTest] :handler data total :4000
[INFO ][2021-06-04 15:30:45,525][com.sf.nebula.FileUtilsTest] :handler data total :4500
[INFO ][2021-06-04 15:30:54,303][com.sf.nebula.FileUtilsTest] :handler data total :5000
[ERROR][2021-06-04 15:30:54,742][com.sf.nebula.util.NebulaGraphUtil] :executeWith3Tries failed, msg=RPC failure in StorageClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused, nGql=UPSERT VERTEX "bdp.tmp_dm_hr.glt_emp_count_5011_5015" set table.dropFlag = true, table.name = "bdp.tmp_dm_hr.glt_emp_count_5011_5015";
[ERROR][2021-06-04 15:30:54,843][com.sf.nebula.util.NebulaGraphUtil] :executeWith3Tries failed, msg=Storage Error: part: 14, error: E_RPC_FAILURE(-3)., nGql=GO FROM "bdp.tmp_dm_hr.glt_emp_count_5011_5015" OVER contain
[ERROR][2021-06-04 15:30:55,139][com.sf.nebula.util.NebulaGraphUtil] :executeWith3Tries failed, msg=RPC failure in StorageClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused, nGql=UPSERT VERTEX "bdp.tmp_dm_hr.glt_emp_count_1311_1315" set table.dropFlag = true, table.name = "bdp.tmp_dm_hr.glt_emp_count_1311_1315";
[ERROR][2021-06-04 15:30:55,238][com.sf.nebula.util.NebulaGraphUtil] :executeWith3Tries failed, msg=Storage Error: The leader has changed. Try again later, nGql=GO FROM "bdp.tmp_dm_hr.glt_emp_count_1311_1315" OVER contain
[ERROR][2021-06-04 15:30:55,491][com.sf.nebula.util.NebulaGraphUtil] :executeWith3Tries failed, msg=RPC failure in StorageClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type 

nebula-metad-0日志


nebula-graphd-0日志

nebula-storaged-0日志

nebula-storaged-1日志

此时console连接进集群发现个别数据取数异常
image

有storage crash时的日志吗,可以贴出来

我把 cpu,内存调大了1倍,解决了

该主题在最后一个回复创建后2天后自动关闭。不再允许新的回复。

浙ICP备20010487号