集群某个实例发生重启,实时写入任务重试,实际上丢失了数据

  • nebula 版本:3.8.0
  • 部署方式:分布式(3个实例,meta3,graph3,storage*3)
  • 安装方式:RPM
  • 是否上生产环境:Y
  • 硬件信息
    • 磁盘:SSD
    • CPU、内存信息:16c 128G
  • 问题的具体描述
    1、集群中有一个实例发生了重启,nebula-java的写入任务发生了重试,但是没有接收到超过重试次数的异常。有个问题是,我配置的是100毫秒,重试3次,重启服务没有那么快,应该会超过3次.但我的程序没有接收到异常,可以认为是重试成功?
    2、在发生重启的这个时间段的数据,发现有些数据没有写入,是不是实例之间的内部同步问题?
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
    execute error, code: -1005, message: Storage Error: RPC failure, probably timeout., retry: 1
    execute error, code: -1005, message: Storage Error: RPC failure, probably timeout., retry: 2
    execute error, code: -1005, message: Storage Error: RPC failure, probably timeout., retry: 3
    execute error, code: -1005, message: RPC failure in StorageClient with timeout: TTransportException: Timed out, retry: 1
    execute failed for IOErrorException, message: java.net.ConnectException: Connection timed out (Connection timed out), retry: 1
    execute failed for IOErrorException, message: java.net.ConnectException: Connection refused (Connection refused), retry: 1
#我的代码如下:
                    SessionPoolConfig sessionPoolConfig = new SessionPoolConfig(addresses, spaceName, username, password);
                    sessionPoolConfig.setMinSessionSize(10).setMaxSessionSize(100).setRetryConnectTimes(3)
                            .setWaitTime(100)
                            .setRetryTimes(3)
                            .setIntervalTime(100);
                    sessionPool = new SessionPool(sessionPoolConfig);
try {
                StopWatch stopWatch = StopWatch.createStarted();
                result = sessionPool .execute(ngql);
                stopWatch.stop();
                if (result != null && result.isSucceeded()){
                    if(stopWatch.getTime() > 50){
                        log.info("ngql:{} execute success, spend time:{}", ngql, stopWatch.getTime());
                    }
                    break;
                } else {
                    JSONObject failedMsg = new JSONObject();
                    failedMsg.put("nebula_ngql", ngql);
                    failedMsg.put("exec_code", result != null ? result.getErrorCode() : 0);
                    failedMsg.put("exec_msg",  result != null ? result.getErrorMessage() : "system error");
                    throw new Exception(failedMsg.toJSONString());
                }
            } catch (Exception e){
                log.error(nebula execute ngql:{} error", ngql, e);
            }