集群某个实例发生重启，实时写入任务重试，实际上丢失了数据

Ian · 2026 年3 月 31 日 09:10

nebula 版本：3.8.0
部署方式：分布式（3个实例，meta3,graph3,storage*3）
安装方式：RPM
是否上生产环境：Y
硬件信息
- 磁盘：SSD
- CPU、内存信息：16c 128G
问题的具体描述
1、集群中有一个实例发生了重启，nebula-java的写入任务发生了重试，但是没有接收到超过重试次数的异常。有个问题是，我配置的是100毫秒，重试3次，重启服务没有那么快，应该会超过3次.但我的程序没有接收到异常，可以认为是重试成功？
2、在发生重启的这个时间段的数据，发现有些数据没有写入，是不是实例之间的内部同步问题？
相关的 meta / storage / graph info 日志信息（尽量使用文本形式方便检索）
execute error, code: -1005, message: Storage Error: RPC failure, probably timeout., retry: 1
execute error, code: -1005, message: Storage Error: RPC failure, probably timeout., retry: 2
execute error, code: -1005, message: Storage Error: RPC failure, probably timeout., retry: 3
execute error, code: -1005, message: RPC failure in StorageClient with timeout: TTransportException: Timed out, retry: 1
execute failed for IOErrorException, message: java.net.ConnectException: Connection timed out (Connection timed out), retry: 1
execute failed for IOErrorException, message: java.net.ConnectException: Connection refused (Connection refused), retry: 1

#我的代码如下：
                    SessionPoolConfig sessionPoolConfig = new SessionPoolConfig(addresses, spaceName, username, password);
                    sessionPoolConfig.setMinSessionSize(10).setMaxSessionSize(100).setRetryConnectTimes(3)
                            .setWaitTime(100)
                            .setRetryTimes(3)
                            .setIntervalTime(100);
                    sessionPool = new SessionPool(sessionPoolConfig);
try {
                StopWatch stopWatch = StopWatch.createStarted();
                result = sessionPool .execute(ngql);
                stopWatch.stop();
                if (result != null && result.isSucceeded()){
                    if(stopWatch.getTime() > 50){
                        log.info("ngql:{} execute success, spend time:{}", ngql, stopWatch.getTime());
                    }
                    break;
                } else {
                    JSONObject failedMsg = new JSONObject();
                    failedMsg.put("nebula_ngql", ngql);
                    failedMsg.put("exec_code", result != null ? result.getErrorCode() : 0);
                    failedMsg.put("exec_msg",  result != null ? result.getErrorMessage() : "system error");
                    throw new Exception(failedMsg.toJSONString());
                }
            } catch (Exception e){
                log.error(nebula execute ngql:{} error", ngql, e);
            }

system · 2026 年4 月 30 日 09:11

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。