nebula版本3.8.0 nebula是2台服务器做的集群,因为意外断电导致集群启动不了,一台的meta一直启动失败,一直报错
,然后删除了两个nebula机器data中的meta, nebula就启动成功了 但是add host storage 失败,两个storage节点一直offline,数据也没有了,但是data中的storage依然还在,以下是storage日志
nebula版本3.8.0 nebula是2台服务器做的集群,因为意外断电导致集群启动不了,一台的meta一直启动失败,一直报错
,然后删除了两个nebula机器data中的meta, nebula就启动成功了 但是add host storage 失败,两个storage节点一直offline,数据也没有了,但是data中的storage依然还在,以下是storage日志
麻烦各位大佬帮我看看,我这nebula的数据还能恢复吗? 磁盘中storage的数据还在
2 台机器做集群,很容易故障。因为超过半数挂了。
能否恢复我就不太清楚了。。mark 下
嗯嗯 现在服务器重装nebula重新跑数据了 后面看增加服务器会不会好一点
同样是 nebula3.8.0
Windows 11 (WSL 2, Ubuntu 22 和 24)
和题主不同,不是因为意外断电,我是在开发环境下 使用 docker-compose 来起的 nebula 服务。平时经常没有 down container 就直接点关机了。
大部分时候不 down container 直接关机都没有问题,但是昨天遇到试图 up 服务的时候
E20250105 07:35:19.738706 146 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_LEADER_CHANGED
E20250105 07:35:20.744900 146 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_LEADER_CHANGED
E20250105 07:35:21.746572 146 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_LEADER_CHANGED
E20250105 07:35:28.478281 146 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_LEADER_CHANGED
E20250105 07:38:08.194293 146 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_LEADER_CHANGED
E20250105 07:38:38.511175 146 SaveGraphVersionProcessor.cpp:25] Failed to save graph version, errorCode: E_LEADER_CHANGED
E20250105 07:34:55.404174 93 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20250105 07:34:55.404657 1 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20250105 07:35:08.476162 94 MetaClient.cpp:772] Send request to "metad2":9559, exceed retry limit
E20250105 07:35:08.476723 94 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20250105 07:35:08.477655 1 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20250105 07:35:21.496620 95 MetaClient.cpp:772] Send request to "metad1":9559, exceed retry limit
E20250105 07:35:21.497597 95 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20250105 07:35:21.499795 1 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20250105 07:35:34.517278 96 MetaClient.cpp:772] Send request to "metad1":9559, exceed retry limit
E20250105 07:35:34.517863 96 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20250105 07:35:34.518396 1 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20250105 07:35:47.599869 48 MetaClient.cpp:772] Send request to "metad1":9559, exceed retry limit
E20250105 07:35:47.601718 48 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E20250105 07:35:47.604538 1 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: Dropping unsent request. Connection closed after: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connect
E20250105 07:38:06.199206 54 MetaClient.cpp:772] Send request to "metad2":9559, exceed retry limit
E20250105 07:38:06.199898 54 MetaClient.cpp:773] RpcResponse exception: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20250105 07:38:06.200489 1 MetaClient.cpp:112] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: TTransportException: Timed out
I20250106 03:19:23.378823 1 StorageDaemon.cpp:147] data path= /data/storage
I20250106 03:19:23.396061 1 MetaClient.cpp:80] Create meta client to "metad0":9559
I20250106 03:19:23.396488 1 MetaClient.cpp:81] root path: /usr/local/nebula, data path size: 1
W20250106 03:19:23.396960 1 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
I20250106 03:19:23.413105 49 ThriftClientManager-inl.h:67] resolve "metad1":9559 as "172.18.0.2":9559
I20250106 03:19:24.415612 49 ThriftClientManager-inl.h:67] resolve "metad2":9559 as "172.18.0.4":9559
I20250106 03:19:25.418324 49 ThriftClientManager-inl.h:67] resolve "metad1":9559 as "172.18.0.2":9559
I20250106 03:19:26.436275 49 ThriftClientManager-inl.h:67] resolve "metad0":9559 as "172.18.0.3":9559
E20250106 03:19:26.437835 1 MetaClient.cpp:112] Heartbeat failed, status:Machine not existed!
I20250106 03:19:26.442168 1 MetaClient.cpp:137] Waiting for the metad to be ready!
两种方式都可以正常启动服务。
想请问,这种情况是因为某些关键数据损坏导致 storaged 服务无法启动吗?虽然一般服务器除了题主的断电情况,也不会强制关机等。
生产环境目前还没遇到类似的问题,但是开发环境遇到了就有些担心
Ubuntu 24.04 LTS
理解