storage 扩容 balance data fail

提问参考模版:

  • nebula 版本:v2.6.1
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息
  • 问题的具体描述
    3 节点机器,准备扩容六节点,现在graph storage 服务已经加入集群,但是balance data 的时候,全部出错
    storage log
Log file created at: 2022/10/13 10:35:36
Running on machine: idc-nebula-02
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1013 10:35:36.979822 22614 MetaClient.cpp:636] Send request to "172.16.54.68":9559, exceed retry limit
E1013 10:35:36.980209 22572 MetaClient.cpp:70] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: write timed out during connection, type = Timed out
E1013 10:35:47.148746 22572 FileUtils.cpp:369] Failed to read the directory "data/storage/nebula" (2): No such file or directory

目前已知信息:

  1. 原先的三节点安装目录 在/data/nebula下,而新的三个节点安装在/usr/local/nebula 下,不知道是不是这个原因
  2. show hosts graph/ storage 显示都是正常
  3. meta log 信息都是显示lastLogTerm LastLogIndex 的信息,没啥有用信息

不好意思,看错了,你这是2.6.1啊。。。
你这个meta不通,真的启动起来了吗?

看这个graph storage 的log 信息是load leader 成功了的。

I1013 13:37:26.869794 22642 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 13:37:26.869910 22642 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 13:37:26.869977 22642 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 13:37:26.869988 22642 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 13:37:26.869998 22642 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 13:37:26.870007 22642 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 13:37:26.870016 22642 MetaClient.cpp:3012] Load leader ok

  1. balancedata出错,错误码是什么,命令行有返回错误信息吗?

  2. 你这个错误日志是只选了Error吗?可以多贴一点吗?

信息有点少。

1 个赞

console 没有报什么错误,只是全部partition都fail 了。

其他更多log 我把storage 的Info log 也捞出来贴一下。
graph

Log file created at: 2022/10/13 10:33:42
Running on machine: idc-nebula-01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1013 10:33:42.856554  7969 GraphDaemon.cpp:124] Starting Graph HTTP Service
I1013 10:33:42.870653  7970 WebService.cpp:124] Web service started on HTTP[19669], HTTP2[19670]
I1013 10:33:42.870749  7969 GraphDaemon.cpp:138] Number of networking IO threads: 48
I1013 10:33:42.870776  7969 GraphDaemon.cpp:147] Number of worker threads: 48
I1013 10:33:42.871076  7969 MetaClient.cpp:57] Create meta client to "172.16.51.66":9559
I1013 10:33:46.319619  7969 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 1 space
I1013 10:33:46.319681  7969 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:33:46.319689  7969 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:33:46.319695  7969 MetaClient.cpp:3012] Load leader ok
I1013 10:33:46.320432  7969 MetaClient.cpp:117] Register time task for heartbeat!
I1013 10:33:46.431037  7969 GraphDaemon.cpp:180] Starting nebula-graphd on 172.18.131.141:9669
I1013 10:43:20.299113  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 1 space
I1013 10:43:20.299253  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:43:20.299330  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:43:20.299342  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:43:20.299352  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:43:20.299360  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:43:20.299369  8018 MetaClient.cpp:3012] Load leader ok
I1013 10:45:41.543970  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:45:41.544098  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:45:41.544160  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:45:41.544171  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:45:41.544181  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:45:41.544190  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:45:41.544200  8018 MetaClient.cpp:3012] Load leader ok
I1013 10:46:11.645377  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:46:11.645501  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:46:11.645566  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:46:11.645577  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:46:11.645587  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:46:11.645596  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:46:11.645606  8018 MetaClient.cpp:3012] Load leader ok
I1013 10:52:47.403580  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:52:47.403695  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:52:47.403753  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:52:47.403764  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:52:47.403774  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:52:47.403784  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:52:47.403792  8018 MetaClient.cpp:3012] Load leader ok
I1013 10:54:07.588161  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:54:07.588271  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:54:07.588326  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:54:07.588336  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:54:07.588346  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:54:07.588367  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:54:07.588376  8018 MetaClient.cpp:3012] Load leader ok
I1013 11:24:44.116171  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 11:24:44.116293  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 11:24:44.116351  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 11:24:44.116361  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 11:24:44.116370  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 11:24:44.116380  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 11:24:44.116389  8018 MetaClient.cpp:3012] Load leader ok
I1013 13:38:26.946231  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 13:38:26.946350  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 13:38:26.946409  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 13:38:26.946420  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 13:38:26.946430  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 13:38:26.946439  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 13:38:26.946449  8018 MetaClient.cpp:3012] Load leader ok
I1013 14:55:00.641844  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 14:55:00.641980  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 14:55:00.642036  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 14:55:00.642046  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 14:55:00.642055  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 14:55:00.642063  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 14:55:00.642072  8018 MetaClient.cpp:3012] Load leader ok
I1013 14:57:16.656226  8018 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 14:57:16.656353  8018 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 14:57:16.656430  8018 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 14:57:16.656442  8018 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 14:57:16.656453  8018 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 14:57:16.656464  8018 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 14:57:16.656474  8018 MetaClient.cpp:3012] Load leader ok

storage

Log file created at: 2022/10/13 10:36:26
Running on machine: idc-nebula-01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1013 10:36:26.437961  8309 StorageDaemon.cpp:125] localhost = "172.18.131.141":9779
I1013 10:36:26.483320  8309 MetaClient.cpp:57] Create meta client to "172.16.51.67":9559
W1013 10:36:26.487628  8309 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E1013 10:36:31.544806  8351 MetaClient.cpp:636] Send request to "172.16.54.68":9559, exceed retry limit
E1013 10:36:31.545328  8309 MetaClient.cpp:70] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: write timed out during connection, type = Timed out
I1013 10:36:31.545387  8309 MetaClient.cpp:106] Waiting for the metad to be ready!
W1013 10:36:41.545483  8309 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E1013 10:36:46.562779  8352 MetaClient.cpp:636] Send request to "172.16.54.68":9559, exceed retry limit
E1013 10:36:46.562892  8309 MetaClient.cpp:70] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: write timed out during connection, type = Timed out
I1013 10:36:46.562944  8309 MetaClient.cpp:106] Waiting for the metad to be ready!
W1013 10:36:56.563041  8309 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
E1013 10:37:01.585503  8355 MetaClient.cpp:636] Send request to "172.16.54.68":9559, exceed retry limit
E1013 10:37:01.585635  8309 MetaClient.cpp:70] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: write timed out during connection, type = Timed out
I1013 10:37:01.585690  8309 MetaClient.cpp:106] Waiting for the metad to be ready!
W1013 10:37:11.585772  8309 FileBasedClusterIdMan.cpp:43] Open file failed, error No such file or directory
I1013 10:37:14.602250  8357 MetaClient.cpp:2373] Persisit the cluster Id from metad 5228811176313035658
I1013 10:37:14.602321  8357 FileBasedClusterIdMan.cpp:21] Remove the existed file cluster.id
I1013 10:37:14.602528  8357 FileBasedClusterIdMan.cpp:34] Persiste clusterId 5228811176313035658 succeeded!
I1013 10:37:14.732300  8309 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 1 space
I1013 10:37:14.732357  8309 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:37:14.732367  8309 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:37:14.732376  8309 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:37:14.732384  8309 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:37:14.732393  8309 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:37:14.732400  8309 MetaClient.cpp:3012] Load leader ok
I1013 10:37:14.733191  8309 MetaClient.cpp:117] Register time task for heartbeat!
I1013 10:37:14.733213  8309 StorageServer.cpp:167] Init schema manager
I1013 10:37:14.733222  8309 StorageServer.cpp:170] Init index manager
I1013 10:37:14.733228  8309 StorageServer.cpp:173] Init kvstore
I1013 10:37:14.733318  8309 NebulaStore.cpp:52] Start the raft service...
I1013 10:37:14.741581  8309 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 8388608 for each part by default
I1013 10:37:14.741909  8309 RaftexService.cpp:62] Init thrift server for raft service, port: 9780
I1013 10:37:14.746490  8551 RaftexService.cpp:93] Starting the Raftex Service
I1013 10:37:14.756789  8551 RaftexService.cpp:83] Starting the Raftex Service on 9780
I1013 10:37:14.756821  8551 RaftexService.cpp:103] Start the Raftex Service successfully
I1013 10:37:14.757330  8309 NebulaStore.cpp:84] Scan the local path, and init the spaces_
E1013 10:37:14.757365  8309 FileUtils.cpp:369] Failed to read the directory "data/storage/nebula" (2): No such file or directory
I1013 10:37:14.757402  8309 NebulaStore.cpp:170] Init data from partManager for "172.18.131.141":9779
I1013 10:37:14.757431  8309 NebulaStore.cpp:77] Register handler...
I1013 10:37:14.757454  8309 StorageServer.cpp:85] Starting Storage HTTP Service
I1013 10:37:14.759675  8309 StorageServer.cpp:89] Http Thread Pool started
I1013 10:37:14.777688  8556 WebService.cpp:124] Web service started on HTTP[19779], HTTP2[19780]
I1013 10:37:14.777796  8309 TransactionManager.cpp:25] TransactionManager ctor()
I1013 10:37:14.779012  8309 RocksEngineConfig.cpp:299] Emplace rocksdb option max_background_jobs=4
I1013 10:37:14.779042  8309 RocksEngineConfig.cpp:299] Emplace rocksdb option max_subcompactions=4
I1013 10:37:14.779191  8309 RocksEngineConfig.cpp:299] Emplace rocksdb option max_bytes_for_level_base=268435456
I1013 10:37:14.779201  8309 RocksEngineConfig.cpp:299] Emplace rocksdb option max_write_buffer_number=4
I1013 10:37:14.779206  8309 RocksEngineConfig.cpp:299] Emplace rocksdb option write_buffer_size=67108864
I1013 10:37:14.779211  8309 RocksEngineConfig.cpp:299] Emplace rocksdb option disable_auto_compactions=false
I1013 10:37:14.779294  8309 RocksEngineConfig.cpp:299] Emplace rocksdb option block_size=8192
I1013 10:37:14.801596  8309 RocksEngine.cpp:128] open rocksdb on data/storage/nebula/0/data
I1013 10:37:14.801810  8309 AdminTaskManager.cpp:22] max concurrenct subtasks: 10
I1013 10:37:14.801939  8309 AdminTaskManager.cpp:35] exit AdminTaskManager::init()
I1013 10:37:14.803510  8584 AdminTaskManager.cpp:231] waiting for incoming task
I1013 10:37:14.808884  8588 StorageServer.cpp:286] The internal storage service start(same with admin) on "172.18.131.141":9777
I1013 10:37:14.809557  8587 StorageServer.cpp:261] The admin service start on "172.18.131.141":9778
I1013 10:37:14.859745  8586 StorageServer.cpp:232] The storage service start on "172.18.131.141":9779
I1013 10:43:18.350253  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 1 space
I1013 10:43:18.350418  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:43:18.350507  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:43:18.350522  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:43:18.350533  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:43:18.350543  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:43:18.350553  8545 MetaClient.cpp:3012] Load leader ok
I1013 10:45:48.796270  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:45:48.796380  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:45:48.796427  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:45:48.796437  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:45:48.796447  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:45:48.796456  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:45:48.796464  8545 MetaClient.cpp:3012] Load leader ok
I1013 10:46:09.043150  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:46:09.043265  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:46:09.043331  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:46:09.043344  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:46:09.043355  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:46:09.043366  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:46:09.043376  8545 MetaClient.cpp:3012] Load leader ok
I1013 10:46:19.203305  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:46:19.203404  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:46:19.203462  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:46:19.203473  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:46:19.203483  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:46:19.203507  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:46:19.203517  8545 MetaClient.cpp:3012] Load leader ok
I1013 10:52:55.598269  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:52:55.598384  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:52:55.598443  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:52:55.598453  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:52:55.598464  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:52:55.598472  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:52:55.598481  8545 MetaClient.cpp:3012] Load leader ok
I1013 10:54:06.172732  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 10:54:06.172832  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 10:54:06.172888  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 10:54:06.172899  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 10:54:06.172909  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 10:54:06.172917  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 10:54:06.172925  8545 MetaClient.cpp:3012] Load leader ok
I1013 11:24:46.024549  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 11:24:46.024688  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 11:24:46.024746  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 11:24:46.024756  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 11:24:46.024766  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 11:24:46.024775  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 11:24:46.024783  8545 MetaClient.cpp:3012] Load leader ok
I1013 13:38:28.260293  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 13:38:28.260406  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 13:38:28.260465  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 13:38:28.260475  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 13:38:28.260484  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 13:38:28.260493  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 13:38:28.260502  8545 MetaClient.cpp:3012] Load leader ok
I1013 14:54:54.448084  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 14:54:54.448446  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 14:54:54.448515  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 14:54:54.448526  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 14:54:54.448535  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 14:54:54.448545  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 14:54:54.448554  8545 MetaClient.cpp:3012] Load leader ok
I1013 14:57:16.654443  8545 MetaClient.cpp:3006] Load leader of "172.16.54.66":9779 in 4 space
I1013 14:57:16.654563  8545 MetaClient.cpp:3006] Load leader of "172.16.54.67":9779 in 4 space
I1013 14:57:16.654623  8545 MetaClient.cpp:3006] Load leader of "172.16.54.68":9779 in 4 space
I1013 14:57:16.654634  8545 MetaClient.cpp:3006] Load leader of "172.18.131.141":9779 in 0 space
I1013 14:57:16.654644  8545 MetaClient.cpp:3006] Load leader of "172.18.131.142":9779 in 0 space
I1013 14:57:16.654654  8545 MetaClient.cpp:3006] Load leader of "172.18.131.143":9779 in 0 space
I1013 14:57:16.654664  8545 MetaClient.cpp:3012] Load leader ok

meta
meta 从扩容开始的 log 我都放到这边了,
删除了 很多重复的下面的log

I1013 13:37:21.936331 3358261 SessionManagerProcessor.cpp:131] resp list session: 1645856541618405

meta2.log (6.0 MB)

看上去有很多这种错误,检查下端口是否通呢,用status查看一下?
看上去任务失败的原因是通信不可达。

E1013 14:53:47.658854 2344221 BalanceTask.cpp:96] [1665629751, 39:18, 172.16.54.66:9779->172.18.131.143:9779] Open part failed, status ["172.18.131.143":9778] RPC failure in AdminClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 113 (No route to host): No route to host

刚刚排查了下,是因为两块网卡,对外的网卡不同的原因。
现在meta 的两个ip 都能ping 通,但是还是这个错误。

ip有没有变呢?如果ip改变,可能也会造成一些问题,比如你show storage hosts的时候的ip和现在设置的是否一致等。

我把meta ip 统一改成了show hosts 里面展示的ip 了。原先的时候 show hosts ip 和 新节点里面的配置的meta ip 不一致,现在一致了。

知道原因了,firewall 没有关。

1 个赞

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。