-
nebula 版本:2.5.1
-
部署方式:分布式
-
安装方式:源码编译
-
是否为线上版本:Y
-
硬件信息
- SSD
- 20 * CPU 16c、内存信息 64G
-
问题描述:
新增7个storage节点,执行balance data。1分钟之内所有子任务显示success,然后所有数据都变为无法查询。登陆添加的storage节点机器查看data&wal目录,发现没有任何分片的数据文件。 -
补充信息:
· 所有数据都是采用sst方式导入
· 单备份,三备份均会出现该问题
· 由于单磁盘容量的限制,每个storage挂了多个路径–data_path=/data/graphdb/storage/disk1,/data/graphdb/storage/disk2,/data/graphdb/storage/disk3 -
meta 日志
I0115 01:06:52.248580 1567 Balancer.cpp:43] Start to invoke balance plan 1642179863
I0115 01:06:52.259410 854836 BalanceTask.cpp:216] [1642179863, 1:98, 11.145.4.123:22509->30.42.50.151:22509] Part has been moved successfully!
I0115 01:06:52.261832 1567 HBProcessor.cpp:35] Receive heartbeat from "11.145.4.199":22509, role = STORAGE
I0115 01:06:52.265578 854836 BalanceTask.cpp:46] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Start to move part, check the peers firstly!
I0115 01:06:52.279281 854836 BalanceTask.cpp:216] [1642179863, 1:93, 11.145.5.171:22509->30.42.50.211:22509] Part has been moved successfully!
I0115 01:06:52.281384 7197 BalanceTask.cpp:63] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Ask the src to give up the leadership.
I0115 01:06:52.285619 854836 BalanceTask.cpp:216] [1642179863, 1:93, 30.42.41.101:22509->30.42.74.36:22509] Part has been moved successfully!
I0115 01:06:52.293191 7197 BalanceTask.cpp:90] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Open the part as learner on dst.
I0115 01:06:52.298393 854836 BalanceTask.cpp:46] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Start to move part, check the peers firstly!
I0115 01:06:52.307433 7197 BalanceTask.cpp:104] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Add learner dst.
I0115 01:06:52.311659 854836 BalanceTask.cpp:216] [1642179863, 1:184, 11.145.4.123:22509->30.42.50.151:22509] Part has been moved successfully!
I0115 01:06:52.313736 7200 BalanceTask.cpp:63] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Ask the src to give up the leadership.
I0115 01:06:52.318099 7197 AdminClient.cpp:452] Return leader change from "11.145.5.72":22508, new leader is "11.145.5.96":22508, retry 0, limit 30
I0115 01:06:52.320493 7197 BalanceTask.cpp:118] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Waiting for the data catch up.
I0115 01:06:52.326460 854836 BalanceTask.cpp:216] [1642179863, 1:184, 11.145.4.151:22509->30.42.50.170:22509] Part has been moved successfully!
I0115 01:06:52.327986 7200 BalanceTask.cpp:90] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Open the part as learner on dst.
I0115 01:06:52.333217 7197 AdminClient.cpp:452] Return leader change from "11.145.5.72":22508, new leader is "11.145.5.96":22508, retry 0, limit 3
I0115 01:06:52.334256 7197 BalanceTask.cpp:132] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Send member change request to the leader, it will add the new member on dst host
I0115 01:06:52.338730 854836 BalanceTask.cpp:216] [1642179863, 1:184, 11.145.4.199:22509->30.42.74.36:22509] Part has been moved successfully!
I0115 01:06:52.340914 7200 BalanceTask.cpp:104] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Add learner dst.
I0115 01:06:52.345783 7197 AdminClient.cpp:452] Return leader change from "11.145.5.72":22508, new leader is "11.145.5.96":22508, retry 0, limit 30
I0115 01:06:52.350668 7197 BalanceTask.cpp:147] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Send member change request to the leader, it will remove the old member on src host
I0115 01:06:52.351603 854836 BalanceTask.cpp:216] [1642179863, 1:175, 11.145.4.200:22509->30.42.50.85:22509] Part has been moved successfully!
I0115 01:06:52.352754 7200 AdminClient.cpp:452] Return leader change from "30.42.54.173":22508, new leader is "30.42.74.36":22508, retry 0, limit 30
I0115 01:06:52.355341 7200 BalanceTask.cpp:118] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Waiting for the data catch up.
I0115 01:06:52.358749 7197 AdminClient.cpp:452] Return leader change from "11.145.5.72":22508, new leader is "11.145.5.96":22508, retry 0, limit 30
I0115 01:06:52.362947 7197 BalanceTask.cpp:163] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Update meta for part.
I0115 01:06:52.363605 854836 BalanceTask.cpp:216] [1642179863, 1:175, 11.145.5.121:22509->30.42.50.84:22509] Part has been moved successfully!
I0115 01:06:52.364440 7200 AdminClient.cpp:452] Return leader change from "30.42.54.173":22508, new leader is "30.42.74.36":22508, retry 0, limit 3
I0115 01:06:52.366465 7200 BalanceTask.cpp:132] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Send member change request to the leader, it will add the new member on dst host
I0115 01:06:52.370198 7197 AdminClient.cpp:197] [space:1, part:98] Update original peers "11.145.5.72":22509,"11.145.5.96":22509,"30.42.50.151":22509, remove "11.145.5.72":22509, add "30.42.50.211":22509
I0115 01:06:52.370225 7197 AdminClient.cpp:203] remove [1, 98] from "11.145.5.72":22509
I0115 01:06:52.370234 7197 AdminClient.cpp:210] add [1, 98] to "30.42.50.211":22509
I0115 01:06:52.376374 854836 BalanceTask.cpp:46] [1642179863, 1:175, 11.145.5.171:22509->30.42.50.211:22509] Start to move part, check the peers firstly!
I0115 01:06:52.377463 7200 AdminClient.cpp:452] Return leader change from "30.42.54.173":22508, new leader is "30.42.74.36":22508, retry 0, limit 30
I0115 01:06:52.379523 1567 HBProcessor.cpp:35] Receive heartbeat from "11.145.4.151":22509, role = STORAGE
I0115 01:06:52.381835 7200 BalanceTask.cpp:147] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Send member change request to the leader, it will remove the old member on src host
I0115 01:06:52.382894 7201 BalanceTask.cpp:173] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Update meta succeeded!
I0115 01:06:52.383102 7201 BalanceTask.cpp:182] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Close part on src host, srcLived.
I0115 01:06:52.389657 854836 BalanceTask.cpp:216] [1642179863, 1:181, 11.145.5.33:22509->30.42.50.170:22509] Part has been moved successfully!
I0115 01:06:52.390666 7200 AdminClient.cpp:452] Return leader change from "30.42.54.173":22508, new leader is "30.42.74.36":22508, retry 0, limit 30
I0115 01:06:52.391752 7204 BalanceTask.cpp:63] [1642179863, 1:175, 11.145.5.171:22509->30.42.50.211:22509] Ask the src to give up the leadership.
I0115 01:06:52.394785 7200 BalanceTask.cpp:163] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Update meta for part.
I0115 01:06:52.401362 7201 BalanceTask.cpp:202] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Check the peers...
I0115 01:06:52.401628 7200 AdminClient.cpp:197] [space:1, part:93] Update original peers "30.42.54.173":22509,"30.42.50.211":22509,"30.42.74.36":22509, remove "30.42.54.173":22509, add "30.42.74.26":22509
I0115 01:06:52.401660 7200 AdminClient.cpp:203] remove [1, 93] from "30.42.54.173":22509
I0115 01:06:52.401672 7200 AdminClient.cpp:210] add [1, 93] to "30.42.74.26":22509
I0115 01:06:52.401980 854836 BalanceTask.cpp:216] [1642179863, 1:181, 11.145.5.72:22509->30.42.50.151:22509] Part has been moved successfully!
I0115 01:06:52.402979 7204 BalanceTask.cpp:90] [1642179863, 1:175, 11.145.5.171:22509->30.42.50.211:22509] Open the part as learner on dst.
I0115 01:06:52.411836 7201 BalanceTask.cpp:216] [1642179863, 1:98, 11.145.5.72:22509->30.42.50.211:22509] Part has been moved successfully!
I0115 01:06:52.415293 7205 BalanceTask.cpp:173] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Update meta succeeded!
I0115 01:06:52.415303 854836 BalanceTask.cpp:216] [1642179863, 1:181, 11.145.5.96:22509->30.42.74.26:22509] Part has been moved successfully!
I0115 01:06:52.415431 7205 BalanceTask.cpp:182] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Close part on src host, srcLived.
I0115 01:06:52.417254 7204 BalanceTask.cpp:104] [1642179863, 1:175, 11.145.5.171:22509->30.42.50.211:22509] Add learner dst.
I0115 01:06:52.421459 7201 BalanceTask.cpp:46] [1642179863, 1:98, 11.145.5.96:22509->30.42.74.26:22509] Start to move part, check the peers firstly!
I0115 01:06:52.427121 854836 BalanceTask.cpp:216] [1642179863, 1:164, 11.145.5.121:22509->30.42.50.84:22509] Part has been moved successfully!
I0115 01:06:52.428076 1567 HBProcessor.cpp:35] Receive heartbeat from "11.134.210.102":13708, role = GRAPH
I0115 01:06:52.428192 7204 AdminClient.cpp:452] Return leader change from "11.145.5.171":22508, new leader is "30.42.50.84":22508, retry 0, limit 30
I0115 01:06:52.430778 7204 BalanceTask.cpp:118] [1642179863, 1:175, 11.145.5.171:22509->30.42.50.211:22509] Waiting for the data catch up.
I0115 01:06:52.433650 7205 BalanceTask.cpp:202] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Check the peers...
I0115 01:06:52.435528 7201 BalanceTask.cpp:63] [1642179863, 1:98, 11.145.5.96:22509->30.42.74.26:22509] Ask the src to give up the leadership.
I0115 01:06:52.439270 854836 BalanceTask.cpp:216] [1642179863, 1:164, 11.145.5.171:22509->30.42.50.211:22509] Part has been moved successfully!
I0115 01:06:52.440263 7204 AdminClient.cpp:452] Return leader change from "11.145.5.171":22508, new leader is "30.42.50.84":22508, retry 0, limit 3
I0115 01:06:52.442173 7204 BalanceTask.cpp:132] [1642179863, 1:175, 11.145.5.171:22509->30.42.50.211:22509] Send member change request to the leader, it will add the new member on dst host
I0115 01:06:52.448174 7205 BalanceTask.cpp:216] [1642179863, 1:93, 30.42.54.173:22509->30.42.74.26:22509] Part has been moved successfully!
- storage 日志(新增节点)
I0115 01:07:01.143492 1218 RaftPart.cpp:494] [Port: 22510, Space: 1, Part: 141] The host "30.42.50.170":22510 has been existed as learner, promote it!
I0115 01:07:01.155961 1218 Part.cpp:426] [Port: 22510, Space: 1, Part: 141] preprocess remove peer "30.42.54.173":22510
I0115 01:07:01.155979 1218 Host.h:32] [Port: 22510, Space: 1, Part: 141] [Host: 30.42.54.173:22510] The host has been destroyed!
I0115 01:07:01.155987 1218 RaftPart.cpp:524] [Port: 22510, Space: 1, Part: 141] Remove peer "30.42.54.173":22510
I0115 01:07:01.196606 1218 AdminProcessor.h:329] Check peers for space 1, part 141
I0115 01:07:01.196630 1218 RaftPart.cpp:2064] [Port: 22510, Space: 1, Part: 141] Check host "30.42.50.211":22510
I0115 01:07:01.196637 1218 RaftPart.cpp:2064] [Port: 22510, Space: 1, Part: 141] Check host "30.42.50.170":22510
I0115 01:07:01.196645 1218 RaftPart.cpp:2072] [Port: 22510, Space: 1, Part: 141] Add peer "30.42.50.211":22510 if not exist!
I0115 01:07:01.196651 1218 RaftPart.cpp:499] [Port: 22510, Space: 1, Part: 141] The host "30.42.50.211":22510 has been existed as follower!
I0115 01:07:01.196658 1218 RaftPart.cpp:2072] [Port: 22510, Space: 1, Part: 141] Add peer "30.42.50.151":22510 if not exist!
I0115 01:07:01.196666 1218 RaftPart.cpp:481] [Port: 22510, Space: 1, Part: 141] I am already in the raft group!
I0115 01:07:01.196671 1218 RaftPart.cpp:2072] [Port: 22510, Space: 1, Part: 141] Add peer "30.42.50.170":22510 if not exist!
I0115 01:07:01.196677 1218 RaftPart.cpp:499] [Port: 22510, Space: 1, Part: 141] The host "30.42.50.170":22510 has been existed as follower!
I0115 01:07:01.596065 1214 SlowOpTracker.h:33] [Port: 22510, Space: 1, Part: 14] total time:59ms, Total send logs: 2
I0115 01:07:02.235462 1214 Part.cpp:390] [Port: 22510, Space: 1, Part: 144] Skip stale add learner "11.145.5.96":22510, the part is opened at 1642180020696, but the log timestamp is 1642179856700
I0115 01:07:02.235505 1214 Part.cpp:416] [Port: 22510, Space: 1, Part: 144] Skip stale add peer "11.145.5.96":22510, the part is opened at 1642180020696, but the log timestamp is 1642179856726
I0115 01:07:02.235517 1214 Part.cpp:429] [Port: 22510, Space: 1, Part: 144] Skip stale remove peer "11.145.5.14":22510, the part is opened at 1642180020696, but the log timestamp is 1642179856844
I0115 01:07:02.235524 1214 Part.cpp:403] [Port: 22510, Space: 1, Part: 144] Skip stale transfer leader "11.145.5.72":22510, the part is opened at 1642180020696, but the log timestamp is 1642179857048
I0115 01:07:02.235534 1214 Part.cpp:307] [Port: 22510, Space: 1, Part: 144] Skip commit stale remove peer "11.145.5.14":22510, the part is opened at 1642180020696, but the log timestamp is 1642179856844
I0115 01:07:02.235541 1214 Part.cpp:295] [Port: 22510, Space: 1, Part: 144] Skip commit stale transfer leader "11.145.5.72":22510, the part is opened at 1642180020696, but the log timestamp is 1642179857048
I0115 01:07:02.237572 1214 Part.cpp:387] [Port: 22510, Space: 1, Part: 144] preprocess add learner "30.42.50.170":22510
I0115 01:07:02.237583 1214 RaftPart.cpp:394] [Port: 22510, Space: 1, Part: 144] The host "30.42.50.170":22510 has been existed as group member
I0115 01:07:02.237592 1214 Part.cpp:413] [Port: 22510, Space: 1, Part: 144] preprocess add peer "30.42.50.170":22510
I0115 01:07:02.237601 1214 RaftPart.cpp:499] [Port: 22510, Space: 1, Part: 144] The host "30.42.50.170":22510 has been existed as follower!
I0115 01:07:02.237614 1214 Part.cpp:426] [Port: 22510, Space: 1, Part: 144] preprocess remove peer "11.145.5.33":22510
I0115 01:07:02.237622 1214 RaftPart.cpp:515] [Port: 22510, Space: 1, Part: 144] The peer "11.145.5.33":22510 not exist!
I0115 01:07:02.237632 1214 Part.cpp:400] [Port: 22510, Space: 1, Part: 144] preprocess trans leader "11.145.5.96":22510
I0115 01:07:02.237639 1214 RaftPart.cpp:401] [Port: 22510, Space: 1, Part: 144] Pre process transfer leader to "11.145.5.96":22510
I0115 01:07:02.237645 1214 RaftPart.cpp:405] [Port: 22510, Space: 1, Part: 144] I am follower, just wait for the new leader.
I0115 01:07:02.237654 1214 RaftPart.cpp:595] [Port: 22510, Space: 1, Part: 144] I am Follower, skip remove peer in commit
I0115 01:07:02.237660 1214 RaftPart.cpp:429] [Port: 22510, Space: 1, Part: 144] Commit transfer leader to "11.145.5.96":22510
I0115 01:07:02.237668 1214 RaftPart.cpp:449] [Port: 22510, Space: 1, Part: 144] I am Follower, just wait for the new leader!
I0115 01:07:02.239614 1214 Part.cpp:387] [Port: 22510, Space: 1, Part: 144] preprocess add learner "30.42.50.151":22510
I0115 01:07:02.239627 1214 RaftPart.cpp:384] [Port: 22510, Space: 1, Part: 144] I am learner!
I0115 01:07:02.241616 1214 Part.cpp:413] [Port: 22510, Space: 1, Part: 144] preprocess add peer "30.42.50.151":22510
I0115 01:07:02.241628 1214 RaftPart.cpp:481] [Port: 22510, Space: 1, Part: 144] I am already in the raft group!
I0115 01:07:02.241636 1214 Part.cpp:426] [Port: 22510, Space: 1, Part: 144] preprocess remove peer "11.145.5.72":22510
I0115 01:07:02.241642 1214 RaftPart.cpp:515] [Port: 22510, Space: 1, Part: 144] The peer "11.145.5.72":22510 not exist!
I0115 01:07:04.225697 1230 Host.h:32] [Port: 22510, Space: 1, Part: 50] [Host: 11.145.5.96:22510] The host has been destroyed!
I0115 01:07:05.423733 1214 RaftPart.cpp:595] [Port: 22510, Space: 1, Part: 141] I am Follower, skip remove peer in commit
I0115 01:07:05.576086 1214 RaftPart.cpp:595] [Port: 22510, Space: 1, Part: 144] I am Follower, skip remove peer in commit
I0115 01:07:07.365478 1218 SlowOpTracker.h:33] [Port: 22510, Space: 1, Part: 40] total time:61ms, Total send logs: 2