3跳查询时 Get neighbors failed

  • nebula 版本:1.1.0
  • 部署方式:分布式,3台
  • 硬件信息
    • 磁盘:HDD
    • CPU、内存信息:
===============================================================================
| ID  | Name         | Partition number | Replica Factor | Charset | Collate  |
===============================================================================
| 542 | twitter_test | 100              | 3              | utf8    | utf8_bin |
-------------------------------------------------------------------------------

导入14亿的边和7000万的点后查询3度的度数

go 3 steps from hash('21250581') over link yield 1 as n | group by $-.n yield $-.n,  COUNT(*)

报错:[ERROR (-8)]: Get neighbors failed

查询2度的度数偶尔报错,偶尔可以:

go 2 steps from hash('21250581') over link yield 1 as n | group by $-.n yield $-.n,  COUNT(*)
===================
| $-.n | COUNT(*) |
===================
| 1    | 3382561  |
-------------------

E1021 11:01:24.107187 60271 StorageClient.inl:123] Request to [10.57.36.17:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: writev() failed (peer=10.57.36.17:44500, local=10.57.36.18:20092), type = Internal error, errno = 32 (Broken pipe): Broken pipe
E1021 11:02:24.097913 60271 StorageClient.inl:123] Request to [10.57.36.18:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E1021 11:02:24.132746 60271 StorageClient.inl:123] Request to [10.57.36.19:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E1021 11:02:24.154722 60297 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed
E1021 11:14:31.907526 60272 StorageClient.inl:123] Request to [10.57.36.17:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1021 11:14:31.907812 60300 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed
E1021 11:14:33.704226 60274 StorageClient.inl:123] Request to [10.57.36.17:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1021 11:14:33.704403 60300 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed
E1021 11:14:34.735734 60275 StorageClient.inl:123] Request to [10.57.36.17:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1021 11:14:34.735885 60297 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed
E1021 11:14:35.613775 60300 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed
E1021 11:14:36.417253 60279 StorageClient.inl:123] Request to [10.57.36.17:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1021 11:14:37.771032 60300 GoExecutor.cpp:554] part: 22error code: -3
E1021 11:14:37.771070 60300 GoExecutor.cpp:554] part: 93error code: -3
E1021 11:14:37.771087 60300 GoExecutor.cpp:554] part: 81error code: -3
E1021 11:14:37.771106 60300 GoExecutor.cpp:554] part: 11error code: -3
E1021 11:14:56.984179 60276 StorageClient.inl:123] Request to [10.57.36.17:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E1021 11:14:58.075486 60300 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed
E1021 11:28:45.671701 60278 StorageClient.inl:123] Request to [10.57.36.18:44500] failed: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E1021 11:28:45.982828 60297 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed

E1021 11:16:26.040318 60174 RaftPart.cpp:1075] [Port: 44501, Space: 464, Part: 56] Receive response about askForVote from [10.57.36.17:44501], error code is -5
E1021 11:16:26.093868 60167 Host.cpp:390] [Port: 44501, Space: 464, Part: 80] [Host: 10.57.36.17:44501] Failed to append logs to the host (Err: -5)
E1021 11:16:26.178503 60167 Host.cpp:390] [Port: 44501, Space: 464, Part: 80] [Host: 10.57.36.17:44501] Failed to append logs to the host (Err: -5)
E1021 11:16:26.261907 60167 Host.cpp:390] [Port: 44501, Space: 464, Part: 80] [Host: 10.57.36.17:44501] Failed to append logs to the host (Err: -5)
E1021 11:16:26.351147 60167 Host.cpp:390] [Port: 44501, Space: 464, Part: 80] [Host: 10.57.36.17:44501] Failed to append logs to the host (Err: -5)
E1021 11:16:26.433197 60153 Host.cpp:390] [Port: 44501, Space: 464, Part: 98] [Host: 10.57.36.17:44501] Failed to append logs to the host (Err: -5)
E1021 11:16:40.405119 60162 Host.cpp:390] [Port: 44501, Space: 542, Part: 48] [Host: 10.57.36.17:44501] Failed to append logs to the host (Err: -5)
E1021 11:17:12.984526 60159 Host.cpp:390] [Port: 44501, Space: 542, Part: 67] [Host: 10.57.36.17:44501] Failed to append logs to the host (Err: -5)
E1021 11:38:43.447788 60082 RaftPart.cpp:773] [Port: 44501, Space: 542, Part: 68] Replicate logs failed
E1021 11:38:44.399224 60076 RaftPart.cpp:773] [Port: 44501, Space: 233, Part: 91] Replicate logs failed
E1021 11:38:44.896698 60069 RaftPart.cpp:773] [Port: 44501, Space: 542, Part: 55] Replicate logs failed
E1021 11:38:45.066574 60069 RaftPart.cpp:773] [Port: 44501, Space: 426, Part: 48] Replicate logs failed
E1021 11:38:45.808149 60071 RaftPart.cpp:773] [Port: 44501, Space: 542, Part: 1] Replicate logs failed
E1021 11:38:45.808360 60065 RaftPart.cpp:773] [Port: 44501, Space: 567, Part: 59] Replicate logs failed

这种一般是因为后台做 compaction, 导致 leader 切换

凌晨3点导入完成的,这个合并要这么久?

嗯… 虽说 HDD 慢点, 但您这个数据量也不至于… 现在有哪台机器负载很高吗?

没有

不查询的时候, storage 是不是也在打日志?

W1021 13:58:48.710702 3341 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 68] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.726805 3310 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 44] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.742282 3341 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 26] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.758975 3334 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 26] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.774626 3310 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 68] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.791939 3313 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 68] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.807279 3313 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 68] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.824036 3334 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 92] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.839331 3340 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 56] Only 0 hosts succeeded, Need to try again
I1021 13:58:48.841239 3378 RaftPart.cpp:1124] [Port: 44501, Space: 464, Part: 14] Sending out an election request (space = 464, part = 14, term = 8397, lastLogId = 18833, lastLogTerm = 1982, candidateIP = 10.57.36.17, candidatePort = 44501)
I1021 13:58:48.842393 3378 RaftPart.cpp:1206] [Port: 44501, Space: 464, Part: 14] No one is elected, continue the election
W1021 13:58:48.855551 3340 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 20] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.870972 3313 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 56] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.887307 3313 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 26] Only 0 hosts succeeded, Need to try again
I1021 13:58:48.893294 3381 RaftPart.cpp:1124] [Port: 44501, Space: 464, Part: 80] Sending out an election request (space = 464, part = 80, term = 6411, lastLogId = 19026, lastLogTerm = 10, candidateIP = 10.57.36.17, candidatePort = 44501)
I1021 13:58:48.894944 3381 RaftPart.cpp:1206] [Port: 44501, Space: 464, Part: 80] No one is elected, continue the election
W1021 13:58:48.903038 3313 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 26] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.918720 3341 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 68] Only 0 hosts succeeded, Need to try again
I1021 13:58:48.929338 3380 RaftPart.cpp:1124] [Port: 44501, Space: 464, Part: 55] Sending out an election request (space = 464, part = 55, term = 7783, lastLogId = 0, lastLogTerm = 0, candidateIP = 10.57.36.17, candidatePort = 44501)
I1021 13:58:48.931006 3380 RaftPart.cpp:1206] [Port: 44501, Space: 464, Part: 55] No one is elected, continue the election
I1021 13:58:48.933115 3380 RaftPart.cpp:1124] [Port: 44501, Space: 464, Part: 43] Sending out an election request (space = 464, part = 43, term = 7756, lastLogId = 0, lastLogTerm = 0, candidateIP = 10.57.36.17, candidatePort = 44501)
W1021 13:58:48.934497 3313 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 44] Only 0 hosts succeeded, Need to try again
I1021 13:58:48.934659 3380 RaftPart.cpp:1206] [Port: 44501, Space: 464, Part: 43] No one is elected, continue the election
W1021 13:58:48.950464 3334 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 68] Only 0 hosts succeeded, Need to try again
W1021 13:58:48.966373 3313 RaftPart.cpp:963] [Port: 44501, Space: 464, Part: 62] Only 0 hosts succeeded, Need to try again

一直在输出这个日志

我们有3台机器,其中有两台一直输出上面的日志,另外一台没有持续输出,输出的是下面的日志
I1021 11:44:20.514497 60075 RaftPart.cpp:1566] [Port: 44501, Space: 542, Part: 32] Stale log! Local lastLogId 36356682, lastLogTerm 3, lastLogIdSent 36356681, lastLogTermSent 3
I1021 11:44:22.352598 60063 RaftPart.cpp:1566] [Port: 44501, Space: 464, Part: 27] Stale log! Local lastLogId 18930, lastLogTerm 3, lastLogIdSent 18929, lastLogTermSent 3
I1021 11:44:22.352612 60079 RaftPart.cpp:1566] [Port: 44501, Space: 277, Part: 53] Stale log! Local lastLogId 25399, lastLogTerm 4, lastLogIdSent 25398, lastLogTermSent 4
I1021 11:44:22.413326 60063 RaftPart.cpp:1566] [Port: 44501, Space: 542, Part: 80] Stale log! Local lastLogId 36915302, lastLogTerm 3, lastLogIdSent 36915301, lastLogTermSent 3
I1021 11:44:22.427775 60069 RaftPart.cpp:1566] [Port: 44501, Space: 542, Part: 18] Stale log! Local lastLogId 36123369, lastLogTerm 3, lastLogIdSent 36123368, lastLogTermSent 3
I1021 11:44:22.467690 60063 RaftPart.cpp:1566] [Port: 44501, Space: 83, Part: 1] Stale log! Local lastLogId 115106, lastLogTerm 319, lastLogIdSent 115105, lastLogTermSent 319
I1021 11:44:22.472838 60063 RaftPart.cpp:1566] [Port: 44501, Space: 92, Part: 68] Stale log! Local lastLogId 425405, lastLogTerm 5, lastLogIdSent 425404, lastLogTermSent 5
I1021 11:44:22.703178 60079 RaftPart.cpp:1566] [Port: 44501, Space: 542, Part: 73] Stale log! Local lastLogId 36649014, lastLogTerm 3, lastLogIdSent 36649013, lastLogTermSent 3
I1021 11:44:22.721462 60058 RaftPart.cpp:1566] [Port: 44501, Space: 542, Part: 34] Stale log! Local lastLogId 36699901, lastLogTerm 3, lastLogIdSent 36699900, lastLogTermSent 3
I1021 11:44:29.482468 60079 SlowOpTracker.h:33] [Port: 44501, Space: 464, Part: 86] , total time:67ms, Total send logs: 2

这台没有日志输出的变成offline状态了 :innocent:

请问下此问题解决了吗?还有问题不?