- nebula 版本:2.5.1
- 部署方式:分布式 5机集群,3机上部署了metad,graphd 5个服务器都部署了storaged
- 安装方式:RPM
- 是否为线上版本:Y
- 硬件信息
- 机械磁盘
- 32核CPU、256G
- 数据量:大概四五种顶点,几千万数据;40+条边,2-3亿数据
- 运行一段时间后,系统不稳定,api和studio都无法连接到graphd服务,无法写入数据和访问数据
- 出现问题时,后台日志大量出现:
【storaged.ERROR】
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0607 09:15:52.634805 13196 MetaClient.cpp:635] Send request to "x.x.x.x5":9559, exceed retry limit
E0607 09:15:52.635146 13154 MetaClient.cpp:65] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0607 09:16:05.642508 13199 MetaClient.cpp:635] Send request to "x.x.x.x4":9559, exceed retry limit
E0607 09:16:05.642588 13154 MetaClient.cpp:65] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0607 09:16:41.334895 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 10] Receive response about askForVote from "x.x.x.x2":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.334940 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 10] Receive response about askForVote from "x.x.x.x3":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.335492 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 15] Receive response about askForVote from "x.x.x.x2":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.335503 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 15] Receive response about askForVote from "x.x.x.x3":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.336407 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 2] Receive response about askForVote from "x.x.x.x6":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.336426 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 2] Receive response about askForVote from "x.x.x.x5":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.340544 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 52] Receive response about askForVote from "x.x.x.x6":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.340564 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 52] Receive response about askForVote from "x.x.x.x5":9780, error code is E_TERM_OUT_OF_DATE
E0607 09:16:41.340998 13454 RaftPart.cpp:1118] [Port: 9780, Space: 3, Part: 40] Receive response about askForVote from "x.x.x.x2":9780, error code is E_TERM_OUT_OF_DATE
E0
【metad.ERROR】
E0607 09:16:04.458173 13091 RaftPart.cpp:1118] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from "x.x.x.x":9560, error code is E_UNKNOWN_PART
E0607 09:16:06.166172 13092 RaftPart.cpp:1118] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from "x.x.x.x":9560, error code is E_UNKNOWN_PART
E0607 09:16:15.475127 13354 ActiveHostsMan.cpp:256] Get last update time failed, error: E_LEADER_CHANGED
E0607 09:16:18.703186 13354 ActiveHostsMan.cpp:256] Get last update time failed, error: E_LEADER_CHANGED
E0607 09:16:18.703187 13353 ActiveHostsMan.cpp:256] Get last update time failed, error: E_LEADER_CHANGED
E0607 09:16:18.703197 13350 ActiveHostsMan.cpp:256] Get last update time failed, error: E_LEADER_CHANGED
【graphd.ERROR】
E0607 09:19:40.356612 13534 MetaClient.cpp:635] Send request to "53.80.6.95":9559, exceed retry limit
E0607 09:19:40.356714 13439 MetaClient.cpp:131] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0607 09:19:40.821357 13531 GraphSessionManager.cpp:108] Create session failed:LeaderChanged: Leader changed!
E0607 09:19:40.821442 13531 GraphService.cpp:89] Create session for userName: root, ip: ::ffff:x.x.x.x failed: Create session failed: LeaderChanged: Leader changed!
E0607 09:19:55.825130 13531 GraphSessionManager.cpp:108] Create session failed:LeaderChanged: Leader changed!
E0607 09:19:55.825212 13531 GraphService.cpp:89] Create session for userName: root, ip: ::ffff:x.x.x.x failed: Create session failed: LeaderChanged: Leader changed!
E0607 09:20:10.826726 13537 MetaClient.cpp:635] Send request to "53.80.6.95":9559, exceed retry limit
E0607 09:20:10.826838 13531 GraphSessionManager.cpp:108] Create session failed:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0607 09:20:10.826887 13531 GraphService.cpp:89] Create session for userName: root, ip: ::ffff:x.x.x.x failed: Create session failed: RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0607 09:20:10.827867 13542 MetaClient.cpp:635] Send request to "53.80.6.95":9559, exceed retry limit
E0607 09:20:10.827927 13531 GraphSessionManager.cpp:108] Create session failed:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0607 09:20:10.827952 13531 GraphService.cpp:89] Create session for userName: root, ip: ::ffff:x.x.x.x failed: Create session failed: RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0607 09:20:18.313988 13531 GraphSessionManager.cpp:108] Create session failed:LeaderChanged: Leader changed!
E0607 09:20:18.314088 13531 GraphService.cpp:89] Create session for userName: root, ip: ::ffff:x.x.x.x failed: Create session failed: LeaderChanged: Leader changed!
日志较多,这里无法全部贴出来,同时在使用过程中还会有很多
E0607 04:49:25.925011 2854 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type -58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:49:26.935583 2858 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type -58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:49:26.935657 2832 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type 58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:49:26.935684 2832 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type -58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:49:27.949560 2855 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type -58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:49:27.949683 2862 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type -58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:49:28.959389 2862 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type -58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:50:12.937350 2850 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type 58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:50:12.937407 2850 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type 58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:50:12.937497 2845 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type 58, rank 0, dst RY-001-YYYYYYYYYY
E0607 04:50:12.937737 2850 AddEdgesProcessor.cpp:164] edge locked : src RY-001-xxxxxxxxx, type -58, rank 0, dst RY-001-YYYYYYYYYY
这样的日志,不知道是否有关联关系