nebula storaged 启动失败

你把graph和storage的meta_client_timeout_ms这个都调大点 默认是60000(60s) 看起来meta能正常收到心跳,但是你们的session可能太多了,导致拿session都超时了

好的,我先试试。我现在没有定位到哪里还有session 在使用。业务应该都停用了
另外还有一个想请教下,客户端的connection pool 里面的connection 和 connection.get_session 之后的session 有啥区别吗? 如果我们后续init connection pool 之后100connection,直接所有的connection.get_session 得到 session 发到list 里面自己维护复用 会不会有什么弊端?比如最大最小的connection数目 不起作用?

三台meta 节点三个服务起来了。但是另外三个的storaged 还是起不来。
另外节点1 meta 一直有log, 这是为啥? 有请求在尝试创建session 吗?

[root@nebula-server-prod-01 logs]# tail nebula-metad.nebula-server-prod-01.root.log.INFO.20220221-114052.18987
I0221 11:42:30.394129 19197 SessionManagerProcessor.cpp:131] resp list session: 1645090444108735
I0221 11:42:30.394148 19197 SessionManagerProcessor.cpp:131] resp list session: 1645099067728831
I0221 11:42:30.394141 19191 SessionManagerProcessor.cpp:131] resp list session: 1645097382055360
I0221 11:42:30.394160 19191 SessionManagerProcessor.cpp:131] resp list session: 1641976551574976
I0221 11:42:30.394153 19197 SessionManagerProcessor.cpp:131] resp list session: 1645121197335487
I0221 11:42:30.394172 19197 SessionManagerProcessor.cpp:131] resp list session: 1645103717376959
I0221 11:42:30.394165 19191 SessionManagerProcessor.cpp:131] resp list session: 1645151740366272
I0221 11:42:30.394184 19191 SessionManagerProcessor.cpp:131] resp list session: 1645166907100608
I0221 11:42:30.394177 19197 SessionManagerProcessor.cpp:131] resp list session: 1645082813031359
I0221 11:42:30.394196 19197 SessionManagerProcessor.cpp:131] resp list session: 1645128850629567

因为不确定哪里有请求在使用session,我想把账号的密码改掉,但是发现启动后,console 连接一直卡住,可能session 太多的缘故,这个情况我该怎么改下账号密码呢?

现在发现好像和graph 没有关系,我不启动graph 服务。
先依次启动三台meta ,查看meta日志。
然后继续启动一台storage 就会持续刷新上面的resp 信息。

如果不启动 graph,可以等一段时间所有 session 超时自动过期?超时参数在 nebula-graphd.conf 中,默认大概是 8h 左右。

只是列出哪些在用的session,你现在不管启动graph还是storage,就会刷类似日志。你参数改过试了没有?

还没来得及改, 是–storage_client_timeout_ms=60000 吗?

# Seconds before the idle connections are closed, 0 for never closed
--client_idle_timeout_secs=0
# Seconds before the idle sessions are expired, 0 for no expiration
--session_idle_timeout_secs=0
# The number of threads to accept incoming connections
--num_accept_threads=1
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=0
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=0
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19669
# HTTP2 service port
--ws_h2_port=19670
# storage client timeout
--storage_client_timeout_ms=60000
# Port to listen on Meta with HTTP protocol, it corresponds to ws_http_port in metad's configuration file

不是,是一个叫

meta_client_timeout_ms

的参数,如果没有的话,你可以在 storaged 和 graphd 的 conf 文件中额外加一行,比如

–meta_client_timeout_ms=600000

然后重启 storaged 和 graphd。

这个之前加过重试了

重新更新下现在的状态:

  1. 服务已经启动成功了
  2. 有一个meta 一直再刷下面的信息,想确认下,这个graph 服务不启动的话也会一直有,
    这个跟什么有关?这样是不是可以判断和业务端的请求流量无关
:42:30.394148 19197 SessionManagerProcessor.cpp:131] resp list session: 1645099067728831
I0221 11:42:30.394141 19191 SessionManagerProcessor.cpp:131] resp list session: 1645097382055360
I0221 11:42:30.394160 19191 SessionManagerProcessor.cpp:131] resp list session: 1641976551574976
I0221 11:42:30.394153 19197 SessionManagerProcessor.cpp:131] resp list session: 1645121197335487
I0221 11:42:30.394172 19197 SessionManagerProcessor.cpp:131] resp list session: 1645103717376959
I0221 11:42:30.394165 19191 SessionManagerProcessor.cpp:131] resp list session: 1645151740366272
I0221 11:42:30.394184 19191 SessionManagerProcessor.cpp:131] resp list session: 1645166907100608
  1. show hosts 之后有时候显示部分OFFLINE 有时候部分显示ONLINE 而且很慢
(root@nebula) [(none)]> show hosts
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| Host         | Port | Status    | Leader count | Leader distribution                                   | Partition distribution                                                |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| "10.0.7.204" | 9779 | "OFFLINE" | 0            | "No valid partition"                                  | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| "10.0.7.205" | 9779 | "ONLINE"  | 10           | "ContactsRadarSplit:4, Relation:3, SupplyChainProd:3" | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| "10.0.7.206" | 9779 | "OFFLINE" | 0            | "No valid partition"                                  | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| "10.0.7.207" | 9779 | "ONLINE"  | 0            | "No valid partition"                                  | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| "10.0.7.208" | 9779 | "OFFLINE" | 0            | "No valid partition"                                  | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| "10.0.7.209" | 9779 | "ONLINE"  | 0            | "No valid partition"                                  | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
| "Total"      |      |           | 10           | "ContactsRadarSplit:4, Relation:3, SupplyChainProd:3" | "ContactsRadarSplit:90, Relation:90, RiskInfo:90, SupplyChainProd:90" |
+--------------+------+-----------+--------------+-------------------------------------------------------+-----------------------------------------------------------------------+
Got 7 rows (time spent 2124014/2130563 us)

Mon, 21 Feb 2022 18:56:36 CST

(root@nebula) [(none)]> show hosts
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| Host         | Port | Status    | Leader count | Leader distribution  | Partition distribution                                                |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| "10.0.7.204" | 9779 | "OFFLINE" | 0            | "No valid partition" | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| "10.0.7.205" | 9779 | "OFFLINE" | 0            | "No valid partition" | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| "10.0.7.206" | 9779 | "OFFLINE" | 0            | "No valid partition" | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| "10.0.7.207" | 9779 | "OFFLINE" | 0            | "No valid partition" | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| "10.0.7.208" | 9779 | "OFFLINE" | 0            | "No valid partition" | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| "10.0.7.209" | 9779 | "ONLINE"  | 0            | "No valid partition" | "ContactsRadarSplit:15, Relation:15, RiskInfo:15, SupplyChainProd:15" |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
| "Total"      |      |           | 0            |                      | "ContactsRadarSplit:90, Relation:90, RiskInfo:90, SupplyChainProd:90" |
+--------------+------+-----------+--------------+----------------------+-----------------------------------------------------------------------+
Got 7 rows (time spent 48795258/48781185 us)

Mon, 21 Feb 2022 18:58:00 CST

grapherr 出错log如下

E0220 09:44:28.155577  2916 MetaClient.cpp:203] Load sessions Failed
*** Aborted at 1645321806 (Unix time, try 'date -d @1645321806') ***
*** Signal 15 (SIGTERM) (0xec2) received by PID 2916 (pthread TID 0x7f1c912f4900) (linux TID 2916) (maybe from PID 3778, UID 0) (code: 0), stack trace: ***
/mnt/nebula/bin/nebula-graphd(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x31)[0x1dcbb81]
/mnt/nebula/bin/nebula-graphd(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x1b)[0x1dc39eb]
/mnt/nebula/bin/nebula-graphd[0x1dc1d37]
/lib64/libpthread.so.0(+0xf5ef)[0x7f1c907b25ef]
/lib64/libc.so.6(syscall+0x19)[0x7f1c904cdba9]
/mnt/nebula/bin/nebula-graphd(_ZN5folly6detail13futexWaitImplEPKSt6atomicIjEjPKNSt6chrono10time_pointINS5_3_V212system_clockENS5_8durationIlSt5ratioILl1ELl1000000000EEEEEEPKNS6_INS7_12steady_clockESC_EEj+0x79)[0x1cf19a9]
/mnt/nebula/bin/nebula-graphd(_ZN5folly6fibers5Baton10waitThreadEv+0x95)[0x1d246b5]
/mnt/nebula/bin/nebula-graphd(_ZN5folly6fibers5Baton4waitEv+0xc7)[0x1d24927]
/mnt/nebula/bin/nebula-graphd(_ZN5folly7futures6detail8waitImplINS_10SemiFutureIN6nebula8StatusOrINS4_4meta4cpp216ListSessionsRespEEEEES9_EEvRT_+0x108)[0xefc6e8]
/mnt/nebula/bin/nebula-graphd(_ZNR5folly10SemiFutureIN6nebula8StatusOrINS1_4meta4cpp216ListSessionsRespEEEE4waitEv+0x307)[0xefca77]
/mnt/nebula/bin/nebula-graphd(_ZNO5folly6FutureIN6nebula8StatusOrINS1_4meta4cpp216ListSessionsRespEEEE3getEv+0x1f)[0xefd11f]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient12loadSessionsEv+0x34)[0x12c0db4]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient8loadDataEv+0x1c7)[0x12c1767]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient12isMetadReadyEv+0x227)[0x12c3747]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient17waitForMetadReadyEii+0x102)[0x12c3912]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula5graph12GraphService4initESt10shared_ptrIN5folly20IOThreadPoolExecutorEERKNS_8HostAddrE+0x3a6)[0xee05c6]
/mnt/nebula/bin/nebula-graphd(main+0x950)[0xe8d150]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x7f1c903f7504]
/mnt/nebula/bin/nebula-graphd[0xeae86d]
(safe mode, symbolizer not available)
E0221 11:08:00.808331 23529 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 11:08:00.808565 23529 MetaClient.cpp:203] Load sessions Failed
E0221 11:09:20.454676 23325 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 11:09:20.454766 23183 GraphService.cpp:50] Failed to initialize session manager: Load sessions from meta failed.
E0221 11:09:20.454803 23183 GraphDaemon.cpp:158] Failed to initialize session manager: Load sessions from meta failed.
E0221 11:12:37.361236 23529 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 11:12:37.361280 23529 MetaClient.cpp:203] Load sessions Failed
E0221 11:24:35.743541 24812 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 11:24:35.743780 24589 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:24:35.743818 24589 MetaClient.cpp:203] Load sessions Failed
E0221 11:37:47.618883 25231 MetaClient.cpp:636] Send request to "10.0.7.204":9559, exceed retry limit
E0221 11:37:47.618964 25585 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:37:47.619001 25585 MetaClient.cpp:203] Load sessions Failed
E0221 11:37:52.643747 25340 MetaClient.cpp:636] Send request to "10.0.7.204":9559, exceed retry limit
E0221 11:37:52.643801 25340 GraphSessionManager.cpp:205] Update sessions failed: RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:37:52.644075 25586 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:44:47.422334 25343 MetaClient.cpp:636] Send request to "10.0.7.204":9559, exceed retry limit
E0221 11:44:47.422387 25343 GraphSessionManager.cpp:205] Update sessions failed: RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:44:47.422432 25586 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:47:50.203390 24795 MetaClient.cpp:636] Send request to "10.0.7.204":9559, exceed retry limit
E0221 11:47:50.203470 25585 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:47:50.203493 25585 MetaClient.cpp:203] Load sessions Failed
E0221 11:50:02.486596 24810 GraphSessionManager.cpp:205] Update sessions failed: LeaderChanged: Leader changed!
E0221 11:50:02.486680 25586 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: LeaderChanged: Leader changed!
E0221 11:53:28.418248 25585 MetaClient.cpp:515] List fulltext indexes failed, status:LeaderChanged: Leader changed!
E0221 11:53:28.418294 25585 MetaClient.cpp:198] Load fulltext indexes Failed
E0221 11:53:37.993968 25343 GraphSessionManager.cpp:205] Update sessions failed: LeaderChanged: Leader changed!
E0221 11:53:37.994057 25586 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: LeaderChanged: Leader changed!
E0221 11:55:58.859166 25344 MetaClient.cpp:636] Send request to "10.0.7.204":9559, exceed retry limit
E0221 11:55:58.859262 25647 GraphSessionManager.cpp:102] Create session failed:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:55:58.859316 25647 GraphService.cpp:86] Create session for userName: root, ip: 172.16.188.142 failed: Create session failed: RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 11:56:36.810287 24812 MetaClient.cpp:636] Send request to "10.0.7.204":9559, exceed retry limit
E0221 11:56:36.810374 25585 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift21TApplicationExceptionE: server shutting down
E0221 11:56:36.810415 25585 MetaClient.cpp:203] Load sessions Failed
E0221 13:12:12.334329 31844 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 13:12:12.334525 31844 MetaClient.cpp:203] Load sessions Failed
E0221 13:17:25.371588 31856 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 13:17:25.371677 31844 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 13:17:25.371708 31844 MetaClient.cpp:203] Load sessions Failed
E0221 13:20:59.390576 31844 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 13:20:59.390619 31844 MetaClient.cpp:203] Load sessions Failed
E0221 13:27:01.770040  1104 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 13:27:01.770092  1104 MetaClient.cpp:203] Load sessions Failed
E0221 13:30:34.809782  1104 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 13:30:34.809825  1104 MetaClient.cpp:203] Load sessions Failed
E0221 13:34:07.840799  1104 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 13:34:07.840842  1104 MetaClient.cpp:203] Load sessions Failed
E0221 13:35:22.413681 31967 GraphSessionManager.cpp:205] Update sessions failed: LeaderChanged: Leader changed!
E0221 13:35:22.413771  1105 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: LeaderChanged: Leader changed!
E0221 13:49:51.293385  3896 MetaClient.cpp:636] Send request to "10.0.7.204":9559, exceed retry limit
E0221 13:49:51.293632  3815 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 13:49:51.293670  3815 MetaClient.cpp:203] Load sessions Failed
*** Aborted at 1645422646 (Unix time, try 'date -d @1645422646') ***
*** Signal 15 (SIGTERM) (0x1258) received by PID 3815 (pthread TID 0x7f5dba942900) (linux TID 3815) (maybe from PID 4696, UID 0) (code: 0), stack trace: ***
/mnt/nebula/bin/nebula-graphd(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x31)[0x1dcbb81]
/mnt/nebula/bin/nebula-graphd(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x1b)[0x1dc39eb]
/mnt/nebula/bin/nebula-graphd[0x1dc1d37]
/lib64/libpthread.so.0(+0xf5ef)[0x7f5db9e005ef]
/lib64/libc.so.6(syscall+0x19)[0x7f5db9b1bba9]
/mnt/nebula/bin/nebula-graphd(_ZN5folly6detail13futexWaitImplEPKSt6atomicIjEjPKNSt6chrono10time_pointINS5_3_V212system_clockENS5_8durationIlSt5ratioILl1ELl1000000000EEEEEEPKNS6_INS7_12steady_clockESC_EEj+0x79)[0x1cf19a9]
/mnt/nebula/bin/nebula-graphd(_ZN5folly6fibers5Baton10waitThreadEv+0x95)[0x1d246b5]
/mnt/nebula/bin/nebula-graphd(_ZN5folly6fibers5Baton4waitEv+0xc7)[0x1d24927]
/mnt/nebula/bin/nebula-graphd(_ZN5folly7futures6detail8waitImplINS_10SemiFutureIN6nebula8StatusOrINS4_4meta4cpp216ListSessionsRespEEEEES9_EEvRT_+0x108)[0xefc6e8]
/mnt/nebula/bin/nebula-graphd(_ZNR5folly10SemiFutureIN6nebula8StatusOrINS1_4meta4cpp216ListSessionsRespEEEE4waitEv+0x307)[0xefca77]
/mnt/nebula/bin/nebula-graphd(_ZNO5folly6FutureIN6nebula8StatusOrINS1_4meta4cpp216ListSessionsRespEEEE3getEv+0x1f)[0xefd11f]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient12loadSessionsEv+0x34)[0x12c0db4]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient8loadDataEv+0x1c7)[0x12c1767]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient12isMetadReadyEv+0x227)[0x12c3747]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula4meta10MetaClient17waitForMetadReadyEii+0x102)[0x12c3912]
/mnt/nebula/bin/nebula-graphd(_ZN6nebula5graph12GraphService4initESt10shared_ptrIN5folly20IOThreadPoolExecutorEERKNS_8HostAddrE+0x3a6)[0xee05c6]
/mnt/nebula/bin/nebula-graphd(main+0x950)[0xe8d150]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x7f5db9a45504]
/mnt/nebula/bin/nebula-graphd[0xeae86d]
(safe mode, symbolizer not available)
E0221 15:04:29.911459 11196 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 15:04:29.911741 11174 MetaClient.cpp:94] RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0221 15:04:29.911782 11174 GraphService.cpp:43] Failed to wait for meta service ready synchronously.
E0221 15:04:29.911798 11174 GraphDaemon.cpp:158] Failed to wait for meta service ready synchronously.
E0221 15:18:11.964200 13645 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 15:18:11.964403 13645 MetaClient.cpp:203] Load sessions Failed
E0221 15:20:24.997524 13645 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 15:20:24.997575 13645 MetaClient.cpp:203] Load sessions Failed
E0221 15:26:12.384086 13663 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 15:26:12.384172 13645 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 15:26:12.384205 13645 MetaClient.cpp:203] Load sessions Failed
E0221 15:29:06.667616 13645 MetaClient.cpp:3534] List sessions failed, status:LeaderChanged: Leader changed!
E0221 15:29:06.667663 13645 MetaClient.cpp:203] Load sessions Failed
E0221 15:29:06.667788 13645 GraphService.cpp:43] Failed to wait for meta service ready synchronously.
E0221 15:29:06.667804 13645 GraphDaemon.cpp:158] Failed to wait for meta service ready synchronously.
E0221 18:54:47.277004 21793 GraphSessionManager.cpp:205] Update sessions failed: LeaderChanged: Leader changed!
E0221 18:54:47.277238 21857 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: LeaderChanged: Leader changed!
E0221 18:57:01.916738 21792 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 18:57:01.916827 21856 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 18:57:01.916868 21856 MetaClient.cpp:203] Load sessions Failed
E0221 19:01:42.662103 21794 GraphSessionManager.cpp:205] Update sessions failed: LeaderChanged: Leader changed!
E0221 19:01:42.662197 21857 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: LeaderChanged: Leader changed!
E0221 19:08:38.116742 21786 GraphSessionManager.cpp:205] Update sessions failed: LeaderChanged: Leader changed!
E0221 19:08:38.116813 21857 GraphSessionManager.cpp:229] Update sessions failed: Update sessions failed: LeaderChanged: Leader changed!
E0221 19:09:40.969022 21855 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 19:09:40.969115 21856 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 19:09:40.969161 21856 MetaClient.cpp:203] Load sessions Failed
E0221 19:22:02.125290 21849 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0221 19:22:02.125368 21856 MetaClient.cpp:3534] List sessions failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: Timed Out
E0221 19:22:02.125406 21856 MetaClient.cpp:203] Load sessions Failed

麻烦帮忙看一下

Graph能起来对吧?可以show sessions看一下,应该是有很多session。
下面这个超时先改小一点比如3600,让系统自动删除超时的session。

–session_idle_timeout_secs=0

我现在再试,meta log 还是会一直刷这个resp list session
另外,这个session 不需要graph 服务起来吗?我之前测试的时候,先启动meta 再启动storage 就一直再刷这个了。

会的,storage也要调用meta client,这个load sessions是meta client在loadData时干的事儿。

现在改了参数依旧在刷resp list session 的问题,这个要怎么弄?等一等看看嘛?

改完参数重启了么?对,这个要等之前的 session 自动超时。

这都一个多小时了,还在刷,而且graphd 容易退出
graphderr.log

E0222 11:21:19.211961 16580 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212050 16579 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212172 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212225 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212244 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212260 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212276 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212292 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212314 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212330 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.212347 16605 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.213248 16580 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.213284 16580 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.213353 16579 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.213409 16579 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.213410 16581 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214097 16576 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214125 16576 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214495 16607 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214546 16580 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214911 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214933 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214938 16581 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214949 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214972 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214983 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.214994 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215005 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215016 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215027 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215039 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215049 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215060 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215070 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215081 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215093 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215104 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215116 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215126 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215142 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215157 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215168 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215178 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215198 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215209 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215214 16582 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215219 16578 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215247 16582 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215283 16582 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215301 16582 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215317 16582 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limit
E0222 11:21:19.215332 16582 MetaClient.cpp:636] Send request to "10.0.7.205":9559, exceed retry limi

而205 这个机器metad 出现resp list session 这个问题.

能不能先排除这个是storaged 调用meta client 还是graph 调用meta client 导致的?
上面的意思是storaged 导致的吗

更新下:把–session_idle_timeout_secs 改为600s 现在可以连上了。服务貌似正常了。
但是show sessions 会崩溃

(root@nebula) [(none)]> show sessions
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x65f815, 0x16)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.sysMap(0xc258000000, 0x3c000000, 0x811858)
        /usr/local/go/src/runtime/mem_linux.go:169 +0xc6
runtime.(*mheap).sysAlloc(0x7f71c0, 0x3b000000, 0x42b457, 0x7f71c8)
        /usr/local/go/src/runtime/malloc.go:727 +0x1e5
runtime.(*mheap).grow(0x7f71c0, 0x1d79d, 0x0)
        /usr/local/go/src/runtime/mheap.go:1344 +0x85
runtime.(*mheap).allocSpan(0x7f71c0, 0x1d79d, 0xc000090100, 0x811868, 0x40)
        /usr/local/go/src/runtime/mheap.go:1160 +0x6b6
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:907 +0x65
runtime.(*mheap).alloc(0x7f71c0, 0x1d79d, 0x460001, 0x7f71c0)
        /usr/local/go/src/runtime/mheap.go:901 +0x85
runtime.largeAlloc(0x3af3a000, 0x460100, 0xc1dd66c680)
        /usr/local/go/src/runtime/malloc.go:1177 +0x92
runtime.mallocgc.func1()
        /usr/local/go/src/runtime/malloc.go:1071 +0x46
runtime.systemstack(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1116

goroutine 1 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00005d340 sp=0xc00005d338 pc=0x467840
runtime.mallocgc(0x3af3a000, 0x0, 0x0, 0x0)
        /usr/local/go/src/runtime/malloc.go:1070 +0x938 fp=0xc00005d3e0 sp=0xc00005d340 pc=0x40d938
runtime.growslice(0x613a60, 0xc0000152f0, 0x0, 0x25, 0x3af38d0b, 0x0, 0x659937, 0x0)
        /usr/local/go/src/runtime/slice.go:224 +0x152 fp=0xc00005d448 sp=0xc00005d3e0 pc=0x44bf52
fmt.(*buffer).writeString(...)
        /usr/local/go/src/fmt/print.go:82
fmt.(*fmt).padString(0xc13125a1e0, 0xc212880000, 0x3af38d0b)
        /usr/local/go/src/fmt/format.go:110 +0xf6 fp=0xc00005d4d0 sp=0xc00005d448 pc=0x4bbeb6
fmt.(*fmt).fmtS(0xc13125a1e0, 0xc212880000, 0x3af38d0b)
        /usr/local/go/src/fmt/format.go:359 +0x65 fp=0xc00005d508 sp=0xc00005d4d0 pc=0x4bcd85
fmt.(*pp).fmtString(0xc13125a1a0, 0xc212880000, 0x3af38d0b, 0x76)
        /usr/local/go/src/fmt/print.go:447 +0x131 fp=0xc00005d558 sp=0xc00005d508 pc=0x4c0271
fmt.(*pp).printArg(0xc13125a1a0, 0x6138e0, 0xc1dd6c7f90, 0x76)
        /usr/local/go/src/fmt/print.go:698 +0x7e7 fp=0xc00005d5f0 sp=0xc00005d558 pc=0x4c2647
fmt.(*pp).doPrintln(0xc13125a1a0, 0xc00005d788, 0x1, 0x1)
        /usr/local/go/src/fmt/print.go:1173 +0xb1 fp=0xc00005d660 sp=0xc00005d5f0 pc=0x4c69f1
fmt.Fprintln(0x6a0fa0, 0xc00000e018, 0xc00005d788, 0x1, 0x1, 0x0, 0x0, 0x8)
        /usr/local/go/src/fmt/print.go:264 +0x58 fp=0xc00005d6c8 sp=0xc00005d660 pc=0x4becf8
fmt.Println(...)
        /usr/local/go/src/fmt/print.go:274

现在呢?

不好意思,一直在忙。现在改了session_idle_timeout 之后可以启动 恢复正常了。
后续会仔细测试了下,感觉如果session_idle_timeout=0 我session 过多会导致meta leader change 致使服务不可用。

2 个赞