存在一个问题,有时候语句优化或者效能都不是特别好,然后在查询的过程中发现直接就卡死了,机器也会卡死,无法在前端页面停止语句执行;
我们的场景是有150WTag点和300W边,查询需要在2-10跳之间的一个场景,(这些点和边已经无法再划分下去了)初探查询后发现机器直接就宕了。
语句卡死的话,有具体的查询语句吗,可以补充下数据量和语句,加个 Profile 让我们的研发同学看看呢。
顺便补充下你的机器配置哈。
刚才去机房重启机器了,哈哈
机器配置:4core,2.40GHz,内存40G,SSD450G左右
数据量:13万点和150万边;
nebula:单点部署,docker容器,每个服务都是一个实例;
故障原因:其实是在studio中输入了一个vid后,进行6跳的图探索后页面卡死,然后机器宕机;
能捞下错误日志贴一下吗
storage
I20220419 02:09:56.685847 35 SlowOpTracker.h:31] [Port: 9780, Space: 1, Part: 21] total time:541ms, Write WAL, total 2
I20220419 02:09:58.441030 41 SlowOpTracker.h:31] [Port: 9780, Space: 1, Part: 1] total time:1186ms, Write WAL, total 2
I20220419 02:09:58.441000 39 SlowOpTracker.h:31] [Port: 9780, Space: 1, Part: 68] total time:1186ms, Write WAL, total 2
I20220419 02:09:59.229022 43 SlowOpTracker.h:31] [Port: 9780, Space: 1, Part: 65] total time:401ms, Write WAL, total 2
I20220419 02:10:04.826126 38 SlowOpTracker.h:31] [Port: 9780, Space: 1, Part: 86] total time:1362ms, Write WAL, total 2
I20220419 02:10:04.826122 35 SlowOpTracker.h:31] [Port: 9780, Space: 1, Part: 1] total time:2335ms, Total send logs: 2
graph
E20220419 02:15:48.885459 59 MetaClient.cpp:735] Send request to "metad0":9559, exceed retry limit
E20220419 02:16:17.542914 59 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:16:42.546355 59 GraphSessionManager.cpp:222] Update sessions failed: RPC failure in MetaClient: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:17:00.522236 52 GraphSessionManager.cpp:246] Update sessions failed: Update sessions failed: RPC failure in MetaClient: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:21:05.131572 42 MetaClient.cpp:735] Send request to "metad0":9559, exceed retry limit
E20220419 02:21:14.495889 42 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:21:28.106245 51 MetaClient.cpp:171] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:24:26.130075 43 MetaClient.cpp:735] Send request to "metad0":9559, exceed retry limit
E20220419 02:24:46.109652 43 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:25:10.510530 43 GraphSessionManager.cpp:222] Update sessions failed: RPC failure in MetaClient: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:25:35.496979 52 GraphSessionManager.cpp:246] Update sessions failed: Update sessions failed: RPC failure in MetaClient: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:27:53.872589 44 MetaClient.cpp:735] Send request to "metad0":9559, exceed retry limit
E20220419 02:28:12.221868 44 MetaClient.cpp:736] RpcResponse exception: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:28:30.082262 51 MetaClient.cpp:171] Heartbeat failed, status:RPC failure in MetaClient: apache::thrift::transport::TTransportException: TTransportException: Timed out
E20220419 02:32:54.326568 45 MetaClient.cpp:735] Send request to "metad0":9559, exceed retry limit
meta
I20220419 02:09:38.831596 123 HBProcessor.cpp:33] Receive heartbeat from "storaged0":9779, role = STORAGE
I20220419 02:09:45.977432 123 HBProcessor.cpp:33] Receive heartbeat from "graphd":9669, role = GRAPH
I20220419 02:09:47.476292 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:101ms, Total commit: 1
I20220419 02:09:48.866307 123 HBProcessor.cpp:33] Receive heartbeat from "storaged0":9779, role = STORAGE
I20220419 02:09:57.907231 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:729ms, Write WAL, total 2
I20220419 02:10:13.864542 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:3851ms, Total send logs: 2
I20220419 02:10:17.368753 123 HBProcessor.cpp:33] Receive heartbeat from "graphd":9669, role = GRAPH
I20220419 02:10:27.829978 122 HBProcessor.cpp:33] Receive heartbeat from "storaged0":9779, role = STORAGE
I20220419 02:10:29.152884 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:10731ms, Total commit: 1
I20220419 02:10:30.693095 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:319ms, Write WAL, total 2
I20220419 02:10:34.235693 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:661ms, Total send logs: 2
I20220419 02:10:40.078974 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:4431ms, Total commit: 1
I20220419 02:10:46.363931 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:139ms, Write WAL, total 2
I20220419 02:10:48.154943 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:714ms, Total send logs: 2
I20220419 02:10:51.888608 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:2111ms, Total commit: 1
I20220419 02:10:56.706192 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:806ms, Write WAL, total 2
I20220419 02:10:58.987819 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:442ms, Total send logs: 2
I20220419 02:11:03.965934 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:2467ms, Total commit: 1
I20220419 02:11:12.359587 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:299ms, Write WAL, total 2
I20220419 02:11:13.151219 123 HBProcessor.cpp:33] Receive heartbeat from "graphd":9669, role = GRAPH
I20220419 02:11:17.564379 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:1304ms, Total send logs: 2
I20220419 02:11:24.064764 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:5186ms, Total commit: 1
I20220419 02:11:25.734196 43 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:271ms, Write WAL, total 2
I20220419 02:11:29.816751 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:1474ms, Total send logs: 2
I20220419 02:11:38.079864 42 SlowOpTracker.h:31] [Port: 9560, Space: 0, Part: 0] total time:5505ms, Total commit: 1
I20220419 02:11:42.891696 121 HBProcessor.cpp:33] Receive heartbeat from "storaged0":9779, role = STORAGE
graph还是storage OOM了
什么版本
3.0.2
来一个实例吧,我刚刚查询的,看图:
这个查询用时无限制,现在都没有返回,我之前加的是count()聚合,但是直接报了系统内容不足,因为我确实做了一些限制在里面,
我想知道这种是语句问题吗?需要优化?以下是执行计划
我limit 100,返回了,用时11s,然后我在docker中看到,这个查询起初,storaged内存占用就上了50%,CPU100%+,然后就是graphd cpu100%+ 内存80%;
explain看一下
上面有一张图吧,那个dot格式
不要dot格式
计划上能优化的不太多,看上去可以去掉一些不必要的边属性以及Filter算子,不过也得等下个版本
此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。