关于GO语句中limit无法和where condition组合使用的问题

may11544 · 2023 年3 月 28 日 07:30

1.全表扫出现的情况是说有选不到合适索引的查询吗？

应该是的，没有创建和用到索引。按理说指定vid的情况下，应该是不需要用索引的吧

就比如这个？

match (v)-[e:edge_attack*1..6]->(v2) where id(v)== "172.16.33.1" return e limit 100

2.内存的问题

目前看应该是全表扫描导致的内存飙升然后graphd的docker重启

may11544 · 2023 年3 月 28 日 07:51

再举个例子

match (v)-[e:edge_attack]->(v2) return e limit 10

再nebula上响应需要10s

再neo4j上只需要2ms

nebula的profile，可以看出来确实发生了全表扫描。初此之外，还有很多match语句的limit没有办法做法early stop的例子。

上面这个应该是最简单的例子了，所以导致match语句基本处于不可用的状态。如果用的话可能也只是离线数据分析可以用，这种语句对在线场景来说是完全不能用的。

Got 10 rows (time spent 14.305845s/14.320751453s)

Execution Plan (optimize time 242 us)

-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
| id | name           | dependencies | profiling data                                                    | operator info                  |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
| 11 | Project        | 10           | {                                                                 | outputVar: {                   |
|    |                |              |   "execTime": "2169(us)",                                         |   "colNames": [                |
|    |                |              |   "rows": 10,                                                     |     "e"                        |
|    |                |              |   "totalTime": "2178(us)",                                        |   ],                           |
|    |                |              |   "version": 0                                                    |   "type": "DATASET",           |
|    |                |              | }                                                                 |   "name": "__Limit_8"          |
|    |                |              |                                                                   | }                              |
|    |                |              |                                                                   | inputVar: __Limit_10           |
|    |                |              |                                                                   | columns: [                     |
|    |                |              |                                                                   |   "$-.e[0]"                    |
|    |                |              |                                                                   | ]                              |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
| 10 | Limit          | 5            | {                                                                 | outputVar: {                   |
|    |                |              |   "execTime": "170941(us)",                                       |   "colNames": [                |
|    |                |              |   "rows": 10,                                                     |     "v",                       |
|    |                |              |   "totalTime": "170954(us)",                                      |     "e",                       |
|    |                |              |   "version": 0                                                    |     "v2"                       |
|    |                |              | }                                                                 |   ],                           |
|    |                |              |                                                                   |   "type": "DATASET",           |
|    |                |              |                                                                   |   "name": "__Limit_10"         |
|    |                |              |                                                                   | }                              |
|    |                |              |                                                                   | inputVar: __AppendVertices_5   |
|    |                |              |                                                                   | offset: 0                      |
|    |                |              |                                                                   | count: 10                      |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
|  5 | AppendVertices | 4            | {                                                                 | outputVar: {                   |
|    |                |              |   "execTime": "810858(us)",                                       |   "colNames": [                |
|    |                |              |   "resp[0]": {                                                    |     "v",                       |
|    |                |              |     "exec": "14550(us)",                                          |     "e",                       |
|    |                |              |     "host": "storaged:9779",                                      |     "v2"                       |
|    |                |              |     "total": "15381(us)"                                          |   ],                           |
|    |                |              |   },                                                              |   "type": "DATASET",           |
|    |                |              |   "rows": 736460,                                                 |   "name": "__AppendVertices_5" |
|    |                |              |   "totalTime": "826505(us)",                                      | }                              |
|    |                |              |   "total_rpc": "15933(us)",                                       | inputVar: __Traverse_4         |
|    |                |              |   "version": 0                                                    | space: 330                     |
|    |                |              | }                                                                 | dedup: true                    |
|    |                |              |                                                                   | limit: -1                      |
|    |                |              |                                                                   | filter:                        |
|    |                |              |                                                                   | orderBy: []                    |
|    |                |              |                                                                   | src: none_direct_dst($-.e)     |
|    |                |              |                                                                   | props: [                       |
|    |                |              |                                                                   |   {                            |
|    |                |              |                                                                   |     "props": [                 |
|    |                |              |                                                                   |       "_tag"                   |
|    |                |              |                                                                   |     ],                         |
|    |                |              |                                                                   |     "tagId": 332               |
|    |                |              |                                                                   |   },                           |
|    |                |              |                                                                   |   {                            |
|    |                |              |                                                                   |     "props": [                 |
|    |                |              |                                                                   |       "_tag"                   |
|    |                |              |                                                                   |     ],                         |
|    |                |              |                                                                   |     "tagId": 331               |
|    |                |              |                                                                   |   }                            |
|    |                |              |                                                                   | ]                              |
|    |                |              |                                                                   | exprs:                         |
|    |                |              |                                                                   | vertex_filter:                 |
|    |                |              |                                                                   | if_track_previous_path: true   |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
|  4 | Traverse       | 2            | {                                                                 | outputVar: {                   |
|    |                |              |   "execTime": "4746657(us)",                                      |   "colNames": [                |
|    |                |              |   "rows": 736460,                                                 |     "v",                       |
|    |                |              |   "step[1]": [                                                    |     "e"                        |
|    |                |              |     {                                                             |   ],                           |
|    |                |              |       "exec": "4015017(us)",                                      |   "type": "DATASET",           |
|    |                |              |       "host": "storaged:9779",                                    |   "name": "__Traverse_4"       |
|    |                |              |       "storage_detail": {                                         | }                              |
|    |                |              |         "GetNeighborsNode": "4013851(us)",                        | inputVar: __Dedup_2            |
|    |                |              |         "HashJoinNode": "4648(us)",                               | space: 330                     |
|    |                |              |         "RelNode": "4013879(us)",                                 | dedup: true                    |
|    |                |              |         "SingleEdgeNode": "4551(us)"                              | limit: -1                      |
|    |                |              |       },                                                          | filter:                        |
|    |                |              |       "total": "6667161(us)",                                     | orderBy: []                    |
|    |                |              |       "total_rpc_time": "6667544(us)",                            | src: $_vid                     |
|    |                |              |       "vertices": 86                                              | edgeTypes: []                  |
|    |                |              |     }                                                             | edgeDirection: OUT_EDGE        |
|    |                |              |   ],                                                              | vertexProps:                   |
|    |                |              |   "totalTime": "11414242(us)",                                    | edgeProps: [                   |
|    |                |              |   "version": 0                                                    |   {                            |
|    |                |              | }                                                                 |     "props": [                 |
|    |                |              |                                                                   |       "_src",                  |
|    |                |              |                                                                   |       "_type",                 |
|    |                |              |                                                                   |       "_rank",                 |
|    |                |              |                                                                   |       "_dst",                  |
|    |                |              |                                                                   |       "name",                  |
|    |                |              |                                                                   |       "proto",                 |
|    |                |              |                                                                   |       "app_proto",             |
|    |                |              |                                                                   |       "tool",                  |
|    |                |              |                                                                   |       "attacker_port",         |
|    |                |              |                                                                   |       "victim_port",           |
|    |                |              |                                                                   |       "phase",                 |
|    |                |              |                                                                   |       "result",                |
|    |                |              |                                                                   |       "tag",                   |
|    |                |              |                                                                   |       "severity",              |
|    |                |              |                                                                   |       "event_time",            |
|    |                |              |                                                                   |       "flow_id"                |
|    |                |              |                                                                   |     ],                         |
|    |                |              |                                                                   |     "type": 334                |
|    |                |              |                                                                   |   }                            |
|    |                |              |                                                                   | ]                              |
|    |                |              |                                                                   | statProps:                     |
|    |                |              |                                                                   | exprs:                         |
|    |                |              |                                                                   | random: false                  |
|    |                |              |                                                                   | steps: 1..1                    |
|    |                |              |                                                                   | vertex filter:                 |
|    |                |              |                                                                   | edge filter:                   |
|    |                |              |                                                                   | if_track_previous_path: false  |
|    |                |              |                                                                   | first step filter:             |
|    |                |              |                                                                   | tag filter:                    |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
|  2 | Dedup          | 1            | {                                                                 | outputVar: {                   |
|    |                |              |   "execTime": "121272(us)",                                       |   "colNames": [                |
|    |                |              |   "rows": 86,                                                     |     "_vid"                     |
|    |                |              |   "totalTime": "121284(us)",                                      |   ],                           |
|    |                |              |   "version": 0                                                    |   "type": "DATASET",           |
|    |                |              | }                                                                 |   "name": "__Dedup_2"          |
|    |                |              |                                                                   | }                              |
|    |                |              |                                                                   | inputVar: __IndexScan_1        |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
|  1 | IndexScan      | 3            | {                                                                 | outputVar: {                   |
|    |                |              |   "execTime": "0(us)",                                            |   "colNames": [                |
|    |                |              |   "resp[0]": {                                                    |     "_vid"                     |
|    |                |              |     "exec": "1505342(us)",                                        |   ],                           |
|    |                |              |     "host": "storaged:9779",                                      |   "type": "DATASET",           |
|    |                |              |     "storage_detail": {                                           |   "name": "__IndexScan_1"      |
|    |                |              |       "IndexEdgeScanNode(IndexID=335, Path=())": "1227079(us)",   | }                              |
|    |                |              |       "IndexLimitNode(limit=9223372036854775807)": "1227075(us)", | inputVar:                      |
|    |                |              |       "IndexProjectionNode(projectColumn=[_src])": "1227078(us)"  | space: 330                     |
|    |                |              |     },                                                            | dedup: false                   |
|    |                |              |     "total": "1690639(us)"                                        | limit: 9223372036854775807     |
|    |                |              |   },                                                              | filter:                        |
|    |                |              |   "rows": 736460,                                                 | orderBy: []                    |
|    |                |              |   "totalTime": "1768582(us)",                                     | schemaId: 334                  |
|    |                |              |   "version": 0                                                    | isEdge: true                   |
|    |                |              | }                                                                 | returnCols: [                  |
|    |                |              |                                                                   |   "_src"                       |
|    |                |              |                                                                   | ]                              |
|    |                |              |                                                                   | indexCtx: [                    |
|    |                |              |                                                                   |   {                            |
|    |                |              |                                                                   |     "columnHints": [],         |
|    |                |              |                                                                   |     "filter": "",              |
|    |                |              |                                                                   |     "index_id": 335            |
|    |                |              |                                                                   |   }                            |
|    |                |              |                                                                   | ]                              |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------
|  3 | Start          |              | {                                                                 | outputVar: {                   |
|    |                |              |   "execTime": "1(us)",                                            |   "colNames": [],              |
|    |                |              |   "rows": 0,                                                      |   "type": "DATASET",           |
|    |                |              |   "totalTime": "48(us)",                                          |   "name": "__Start_3"          |
|    |                |              |   "version": 0                                                    | }                              |
|    |                |              | }                                                                 |                                |
-----+----------------+--------------+-------------------------------------------------------------------+---------------------------------

Tue, 28 Mar 2023 07:58:23 UTC

yee · 2023 年3 月 28 日 08:30

感谢这个反馈，现在 nebula 确实还存在许多优化上的事情要做，针对这条语句有两方面的优化没能做好，一个是列裁剪，一个是 limit 下推。我分别记了相关的 issue，会将对应的规则加到优化器中。

关于下面这个 query 应该不存在扫全表的问题：

match (v)-[e:edge_attack*1..6]->(v2) where id(v)== "172.16.33.1" return e limit 100

如果方便的话，可以帮忙再贴一下对应的计划吗？我本地看了是直接走的通过 id 来拓展。

再次感谢反馈

may11544 · 2023 年3 月 28 日 11:05

这个的场景是亮点之间有比较多不同rank的边，目前1…6会导致graphd内存占用过高，然后重启。

所以缩短了步数

profile match (v)-[e:edge_attack*1..2]->(v2) where id(v)== "172.16.33.1" return e limit 100

Got 100 rows (time spent 28.081838s/28.107994682s)

Execution Plan (optimize time 222 us)

-----+----------------+--------------+-------------------------------------------+------------------------------------------------
| id | name           | dependencies | profiling data                            | operator info                                 |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------
| 11 | Project        | 10           | {                                         | outputVar: {                                  |
|    |                |              |   "execTime": "23118(us)",                |   "colNames": [                               |
|    |                |              |   "rows": 100,                            |     "e"                                       |
|    |                |              |   "totalTime": "23124(us)",               |   ],                                          |
|    |                |              |   "version": 0                            |   "type": "DATASET",                          |
|    |                |              | }                                         |   "name": "__Limit_8"                         |
|    |                |              |                                           | }                                             |
|    |                |              |                                           | inputVar: __Limit_10                          |
|    |                |              |                                           | columns: [                                    |
|    |                |              |                                           |   "[__VAR_1 IN $-.e WHERE is_edge($__VAR_1)]" |
|    |                |              |                                           | ]                                             |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------
| 10 | Limit          | 5            | {                                         | outputVar: {                                  |
|    |                |              |   "execTime": "4363810(us)",              |   "colNames": [                               |
|    |                |              |   "rows": 100,                            |     "v",                                      |
|    |                |              |   "totalTime": "4363820(us)",             |     "e",                                      |
|    |                |              |   "version": 0                            |     "v2"                                      |
|    |                |              | }                                         |   ],                                          |
|    |                |              |                                           |   "type": "DATASET",                          |
|    |                |              |                                           |   "name": "__Limit_10"                        |
|    |                |              |                                           | }                                             |
|    |                |              |                                           | inputVar: __AppendVertices_5                  |
|    |                |              |                                           | offset: 0                                     |
|    |                |              |                                           | count: 100                                    |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------
|  5 | AppendVertices | 4            | {                                         | outputVar: {                                  |
|    |                |              |   "execTime": "11375379(us)",             |   "colNames": [                               |
|    |                |              |   "resp[0]": {                            |     "v",                                      |
|    |                |              |     "exec": "30692(us)",                  |     "e",                                      |
|    |                |              |     "host": "storaged:9779",              |     "v2"                                      |
|    |                |              |     "total": "32106(us)"                  |   ],                                          |
|    |                |              |   },                                      |   "type": "DATASET",                          |
|    |                |              |   "rows": 12332977,                       |   "name": "__AppendVertices_5"                |
|    |                |              |   "totalTime": "11407795(us)",            | }                                             |
|    |                |              |   "total_rpc": "33668(us)",               | inputVar: __Traverse_4                        |
|    |                |              |   "version": 0                            | space: 330                                    |
|    |                |              | }                                         | dedup: true                                   |
|    |                |              |                                           | limit: -1                                     |
|    |                |              |                                           | filter:                                       |
|    |                |              |                                           | orderBy: []                                   |
|    |                |              |                                           | src: none_direct_dst($-.e)                    |
|    |                |              |                                           | props: [                                      |
|    |                |              |                                           |   {                                           |
|    |                |              |                                           |     "props": [                                |
|    |                |              |                                           |       "_tag"                                  |
|    |                |              |                                           |     ],                                        |
|    |                |              |                                           |     "tagId": 332                              |
|    |                |              |                                           |   },                                          |
|    |                |              |                                           |   {                                           |
|    |                |              |                                           |     "props": [                                |
|    |                |              |                                           |       "_tag"                                  |
|    |                |              |                                           |     ],                                        |
|    |                |              |                                           |     "tagId": 331                              |
|    |                |              |                                           |   }                                           |
|    |                |              |                                           | ]                                             |
|    |                |              |                                           | exprs:                                        |
|    |                |              |                                           | vertex_filter:                                |
|    |                |              |                                           | if_track_previous_path: true                  |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------
|  4 | Traverse       | 2            | {                                         | outputVar: {                                  |
|    |                |              |   "execTime": "12027550(us)",             |   "colNames": [                               |
|    |                |              |   "rows": 12332977,                       |     "v",                                      |
|    |                |              |   "step[1]": [                            |     "e"                                       |
|    |                |              |     {                                     |   ],                                          |
|    |                |              |       "exec": "158854(us)",               |   "type": "DATASET",                          |
|    |                |              |       "host": "storaged:9779",            |   "name": "__Traverse_4"                      |
|    |                |              |       "storage_detail": {                 | }                                             |
|    |                |              |         "GetNeighborsNode": "157079(us)", | inputVar: __Dedup_2                           |
|    |                |              |         "HashJoinNode": "6590(us)",       | space: 330                                    |
|    |                |              |         "RelNode": "157080(us)",          | dedup: true                                   |
|    |                |              |         "SingleEdgeNode": "6587(us)"      | limit: -1                                     |
|    |                |              |       },                                  | filter:                                       |
|    |                |              |       "total": "260731(us)",              | orderBy: []                                   |
|    |                |              |       "total_rpc_time": "261870(us)",     | src: $-._vid                                  |
|    |                |              |       "vertices": 1                       | edgeTypes: []                                 |
|    |                |              |     }                                     | edgeDirection: OUT_EDGE                       |
|    |                |              |   ],                                      | vertexProps:                                  |
|    |                |              |   "step[2]": [                            | edgeProps: [                                  |
|    |                |              |     {                                     |   {                                           |
|    |                |              |       "exec": "7853(us)",                 |     "props": [                                |
|    |                |              |       "host": "storaged:9779",            |       "_src",                                 |
|    |                |              |       "storage_detail": {                 |       "_type",                                |
|    |                |              |         "GetNeighborsNode": "7322(us)",   |       "_rank",                                |
|    |                |              |         "HashJoinNode": "970(us)",        |       "_dst",                                 |
|    |                |              |         "RelNode": "7323(us)",            |       "name",                                 |
|    |                |              |         "SingleEdgeNode": "962(us)"       |       "proto",                                |
|    |                |              |       },                                  |       "app_proto",                            |
|    |                |              |       "total": "12948(us)",               |       "tool",                                 |
|    |                |              |       "total_rpc_time": "13017(us)",      |       "attacker_port",                        |
|    |                |              |       "vertices": 7                       |       "victim_port",                          |
|    |                |              |     }                                     |       "phase",                                |
|    |                |              |   ],                                      |       "result",                               |
|    |                |              |   "totalTime": "12284637(us)",            |       "tag",                                  |
|    |                |              |   "version": 0                            |       "severity",                             |
|    |                |              | }                                         |       "event_time",                           |
|    |                |              |                                           |       "flow_id"                               |
|    |                |              |                                           |     ],                                        |
|    |                |              |                                           |     "type": 334                               |
|    |                |              |                                           |   }                                           |
|    |                |              |                                           | ]                                             |
|    |                |              |                                           | statProps:                                    |
|    |                |              |                                           | exprs:                                        |
|    |                |              |                                           | random: false                                 |
|    |                |              |                                           | steps: 1..2                                   |
|    |                |              |                                           | vertex filter:                                |
|    |                |              |                                           | edge filter:                                  |
|    |                |              |                                           | if_track_previous_path: false                 |
|    |                |              |                                           | first step filter:                            |
|    |                |              |                                           | tag filter:                                   |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------
|  2 | Dedup          | 1            | {                                         | outputVar: {                                  |
|    |                |              |   "execTime": "13(us)",                   |   "colNames": [                               |
|    |                |              |   "rows": 1,                              |     "_vid"                                    |
|    |                |              |   "totalTime": "14(us)",                  |   ],                                          |
|    |                |              |   "version": 0                            |   "type": "DATASET",                          |
|    |                |              | }                                         |   "name": "__Dedup_2"                         |
|    |                |              |                                           | }                                             |
|    |                |              |                                           | inputVar: __VAR_0                             |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------
|  1 | PassThrough    | 3            | {                                         | outputVar: {                                  |
|    |                |              |   "execTime": "7(us)",                    |   "colNames": [                               |
|    |                |              |   "rows": 0,                              |     "_vid"                                    |
|    |                |              |   "totalTime": "10(us)",                  |   ],                                          |
|    |                |              |   "version": 0                            |   "type": "DATASET",                          |
|    |                |              | }                                         |   "name": "__VAR_0"                           |
|    |                |              |                                           | }                                             |
|    |                |              |                                           | inputVar:                                     |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------
|  3 | Start          |              | {                                         | outputVar: {                                  |
|    |                |              |   "execTime": "0(us)",                    |   "colNames": [],                             |
|    |                |              |   "rows": 0,                              |   "type": "DATASET",                          |
|    |                |              |   "totalTime": "22(us)",                  |   "name": "__Start_3"                         |
|    |                |              |   "version": 0                            | }                                             |
|    |                |              | }                                         |                                               |
-----+----------------+--------------+-------------------------------------------+------------------------------------------------

Tue, 28 Mar 2023 11:03:42 UTC

感觉match扫描的时候是广搜，并且match到所有的边之后再开始过滤的，感觉是不是不好优化，我没测试过neo4j的limit机制，不知道是不是一样的呢。

这也是为什么之前想要改为GO语句的原因，但是GO的limit又是在where condition之前选择的。

贴一下统计数据

(root@nebula) [quanxi]> show stats
+---------+---------------+--------+
| Type    | Name          | Count  |
+---------+---------------+--------+
| "Tag"   | "tag_attack"  | 0      |
| "Tag"   | "tag_ip"      | 163    |
| "Edge"  | "edge_attack" | 736460 |
| "Edge"  | "edge_ref"    | 0      |
| "Space" | "vertices"    | 163    |
| "Space" | "edges"       | 736460 |

may11544 · 2023 年3 月 29 日 04:04

可以看一下

yee · 2023 年3 月 29 日 06:20

是的，nebula 的拓展都是通过广搜，如果把最早的 GO 转化为 MATCH 的话，其实还有个 filter 条件，类似下面的语句：

MATCH (v)-[e:edge_attack*1..6]-(v2) 
WHERE id(v)=="172.16.33.1" AND all(i in e WERE i.event_time<1679888603187 AND i.event_time>1679888501000)
RETURN DISTINCT e

如果后续这个优化合入相信可以缓解你的问题。

当然 limit 的优化，我也记了个 issue，后面也会推进这个优化的事情。优化的方法在考虑切换深搜的方式。

多谢你的场景提供，希望能一起把 nebula 打磨的更易用好用。

may11544 · 2023 年3 月 29 日 08:37

感谢回复，另外就是GO语句的limit有计划做赶紧吗，就是关于limit和where condition的先后问题

yee · 2023 年3 月 29 日 08:50

关于 GO 的 limit 和 where 的执行顺序的问题，这个我们内部的讨论是不再调整了，之前我也说明了缘由是之前 GO 的语意的设计问题。

后续更多的优化期望是在 match 中实现，这块的优化空间更大一些。如果现在使用上需求比较紧急可以先按照你最初的设定给个稍微大些的 limit 来避免结果截断的问题。

may11544 · 2023 年3 月 29 日 08:53

好的，了解。还是希望能尽快优化下这里，这是不可用=>可用的改变。不然match语句真的可有可无了
另外就是设定稍微大些的 limit是解决不了问题的，应为过滤条件是不定的。所以这个功能暂时在我看来也处于不可用状态。

yee · 2023 年3 月 29 日 09:12

是的，我们会在最近发布的版本中优先把过滤下推的实现合入（ @kyle ），limit 的优化也会尽快的出方案。多谢理解。

kyle · 2023 年4 月 10 日 06:03

@may11544 @yee 这里的 all filter 下推存储层已经合入到最新代码了。

system · 2023 年5 月 10 日 06:03

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。