即然rank是边key的一部分,为什么根据某点出发,指定某个类型的边时,通过rank过滤还是很慢?比如
GO FROM 'a_1' OVER operate WHERE rank(edge) <= 1694422913 YIELD src(edge), dst(edge), rank(edge);
即然rank是边key的一部分,为什么根据某点出发,指定某个类型的边时,通过rank过滤还是很慢?比如
GO FROM 'a_1' OVER operate WHERE rank(edge) <= 1694422913 YIELD src(edge), dst(edge), rank(edge);
profile了下,看起来没下推,所以是捞上来以后再做的过滤
改一下表达就好了 改成 WHERE operate._rank < 1694422913
before
profile GO FROM 'player100' OVER follow WHERE rank(edge) < 100 YIELD src(edge), dst(edge), rank(edge)
-----+-----------+--------------+--------------------------------------+------------------------------
| 3 | ExpandAll | 2 | { | outputVar: { |
| | | | "execTime": "117(us)", | "colNames": [ |
| | | | "graphExpandAllTime+2": "57(us)", | "EDGE", |
| | | | "resp[0]": { | "__COL_0" |
| | | | "exec": "584(us)", | ], |
| | | | "host": "storaged2:9779", | "type": "DATASET", |
| | | | "storage_detail": { | "name": "__ExpandAll_3" |
| | | | "GetNeighborsNode": "182(us)", | } |
| | | | "HashJoinNode": "160(us)", | inputVar: __Expand_2 |
| | | | "RelNode": "183(us)", | space: 1 |
| | | | "SingleEdgeNode": "157(us)" | dedup: 0 |
| | | | }, | limit: -1 |
| | | | "total": "1596(us)" | filter: | #<---- Filter 没有推理 rank(edge) 表达,所以没有下推
| | | | }, | orderBy: [] |
| | | | "rows": 2, | sample: false |
| | | | "totalTime": "1924(us)", | joinInput: false |
| | | | "version": 0 | maxSteps: 1 |
| | | | } | edgeProps: [ |
| | | | | { |
| | | | | "props": [ |
| | | | | "_dst", |
| | | | | "_rank", |
| | | | | "_src", |
| | | | | "_type", |
| | | | | "degree" |
| | | | | ], |
| | | | | "type": 5 |
| | | | | } |
| | | | | ] |
| | | | | stepLimits: [] |
| | | | | minSteps: 1 |
| | | | | vertexProps: |
| | | | | vertexColumns: [] |
| | | | | edgeColumns: [ |
| | | | | "EDGE AS EDGE", |
| | | | | "*._rank AS __COL_0" |
| | | | | ] |
-----+-----------+--------------+--------------------------------------+------------------------------
after
profile GO FROM 'player100' OVER follow WHERE follow._rank < 100 YIELD src(edge), dst(edge), rank(edge)
-----+-----------+--------------+-------------------------------------+-------------------------------
| 6 | ExpandAll | 2 | { | outputVar: { |
| | | | "execTime": "112(us)", | "colNames": [ |
| | | | "graphExpandAllTime+2": "62(us)", | "__COL_0", |
| | | | "resp[0]": { | "EDGE", |
| | | | "exec": "371(us)", | "__COL_1" |
| | | | "host": "storaged2:9779", | ], |
| | | | "storage_detail": { | "type": "DATASET", |
| | | | "FilterNode": "68(us)", | "name": "__Filter_4" |
| | | | "GetNeighborsNode": "85(us)", | } |
| | | | "HashJoinNode": "63(us)", | inputVar: __Expand_2 |
| | | | "RelNode": "86(us)", | space: 1 |
| | | | "SingleEdgeNode": "60(us)" | dedup: 0 |
| | | | }, | limit: -1 |
| | | | "total": "1367(us)" | filter: (follow._rank<100) | # 过滤条件 apply 到 storage 了。
| | | | }, | orderBy: [] |
| | | | "rows": 2, | sample: false |
| | | | "totalTime": "1728(us)", | joinInput: false |
| | | | "version": 0 | maxSteps: 1 |
| | | | } | edgeProps: [ |
| | | | | { |
| | | | | "props": [ |
| | | | | "_dst", |
| | | | | "_rank", |
| | | | | "_src", |
| | | | | "_type", |
| | | | | "degree" |
| | | | | ], |
| | | | | "type": 5 |
| | | | | } |
| | | | | ] |
| | | | | stepLimits: [] |
| | | | | minSteps: 1 |
| | | | | vertexProps: |
| | | | | vertexColumns: [] |
| | | | | edgeColumns: [ |
| | | | | "follow._rank AS __COL_0", |
| | | | | "EDGE AS EDGE", |
| | | | | "*._rank AS __COL_1" |
| | | | | ] |
-----+-----------+--------------+-------------------------------------+-------------------------------
这是 nGQL 简明教程,第二期 nGQL 执行计划详解与调优 - siwei.io 中提及的一个典型的优化思路
感谢。还有个问题想请教下,rank条件下推后,storaged是否需要扫描所有边(指定起点指定类型的所有边), 然后过滤除满足条件的边。还是说可以根据 rank的条件,只扫描满足条件的边?
扫的话应该是全扫,但是返回给 graphd 的是 filter 之后的哈
调整前后有明显区别吗?我看之前的总时间是1924(us),之后是1728(us),好像没有太大的区别啊
只是 wey 给的例子如此,不是 zhangyw 的数据对比,具体的数据差距要看 zhangyw 的数据量的,wey 的例子是基于官方 basketball 数据集的,本身数据量就不大。
好,我尝试下自己的,谢谢您的回复
因为不是全图扫,是典型的图探索,所以这里主要是单个点出边比如上万才会有差别,一般的平均出度几十个的没啥差距,我的例子主要关注算子的 filter 信息,往右拉一下能看到