在某个 tag 上对于某个字段创建索引,使用 lookup 语句查询改字段在某个区间内的排序后的前 n 个 tag 记录,语句大概如下:
LOOKUP ON tag_name WHERE tag_name.index_field >= 0 and tag_name.index_field < 1672719040743 YIELD id(VERTEX) as id, properties(vertex).index_field| order by $-.index_field | LIMIT 20,explain 发现分成了 TagIndexRangeScan 和 TopN 两步,这种场景是不是可以把 TopN 和 T agIndexRangeScan 合并成一步,或者是我们使用姿势有问题?
你的语句里有 order by ,这样需要先排序在取top n,应该是合并不了。
如果把order by 去掉,limit 会在TagIndexRangeScan里。
(root@nebula) [nba]> explain LOOKUP ON player WHERE player.age >= 0 and player.age < 100 YIELD id(VERTEX) as id, properties(vertex).age as age | LIMIT 20
Execution succeeded (time spent 4180/5071 us)
Execution Plan (optimize time 2998 us)
-----+-------------------+--------------+----------------+------------------------------------
| id | name | dependencies | profiling data | operator info |
-----+-------------------+--------------+----------------+------------------------------------
| 7 | Project | 8 | | outputVar: { |
| | | | | "colNames": [ |
| | | | | "id", |
| | | | | "age" |
| | | | | ], |
| | | | | "name": "__Limit_4", |
| | | | | "type": "DATASET" |
| | | | | } |
| | | | | inputVar: __Limit_6 |
| | | | | columns: [ |
| | | | | "id(VERTEX) AS id", |
| | | | | "properties(VERTEX).age AS age" |
| | | | | ] |
-----+-------------------+--------------+----------------+------------------------------------
| 8 | Limit | 9 | | outputVar: { |
| | | | | "colNames": [ |
| | | | | "_vid", |
| | | | | "player._tag", |
| | | | | "player.age", |
| | | | | "player.name" |
| | | | | ], |
| | | | | "name": "__Limit_6", |
| | | | | "type": "DATASET" |
| | | | | } |
| | | | | inputVar: __TagIndexRangeScan_9 |
| | | | | offset: 0 |
| | | | | count: 20 |
-----+-------------------+--------------+----------------+------------------------------------
| 9 | TagIndexRangeScan | 0 | | outputVar: { |
| | | | | "colNames": [ |
| | | | | "_vid", |
| | | | | "player._tag", |
| | | | | "player.age", |
| | | | | "player.name" |
| | | | | ], |
| | | | | "type": "DATASET", |
| | | | | "name": "__TagIndexRangeScan_9" |
| | | | | } |
| | | | | inputVar: |
| | | | | space: 7 |
| | | | | dedup: false |
| | | | | limit: 20 |
| | | | | filter: |
| | | | | orderBy: [] |
| | | | | schemaId: 8 |
| | | | | isEdge: false |
| | | | | returnCols: [ |
| | | | | "_vid", |
| | | | | "_tag", |
| | | | | "age", |
| | | | | "name" |
| | | | | ] |
| | | | | indexCtx: [ |
| | | | | { |
| | | | | "columnHints": [ |
| | | | | { |
| | | | | "includeEnd": false, |
| | | | | "endValue": 100, |
| | | | | "beginValue": 0, |
| | | | | "includeBegin": true, |
| | | | | "column": "age", |
| | | | | "scanType": "RANGE" |
| | | | | } |
| | | | | ], |
| | | | | "filter": "", |
| | | | | "index_id": 15 |
| | | | | } |
| | | | | ] |
-----+-------------------+--------------+----------------+------------------------------------
| 0 | Start | | | outputVar: { |
| | | | | "colNames": [], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Start_0" |
| | | | | } |
-----+-------------------+--------------+----------------+------------------------------------
去掉 order by 应该达不到取全局有序前 20 条数据的效果,感觉整个语句只是对索引字段做 reverse scan 取 topN 条记录即可,我们这边有合适的方式把 scan 和 topN executor 合并吗?这样应该对语句性能提升帮助很大
目前版本已经有了一定的 topN 下推到 scan 的能力,还需要加强一下。欢迎向社区版提代码!
能举个 topN 下推到 scan 的例子吗? 感谢
目前的 TopN 下推应该只有这种模式的可以,你可以验证一下
LOOKUP ON tag_name
WHERE tag_name.index_field >= 0 and tag_name.index_field < 1672719040743
YIELD properties(vertex).index_field |
order by $-.index_field | LIMIT 20
1 个赞
这个语句 explain 后还是一个 tag index range scan executor + topN executor,我这边需求是如果 topN 是针对 tag index 字段的,是不是可以把两个 executor 合并?这样在大数据量下型男应该会有提升
改成这个应该下推的,我验证过,去掉了 properties() 函数
不好意思,我看错了,topN 确实有合并到 index range scan executor,目前只支持这样操作吗,或者后续会继续丰富下推场景吗?根据按索引排序查找前 n 条记录这种操作是不是比较常见?
目前没有这样的计划,这个查询其实是非常非常 SQL/Tabular 的查询(算是 Graph Query 的反面),如果感兴趣可以参考
实现来贡献呢哈
1 个赞
了解,感谢大佬解答
1 个赞