TagIndexRangeScan 与 TopN 是否可以尝试合并

在某个 tag 上对于某个字段创建索引,使用 lookup 语句查询改字段在某个区间内的排序后的前 n 个 tag 记录,语句大概如下:
LOOKUP ON tag_name WHERE tag_name.index_field >= 0 and tag_name.index_field < 1672719040743 YIELD id(VERTEX) as id, properties(vertex).index_field| order by $-.index_field | LIMIT 20,explain 发现分成了 TagIndexRangeScan 和 TopN 两步,这种场景是不是可以把 TopN 和 T agIndexRangeScan 合并成一步,或者是我们使用姿势有问题?

你的语句里有 order by ,这样需要先排序在取top n,应该是合并不了。

如果把order by 去掉,limit 会在TagIndexRangeScan里。

(root@nebula) [nba]> explain LOOKUP ON player WHERE player.age >= 0 and player.age < 100 YIELD id(VERTEX) as id, properties(vertex).age as age | LIMIT 20
Execution succeeded (time spent 4180/5071 us)

Execution Plan (optimize time 2998 us)

-----+-------------------+--------------+----------------+------------------------------------
| id | name              | dependencies | profiling data | operator info                     |
-----+-------------------+--------------+----------------+------------------------------------
|  7 | Project           | 8            |                | outputVar: {                      |
|    |                   |              |                |   "colNames": [                   |
|    |                   |              |                |     "id",                         |
|    |                   |              |                |     "age"                         |
|    |                   |              |                |   ],                              |
|    |                   |              |                |   "name": "__Limit_4",            |
|    |                   |              |                |   "type": "DATASET"               |
|    |                   |              |                | }                                 |
|    |                   |              |                | inputVar: __Limit_6               |
|    |                   |              |                | columns: [                        |
|    |                   |              |                |   "id(VERTEX) AS id",             |
|    |                   |              |                |   "properties(VERTEX).age AS age" |
|    |                   |              |                | ]                                 |
-----+-------------------+--------------+----------------+------------------------------------
|  8 | Limit             | 9            |                | outputVar: {                      |
|    |                   |              |                |   "colNames": [                   |
|    |                   |              |                |     "_vid",                       |
|    |                   |              |                |     "player._tag",                |
|    |                   |              |                |     "player.age",                 |
|    |                   |              |                |     "player.name"                 |
|    |                   |              |                |   ],                              |
|    |                   |              |                |   "name": "__Limit_6",            |
|    |                   |              |                |   "type": "DATASET"               |
|    |                   |              |                | }                                 |
|    |                   |              |                | inputVar: __TagIndexRangeScan_9   |
|    |                   |              |                | offset: 0                         |
|    |                   |              |                | count: 20                         |
-----+-------------------+--------------+----------------+------------------------------------
|  9 | TagIndexRangeScan | 0            |                | outputVar: {                      |
|    |                   |              |                |   "colNames": [                   |
|    |                   |              |                |     "_vid",                       |
|    |                   |              |                |     "player._tag",                |
|    |                   |              |                |     "player.age",                 |
|    |                   |              |                |     "player.name"                 |
|    |                   |              |                |   ],                              |
|    |                   |              |                |   "type": "DATASET",              |
|    |                   |              |                |   "name": "__TagIndexRangeScan_9" |
|    |                   |              |                | }                                 |
|    |                   |              |                | inputVar:                         |
|    |                   |              |                | space: 7                          |
|    |                   |              |                | dedup: false                      |
|    |                   |              |                | limit: 20                         |
|    |                   |              |                | filter:                           |
|    |                   |              |                | orderBy: []                       |
|    |                   |              |                | schemaId: 8                       |
|    |                   |              |                | isEdge: false                     |
|    |                   |              |                | returnCols: [                     |
|    |                   |              |                |   "_vid",                         |
|    |                   |              |                |   "_tag",                         |
|    |                   |              |                |   "age",                          |
|    |                   |              |                |   "name"                          |
|    |                   |              |                | ]                                 |
|    |                   |              |                | indexCtx: [                       |
|    |                   |              |                |   {                               |
|    |                   |              |                |     "columnHints": [              |
|    |                   |              |                |       {                           |
|    |                   |              |                |         "includeEnd": false,      |
|    |                   |              |                |         "endValue": 100,          |
|    |                   |              |                |         "beginValue": 0,          |
|    |                   |              |                |         "includeBegin": true,     |
|    |                   |              |                |         "column": "age",          |
|    |                   |              |                |         "scanType": "RANGE"       |
|    |                   |              |                |       }                           |
|    |                   |              |                |     ],                            |
|    |                   |              |                |     "filter": "",                 |
|    |                   |              |                |     "index_id": 15                |
|    |                   |              |                |   }                               |
|    |                   |              |                | ]                                 |
-----+-------------------+--------------+----------------+------------------------------------
|  0 | Start             |              |                | outputVar: {                      |
|    |                   |              |                |   "colNames": [],                 |
|    |                   |              |                |   "type": "DATASET",              |
|    |                   |              |                |   "name": "__Start_0"             |
|    |                   |              |                | }                                 |
-----+-------------------+--------------+----------------+------------------------------------

去掉 order by 应该达不到取全局有序前 20 条数据的效果,感觉整个语句只是对索引字段做 reverse scan 取 topN 条记录即可,我们这边有合适的方式把 scan 和 topN executor 合并吗?这样应该对语句性能提升帮助很大

目前版本已经有了一定的 topN 下推到 scan 的能力,还需要加强一下。欢迎向社区版提代码!

能举个 topN 下推到 scan 的例子吗? 感谢

目前的 TopN 下推应该只有这种模式的可以,你可以验证一下

LOOKUP ON tag_name
WHERE tag_name.index_field >= 0 and tag_name.index_field < 1672719040743
YIELD properties(vertex).index_field |
order by $-.index_field | LIMIT 20

参考 https://github.com/vesoft-inc/nebula/pull/3499/files#diff-24c5f96f18fee906869d0003c247b0cae309e85ce83fab869385482cd56a798c

1 个赞

这个语句 explain 后还是一个 tag index range scan executor + topN executor,我这边需求是如果 topN 是针对 tag index 字段的,是不是可以把两个 executor 合并?这样在大数据量下型男应该会有提升

改成这个应该下推的,我验证过,去掉了 properties() 函数

不好意思,我看错了,topN 确实有合并到 index range scan executor,目前只支持这样操作吗,或者后续会继续丰富下推场景吗?根据按索引排序查找前 n 条记录这种操作是不是比较常见?

目前没有这样的计划,这个查询其实是非常非常 SQL/Tabular 的查询(算是 Graph Query 的反面),如果感兴趣可以参考

实现来贡献呢哈 :slight_smile:

1 个赞

了解,感谢大佬解答

1 个赞