索引的命中
在本篇结束之前,我还想再给一个索引查询的例子。首先,假设环境里只存在 player() 和 player.name 上的索引,没有 player.age 上的索引。
(root@nebula) [basketballplayer]> SHOW TAG INDEXES
+------------------+----------+----------+
| Index Name | By Tag | Columns |
+------------------+----------+----------+
| "player_index_0" | "player" | [] |
| "player_index_1" | "player" | ["name"] |
+------------------+----------+----------+
Got 2 rows (time spent 2440/19653 us)
在 NebulaGraph 中,只有不具备 ID 条件的起点才涉及索引,而这类起点查询其实是一个典型的非图库查询。相反,点的拓展和 NebulaGraph 索引是无关的。
下面的例子只涉及起点查询,类似于 SQL 中 SELECT * FROM player WHERE ...
的表达。这三个查询分别是:
MATCH (n:player) WHERE n.player.age > 50 RETURN n
MATCH (n:player) WHERE n.player.name > "T" RETURN n
MATCH (n:team) WHERE n.team.name > "T" RETURN n
之中只有 1. 和 2. 是允许的查询,3. 因为查询涉及数据全扫描而被 NebulaGraph 禁止,因为不存在 team 之中的索引,这样的全扫描是很昂贵的:
(root@nebula) [basketballplayer]> match (n:team) WHERE n.team.name > "T" RETURN n
[ERROR (-1005)]: IndexNotFound: No valid index found
虽然都能被执行,但 1. 与 2. 的情况又有不同:1. 中的过滤条件 n.player.age > 50
涉及未被索引的字段,这个过滤是无法被下推到 IndexScan
算子的,这意味着所有的 player 都会被扫描到 graphd 中,然后再进一步进行 Filter
算子的过滤计算。
下图是索引文中介绍的点索引在 RocksDB 中的数据结构。
(root@nebula) [basketballplayer]> profile MATCH (n:player) WHERE n.player.age > 50 RETURN n
+---+
| n |
+---+
+---+
Empty set (time spent 8468/25278 us)
Execution Plan (optimize time 442 us)
-----+----------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| id | name | dependencies | profiling data | operator info |
-----+----------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 6 | Project | 5 | ver: 0, rows: 0, execTime: 19us, totalTime: 22us | outputVar: { |
| | | | | "colNames": [ |
| | | | | "n" |
| | | | | ], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Project_6" |
| | | | | } |
| | | | | inputVar: __Filter_5 |
| | | | | columns: [ |
| | | | | "$n" |
| | | | | ] |
-----+----------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 5 | Filter | 9 | ver: 0, rows: 0, execTime: 74us, totalTime: 75us | outputVar: { |
| | | | | "colNames": [ |
| | | | | "n" |
| | | | | ], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Filter_5" |
| | | | | } |
| | | | | inputVar: __Project_4 |
| | | | | condition: (n.player.age>50) |
| | | | | isStable: false |
-----+----------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 9 | AppendVertices | 7 | { | outputVar: { |
| | | | ver: 0, rows: 51, execTime: 841us, totalTime: 4481us | "colNames": [ |
| | | | total_rpc: 4018(us) | "n" |
| | | | "storaged1":9779 exec/total: 1591(us)/2868(us) | ], |
| | | | "storaged0":9779 exec/total: 1532(us)/2672(us) | "type": "DATASET", |
| | | | "storaged2":9779 exec/total: 2510(us)/3733(us) | "name": "__Project_4" |
| | | | } | } |
| | | | | inputVar: __IndexScan_1 |
| | | | | space: 49 |
| | | | | dedup: true |
| | | | | limit: -1 |
| | | | | filter: player._tag IS NOT EMPTY |
| | | | | orderBy: [] |
| | | | | src: $_vid |
| | | | | props: [ |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "age", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 59 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "geo", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 60 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "age", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 50 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "email", |
| | | | | "phone_num", |
| | | | | "birthday", |
| | | | | "address", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 61 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 51 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "address", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 62 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "uuid", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 63 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 64 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 65 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 66 |
| | | | | } |
| | | | | ] |
| | | | | exprs: |
| | | | | vertex_filter: |
| | | | | if_track_previous_path: false |
-----+----------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 7 | IndexScan | 2 | { | outputVar: { |
| | | | ver: 0, rows: 51, execTime: 0us, totalTime: 2475us | "colNames": [ |
| | | | "storaged2":9779 exec/total: 640(us)/1968(us) | "_vid" |
| | | | "storaged1":9779 exec/total: 709(us)/2040(us) | ], |
| | | | "storaged0":9779 exec/total: 599(us)/2075(us) | "type": "DATASET", |
| | | | storage_detail: {IndexLimitNode(limit=9223372036854775807):318(us),IndexProjectionNode(projectColumn=[_vid]):317(us),IndexVertexScanNode(IndexID=54, Path=()):320(us)} | "name": "__IndexScan_1" |
| | | | } | } |
| | | | | inputVar: |
| | | | | space: 49 |
| | | | | dedup: false |
| | | | | limit: 9223372036854775807 |
| | | | | filter: |
| | | | | orderBy: [] |
| | | | | schemaId: 50 |
| | | | | isEdge: false |
| | | | | returnCols: [ |
| | | | | "_vid" |
| | | | | ] |
| | | | | indexCtx: [ |
| | | | | { |
| | | | | "columnHints": [], |
| | | | | "filter": "", |
| | | | | "index_id": 54 |
| | | | | } |
| | | | | ] |
-----+----------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 2 | Start | | ver: 0, rows: 0, execTime: 0us, totalTime: 36us | outputVar: { |
| | | | | "colNames": [], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Start_2" |
| | | | | } |
-----+----------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
Tue, 13 Sep 2022 15:28:31 CST
(root@nebula) [basketballplayer]>
(root@nebula) [basketballplayer]> profile MATCH (n:player) WHERE n.player.name > "T" RETURN n
+------------------------------------------------------------------------------------------+
| n |
+------------------------------------------------------------------------------------------+
| ("player127" :player{age: 42, name: "Vince Carter"}) |
| ("player109" :player{age: 34, name: "Tiago Splitter"}) |
| ("player100" :actor{age: 42, name: "Tim Duncan X"} :player{age: 42, name: "Tim Duncan"}) |
| ("player101" :player{age: 36, name: "Tony Parker"}) |
| ("player133" :player{age: 38, name: "Yao Ming"}) |
| ("player114" :player{age: 39, name: "Tracy McGrady"}) |
+------------------------------------------------------------------------------------------+
Got 6 rows (time spent 5438/13004 us)
Execution Plan (optimize time 333 us)
-----+----------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| id | name | dependencies | profiling data | operator info |
-----+----------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 6 | Project | 5 | ver: 0, rows: 6, execTime: 24us, totalTime: 26us | outputVar: { |
| | | | | "colNames": [ |
| | | | | "n" |
| | | | | ], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Project_6" |
| | | | | } |
| | | | | inputVar: __Filter_5 |
| | | | | columns: [ |
| | | | | "$n" |
| | | | | ] |
-----+----------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 5 | Filter | 9 | ver: 0, rows: 6, execTime: 28us, totalTime: 30us | outputVar: { |
| | | | | "colNames": [ |
| | | | | "n" |
| | | | | ], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Filter_5" |
| | | | | } |
| | | | | inputVar: __Project_4 |
| | | | | condition: (n.player.name>"T") |
| | | | | isStable: false |
-----+----------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 9 | AppendVertices | 7 | { | outputVar: { |
| | | | ver: 0, rows: 6, execTime: 488us, totalTime: 2146us | "colNames": [ |
| | | | "storaged1":9779 exec/total: 395(us)/1461(us) | "n" |
| | | | "storaged0":9779 exec/total: 375(us)/1291(us) | ], |
| | | | "storaged2":9779 exec/total: 607(us)/1615(us) | "type": "DATASET", |
| | | | total_rpc: 1860(us) | "name": "__Project_4" |
| | | | } | } |
| | | | | inputVar: __IndexScan_1 |
| | | | | space: 49 |
| | | | | dedup: true |
| | | | | limit: -1 |
| | | | | filter: player._tag IS NOT EMPTY |
| | | | | orderBy: [] |
| | | | | src: $_vid |
| | | | | props: [ |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "age", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 59 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "geo", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 60 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "age", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 50 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "email", |
| | | | | "phone_num", |
| | | | | "birthday", |
| | | | | "address", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 61 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "name", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 51 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "address", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 62 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "uuid", |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 63 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 64 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 65 |
| | | | | }, |
| | | | | { |
| | | | | "props": [ |
| | | | | "_tag" |
| | | | | ], |
| | | | | "tagId": 66 |
| | | | | } |
| | | | | ] |
| | | | | exprs: |
| | | | | vertex_filter: |
| | | | | if_track_previous_path: false |
-----+----------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 7 | IndexScan | 2 | { | outputVar: { |
| | | | ver: 0, rows: 6, execTime: 0us, totalTime: 1966us | "colNames": [ |
| | | | "storaged2":9779 exec/total: 431(us)/1564(us) | "_vid" |
| | | | "storaged1":9779 exec/total: 443(us)/1561(us) | ], |
| | | | "storaged0":9779 exec/total: 413(us)/1696(us) | "type": "DATASET", |
| | | | storage_detail: {IndexLimitNode(limit=9223372036854775807):269(us),IndexProjectionNode(projectColumn=[_vid]):269(us),IndexVertexScanNode(IndexID=55, Path=(name=("T",INF])):272(us)} | "name": "__IndexScan_1" |
| | | | } | } |
| | | | | inputVar: |
| | | | | space: 49 |
| | | | | dedup: false |
| | | | | limit: 9223372036854775807 |
| | | | | filter: |
| | | | | orderBy: [] |
| | | | | schemaId: 50 |
| | | | | isEdge: false |
| | | | | returnCols: [ |
| | | | | "_vid" |
| | | | | ] |
| | | | | indexCtx: [ |
| | | | | { |
| | | | | "columnHints": [ |
| | | | | { |
| | | | | "includeEnd": false, |
| | | | | "includeBegin": false, |
| | | | | "endValue": "__EMPTY__", |
| | | | | "beginValue": "T", |
| | | | | "scanType": "RANGE", |
| | | | | "column": "name" |
| | | | | } |
| | | | | ], |
| | | | | "filter": "", |
| | | | | "index_id": 55 |
| | | | | } |
| | | | | ] |
-----+----------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
| 2 | Start | | ver: 0, rows: 0, execTime: 0us, totalTime: 36us | outputVar: { |
| | | | | "colNames": [], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Start_2" |
| | | | | } |
-----+----------------+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------
通过两个查询的 profile 分析,我们可以很清楚地看到:
-
IndexScan
在 1. 查询中没有columnHints
信息,扫描的行数为rows: 52
(这是所有的 player 顶点数量) -
IndexScan
在 2. 查询中有columnHints
信息,扫描的行数只有rows: 6
所以,我们在真的需要这种从属性反查图探索起点的时候,要根据实际查询需求去斟酌索引的创建。
小结
在理解了 NebulaGraph 的基本架构设计、存储格式、查询的简单调用流程和常见的优化规则之后,结合 PROFILE
/EXPLAIN
,我们可以一点点去设计更适合不同场景的图建模与图查询。
优化原则:减少模糊,增加确定,越早越好。
欢迎大家在本文的评论区讨论、提供更多优化查询的例子。
延伸阅读
- NebulaGraph 源码阅读 https://discuss.nebula-graph.com.cn/t/topic/4963
- NebulaGraph Index 实践 https://discuss.nebula-graph.com.cn/t/topic/689
- nGQL 引用属性文档 https://docs.nebula-graph.com.cn/3.2.0/3.ngql-guide/4.variable-and-composite-queries/3.property-reference/
谢谢你读完本文 (///▽///)
想来看看还有什么神奇的执行查询语句么?现在可以用 NebulaGraph Cloud 来搭建自己的图数据系统,领略各种查询语句的独特之处哟~ NebulaGraph 阿里云计算巢现 30 天免费使用中,点击链接来用用图数据库吧~
想看源码的小伙伴可以前往 GitHub 阅读、使用、(^з^)-☆ star 它 → GitHub;和其他的 NebulaGraph 用户一起交流图数据库技术和应用技能,留下「你的名片」一起玩耍呢~