关于match的索引命中问题

我创建了field_A和field_B的单独索引以及两者的联合索引。版本是2.6.0

match (v:TA)<-[e:EA]-(v1)<-[e10…3]-(v2)<-[e2:EB]-(v3:TB)-[e3:EC]->(v4:TA) where v.field_A == “value_A” or v.field_B== “value_B” return distinct v4.field_A;
业务数据的点和边被我替换了名字,我的查询时这样的,找终点的某个属性,查询时间是0.17s。
match (v:TA)<-[e:EA]-(v1)<-[e1
0…3]-(v2)<-[e2:EB]-(v3:TB)-[e3:EC]->(v4:TA{field_B: value_C}) where v.field_A == “value_A” or v.field_B== “value_B” return distinct v4.field_A;
但是当我在终点加上一个筛选条件的时候,查询时间变成了15s,请问这个是索引未命中吗?我应该怎么做呢?

你创建了什么索引呢。以及补充下版本号

我创建了field_A和field_B的单独索引以及两者的联合索引。版本是2.6.0

来个人呢,我再简化一下问题吧。
match (v:A)-[e1:B]-(v1:C)-[e2:D]-(v2:A) return distinct v2
v和v2的类型相同,他们分别有field_1和field_2两个字段,我建立了这两个字段的分别索引和联合索引
where v.filed_1 == “value_1” and v.filed_2 == “value_2”,这个查询很快,毫秒级返回
where v.filed_1 == “value_1” and v.filed_2 == “value_2” and v2.field_2 == “value_3”,这个查询很慢,需要15秒左右。
v和v2的类型相同,筛选的属性有一个相同,且有联合索引,为什么后面这个这么慢呢?

应该是优化规则有问题,能麻烦把两个 query 分别 profile 一下分享出来么?

正在使用的数据,稍等我修改一下,看如何去掉敏感信息后贴出来

1 个赞


请问这样的数据能够看出来吗,最后一列operator_info不是很好处理。

可以用 profile format="row" MATCH ... 在 console 里执行哈 ,dot格式可以在 https://dreampuf.github.io/GraphvizOnline/ 里渲染成图

我渲染了那个图,但是看不太懂,而且比较小,不好截图,我把原始查询和console内容发出来吧。
请问你们有关于执行计划方面的文档吗,执行计划的输出都不太能看的明白。
索引:entity_name和code_book_type有分别的索引和联合索引。
查询语句:
profile format=“dot” match (v:tag1)<-[e:edge1]-(v1:tag2)<-[e1:edge2]-(v2:tag2)<-[e2:edge3]-(v3:tag3)-[e3:edge4]->(v4:tag1) where v.entity_name == “A” or v.entity_name == “B” return distinct v3.entity_name, v4.entity_name, v4.code_book_type;
执行耗时:90ms
执行计划:digraph exec_plan {
rankdir=BT;
“DataCollect_39”[label="{DataCollect_39|outputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__DataCollect_39"}]|inputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__Dedup_38"}]}", shape=Mrecord];
“Dedup_38”->“DataCollect_39”;
“Dedup_38”[label="{Dedup_38|outputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__Dedup_38"}]|inputVar: __Project_37}", shape=Mrecord];
“Project_37”->“Dedup_38”;
“Project_37”[label="{Project_37|outputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__Project_37"}]|inputVar: __Filter_36}", shape=Mrecord];
“Filter_36”->“Project_37”;
“Filter_36”[label="{Filter_36|outputVar: [{“colNames”:[“v”,“e”,“v1”,“e1”,“v2”,“e2”,“v3”,“e3”,“v4”,"__COL_0"],“type”:“DATASET”,“name”:"__Filter_36"}]|inputVar: __Project_35}", shape=Mrecord];
“Project_35”->“Filter_36”;
“Project_35”[label="{Project_35|outputVar: [{“colNames”:[“v”,“e”,“v1”,“e1”,“v2”,“e2”,“v3”,“e3”,“v4”,"__COL_0"],“type”:“DATASET”,“name”:"__Project_35"}]|inputVar: __InnerJoin_34}", shape=Mrecord];
“InnerJoin_34”->“Project_35”;
“InnerJoin_34”[label="{InnerJoin_34|outputVar: [{“colNames”:["_path_0","_path_1","_path_2","_path_3","_path_4"],“type”:“DATASET”,“name”:"__InnerJoin_34"}]|inputVar: {“rightVar”:{"__Project_33":“0”},“leftVar”:{"__InnerJoin_28":“0”}}}", shape=Mrecord];
“Project_33”->“InnerJoin_34”;
“Project_33”[label="{Project_33|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_33"}]|inputVar: __Filter_32}", shape=Mrecord];
“Filter_32”->“Project_33”;
“Filter_32”[label="{Filter_32|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_32"}]|inputVar: __GetVertices_31}", shape=Mrecord];
“GetVertices_42”->“Filter_32”;
“GetVertices_42”[label="{GetVertices_42|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetVertices_31"}]|inputVar: __InnerJoin_28}", shape=Mrecord];
“InnerJoin_28”->“GetVertices_42”;
“InnerJoin_28”[label="{InnerJoin_28|outputVar: [{“colNames”:["_path_0","_path_1","_path_2","_path_3"],“type”:“DATASET”,“name”:"__InnerJoin_28"}]|inputVar: {“rightVar”:{"__Filter_27":“0”},“leftVar”:{"__InnerJoin_21":“0”}}}", shape=Mrecord];
“Filter_27”->“InnerJoin_28”;
“Filter_27”[label="{Filter_27|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_27"}]|inputVar: __Project_26}", shape=Mrecord];
“Project_26”->“Filter_27”;
“Project_26”[label="{Project_26|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_26"}]|inputVar: __Filter_25}", shape=Mrecord];
“Filter_25”->“Project_26”;
“Filter_25”[label="{Filter_25|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_25"}]|inputVar: __GetNeighbors_24}", shape=Mrecord];
“GetNeighbors_50”->“Filter_25”;
“GetNeighbors_50”[label="{GetNeighbors_50|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_24"}]|inputVar: __InnerJoin_21}", shape=Mrecord];
“InnerJoin_21”->“GetNeighbors_50”;
“InnerJoin_21”[label="{InnerJoin_21|outputVar: [{“colNames”:["_path_0","_path_1","_path_2"],“type”:“DATASET”,“name”:"__InnerJoin_21"}]|inputVar: {“rightVar”:{"__Filter_20":“0”},“leftVar”:{"__InnerJoin_14":“0”}}}", shape=Mrecord];
“Filter_20”->“InnerJoin_21”;
“Filter_20”[label="{Filter_20|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_20"}]|inputVar: __Project_19}", shape=Mrecord];
“Project_19”->“Filter_20”;
“Project_19”[label="{Project_19|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_19"}]|inputVar: __Filter_18}", shape=Mrecord];
“Filter_18”->“Project_19”;
“Filter_18”[label="{Filter_18|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_18"}]|inputVar: __GetNeighbors_17}", shape=Mrecord];
“GetNeighbors_49”->“Filter_18”;
“GetNeighbors_49”[label="{GetNeighbors_49|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_17"}]|inputVar: __InnerJoin_14}", shape=Mrecord];
“InnerJoin_14”->“GetNeighbors_49”;
“InnerJoin_14”[label="{InnerJoin_14|outputVar: [{“colNames”:["_path_0","_path_1"],“type”:“DATASET”,“name”:"__InnerJoin_14"}]|inputVar: {“rightVar”:{"__Filter_13":“0”},“leftVar”:{"__Filter_7":“0”}}}", shape=Mrecord];
“Filter_13”->“InnerJoin_14”;
“Filter_13”[label="{Filter_13|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_13"}]|inputVar: __Project_12}", shape=Mrecord];
“Project_12”->“Filter_13”;
“Project_12”[label="{Project_12|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_12"}]|inputVar: __Filter_11}", shape=Mrecord];
“Filter_11”->“Project_12”;
“Filter_11”[label="{Filter_11|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_11"}]|inputVar: __GetNeighbors_10}", shape=Mrecord];
“GetNeighbors_48”->“Filter_11”;
“GetNeighbors_48”[label="{GetNeighbors_48|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_10"}]|inputVar: __Filter_7}", shape=Mrecord];
“Filter_7”->“GetNeighbors_48”;
“Filter_7”[label="{Filter_7|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_7"}]|inputVar: __Project_6}", shape=Mrecord];
“Project_6”->“Filter_7”;
“Project_6”[label="{Project_6|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_6"}]|inputVar: __Filter_5}", shape=Mrecord];
“Filter_5”->“Project_6”;
“Filter_5”[label="{Filter_5|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_5"}]|inputVar: __GetNeighbors_4}", shape=Mrecord];
“GetNeighbors_47”->“Filter_5”;
“GetNeighbors_47”[label="{GetNeighbors_47|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_4"}]|inputVar: __IndexScan_1}", shape=Mrecord];
“IndexScan_40”->“GetNeighbors_47”;
“IndexScan_40”[label="{IndexScan_40|outputVar: [{“colNames”:["_vid"],“type”:“DATASET”,“name”:"__IndexScan_1"}]|inputVar: __VAR_0}", shape=Mrecord];
“Start_0”->“IndexScan_40”;
“Start_0”[label="{Start_0|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Start_0"}]|inputVar: }", shape=Mrecord];
}

执行语句:(相比上个,在v4上增加了一个筛选)
profile format=“dot” match (v:tag1)<-[e:edge1]-(v1:tag2)<-[e1:edge2]-(v2:tag2)<-[e2:edge3]-(v3:tag3)-[e3:edge4]->(v4:tag1{code_book_type: “C”}) where v.entity_name == “A” or v.entity_name == “B” return distinct v3.entity_name, v4.entity_name, v4.code_book_type;
执行耗时:11000ms
执行计划:
digraph exec_plan {
rankdir=BT;
“DataCollect_40”[label="{DataCollect_40|outputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__DataCollect_40"}]|inputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__Dedup_39"}]}", shape=Mrecord];
“Dedup_39”->“DataCollect_40”;
“Dedup_39”[label="{Dedup_39|outputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__Dedup_39"}]|inputVar: __Project_38}", shape=Mrecord];
“Project_38”->“Dedup_39”;
“Project_38”[label="{Project_38|outputVar: [{“colNames”:[“v3.entity_name”,“v4.entity_name”,“v4.code_book_type”],“type”:“DATASET”,“name”:"__Project_38"}]|inputVar: __Filter_37}", shape=Mrecord];
“Filter_42”->“Project_38”;
“Filter_42”[label="{Filter_42|outputVar: [{“colNames”:[“v”,“e”,“v1”,“e1”,“v2”,“e2”,“v3”,“e3”,“v4”,"__COL_0"],“type”:“DATASET”,“name”:"__Filter_37"}]|inputVar: __Project_35}", shape=Mrecord];
“Project_35”->“Filter_42”;
“Project_35”[label="{Project_35|outputVar: [{“colNames”:[“v”,“e”,“v1”,“e1”,“v2”,“e2”,“v3”,“e3”,“v4”,"__COL_0"],“type”:“DATASET”,“name”:"__Project_35"}]|inputVar: __InnerJoin_34}", shape=Mrecord];
“InnerJoin_34”->“Project_35”;
“InnerJoin_34”[label="{InnerJoin_34|outputVar: [{“colNames”:["_path_5","_path_8","_path_7","_path_6","_path_9"],“type”:“DATASET”,“name”:"__InnerJoin_34"}]|inputVar: {“rightVar”:{"__Project_33":“0”},“leftVar”:{"__InnerJoin_28":“0”}}}", shape=Mrecord];
“Project_33”->“InnerJoin_34”;
“Project_33”[label="{Project_33|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_33"}]|inputVar: __Filter_32}", shape=Mrecord];
“Filter_32”->“Project_33”;
“Filter_32”[label="{Filter_32|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_32"}]|inputVar: __GetVertices_31}", shape=Mrecord];
“GetVertices_44”->“Filter_32”;
“GetVertices_44”[label="{GetVertices_44|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetVertices_31"}]|inputVar: __InnerJoin_28}", shape=Mrecord];
“InnerJoin_28”->“GetVertices_44”;
“InnerJoin_28”[label="{InnerJoin_28|outputVar: [{“colNames”:["_path_5","_path_8","_path_7","_path_6"],“type”:“DATASET”,“name”:"__InnerJoin_28"}]|inputVar: {“rightVar”:{"__Filter_27":“0”},“leftVar”:{"__InnerJoin_21":“0”}}}", shape=Mrecord];
“Filter_27”->“InnerJoin_28”;
“Filter_27”[label="{Filter_27|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_27"}]|inputVar: __Project_26}", shape=Mrecord];
“Project_26”->“Filter_27”;
“Project_26”[label="{Project_26|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_26"}]|inputVar: __Filter_25}", shape=Mrecord];
“Filter_25”->“Project_26”;
“Filter_25”[label="{Filter_25|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_25"}]|inputVar: __GetNeighbors_24}", shape=Mrecord];
“GetNeighbors_52”->“Filter_25”;
“GetNeighbors_52”[label="{GetNeighbors_52|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_24"}]|inputVar: __InnerJoin_21}", shape=Mrecord];
“InnerJoin_21”->“GetNeighbors_52”;
“InnerJoin_21”[label="{InnerJoin_21|outputVar: [{“colNames”:["_path_5","_path_8","_path_7"],“type”:“DATASET”,“name”:"__InnerJoin_21"}]|inputVar: {“rightVar”:{"__Filter_20":“0”},“leftVar”:{"__InnerJoin_14":“0”}}}", shape=Mrecord];
“Filter_20”->“InnerJoin_21”;
“Filter_20”[label="{Filter_20|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_20"}]|inputVar: __Project_19}", shape=Mrecord];
“Project_19”->“Filter_20”;
“Project_19”[label="{Project_19|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_19"}]|inputVar: __Filter_18}", shape=Mrecord];
“Filter_18”->“Project_19”;
“Filter_18”[label="{Filter_18|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_18"}]|inputVar: __GetNeighbors_17}", shape=Mrecord];
“GetNeighbors_51”->“Filter_18”;
“GetNeighbors_51”[label="{GetNeighbors_51|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_17"}]|inputVar: __InnerJoin_14}", shape=Mrecord];
“InnerJoin_14”->“GetNeighbors_51”;
“InnerJoin_14”[label="{InnerJoin_14|outputVar: [{“colNames”:["_path_5","_path_8"],“type”:“DATASET”,“name”:"__InnerJoin_14"}]|inputVar: {“rightVar”:{"__Filter_13":“0”},“leftVar”:{"__Filter_7":“0”}}}", shape=Mrecord];
“Filter_13”->“InnerJoin_14”;
“Filter_13”[label="{Filter_13|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_13"}]|inputVar: __Project_12}", shape=Mrecord];
“Project_12”->“Filter_13”;
“Project_12”[label="{Project_12|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_12"}]|inputVar: __Filter_11}", shape=Mrecord];
“Filter_11”->“Project_12”;
“Filter_11”[label="{Filter_11|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_11"}]|inputVar: __GetNeighbors_10}", shape=Mrecord];
“GetNeighbors_50”->“Filter_11”;
“GetNeighbors_50”[label="{GetNeighbors_50|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_10"}]|inputVar: __Filter_7}", shape=Mrecord];
“Filter_7”->“GetNeighbors_50”;
“Filter_7”[label="{Filter_7|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Filter_7"}]|inputVar: __Project_6}", shape=Mrecord];
“Project_6”->“Filter_7”;
“Project_6”[label="{Project_6|outputVar: [{“colNames”:["_path"],“type”:“DATASET”,“name”:"__Project_6"}]|inputVar: __Filter_5}", shape=Mrecord];
“Filter_5”->“Project_6”;
“Filter_5”[label="{Filter_5|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Filter_5"}]|inputVar: __GetNeighbors_4}", shape=Mrecord];
“GetNeighbors_49”->“Filter_5”;
“GetNeighbors_49”[label="{GetNeighbors_49|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__GetNeighbors_4"}]|inputVar: __IndexScan_1}", shape=Mrecord];
“IndexScan_41”->“GetNeighbors_49”;
“IndexScan_41”[label="{IndexScan_41|outputVar: [{“colNames”:["_vid"],“type”:“DATASET”,“name”:"__IndexScan_1"}]|inputVar: __VAR_0}", shape=Mrecord];
“Start_0”->“IndexScan_41”;
“Start_0”[label="{Start_0|outputVar: [{“colNames”:[],“type”:“DATASET”,“name”:"__Start_0"}]|inputVar: }", shape=Mrecord];
}

first.txt (21.1 KB)
second.txt (21.1 KB)
分别是第一次和第二次的plan
似乎下载的图片太大,无法上传。
格式好像是被这个编辑器给格式化掉了。

profile format=“dot” 根本看不出索引相关的信息

1 个赞

你就 explain 一下,然后看 IndexScan 的 IndexCtx,把这个和你的语句贴出来就行

1 个赞

抱歉,我误导你了, dot format 没有需要的信息,可以按照 @kyle 直接在 console 打印 explain 的结果贴一下么?

explain match (v:icd_code_book_tag)<-[e:classification_code_medical_service_item_edge]-(v1:procedure_tag)<-[e1:subclass_of_edge]-(v2:procedure_tag)<-[e2:medicare_treatment_plan_edge]-(v3:disorder_tag)-[e3:icd_10_edge]->(v4:icd_code_book_tag) where v.entity_name == “002401000040400” or v.entity_name == “003310050160200” return distinct v3.entity_name, v4.entity_name, v4.code_book_type;

explain match (v:icd_code_book_tag)<-[e:classification_code_medical_service_item_edge]-(v1:procedure_tag)<-[e1:subclass_of_edge]-(v2:procedure_tag)<-[e2:medicare_treatment_plan_edge]-(v3:disorder_tag)-[e3:icd_10_edge]->(v4:icd_code_book_tag{code_book_type: “疾病分类与代码国家临床版2.0”}) where v.entity_name == “002401000040400” or v.entity_name == “003310050160200” return distinct v3.entity_name, v4.entity_name, v4.code_book_type;

1 个赞

@kyle

看起来先搜索右边作为起点扫单 code_book_type 条件的 cost 大很多,这个 rule 在 RBO 角度是合理的么,还是这里缺失了优先匹配多条件(符合索引)作为起点的 rule?多条件的时候增加优先匹配复合索引(至少都是 ==的情况下)的 rule 是不是合理的?

还是说在 Cost Based 引入之前这个情况不好过度通过 Rule 优化?

这个不是复合索引,只是多趟 or。
RBO 中以上索引选取是符合预期的,性能差的原因确实是 code_book_type 的选择度比较低,只能通过 CBO 或者 index hint 去做,但现在 nebula 都还不支持。

可以尝试以下语句写法:

MATCH (v:icd_code_book_tag)<-[e:classification_code_medical_service_item_edge]-(v1:procedure_tag)<-[e1:subclass_of_edge]-(v2:procedure_tag)<-[e2:medicare_treatment_plan_edge]-(v3:disorder_tag)-[e3:icd_10_edge]->(v4:icd_code_book_tag) 
  WHERE v.entity_name == “002401000040400” or v.entity_name == “003310050160200” 
WITH v3.entity_name AS v3_ename, v4.entity_name AS v4_ename, v4.code_book_type AS v4_cbt
  WHERE v4_cbt == “疾病分类与代码国家临床版2.0”
RETURN v3_name, v4_ename, v4_cbt
1 个赞

@kyle @wey 有关于执行计划方面的说明文档吗,我看文档里面只有这个https://docs.nebula-graph.com.cn/2.6.1/3.ngql-guide/17.query-tuning-statements/1.explain-and-profile/,但是这个似乎说的很简单,看了这个也不知道怎么看执行计划,完全没法做执行计划方面的优化。

好问题,这方面我们确实可以准备一些文章、文档。希望可以尽快分享出来。
在那之前,你可以通过源码解读文章有一些概念上的认识(可能您已经扫过这些文章了)

https://nebula-graph.com.cn/tags/源码解读/

好的,希望你们有空还是完善一下文档方面的内容,我们最近读下来发现很多地方不是很好理解,希望未来可以更好。

3 个赞

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。