limit 没有下推,如何优化

提问参考模版:

  • nebula 版本:nebula3.4.1
  • 部署方式: 分布式
  • 安装方式: RPM
  • 是否上生产环境:Y
  • 硬件信息
    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息
  • 问题的具体描述
    查询语句原始如下,逻辑就是找到该点一跳内的数据,但是数据量很多
GO FROM 815677140545765099 OVER Relation BIDIRECT WHERE Relation.name=="竞争" YIELD DISTINCT "候选人", $$.Company.keyno as keyno, $$.Company.name as name, Relation.data as tender, Relation.publish_time as ts | ORDER BY $-.ts desc

查询之后在内存中对keyno 进行group by 但是数据量太多,想limit,语句如下:

GO FROM -4010402952414171612 OVER Relation BIDIRECT WHERE Relation.name=="竞争" YIELD DISTINCT "候选人", $$.Company.keyno as keyno, $$.Company.name as name |
YIELD $-.keyno, $-.name, count(*) as cnt | ORDER BY $-.cnt desc | limit 0, 1

用了limit 之后发现limit 貌似没有下推,profile 之后如下, 这个有什么优化的方式吗

id	name	dependencies	profiling data	operator info
12	TopN	9	ver: 0, rows: 1, execTime: 9204us, totalTime: 9207us	"outputVar: {
  ""colNames"": [
    ""$-.keyno"",
    ""$-.name"",
    ""cnt""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Limit_11""
}
inputVar: __Aggregate_9
factors: [
  {
    ""cnt"": ""DESCEND""
  }
]
offset: 0
count: 1"
9	Aggregate	8	ver: 0, rows: 67667, execTime: 269129us, totalTime: 269133us	"outputVar: {
  ""colNames"": [
    ""$-.keyno"",
    ""$-.name"",
    ""cnt""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Aggregate_9""
}
inputVar: __Dedup_8
groupKeys: [
  ""$-.keyno"",
  ""$-.name""
]
groupItems: [
  {
    ""expr"": ""$-.keyno""
  },
  {
    ""expr"": ""$-.name""
  },
  {
    ""expr"": ""count(*)""
  }
]"
8	Dedup	7	ver: 0, rows: 67667, execTime: 2093495us, totalTime: 2093502us	"outputVar: {
  ""colNames"": [
    ""\""中标候选人\"""",
    ""keyno"",
    ""name""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Dedup_8""
}
inputVar: __Project_7"
7	Project	14	ver: 0, rows: 4289604, execTime: 2174478us, totalTime: 2174487us	"outputVar: {
  ""colNames"": [
    ""\""中标候选人\"""",
    ""keyno"",
    ""name""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Project_7""
}
inputVar: __Filter_6
columns: [
  ""\""中标候选人\"""",
  ""$__COL_1 AS keyno"",
  ""$__COL_2 AS name""
]"
14	LeftJoin	13	ver: 0, rows: 4289604, execTime: 2484443us, totalTime: 2484448us	"outputVar: {
  ""colNames"": [
    ""__COL_0"",
    ""JOIN_DST_VID"",
    ""__COL_1"",
    ""__COL_2"",
    ""DST_VID""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Filter_6""
}
inputVar: {
  ""rightVar"": {
    ""__Project_4"": 0
  },
  ""leftVar"": {
    ""__Filter_13"": 0
  }
}
hashKeys: [
  ""COLUMN[-1]""
]
probeKeys: [
  ""COLUMN[-1]""
]
kind: LeftJoin"
13	Filter	4	ver: 0, rows: 4289604, execTime: 209276us, totalTime: 209284us	"outputVar: {
  ""colNames"": [
    ""__COL_0"",
    ""JOIN_DST_VID""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Filter_13""
}
inputVar: __Project_2
condition: ($__Project_2.__COL_0==""招投标竞争"")
isStable: false"
4	Project	3	ver: 0, rows: 102594, execTime: 45481us, totalTime: 45490us	"outputVar: {
  ""colNames"": [
    ""__COL_1"",
    ""__COL_2"",
    ""DST_VID""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Project_4""
}
inputVar: __GetVertices_3
columns: [
  ""$$.Company.keyno AS __COL_1"",
  ""$$.Company.name AS __COL_2"",
  ""COLUMN[0] AS DST_VID""
]"
3	GetVertices	2	"{
ver: 0, rows: 102594, execTime: 459035us, totalTime: 492206us
resp[4]: {
  ""exec"": ""18838(us)"",
  ""host"": ""10.0.20.168:9779"",
  ""total"": ""42262(us)""
}
total_rpc: 145581(us)
resp[2]: {
  ""exec"": ""21372(us)"",
  ""host"": ""10.0.20.166:9779"",
  ""total"": ""43438(us)""
}
resp[1]: {
  ""exec"": ""18496(us)"",
  ""host"": ""10.0.20.170:9779"",
  ""total"": ""38311(us)""
}
resp[0]: {
  ""exec"": ""19398(us)"",
  ""host"": ""10.0.20.169:9779"",
  ""total"": ""42008(us)""
}
resp[3]: {
  ""exec"": ""19931(us)"",
  ""host"": ""10.0.20.167:9779"",
  ""total"": ""40349(us)""
}
resp[5]: {
  ""exec"": ""20970(us)"",
  ""host"": ""10.0.20.171:9779"",
  ""total"": ""40131(us)""
}
}"	"outputVar: {
  ""colNames"": [],
  ""type"": ""DATASET"",
  ""name"": ""__GetVertices_3""
}
inputVar: __Project_2
space: 16
dedup: true
limit: 9223372036854775807
filter: 
orderBy: []
src: COLUMN[-1]
props: [
  {
    ""props"": [
      ""keyno"",
      ""name""
    ],
    ""tagId"": 17
  }
]
exprs: "
2	Project	1	ver: 0, rows: 4372373, execTime: 1564262us, totalTime: 1564272us	"outputVar: {
  ""colNames"": [
    ""__COL_0"",
    ""JOIN_DST_VID""
  ],
  ""type"": ""DATASET"",
  ""name"": ""__Project_2""
}
inputVar: __GetNeighbors_1
columns: [
  ""Relation.name AS __COL_0"",
  ""*._dst AS JOIN_DST_VID""
]"
1	GetNeighbors	0	"{
ver: 0, rows: 4372373, execTime: 46us, totalTime: 3167862us
resp[0]: {
  ""exec"": ""1814403(us)"",
  ""host"": ""10.0.20.168:9779"",
  ""storage_detail"": {
    ""GetNeighborsNode"": ""1810652(us)"",
    ""HashJoinNode"": ""90(us)"",
    ""RelNode"": ""1810652(us)"",
    ""SingleEdgeNode"": ""88(us)""
  },
  ""total"": ""3167654(us)"",
  ""vertices"": 1
}
total_rpc_time: 3167798(us)
}"	"outputVar: {
  ""colNames"": [],
  ""type"": ""DATASET"",
  ""name"": ""__GetNeighbors_1""
}
inputVar: __VAR_0
space: 16
dedup: false
limit: -1
filter: 
orderBy: []
src: COLUMN[0]
edgeTypes: []
edgeDirection: OUT_EDGE
vertexProps: 
edgeProps: [
  {
    ""props"": [
      ""_dst"",
      ""name""
    ],
    ""type"": -19
  },
  {
    ""props"": [
      ""_dst"",
      ""name""
    ],
    ""type"": 19
  }
]
statProps: 
exprs: 
random: false"
0	Start		ver: 0, rows: 0, execTime: 0us, totalTime: 26us	"outputVar: {
  ""colNames"": [],
  ""type"": ""DATASET"",
  ""name"": ""__Start_0""
}"

查询里面有 aggregate 没法下推 limit 的,如果下推了,聚合的结果就不对了

意识到这个情况了,还有其他优化的方式吗

看计划,属性也基本裁剪了。可以考虑拆解一下把双向边改单向边 union 结果看看能不能满足业务要求

感谢!试了下UNION 两个方向的写法

  1. 如果用UNION 好像 不LIMIT,直接取全部结果差不多,还没系统测试,点了几个数据
  2. Union 怎么保证整体按照cnt 排序,下面写法无法保证这一点
GO FROM -4010402952414171612 OVER Relation BIDIRECT WHERE Relation.name=="竞争" YIELD DISTINCT "候选人", $$.Company.keyno as keyno, $$.Company.name as name |
YIELD $-.keyno, $-.name, count(*) as cnt | ORDER BY $-.cnt desc | limit 0, 5 UNION
GO FROM -4010402952414171612 OVER Relation BIDIRECT WHERE Relation.name=="竞争" YIELD DISTINCT "候选人", $$.Company.keyno as keyno, $$.Company.name as name |
YIELD $-.keyno, $-.name, count(*) as cnt | ORDER BY $-.cnt desc | limit 0, 5 | ORDER BY $-.cnt desc

第7步,project ,14步 leftjoin 的作用是什么 看耗时挺久的

可以试试排序放到后面做,类似下面:

$var = GO FROM -4010402952414171612 OVER Relation WHERE Relation.name=="竞争" YIELD DISTINCT "候选人", $$.Company.keyno as keyno, $$.Company.name as name |
YIELD $-.keyno as keyno, $-.name as name, count(*) as cnt | limit 0, 5
UNION
GO FROM -4010402952414171612 OVER Relation REVERSELY WHERE Relation.name=="竞争" YIELD DISTINCT "候选人", $$.Company.keyno as keyno, $$.Company.name as name |
YIELD $-.keyno as keyno, $-.name as name, count(*) as cnt | limit 0, 5;

YIELD $var.keyno as keyno, $var.name as name, $var.cnt as cnt | order by $-.cnt desc

另外上面计划中的 left join 主要是为了 join 边和终点的数据,原因是 nebula 是把起点和边的属性放在一起,如果查询中使用到了终点的属性,需要单独请求,就像你的 query 中的 $$.Company.keyno

这样逻辑是不是不太对,原先的需求是找出Group by 之后,cnt 最大的前十的结果。这样各自limit 5 是随机的,最后排序结果不就不能保证这十个是整体中最大的吗?
ps:
上面写法报错了,$var 不存在
-1009:SemanticError: $var.keyno', not exist variable var’

那可以把 union 前后的 limit 拿到后面做。至于语法错误,你可以看看是不是把 $var 定义和后面的 yield 语句分开使用了,需要在一条 query 中执行,类似下面:

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。