ngql中的count使用表达式，统计数量不对

Anyzm · 2021 年8 月 17 日 07:43

nebula版本：2.0.1

看使用手册，count函数中应该是可以使用表达式过滤的，我在ngql中的count使用表达式，统计数量不对，具体查询代码和数据如下，isrh字段是string类型，我用in和==匹配，统计到的数据都不对。
请问是我的使用方式不对吗？应该如何正确在count中使用条件过滤并统计数据。

另外如果可以的话，建议在sum、avg等函数中也同样可以传入表达式与计算字段两个参数，因为有时候TAG的属性比较多，需要计算的属性比较多，如果不支持条件过滤，需要提前过滤的话，一次查询可能要拆分成很多条ngql（因为每个属性计算的过滤条件不一致）

(root@nebula) [social_book]> go from   "UR4498673263107723264"  over e_call BIDIRECT yield distinct $$.user.user_no as userNo,$$.user.birth_date as birthDate,$$.user.date_appl_submit as dateApplSubmit,$$.user.isrh as isrh | yield count($-.isrh in ["0","1"]) as isrhCnt
+---------+
| isrhCnt |
+---------+
| 5       |
+---------+
Got 1 rows (time spent 2146/53245 us)

Tue, 17 Aug 2021 14:33:58 CST
(root@nebula) [social_book]> go from   "UR4498673263107723264"  over e_call BIDIRECT yield distinct $$.user.user_no as userNo,$$.user.birth_date as birthDate,$$.user.date_appl_submit as dateApplSubmit,$$.user.isrh as isrh

+-------------------------+--------------+----------------+------+
| userNo                  | birthDate    | dateApplSubmit | isrh |
+-------------------------+--------------+----------------+------+
| "UR4497938639351881728" | "1997-05-10" | 1551020683     | "1"  |
+-------------------------+--------------+----------------+------+
| "UR4498067645875040256" | "1994-12-17" | 1557064981     | "4"  |
+-------------------------+--------------+----------------+------+
| "UR4498304571505025024" | "1974-12-05" | 1536969424     | "1"  |
+-------------------------+--------------+----------------+------+
| "UR4498353365152079872" | "1978-07-04" | 1606261094     | "4"  |
+-------------------------+--------------+----------------+------+
| "UR4497872659661627392" | "1993-09-21" | 1605229438     | "1"  |
+-------------------------+--------------+----------------+------+
Got 5 rows (time spent 2217/53483 us)

kyle · 2021 年8 月 18 日 03:51

结果是对的啊，不就五行吗

Anyzm · 2021 年8 月 18 日 03:53

Anyzm:

+-------------------------+--------------+----------------+------+
| "UR4498067645875040256" | "1994-12-17" | 1557064981     | "4"  |
+-------------------------+--------------+----------------+------+
| "UR4498304571505025024" | "1974-12-05" | 1536969424     | "1"  |
+-------------------------+--------------+----------------+------+

我的count里面加了过滤条件啊，满足条件的只有三条！也就是说count()里面其实不能接收条件过滤的表达式，接收了语法不会报错，但是过率条件不会生效

kyle · 2021 年8 月 18 日 03:53

你如果想过滤掉除去“4”这条数据，应该把他当作谓词。

go from   "UR4498673263107723264"  over e_call BIDIRECT 
where $$.user.isrh in ["0","1"] 
yield distinct $$.user.user_no as userNo,$$.user.birth_date as birthDate,$$.user.date_appl_submit as dateApplSubmit,$$.user.isrh as isrh 
| yield count(*) as isrhCnt

kyle · 2021 年8 月 18 日 03:55

你 count 里边加了过滤条件也是五行结果啊。count 内部表达式执行结果依次是 true、false、true、false、true。

Anyzm · 2021 年8 月 18 日 03:56

嗯，昨天讨论了这个，如果只是查询统计统一过滤条件的时候，这样确实可以，但是我们有几十个统计指标，而且统计过滤条件不一，如果是按照这种方法就要写五六十行ngql了，性能没有保障，传输ngql也比较大，编写也比较麻烦。
所以希望官方能够提供count、avg、sum等接收过滤条件参数的函数

kyle · 2021 年8 月 18 日 03:56

Aggregate 函数也是函数是允许嵌套的，所以不会报错。in 表达式生效了，但它没有作为过滤条件。

kyle · 2021 年8 月 18 日 03:57

你是说类似 sql 里边的 having？

Anyzm · 2021 年8 月 18 日 04:00

类似吧，但还是不一样。因为图数据库很多是用来做图计算，图计算的过滤指标很多不一致，所以会有这样的需求。
比如 go from “id” over e_edge where xxx yield xxx,xxxx,xxxxx,xxxxx | yield count(满足条件过滤1的数量), avg(满足过滤条件2的，x的平均值)，sum（满足过滤条件3，y的总和）……

min.wu · 2021 年8 月 18 日 05:22

有gremlin或者cypher的例子嘛？

kyle · 2021 年8 月 18 日 06:05

如果你希望对聚合函数结果做过滤的话，go 语句目前是不支持的。
cypher 的写法类似:

MATCH (v)-[e_call]-(n) 
WHERE id(v)=="UR4498673263107723264" 
WITH count(n.isrh) AS isrhCnt WHERE n.isrh in ["0", "1"]
RETURN isrhCnt

kyle · 2021 年8 月 18 日 06:09

你这种语句写法跟你的描述完全不一致，这种写法的语句有它自己的语义。

Anyzm · 2021 年8 月 18 日 12:05

你觉得它的语义是哪样的呢？我觉得它表达的语义就应该是我所描述的那样的，不然在count中传递一个表达式就毫无意义了

kyle · 2021 年8 月 19 日 02:04

建议先熟悉一下基本 sql

mysql 测试:

mysql> select * from t1;
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.01 sec)

mysql> select count(a > 2) from t1;
+--------------+
| count(a > 2) |
+--------------+
|            3 |
+--------------+
1 row in set (0.00 sec)

cypher 测试:

neo4j@neo4j> match (v:Person) where v.born in [1966,1967,1968] return count(v.born);
+---------------+
| count(v.born) |
+---------------+
| 16            |
+---------------+


neo4j@neo4j> match (v:Person) return count(v.born in [1966,1967,1968]);
+------------------------------------+
| count( v.born in [1966,1967,1968]) |
+------------------------------------+
| 126                                |
+------------------------------------+

blacklovebear · 2021 年8 月 19 日 02:21

msyql count 里面支持 case when 这种表达式，可以满足要求。ng 如果要实现类似的功能，怎么去实现呢？

select count(*),
count(city_name),
count(distinct city_name),
count(case
when xs_code like ‘23%’ or xs_code like ‘24%’ then
city_name
end),
count(distinct case
when xs_code like ‘23%’ or xs_code like ‘24%’ then
city_name
end)
from tb_county

kyle · 2021 年8 月 19 日 02:25

nebula 是支持的，你可以测试一下

kyle · 2021 年8 月 19 日 02:27

(czp@nebula) [nba]> match (v:player) where v.age in [41,42,43] return count(v.age)
+--------------+
| count(v.age) |
+--------------+
| 4            |
+--------------+
Got 1 rows (time spent 16552/16874 us)

(czp@nebula) [nba]> match (v:player) return count(case when v.age in [41,42,43] then true end)
+------------------------------------------------------+
| count(CASE WHEN (v.age IN [41,42,43]) THEN true END) |
+------------------------------------------------------+
| 4                                                    |
+------------------------------------------------------+
Got 1 rows (time spent 15062/15367 us)

blacklovebear · 2021 年8 月 19 日 02:33

果然能解决问题，感谢

Anyzm · 2021 年8 月 19 日 02:35

这就是我想要的效果，nice，多谢

Chen-Qiong · 2021 年8 月 23 日 08:28

谢谢，确实可以