ngql中的count使用表达式,统计数量不对

nebula版本:2.0.1

看使用手册,count函数中应该是可以使用表达式过滤的,我在ngql中的count使用表达式,统计数量不对,具体查询代码和数据如下,isrh字段是string类型,我用in和==匹配,统计到的数据都不对。
请问是我的使用方式不对吗?应该如何正确在count中使用条件过滤并统计数据。

另外如果可以的话,建议在sum、avg等函数中也同样可以传入表达式与计算字段两个参数,因为有时候TAG的属性比较多,需要计算的属性比较多,如果不支持条件过滤,需要提前过滤的话,一次查询可能要拆分成很多条ngql(因为每个属性计算的过滤条件不一致)

(root@nebula) [social_book]> go from   "UR4498673263107723264"  over e_call BIDIRECT yield distinct $$.user.user_no as userNo,$$.user.birth_date as birthDate,$$.user.date_appl_submit as dateApplSubmit,$$.user.isrh as isrh | yield count($-.isrh in ["0","1"]) as isrhCnt
+---------+
| isrhCnt |
+---------+
| 5       |
+---------+
Got 1 rows (time spent 2146/53245 us)

Tue, 17 Aug 2021 14:33:58 CST
(root@nebula) [social_book]> go from   "UR4498673263107723264"  over e_call BIDIRECT yield distinct $$.user.user_no as userNo,$$.user.birth_date as birthDate,$$.user.date_appl_submit as dateApplSubmit,$$.user.isrh as isrh

+-------------------------+--------------+----------------+------+
| userNo                  | birthDate    | dateApplSubmit | isrh |
+-------------------------+--------------+----------------+------+
| "UR4497938639351881728" | "1997-05-10" | 1551020683     | "1"  |
+-------------------------+--------------+----------------+------+
| "UR4498067645875040256" | "1994-12-17" | 1557064981     | "4"  |
+-------------------------+--------------+----------------+------+
| "UR4498304571505025024" | "1974-12-05" | 1536969424     | "1"  |
+-------------------------+--------------+----------------+------+
| "UR4498353365152079872" | "1978-07-04" | 1606261094     | "4"  |
+-------------------------+--------------+----------------+------+
| "UR4497872659661627392" | "1993-09-21" | 1605229438     | "1"  |
+-------------------------+--------------+----------------+------+
Got 5 rows (time spent 2217/53483 us)

结果是对的啊,不就五行吗

我的count里面加了过滤条件啊,满足条件的只有三条!也就是说count()里面其实不能接收条件过滤的表达式,接收了语法不会报错,但是过率条件不会生效

你如果想过滤掉除去“4”这条数据,应该把他当作谓词。

go from   "UR4498673263107723264"  over e_call BIDIRECT 
where $$.user.isrh in ["0","1"] 
yield distinct $$.user.user_no as userNo,$$.user.birth_date as birthDate,$$.user.date_appl_submit as dateApplSubmit,$$.user.isrh as isrh 
| yield count(*) as isrhCnt

你 count 里边加了过滤条件也是五行结果啊。count 内部表达式执行结果依次是 true、false、true、false、true。

嗯,昨天讨论了这个,如果只是查询统计统一过滤条件的时候,这样确实可以,但是我们有几十个统计指标,而且统计过滤条件不一,如果是按照这种方法就要写五六十行ngql了,性能没有保障,传输ngql也比较大,编写也比较麻烦。
所以希望官方能够提供count、avg、sum等接收过滤条件参数的函数

Aggregate 函数也是函数是允许嵌套的,所以不会报错。in 表达式生效了,但它没有作为过滤条件。

你是说类似 sql 里边的 having?

类似吧,但还是不一样。因为图数据库很多是用来做图计算,图计算的过滤指标很多不一致,所以会有这样的需求。
比如 go from “id” over e_edge where xxx yield xxx,xxxx,xxxxx,xxxxx | yield count(满足条件过滤1的数量), avg(满足过滤条件2的,x的平均值),sum(满足过滤条件3,y的总和)……

有gremlin或者cypher的例子嘛?

如果你希望对聚合函数结果做过滤的话,go 语句目前是不支持的。
cypher 的写法类似:

MATCH (v)-[e_call]-(n) 
WHERE id(v)=="UR4498673263107723264" 
WITH count(n.isrh) AS isrhCnt WHERE n.isrh in ["0", "1"]
RETURN isrhCnt
1 个赞

你这种语句写法跟你的描述完全不一致,这种写法的语句有它自己的语义。

你觉得它的语义是哪样的呢?我觉得它表达的语义就应该是我所描述的那样的,不然在count中传递一个表达式就毫无意义了

建议先熟悉一下基本 sql

mysql 测试:

mysql> select * from t1;
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.01 sec)

mysql> select count(a > 2) from t1;
+--------------+
| count(a > 2) |
+--------------+
|            3 |
+--------------+
1 row in set (0.00 sec)

cypher 测试:

neo4j@neo4j> match (v:Person) where v.born in [1966,1967,1968] return count(v.born);
+---------------+
| count(v.born) |
+---------------+
| 16            |
+---------------+


neo4j@neo4j> match (v:Person) return count(v.born in [1966,1967,1968]);
+------------------------------------+
| count( v.born in [1966,1967,1968]) |
+------------------------------------+
| 126                                |
+------------------------------------+
1 个赞

msyql count 里面支持 case when 这种表达式,可以满足要求。ng 如果要实现类似的功能,怎么去实现呢?

select count(*),
count(city_name),
count(distinct city_name),
count(case
when xs_code like ‘23%’ or xs_code like ‘24%’ then
city_name
end),
count(distinct case
when xs_code like ‘23%’ or xs_code like ‘24%’ then
city_name
end)
from tb_county

nebula 是支持的,你可以测试一下

(czp@nebula) [nba]> match (v:player) where v.age in [41,42,43] return count(v.age)
+--------------+
| count(v.age) |
+--------------+
| 4            |
+--------------+
Got 1 rows (time spent 16552/16874 us)

(czp@nebula) [nba]> match (v:player) return count(case when v.age in [41,42,43] then true end)
+------------------------------------------------------+
| count(CASE WHEN (v.age IN [41,42,43]) THEN true END) |
+------------------------------------------------------+
| 4                                                    |
+------------------------------------------------------+
Got 1 rows (time spent 15062/15367 us)
7 个赞

果然能解决问题,感谢 :laughing:

这就是我想要的效果,nice,多谢

谢谢,确实可以