match多语句查询

贴一下我本地的执行结果,不过版本是最新的 master:

(user@nebula) [nba]> match (v:player) where v.name CONTAINS 'Tim' RETURN v.age as age, id(v) as vid | GO FROM $-.vid OVER like REVERSELY YIELD $-.age as age, ("Tony Parker" == like._dst) as liked, $-.vid as vid, like._dst as dst
+-----+-------+--------------+---------------------+
| age | liked | vid          | dst                 |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Aron Baynes"       |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Boris Diaw"        |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Danny Green"       |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Dejounte Murray"   |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "LaMarcus Aldridge" |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Manu Ginobili"     |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Marco Belinelli"   |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Shaquile O'Neal"   |
+-----+-------+--------------+---------------------+
| 42  | false | "Tim Duncan" | "Tiago Splitter"    |
+-----+-------+--------------+---------------------+
| 42  | true  | "Tim Duncan" | "Tony Parker"       |
+-----+-------+--------------+---------------------+
Got 10 rows (time spent 4982/5480 us)

Sat, 08 May 2021 17:58:27 CST

@yee 你这也一样,查出来都是重复的数据。

根据你上面语句,我们目标是查询名字包含 “Tim” 的用户,你上面实际查出来只有一个 “Tim Duncan”。
所以实际上要查到的数据应该只有 “Tim Duncan” 一条数据而已的。

我们要的是根据关键词搜索出来的用户,而不是搜索出来的用户的like用户,本末倒置了。查询like只是为了判断而已的,不是输出。

语义上讲理想的结果应该类似于这样:

+-----+-------+--------------+---------------------+
|      vid      | age |         liked          
+-----+-------+--------------+---------------------+
| "Tim Duncan"  | 42  | "Tony Parker" in [likes] ? true : false  
+-----+-------+--------------+---------------------+

可以利用group bytoString() 做进一步hack,如果不满意result里是套在一起,可以进一步处理的。

match (v:player) where v.name CONTAINS 'a' RETURN v.age as age, id(v) as vid | \
  GO FROM $-.vid OVER follow REVERSELY YIELD $-.age as age, (toString("player148" == follow._dst) + "," + toString($-.vid)) as vid, follow._dst as dst | \
  GROUP BY $-.vid YIELD $-.vid as result, count(*) as count

+-------------------+-------+
| result            | count |
+-------------------+-------+
| "false,player114" | 3     |
+-------------------+-------+
| "false,player125" | 4     |
+-------------------+-------+
| "false,player120" | 3     |
+-------------------+-------+
| "false,player119" | 1     |
+-------------------+-------+
| "false,player145" | 1     |
+-------------------+-------+
| "false,player143" | 1     |
+-------------------+-------+
| "false,player141" | 2     |
+-------------------+-------+
| "false,player144" | 1     |
+-------------------+-------+
| "false,player139" | 1     |
+-------------------+-------+
| "false,player138" | 1     |
+-------------------+-------+
| "false,player129" | 3     |
+-------------------+-------+
| "false,player101" | 5     |
+-------------------+-------+
| "false,player100" | 10    |
+-------------------+-------+
| "false,player150" | 1     |
+-------------------+-------+
| "false,player104" | 2     |
+-------------------+-------+
| "false,player136" | 2     |
+-------------------+-------+
| "false,player116" | 6     |
+-------------------+-------+
| "false,player105" | 2     |
+-------------------+-------+
| "false,player131" | 1     |
+-------------------+-------+
| "false,player103" | 1     |
+-------------------+-------+
| "false,player121" | 4     |
+-------------------+-------+
| "false,player115" | 2     |
+-------------------+-------+
| "false,player128" | 2     |
+-------------------+-------+
| "true,player136"  | 1     |
+-------------------+-------+
| "true,player127"  | 1     |
+-------------------+-------+
| "false,player148" | 3     |
+-------------------+-------+
| "false,player124" | 1     |
+-------------------+-------+
| "false,player147" | 1     |
+-------------------+-------+
| "false,player140" | 1     |
+-------------------+-------+
| "false,player102" | 3     |
+-------------------+-------+

这里给出分割开来的例子, c那一列的count数字是没有用处的。

match (v:player) where v.name CONTAINS 'a' RETURN id(v) as vid | \
  GO FROM $-.vid OVER follow REVERSELY YIELD (toString("player148" == follow._dst) + "," + toString($-.vid)) as vid, follow._dst as dst | \
  GROUP BY $-.vid YIELD split($-.vid, ',')[0] as is_liked, split($-.vid, ',')[1] as vid

+----------+-------------+
| is_liked | vid         |
+----------+-------------+
| "false"  | "player144" |
+----------+-------------+
| "false"  | "player141" |
+----------+-------------+
| "false"  | "player119" |
+----------+-------------+
| "false"  | "player145" |
+----------+-------------+
| "false"  | "player139" |
+----------+-------------+
| "false"  | "player143" |
+----------+-------------+
| "false"  | "player115" |
+----------+-------------+
| "false"  | "player121" |
+----------+-------------+
| "false"  | "player125" |
+----------+-------------+
| "true"   | "player127" |
+----------+-------------+
| "false"  | "player148" |
+----------+-------------+
| "false"  | "player105" |
+----------+-------------+
| "false"  | "player116" |
+----------+-------------+
| "false"  | "player147" |
+----------+-------------+
| "false"  | "player124" |
+----------+-------------+
| "false"  | "player103" |
+----------+-------------+
| "true"   | "player136" |
+----------+-------------+
| "false"  | "player128" |
+----------+-------------+
| "false"  | "player100" |
+----------+-------------+
| "false"  | "player101" |
+----------+-------------+
| "false"  | "player114" |
+----------+-------------+
| "false"  | "player129" |
+----------+-------------+
| "false"  | "player138" |
+----------+-------------+
| "false"  | "player120" |
+----------+-------------+
| "false"  | "player104" |
+----------+-------------+
| "false"  | "player136" |
+----------+-------------+
| "false"  | "player150" |
+----------+-------------+
| "false"  | "player131" |
+----------+-------------+
| "false"  | "player102" |
+----------+-------------+
| "false"  | "player140" |
+----------+-------------+

这个情况有一个是不完美就是有那种同一个vid既有 true,又有 false的情况,我做了尝试想利用count把它合并成true,结果失败了。。。
只能在client测加逻辑处理一下,如果存在有过true,就是点赞过的。

接下来,我就针对这个需求来写查询,首先根据关键词获取到所有的文章列表:

MATCH (v:post)
WHERE v.title CONTAINS '人工智能'
RETURN v.title AS title, id(v) AS vid

判断当前用户是否已经点赞过该文章:

GO FROM $-.vid OVER like REVERSELY YIELD like._dst AS user_id, $-.title AS title |
YIELD any(d IN COLLECT($-.user_id) WHERE d=="current_user_id")  AS liked, $-.title AS title;

最后汇总起来就是:

MATCH (v:post)
WHERE v.title CONTAINS '人工智能'
RETURN v.title AS title, id(v) AS vid |
GO FROM $-.vid OVER like REVERSELY YIELD like._dst AS user_id, $-.title AS title |
YIELD any(d IN COLLECT($-.user_id) WHERE d=="current_user_id") AS liked, $-.title AS title;

使用本地的数据测试如下:

(user@nebula) [nba]> match (v:player) where v.name CONTAINS 'Tim' RETURN v.age as age, id(v) as vid | GO FROM $-.vid OVER like REVERSELY YIELD $-.age AS age, like._dst AS dst | YIELD any(d IN COLLECT(DISTINCT $-.dst) WHERE d=='Tony Parker') AS d, $-.age as age
+------+-----+
| d    | age |
+------+-----+
| true | 42  |
+------+-----+
Got 1 rows (time spent 5491/5977 us)

Sun, 09 May 2021 12:51:37 CST

PS: 目前必须承认这种解决方案不是很友好,LOOKUP 的设计初衷没有太多的考虑模糊匹配,也非常理解用户的心情,相信这些麻烦是暂时的,后面的完善已经在路上了,只是需要一点点时间。MATCH 的功能会越来越完善,我们也会把用户的这些查询的 case 不断的补充到我们的用例中,再次感谢你对 nebula 的批评和建议。

3赞

@yee 谢谢,刚试了下这样的确可行的,暂时用着先。
nebula 目前用下来,在简单的读写层面是比较方便的,但是在面对复杂查询时比较吃力,有时候都不知道怎么写,或者说无法实现。
LOOKUP模糊查询我觉得要尽可能地支持,除非match的性能可以赶上LOOKUP,且支持多match。
es全文索引很多时候是用不到的,而模糊查询的使用需求就非常多了。
和es相比,我建议nebula能够支持sonic,这是es的替代品,但更轻量,性能更好,资源消耗非常低,只需要几MB。

期待nebula早日完善,

3赞

好的,我将 Sonic 作为全文索引后端的需求在 Github 中提了 issue,后续在 issue 中更新进展。

MATCH 的优化其实一直在做,现在用户面临的很多性能问题,我们也在从多方面进行改进,比如执行计划,优化器和表达式等,这些会一点点的进入到主干中。

4赞

我也用类似写法实现查询的,但是指导文档中说不建议match 与 lookup、go混用

浙ICP备20010487号