查询结果统计问题

test · 2020 年9 月 27 日 10:36

各位大佬请教个问题
需求:
给一个指定的词, 获取这个词的下一个词在句子集合中出现的次数
例:
句子1: 尝试将平台上的制造商转变为品牌商，
句子2: 滴滴平台上的不合规网约车辆占比超过82%，
指定词: 平台上
问题:
在上面的例子中, 指定词"平台上"的下一个词在两个句子中都是"的", 在目前建的结构中因为两句子中"平台上"和"的"的vid是相同的, 所以句子1中就找不到"平台上"的下一个词了, 请各位大佬帮帮忙看看应该怎么才能查询出来~~~或者是换一种结构

跪谢各位大佬, 啾咪

下面是创建语句:

// 创建和边
create tag sentence(sid int);
create tag kw(kwid int, word string, attr string, pos int);
create edge posinsentence(pos int);
create edge nextword(sid int, start_pos int, end_pos int, start bool, end bool);

// 插入句子点
insert vertex sentence(sid) values uuid('s_2084'):(2084);


insert vertex sentence(sid) values uuid('s_20659'):(20659);

// 插入词点
insert vertex kw(kwid, word, attr, pos) values uuid('k_33885'):(33885, '尝试', '7', 1);
insert vertex kw(kwid, word, attr, pos) values uuid('k_296151'):(296151, '将', '7', 2);
insert vertex kw(kwid, word, attr, pos) values uuid('k_1644671'):(1644671, '平台上', '201', 3);
insert vertex kw(kwid, word, attr, pos) values uuid('k_50234'):(50234, '的', '188', 4);
insert vertex kw(kwid, word, attr, pos) values uuid('k_273416'):(273416, '制造商', '18', 5);
insert vertex kw(kwid, word, attr, pos) values uuid('k_279440'):(279440, '转变为', '7', 6);
insert vertex kw(kwid, word, attr, pos) values uuid('k_928677'):(928677, '品牌商', '18', 7);


insert vertex kw(kwid, word, attr, pos) values uuid('k_51011'):(51011, '滴滴', '100', 1);
insert vertex kw(kwid, word, attr, pos) values uuid('k_1644671'):(1644671, '平台上', '201', 2);
insert vertex kw(kwid, word, attr, pos) values uuid('k_50234'):(50234, '的', '188', 3);
insert vertex kw(kwid, word, attr, pos) values uuid('k_1348093'):(1348093, '不合规', '7', 4);
insert vertex kw(kwid, word, attr, pos) values uuid('k_1956279'):(1956279, '网约车', '99', 5);
insert vertex kw(kwid, word, attr, pos) values uuid('k_1392599'):(1392599, '占比', '7', 6);
insert vertex kw(kwid, word, attr, pos) values uuid('k_34511'):(34511, '超过', '7', 7);
insert vertex kw(kwid, word, attr, pos) values uuid('k_976199'):(976199, '82%', '128', 8);

// 插入词到句子的边
insert edge posinsentence(pos) values uuid('k_33885') -> uuid('s_2084'):(1);
insert edge posinsentence(pos) values uuid('k_296151') -> uuid('s_2084'):(2);
insert edge posinsentence(pos) values uuid('k_1644671') -> uuid('s_2084'):(3);
insert edge posinsentence(pos) values uuid('k_50234') -> uuid('s_2084'):(4);
insert edge posinsentence(pos) values uuid('k_273416') -> uuid('s_2084'):(5);
insert edge posinsentence(pos) values uuid('k_279440') -> uuid('s_2084'):(6);
insert edge posinsentence(pos) values uuid('k_928677') -> uuid('s_2084'):(7);


insert edge posinsentence(pos) values uuid('k_51011') -> uuid('s_20659'):(1);
insert edge posinsentence(pos) values uuid('k_1644671') -> uuid('s_20659'):(2);
insert edge posinsentence(pos) values uuid('k_50234') -> uuid('s_20659'):(3);
insert edge posinsentence(pos) values uuid('k_1348093') -> uuid('s_20659'):(4);
insert edge posinsentence(pos) values uuid('k_1956279') -> uuid('s_20659'):(5);
insert edge posinsentence(pos) values uuid('k_1392599') -> uuid('s_20659'):(6);
insert edge posinsentence(pos) values uuid('k_34511') -> uuid('s_20659'):(7);
insert edge posinsentence(pos) values uuid('k_976199') -> uuid('s_20659'):(8);

// 插入词到词的边
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_33885') -> uuid('k_296151'):(2084, 1, 2, true, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_296151') -> uuid('k_1644671'):(2084, 2, 3, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_1644671') -> uuid('k_50234'):(2084, 3, 4, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_50234') -> uuid('k_273416'):(2084, 4, 5, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_273416') -> uuid('k_279440'):(2084, 5, 6, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_279440') -> uuid('k_928677'):(2084, 6, 7, false, true);


insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_51011') -> uuid('k_1644671'):(20659, 1, 2, true, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_1644671') -> uuid('k_50234'):(20659, 2, 3, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_50234') -> uuid('k_1348093'):(20659, 3, 4, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_1348093') -> uuid('k_1956279'):(20659, 4, 5, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_1956279') -> uuid('k_1392599'):(20659, 5, 6, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_1392599') -> uuid('k_34511'):(20659, 6, 7, false, false);
insert edge nextword(sid, start_pos, end_pos, start, end) values uuid('k_34511') -> uuid('k_976199'):(20659, 7, 8, false, true);

min.wu · 2020 年9 月 28 日 01:59

你这东西感觉有点像个LINE？

我理解你想问的问题，是不是说：

有如下构图：
A->B->C
E->B->D

Given A,求B的上游是谁(A和E)？总共有多少个上游(2个）？

test · 2020 年9 月 28 日 05:02

en~~~~差不多是这个样子的, 然鹅不晓得应该怎么查询

test · 2020 年9 月 28 日 05:15

还有个疑问
insert插入的时候,可能存在同一条边被多次插入的情况,可否知道这条边被覆盖了几次呢

min.wu · 2020 年9 月 29 日 02:01

有如下构图：
A->B->C
E->B->D

Given A,求B的上游是谁(A和E)？总共有多少个上游(2个）？

回答1： A的下游可以用 GO FROM $A OVER
回答2： B的上游可以用 GO FROM $B OVER $edge REVERSELY
回答3：子查询可以用管道 | 串起来
回答4：数个数可以用count
剩下的你可以查手册解决

min.wu · 2020 年9 月 29 日 02:03

不能知道覆盖了几次，因为本质是个kv
也还没有atomic_inc这样原子的累加操作。

nicole · 2020 年9 月 29 日 02:28

你的需求是不是要查询A的下游，但条件是A的下游的下游是C。即查询“平台上”的nextword，并且nextword是属于句子1的。

不知道这个语句是不是符合你需求：
go from uuid(‘k_1644671’) over nextword YIELD nextword._dst AS id, $$.kw.word AS wd | go from $-.id over posinsentence WHERE $$.sentence.sid == 2084 YIELD $-.wd

查询结果如下：

test · 2020 年9 月 29 日 03:31

感谢大佬们的回复
我现在是这样写的, 然鹅在点和边到了千万以上就查不动了

go from uuid('k_平台上') over nextword where $$.kw.attr == '7' yield nextword._dst as nid, nextword._src as oid, $$.kw.word as word | go from $-.nid over posinsentence yield posinsentence._dst as sentence_t, $-.oid as oid, $-.word as word | go from $-.oid over posinsentence where $-.sentence_t == posinsentence._dst yield $-.word as word | group by $-.word yield $-.word, count($-.word) as num;

nicole · 2020 年9 月 29 日 03:39

是的，这种写法相当于把所有数据查出来然后再进行过滤查询了，效率比较差。看大佬们有没有好的写法可以将查询条件下推到服务端一块处理。