nebula cypher 机器oom

kyle · 2022 年12 月 30 日 08:25

先梳理一下你的问题：

match (m:`USER`) where id(m) in ["1510615318"]  
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:`TIME`) 
  where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
with m,collect(distinct x) as col_appl_1st_v4 
unwind case when size(col_appl_1st_v4) == 0 then [null] else col_appl_1st_v4 end as x 
match (x)-[r2:CALL_TO]-(n:`USER`) 
  where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
with m,col_appl_1st_v4,collect(distinct n) as col_user_2nd_v4
with m,col_appl_1st_v4,col_user_2nd_v4, [uu in col_user_2nd_v4 WHERE "MOBILE" in labels(n)] as col_mobile_2nd_v4
return size(col_appl_1st_v4) as n_appl_1st_v4, size(col_user_2nd_v4) as n_user_2nd_v4

这个语句是最开始的写法，并且会 OOM。

match (m:USER) where id(m) in [“1510615318”]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:TIME)
  where r1.first_callmark_on<=20220102 and t1.TIME.date < ‘2022-01-02’
optional match (x)-[r2:CALL_TO]-(n:USER)
  where r2.first_callmark_on<=20220102 and id(n)<>‘1510615318’
with m, collect(distinct x) as col_appl_1st_v4, collect(distinct n) as col_user_2nd_v4
with m, col_appl_1st_v4, col_user_2nd_v4, n_cbkin_State_enterprise_leader_2nd_v4,[uu in col_user_2nd_v4 WHERE “MOBILE” in labels(uu)] as col_mobile_2nd_v4
return size(col_appl_1st_v4) as n_appl_1st_v4, size(col_user_2nd_v4) as n_user_2nd_v4

这个语句是 @yee 给出的改写方案，并且被验证是不会 OOM 的。
所以，你希望用类似的方式把你们原有的更复杂的 cypher 改写成 nebula cypher 语句，你说的 “后面的不会写，要走索引，怎么绕过索引” 只是遇到了语法报错？

dillon.wang · 2022 年12 月 30 日 08:44

dillon.wang:

unwind case when col_mobile_2nd_v4 =[] then [null] else col_mobile_2nd_v4 end as n
optional match (n:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME) where replace(t2.date,'-','')<='20220602'
with m,col_appl_1st_v4,col_user_2nd_v4,col_mobile_2nd_v4,n_cbkin_State_enterprise_leader_2nd_v4,n_cbkin_Engineer_2nd_v4,
collect(distinct case when a2 is not null then n else null end) as col_appl_2nd_v4

从这里开始照之前那种改写就会oomunwind case when col_mobile_2nd_v4 =[] then [null] else col_mobile_2nd_v4 end as n optional match (n:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME) where replace(t2.date,'-','')<='20220602' with m,col_appl_1st_v4,col_user_2nd_v4,col_mobile_2nd_v4,n_cbkin_State_enterprise_leader_2nd_v4,n_cbkin_Engineer_2nd_v4, collect(distinct case when a2 is not null then n else null end) as col_appl_2nd_v4

kyle · 2022 年12 月 30 日 09:22

上面改写后的语句是内存有明显下降吗？

dillon.wang · 2022 年12 月 30 日 13:44

dillon.wang:

  match (m:`USER`) where id(m) in ["1510615318"]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`)
where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
optional match (n)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(n2:`USER`)
where r3.first_cbkmark_on<=20220602 and id(n2)<>'1510615318'
optional match (n2)<-[r4:CBK_ENGINEER]-(n3:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n3)<>'1510615318'
with
  m,
  collect(distinct x) as col_appl_1st_v4,
  collect(distinct n) as col_user_2nd_v4,
  count(distinct n2) as n_cbkin_State_enterprise_leader_2nd_v4,
  count(distinct case when n2 is not null then n3 else null end) as n_cbkin_Engineer_2nd_v4
with
  m,
  col_appl_1st_v4,
  col_user_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  [uu in col_user_2nd_v4 WHERE "MOBILE" in labels(uu)] as col_mobile_2nd_v4,
  n_cbkin_Engineer_2nd_v4
return
  size(col_appl_1st_v4) as n_appl_1st_v4,
  size(col_user_2nd_v4) as n_user_2nd_v4,
  size(col_mobile_2nd_v4) as n_mobile_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  n_cbkin_Engineer_2nd_v4

我的意思是这个意思，比如我继续往后改写，后面的怎么写呢

  match (m:`USER`) where id(m) in ["1510615318"]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`)
where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
optional match (n)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(n2:`USER`)
where r3.first_cbkmark_on<=20220602 and id(n2)<>'1510615318'
optional match (n2)<-[r4:CBK_ENGINEER]-(n3:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n3)<>'1510615318'
optional match (n4:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME)
where t2.`TIME`.`date` < '2022-01-02'
with
  m,
  collect(distinct x) as col_appl_1st_v4,
  collect(distinct n) as col_user_2nd_v4,
  count(distinct n2) as n_cbkin_State_enterprise_leader_2nd_v4,
  count(distinct case when n2 is not null then n3 else null end) as n_cbkin_Engineer_2nd_v4
  count(distinct case when n4 is not null then n3 else null end) as col_appl_2nd_v4
with
  m,
  col_appl_1st_v4,
  col_user_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  [uu in col_user_2nd_v4 WHERE "MOBILE" in labels(uu)] as col_mobile_2nd_v4,
  n_cbkin_Engineer_2nd_v4,
  col_appl_2nd_v4
return
  size(col_appl_1st_v4) as n_appl_1st_v4,
  size(col_user_2nd_v4) as n_user_2nd_v4,
  size(col_mobile_2nd_v4) as n_mobile_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  n_cbkin_Engineer_2nd_v4,
  col_appl_2nd_v4

这个是我改写的

但是这么写就有问题了就会oom

kyle · 2023 年1 月 3 日 02:43

得看一下你之前改写的语句内存吃了多少，如果两个 match 能跑出来但是内存已经很高的话，你改后面的没什么意义。

dillon.wang · 2023 年1 月 3 日 02:54

我圈起来的去掉，是可以很快跑出来的，但是后面按照这个改就不行

kyle · 2023 年1 月 3 日 02:55

看下内存占用，能跑出来的那条语句

dillon.wang · 2023 年1 月 3 日 03:02

内存占用很低，正常的，看过了

dillon.wang · 2023 年1 月 3 日 03:02

我圈起来的这些地方怎么改呢

dillon.wang · 2023 年1 月 3 日 03:06

可能是因为后续的走了索引，所以内存直接爆了我是没有创建索引的

kyle · 2023 年1 月 3 日 03:08

你前面改的语句结果是对的？

dillon.wang · 2023 年1 月 3 日 03:09

是正确的

kyle · 2023 年1 月 3 日 03:21


match (m:`USER`) where id(m) in ["1510615318"]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`)
where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
optional match (n)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(n2:`USER`)
where r3.first_cbkmark_on<=20220602 and id(n2)<>'1510615318'
optional match (n2)<-[r4:CBK_ENGINEER]-(n3:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n3)<>'1510615318'
with m, collect(distinct x) as col_appl_1st_v4, collect(distinct n) as col_user_2nd_v4,count(distinct n2) as n_cbkin_State_enterprise_leader_2nd_v4,count(distinct case when n2 is not null then n3 else null end) as n_cbkin_Engineer_2nd_v4
unwind col_user_2nd_v4 as n4
optional match (n4:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME)
where t2.`TIME`.`date` < '2022-01-02'
RETURN m,col_appl_1st_v4,col_user_2nd_v4,n_cbkin_State_enterprise_leader_2nd_v4,n_cbkin_Engineer_2nd_v4,count(distinct case when n4 is not null then n3 else null end) as col_appl_2nd_v4

dillon.wang · 2023 年1 月 3 日 03:57


match (m:`USER`) where id(m) in ["1510615318"]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`)
where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
optional match (n)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(n2:`USER`)
where r3.first_cbkmark_on<=20220602 and id(n2)<>'1510615318'
optional match (n2)<-[r4:CBK_ENGINEER]-(n3:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n3)<>'1510615318'
with 
  m, 
  collect(distinct x) as col_appl_1st_v4, 
  collect(distinct n) as col_user_2nd_v4,
  count(distinct n2) as n_cbkin_State_enterprise_leader_2nd_v4,
  count(distinct case when n2 is not null then n3 else null end) as n_cbkin_Engineer_2nd_v4
  unwind col_user_2nd_v4 as n4
optional match (n4:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME)
where t2.`TIME`.`date` < '2022-01-02'
RETURN 
  m,
  col_appl_1st_v4,
  col_user_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  n_cbkin_Engineer_2nd_v4,
  count(distinct case when n4 is not null then n3 else null end) as col_appl_2nd_v4

运行不了

yee · 2023 年1 月 3 日 05:52

看了原始的 query，其中有些逻辑改写起来不是很方便的，比如 case when y is not null then n else null end 此类的表达式，如果可能看看能否使用最新的开发版来做些测试，对应的 match 的相关问题已经被 fix，而且你们对应的 query 改动也比较少。

dillon.wang · 2023 年1 月 3 日 06:05

目前只能这个改写，我们我们这个里面导入的数据量很大，前期也是这个版本做的其他测试，随意改动版本很麻烦，目前导入了十多亿点上百亿边，随意更换版本很麻烦

dillon.wang · 2023 年1 月 3 日 06:23

辛苦帮我看下，我这个在目前的版本上该如何改写

yee · 2023 年1 月 3 日 12:54

match (m:USER {mobileId:'1bb7d145a19749a9e1b1f12dc613233115d5df3e22c741309ea12e71256d03d6'})
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:TIME)
where r1.first_callmark_on<=20220602 and t1.`TIME`.`date`<='2022-06-02'
optional match (x)-[r2:CALL_TO]-(n:USER) 
where r2.first_callmark_on<=20220602 and id(m)!=id(n)
optional match (n:MOBILE)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(y1:USER) 
where r3.first_cbkmark_on<=20220602 and id(y1)!=id(n)
optional match (n:MOBILE)<-[r4:CBK_ENGINEER]-(y2:USER)
where r4.first_cbkmark_on<=20220602 and id(y2)!=id(n)
optional match (n:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME) 
where t2.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a3:LOAN_BK)-[:APPLY_ON]->(t3:TIME) 
where t3.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a4:LOAN_BK)-[:M2_ON]->(t4:TIME) 
where t4.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a5:A_BLACK_LIST)-[:APPLY_ON]->(t5:TIME)
where t5.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a6:C_BLACK_LIST)-[:APPLY_ON]->(t6:TIME) 
where t6.`TIME`.`date`<='2022-06-02'
return 
    count(distinct x) as n_appl_1st_v4,
    count(distinct n) as n_user_2nd_v4,
    count(case when "MOBILE" in labels(n) then n else null end) as n_mobile_2nd_v4,
    count(distinct a2) as n_appl_2nd_v4,
    count(distinct y1) AS n_cbkin_State_enterprise_leader_2nd_v4,
    count(distinct y2) AS n_cbkin_Engineer_2nd_v4,
    count(distinct a3) as n_bk_2nd_v4,
    count(distinct a4) AS n_bad30p_2nd_v4,
    count(distinct a5) AS n_ablk_2nd_v4,
    count(distinct a6) AS n_cblk_2nd_v4,
    count(case when 'ORG' in labels(n) then n else null end) as n_org_2nd_v4,
    count(case when 'EXPRESS' in labels(n) then n else null end) as n_express_2nd_v4,
    count(case when 'AGENCY' in labels(n) then n else null end) as n_agency_2nd_v4,
    count(case when 'BNK_ORG' in labels(n) then n else null end) as n_bnk_org_2nd_v4,
    count(case when 'OTHER_ORG' in labels(n) then n else null end) as n_other_org_2nd_v4

我结合着自己的理解，改了下你最初的语句，你看看能不能满足要求！

dillon.wang · 2023 年1 月 4 日 02:35

match (m:`USER`) where id(m) in ["1510615318"] 
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220602 and t1.`TIME`.`date`<='2022-06-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`) 
where r2.first_callmark_on<=20220602 and id(m)!=id(n)
return m

这样写貌似不行 and id(m)!=id(n)

我改成这样的了，是可以运行就是不知道是否等价

dillon.wang · 2023 年1 月 4 日 02:52

我改成这样了，看起来，应该是一致的，需要老师帮忙确认下那个and id(m)!=id(n)是否等价

match (m:`USER`) where id(m) in ["1510615318"] 
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220602 and t1.`TIME`.`date`<='2022-06-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`) 
where r2.first_callmark_on<=20220602 and id(n)<>'1510615318'
optional match (n:MOBILE)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(y1:`USER`) 
where r3.first_cbkmark_on<=20220602 and id(n)<>'1510615318'
optional match (n:MOBILE)<-[r4:CBK_ENGINEER]-(y2:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n)<>'1510615318'
optional match (n:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:`TIME`) 
where t2.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a3:LOAN_BK)-[:APPLY_ON]->(t3:`TIME`) 
where t3.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a4:LOAN_BK)-[:M2_ON]->(t4:`TIME`) 
where t4.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a5:A_BLACK_LIST)-[:APPLY_ON]->(t5:`TIME`)
where t5.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a6:C_BLACK_LIST)-[:APPLY_ON]->(t6:`TIME`) 
where t6.`TIME`.`date`<='2022-06-02'
return 
count(distinct x) as n_appl_1st_v4,
count(distinct n) as n_user_2nd_v4,
count(case when "MOBILE" in labels(n) then n else null end) as n_mobile_2nd_v4,
count(distinct a2) as n_appl_2nd_v4,
count(distinct y1) AS n_cbkin_State_enterprise_leader_2nd_v4,
count(distinct y2) AS n_cbkin_Engineer_2nd_v4,
count(distinct a3) as n_bk_2nd_v4,
count(distinct a4) AS n_bad30p_2nd_v4,
count(distinct a5) AS n_ablk_2nd_v4,
count(distinct a6) AS n_cblk_2nd_v4,
count(case when 'ORG' in labels(n) then n else null end) as n_org_2nd_v4,
count(case when 'EXPRESS' in labels(n) then n else null end) as n_express_2nd_v4,
count(case when 'AGENCY' in labels(n) then n else null end) as n_agency_2nd_v4,
count(case when 'BNK_ORG' in labels(n) then n else null end) as n_bnk_org_2nd_v4,
count(case when 'OTHER_ORG' in labels(n) then n else null end) as n_other_org_2nd_v4