nebula cypher 机器oom

先梳理一下你的问题:

match (m:`USER`) where id(m) in ["1510615318"]  
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:`TIME`) 
  where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
with m,collect(distinct x) as col_appl_1st_v4 
unwind case when size(col_appl_1st_v4) == 0 then [null] else col_appl_1st_v4 end as x 
match (x)-[r2:CALL_TO]-(n:`USER`) 
  where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
with m,col_appl_1st_v4,collect(distinct n) as col_user_2nd_v4
with m,col_appl_1st_v4,col_user_2nd_v4, [uu in col_user_2nd_v4 WHERE "MOBILE" in labels(n)] as col_mobile_2nd_v4
return size(col_appl_1st_v4) as n_appl_1st_v4, size(col_user_2nd_v4) as n_user_2nd_v4

这个语句是最开始的写法,并且会 OOM。

match (m:USER) where id(m) in [“1510615318”]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:TIME)
  where r1.first_callmark_on<=20220102 and t1.TIME.date < ‘2022-01-02’
optional match (x)-[r2:CALL_TO]-(n:USER)
  where r2.first_callmark_on<=20220102 and id(n)<>‘1510615318’
with m, collect(distinct x) as col_appl_1st_v4, collect(distinct n) as col_user_2nd_v4
with m, col_appl_1st_v4, col_user_2nd_v4, n_cbkin_State_enterprise_leader_2nd_v4,[uu in col_user_2nd_v4 WHERE “MOBILE” in labels(uu)] as col_mobile_2nd_v4
return size(col_appl_1st_v4) as n_appl_1st_v4, size(col_user_2nd_v4) as n_user_2nd_v4

这个语句是 @yee 给出的改写方案,并且被验证是不会 OOM 的。
所以,你希望用类似的方式把你们原有的更复杂的 cypher 改写成 nebula cypher 语句,你说的 “后面的不会写,要走索引,怎么绕过索引” 只是遇到了语法报错?

从这里开始照之前那种改写 就会oomunwind case when col_mobile_2nd_v4 =[] then [null] else col_mobile_2nd_v4 end as n optional match (n:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME) where replace(t2.date,'-','')<='20220602' with m,col_appl_1st_v4,col_user_2nd_v4,col_mobile_2nd_v4,n_cbkin_State_enterprise_leader_2nd_v4,n_cbkin_Engineer_2nd_v4, collect(distinct case when a2 is not null then n else null end) as col_appl_2nd_v4

上面改写后的语句是内存有明显下降吗?

我的意思是这个意思,比如我继续往后改写,后面的怎么写呢

  match (m:`USER`) where id(m) in ["1510615318"]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`)
where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
optional match (n)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(n2:`USER`)
where r3.first_cbkmark_on<=20220602 and id(n2)<>'1510615318'
optional match (n2)<-[r4:CBK_ENGINEER]-(n3:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n3)<>'1510615318'
optional match (n4:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME)
where t2.`TIME`.`date` < '2022-01-02'
with
  m,
  collect(distinct x) as col_appl_1st_v4,
  collect(distinct n) as col_user_2nd_v4,
  count(distinct n2) as n_cbkin_State_enterprise_leader_2nd_v4,
  count(distinct case when n2 is not null then n3 else null end) as n_cbkin_Engineer_2nd_v4
  count(distinct case when n4 is not null then n3 else null end) as col_appl_2nd_v4
with
  m,
  col_appl_1st_v4,
  col_user_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  [uu in col_user_2nd_v4 WHERE "MOBILE" in labels(uu)] as col_mobile_2nd_v4,
  n_cbkin_Engineer_2nd_v4,
  col_appl_2nd_v4
return
  size(col_appl_1st_v4) as n_appl_1st_v4,
  size(col_user_2nd_v4) as n_user_2nd_v4,
  size(col_mobile_2nd_v4) as n_mobile_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  n_cbkin_Engineer_2nd_v4,
  col_appl_2nd_v4

这个是我改写的


但是这么写就有问题了 就会oom

得看一下你之前改写的语句内存吃了多少,如果两个 match 能跑出来但是内存已经很高的话,你改后面的没什么意义。

1 个赞

我圈起来的去掉 ,是可以很快跑出来的,但是后面按照这个改 就不行

看下内存占用,能跑出来的那条语句

内存占用很低,正常的,看过了

我圈起来的这些地方 怎么改呢

可能是因为后续的 走了索引,所以内存直接爆了 我是没有创建索引的

你前面改的语句结果是对的?

是正确的


match (m:`USER`) where id(m) in ["1510615318"]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`)
where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
optional match (n)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(n2:`USER`)
where r3.first_cbkmark_on<=20220602 and id(n2)<>'1510615318'
optional match (n2)<-[r4:CBK_ENGINEER]-(n3:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n3)<>'1510615318'
with m, collect(distinct x) as col_appl_1st_v4, collect(distinct n) as col_user_2nd_v4,count(distinct n2) as n_cbkin_State_enterprise_leader_2nd_v4,count(distinct case when n2 is not null then n3 else null end) as n_cbkin_Engineer_2nd_v4
unwind col_user_2nd_v4 as n4
optional match (n4:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME)
where t2.`TIME`.`date` < '2022-01-02'
RETURN m,col_appl_1st_v4,col_user_2nd_v4,n_cbkin_State_enterprise_leader_2nd_v4,n_cbkin_Engineer_2nd_v4,count(distinct case when n4 is not null then n3 else null end) as col_appl_2nd_v4

1 个赞

match (m:`USER`) where id(m) in ["1510615318"]
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220102 and t1.`TIME`.`date` < '2022-01-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`)
where r2.first_callmark_on<=20220102 and id(n)<>'1510615318'
optional match (n)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(n2:`USER`)
where r3.first_cbkmark_on<=20220602 and id(n2)<>'1510615318'
optional match (n2)<-[r4:CBK_ENGINEER]-(n3:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n3)<>'1510615318'
with 
  m, 
  collect(distinct x) as col_appl_1st_v4, 
  collect(distinct n) as col_user_2nd_v4,
  count(distinct n2) as n_cbkin_State_enterprise_leader_2nd_v4,
  count(distinct case when n2 is not null then n3 else null end) as n_cbkin_Engineer_2nd_v4
  unwind col_user_2nd_v4 as n4
optional match (n4:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME)
where t2.`TIME`.`date` < '2022-01-02'
RETURN 
  m,
  col_appl_1st_v4,
  col_user_2nd_v4,
  n_cbkin_State_enterprise_leader_2nd_v4,
  n_cbkin_Engineer_2nd_v4,
  count(distinct case when n4 is not null then n3 else null end) as col_appl_2nd_v4


运行不了

看了原始的 query,其中有些逻辑改写起来不是很方便的,比如 case when y is not null then n else null end 此类的表达式,如果可能看看能否使用最新的开发版来做些测试,对应的 match 的相关问题已经被 fix,而且你们对应的 query 改动也比较少。

目前只能这个改写,我们我们这个里面导入的数据量很大,前期也是这个版本做的其他测试,随意改动版本很麻烦,目前导入了十多亿点 上百亿边,随意更换版本很麻烦

辛苦帮我看下,我这个在目前的版本上 该如何改写

match (m:USER {mobileId:'1bb7d145a19749a9e1b1f12dc613233115d5df3e22c741309ea12e71256d03d6'})
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:TIME)
where r1.first_callmark_on<=20220602 and t1.`TIME`.`date`<='2022-06-02'
optional match (x)-[r2:CALL_TO]-(n:USER) 
where r2.first_callmark_on<=20220602 and id(m)!=id(n)
optional match (n:MOBILE)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(y1:USER) 
where r3.first_cbkmark_on<=20220602 and id(y1)!=id(n)
optional match (n:MOBILE)<-[r4:CBK_ENGINEER]-(y2:USER)
where r4.first_cbkmark_on<=20220602 and id(y2)!=id(n)
optional match (n:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:TIME) 
where t2.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a3:LOAN_BK)-[:APPLY_ON]->(t3:TIME) 
where t3.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a4:LOAN_BK)-[:M2_ON]->(t4:TIME) 
where t4.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a5:A_BLACK_LIST)-[:APPLY_ON]->(t5:TIME)
where t5.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a6:C_BLACK_LIST)-[:APPLY_ON]->(t6:TIME) 
where t6.`TIME`.`date`<='2022-06-02'
return 
    count(distinct x) as n_appl_1st_v4,
    count(distinct n) as n_user_2nd_v4,
    count(case when "MOBILE" in labels(n) then n else null end) as n_mobile_2nd_v4,
    count(distinct a2) as n_appl_2nd_v4,
    count(distinct y1) AS n_cbkin_State_enterprise_leader_2nd_v4,
    count(distinct y2) AS n_cbkin_Engineer_2nd_v4,
    count(distinct a3) as n_bk_2nd_v4,
    count(distinct a4) AS n_bad30p_2nd_v4,
    count(distinct a5) AS n_ablk_2nd_v4,
    count(distinct a6) AS n_cblk_2nd_v4,
    count(case when 'ORG' in labels(n) then n else null end) as n_org_2nd_v4,
    count(case when 'EXPRESS' in labels(n) then n else null end) as n_express_2nd_v4,
    count(case when 'AGENCY' in labels(n) then n else null end) as n_agency_2nd_v4,
    count(case when 'BNK_ORG' in labels(n) then n else null end) as n_bnk_org_2nd_v4,
    count(case when 'OTHER_ORG' in labels(n) then n else null end) as n_other_org_2nd_v4

我结合着自己的理解,改了下你最初的语句,你看看能不能满足要求!

1 个赞
match (m:`USER`) where id(m) in ["1510615318"] 
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220602 and t1.`TIME`.`date`<='2022-06-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`) 
where r2.first_callmark_on<=20220602 and id(m)!=id(n)
return m


这样写貌似不行 and id(m)!=id(n)

我改成这样的了,是可以运行就是不知道是否等价

我改成这样了,看起来,应该是一致的,需要老师帮忙确认下那个and id(m)!=id(n)是否等价

match (m:`USER`) where id(m) in ["1510615318"] 
optional match (m)-[r1:CALL_TO]-(x:MOBILE)-[:APPLY_FOR]->(a1:APPL)-[:APPLY_ON]-(t1:`TIME`)
where r1.first_callmark_on<=20220602 and t1.`TIME`.`date`<='2022-06-02'
optional match (x)-[r2:CALL_TO]-(n:`USER`) 
where r2.first_callmark_on<=20220602 and id(n)<>'1510615318'
optional match (n:MOBILE)<-[r3:CBK_STATE_ENTERPRISE_LEADER]-(y1:`USER`) 
where r3.first_cbkmark_on<=20220602 and id(n)<>'1510615318'
optional match (n:MOBILE)<-[r4:CBK_ENGINEER]-(y2:`USER`)
where r4.first_cbkmark_on<=20220602 and id(n)<>'1510615318'
optional match (n:MOBILE)-[:APPLY_FOR]->(a2:APPL)-[:APPLY_ON]->(t2:`TIME`) 
where t2.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a3:LOAN_BK)-[:APPLY_ON]->(t3:`TIME`) 
where t3.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a4:LOAN_BK)-[:M2_ON]->(t4:`TIME`) 
where t4.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a5:A_BLACK_LIST)-[:APPLY_ON]->(t5:`TIME`)
where t5.`TIME`.`date`<='2022-06-02'
optional match (n:MOBILE)-[:APPLY_FOR]->(a6:C_BLACK_LIST)-[:APPLY_ON]->(t6:`TIME`) 
where t6.`TIME`.`date`<='2022-06-02'
return 
count(distinct x) as n_appl_1st_v4,
count(distinct n) as n_user_2nd_v4,
count(case when "MOBILE" in labels(n) then n else null end) as n_mobile_2nd_v4,
count(distinct a2) as n_appl_2nd_v4,
count(distinct y1) AS n_cbkin_State_enterprise_leader_2nd_v4,
count(distinct y2) AS n_cbkin_Engineer_2nd_v4,
count(distinct a3) as n_bk_2nd_v4,
count(distinct a4) AS n_bad30p_2nd_v4,
count(distinct a5) AS n_ablk_2nd_v4,
count(distinct a6) AS n_cblk_2nd_v4,
count(case when 'ORG' in labels(n) then n else null end) as n_org_2nd_v4,
count(case when 'EXPRESS' in labels(n) then n else null end) as n_express_2nd_v4,
count(case when 'AGENCY' in labels(n) then n else null end) as n_agency_2nd_v4,
count(case when 'BNK_ORG' in labels(n) then n else null end) as n_bnk_org_2nd_v4,
count(case when 'OTHER_ORG' in labels(n) then n else null end) as n_other_org_2nd_v4
1 个赞