MATCH匹配多模式性能问题

  • nebula 版本:3.4.1
  • 部署方式:分布式
  • 安装方式: RPM
  • 是否上生产环境:Y
  • 硬件信息
    • 单机磁盘 两块3.84T SSD盘做RAID1做数据目录 。单机数据目录为3.84/2=1.92T
    • 单机CPU:26C 512G
    • 6台物理机
  • 部署情况
    • Storage memory_tracker_limit_ratio=0.4 Graphd memory_tracker_limit_ratio=0.45
    • 混合部署方式。6台机器6个storage服务,6个Graphd服务,其中三台部了meta服务。

问题的具体描述

需求描述

给定任意结构的模式和模式中其中一个点的vid。查询出模式的数据。


如图上模式,给定incomig_parts的vid。需要在Nebula中匹配出所有满足模式的数据。

目前实现

目前我们是用的Nebula中的MATCH多模式语句进行模式匹配。如下:

MATCH  (abaa9c1ec:person)-[a130d5464:trans]->(a5a5e256e:person), (a5a5e256e:person)-[a358405f5:trans]->(ad2d4c3ae:person), (a5a5e256e:person)-[a22967a8b:trans]->(aacebe3dc:person) 
where id(abaa9c1ec)=='66326d4b4*****762449ddd514'
return abaa9c1ec,ad2d4c3ae,a5a5e256e,aacebe3dc,a130d5464,a358405f5,a22967a8b

在执行以上语句后,Nebula不能做出响应,客户端在长时间等待后报超时,Storage内存使用激增。

相关的 meta / storage / graph info 日志信息

Graphd日志信息

417 10:16:05.864279 1055713 MemoryUtils.cpp:227] sys:55.291GiB/503.287GiB 10.99% usr:156.000MiB/226.457GiB 0.07%
I20230417 10:17:05.863883 1055713 MemoryUtils.cpp:227] sys:55.297GiB/503.287GiB 10.99% usr:156.000MiB/226.457GiB 0.07%
I20230417 10:18:05.862987 1055713 MemoryUtils.cpp:227] sys:55.304GiB/503.287GiB 10.99% usr:156.000MiB/226.457GiB 0.07%
I20230417 10:19:05.862459 1055713 MemoryUtils.cpp:227] sys:55.321GiB/503.287GiB 10.99% usr:155.000MiB/226.457GiB 0.07%
I20230417 10:20:05.862591 1055713 MemoryUtils.cpp:227] sys:55.337GiB/503.287GiB 11.00% usr:155.000MiB/226.457GiB 0.07%
I20230417 10:21:05.864045 1055713 MemoryUtils.cpp:227] sys:55.331GiB/503.287GiB 10.99% usr:155.000MiB/226.457GiB 0.07%
I20230417 10:21:27.995688 1055780 ThriftClientManager-inl.h:67] resolve "a-lsfqz-nbldn05.cgb.prd":9779 as "21.6.64.138":9779
I20230417 10:22:05.864085 1055713 MemoryUtils.cpp:227] sys:85.094GiB/503.287GiB 16.91% usr:32.995GiB/226.457GiB 14.57%
I20230417 10:23:05.862848 1055713 MemoryUtils.cpp:227] sys:97.169GiB/503.287GiB 19.31% usr:43.377GiB/226.457GiB 19.15%
I20230417 10:24:05.863934 1055713 MemoryUtils.cpp:227] sys:111.646GiB/503.287GiB 22.18% usr:58.595GiB/226.457GiB 25.87%
I20230417 10:25:05.862601 1055713 MemoryUtils.cpp:227] sys:119.508GiB/503.287GiB 23.75% usr:66.522GiB/226.457GiB 29.38%
E20230417 10:25:37.355666 1055803 StorageClientBase-inl.h:224] Request to "a-lsfqz-nbldn01.cgb.prd":9779 time out: TTransportException: Timed out
I20230417 10:26:05.864032 1055713 MemoryUtils.cpp:227] sys:119.021GiB/503.287GiB 23.65% usr:66.075GiB/226.457GiB 29.18%
E20230417 10:26:29.544015 1055702 StorageClientBase-inl.h:224] Request to "a-lsfqz-nbldn02.cgb.prd":9779 time out: TTransportException: Timed out
I20230417 10:27:05.863698 1055713 MemoryUtils.cpp:227] sys:128.705GiB/503.287GiB 25.57% usr:66.353GiB/226.457GiB 29.30%
E20230417 10:27:34.764277 1055712 StorageClientBase-inl.h:224] Request to "a-lsfqz-nbldn06.cgb.prd":9779 time out: TTransportException: Timed out
I20230417 10:28:05.863044 1055713 MemoryUtils.cpp:227] sys:148.898GiB/503.287GiB 29.59% usr:68.952GiB/226.457GiB 30.45%
I20230417 10:28:20.495940 1055730 ThriftClientManager-inl.h:67] resolve "a-lsfqz-nbldn05.cgb.prd":9779 as "21.6.64.138":9779
E20230417 10:28:31.635681 1055722 StorageClientBase-inl.h:224] Request to "a-lsfqz-nbldn03.cgb.prd":9779 time out: TTransportException: Timed out
I20230417 10:29:05.862923 1055713 MemoryUtils.cpp:227] sys:176.337GiB/503.287GiB 35.04% usr:73.661GiB/226.457GiB 32.53%
E20230417 10:29:25.273134 1055730 StorageClientBase-inl.h:224] Request to "a-lsfqz-nbldn05.cgb.prd":9779 time out: TTransportException: Timed out
I20230417 10:30:05.862658 1055713 MemoryUtils.cpp:227] sys:187.994GiB/503.287GiB 37.35% usr:62.571GiB/226.457GiB 27.63%
E20230417 10:30:23.226327 1055737 StorageClientBase-inl.h:224] Request to "a-lsfqz-nbldn04.cgb.prd":9779 time out: TTransportException: Timed out
E20230417 10:30:26.329897 1055689 StorageClientBase-inl.h:143] There some RPC errors: RPC failure in StorageClient with timeout: TTransportException: Timed out
E20230417 10:30:33.392619 1055689 StorageClientBase-inl.h:143] There some RPC errors: RPC failure in StorageClient with timeout: TTransportException: Timed out
E20230417 10:30:40.510330 1055689 StorageClientBase-inl.h:143] There some RPC errors: RPC failure in StorageClient with timeout: TTransportException: Timed out
E20230417 10:30:48.357628 1055689 StorageClientBase-inl.h:143] There some RPC errors: RPC failure in StorageClient with timeout: TTransportException: Timed out
E20230417 10:30:55.794891 1055689 StorageClientBase-inl.h:143] There some RPC errors: RPC failure in StorageClient with timeout: TTransportException: Timed out
E20230417 10:31:02.687793 1055689 StorageClientBase-inl.h:143] There some RPC errors: RPC failure in StorageClient with timeout: TTransportException: Timed out

Storage日志信息

I20230417 10:27:03.026598 1056989 MemoryUtils.cpp:227] sys:50.997GiB/503.287GiB 10.13% usr:5.074GiB/201.295GiB 2.52%
I20230417 10:28:03.026952 1056989 MemoryUtils.cpp:227] sys:51.005GiB/503.287GiB 10.13% usr:5.074GiB/201.295GiB 2.52%
I20230417 10:29:03.027174 1056989 MemoryUtils.cpp:227] sys:51.002GiB/503.287GiB 10.13% usr:5.074GiB/201.295GiB 2.52%
I20230417 10:30:03.025533 1056989 MemoryUtils.cpp:227] sys:59.218GiB/503.287GiB 11.77% usr:12.644GiB/201.295GiB 6.28%
I20230417 10:31:03.026669 1056989 MemoryUtils.cpp:227] sys:83.341GiB/503.287GiB 16.56% usr:37.496GiB/201.295GiB 18.63%
I20230417 10:32:03.027050 1056989 MemoryUtils.cpp:227] sys:115.281GiB/503.287GiB 22.91% usr:69.346GiB/201.295GiB 34.45%
I20230417 10:33:03.027364 1056989 MemoryUtils.cpp:227] sys:132.416GiB/503.287GiB 26.31% usr:86.711GiB/201.295GiB 43.08%
I20230417 10:34:03.026966 1056989 MemoryUtils.cpp:227] sys:157.331GiB/503.287GiB 31.26% usr:111.500GiB/201.295GiB 55.39%
I20230417 10:35:03.026486 1056989 MemoryUtils.cpp:227] sys:182.334GiB/503.287GiB 36.23% usr:136.734GiB/201.295GiB 67.93%
I20230417 10:36:03.026317 1056989 MemoryUtils.cpp:227] sys:206.920GiB/503.287GiB 41.11% usr:160.910GiB/201.295GiB 79.94%
I20230417 10:37:03.026144 1056989 MemoryUtils.cpp:227] sys:230.875GiB/503.287GiB 45.87% usr:184.315GiB/201.295GiB 91.56%
I20230417 10:38:03.026619 1056989 MemoryUtils.cpp:227] sys:248.323GiB/503.287GiB 49.34% usr:201.003GiB/201.295GiB 99.85%
I20230417 10:39:03.176322 1056989 MemoryUtils.cpp:227] sys:212.410GiB/503.287GiB 42.20% usr:152.009GiB/201.295GiB 75.52%
I20230417 10:40:04.027333 1056989 MemoryUtils.cpp:227] sys:183.627GiB/503.287GiB 36.49% usr:117.554GiB/201.295GiB 58.40%
I20230417 10:41:04.324846 1056989 MemoryUtils.cpp:227] sys:165.251GiB/503.287GiB 32.83% usr:90.680GiB/201.295GiB 45.05%
I20230417 10:42:05.232738 1056989 MemoryUtils.cpp:227] sys:153.552GiB/503.287GiB 30.51% usr:78.302GiB/201.295GiB 38.90%
I20230417 10:43:05.278375 1056989 MemoryUtils.cpp:227] sys:143.394GiB/503.287GiB 28.49% usr:75.193GiB/201.295GiB 37.35%
I20230417 10:44:05.277688 1056989 MemoryUtils.cpp:227] sys:164.867GiB/503.287GiB 32.76% usr:100.171GiB/201.295GiB 49.76%
I20230417 10:45:05.277849 1056989 MemoryUtils.cpp:227] sys:186.614GiB/503.287GiB 37.08% usr:125.464GiB/201.295GiB 62.33%
I20230417 10:46:05.279211 1056989 MemoryUtils.cpp:227] sys:208.570GiB/503.287GiB 41.44% usr:150.414GiB/201.295GiB 74.72%
I20230417 10:47:05.278875 1056989 MemoryUtils.cpp:227] sys:231.707GiB/503.287GiB 46.04% usr:174.757GiB/201.295GiB 86.82%
I20230417 10:48:05.278530 1056989 MemoryUtils.cpp:227] sys:255.733GiB/503.287GiB 50.81% usr:199.036GiB/201.295GiB 98.88%
I20230417 10:49:05.278842 1056989 MemoryUtils.cpp:227] sys:230.537GiB/503.287GiB 45.81% usr:162.638GiB/201.295GiB 80.80%
I20230417 10:50:05.277948 1056989 MemoryUtils.cpp:227] sys:191.468GiB/503.287GiB 38.04% usr:115.817GiB/201.295GiB 57.54%
I20230417 10:51:05.278424 1056989 MemoryUtils.cpp:227] sys:148.417GiB/503.287GiB 29.49% usr:71.569GiB/201.295GiB 35.55%
I20230417 10:52:05.277276 1056989 MemoryUtils.cpp:227] sys:100.125GiB/503.287GiB 19.89% usr:31.286GiB/201.295GiB 15.54%
I20230417 10:53:05.278837 1056989 MemoryUtils.cpp:227] sys:52.897GiB/503.287GiB 10.51% usr:5.074GiB/201.295GiB 2.52%
I20230417 10:54:05.278858 1056989 MemoryUtils.cpp:227] sys:52.896GiB/503.287GiB 10.51% usr:5.074GiB/201.295GiB 2.52%
I20230417 10:55:05.278710 1056989 MemoryUtils.cpp:227] sys:52.899GiB/503.287GiB 10.51% usr:5.073GiB/201.295GiB 2.52%
I20230417 10:56:05.278095 1056989 MemoryUtils.cpp:227] sys:52.897GiB/503.287GiB 10.51% usr:5.073GiB/201.295GiB 2.52%

执行计划

请问要满足我们的需求,该怎么写NGQL会更优或者能做什么优化?谢谢。

1 个赞

你的这个查询比较大,filter 很少,多个 pattern 之间还会有笛卡尔积,都比较消耗资源。

要么加配置。

要么改查询,可以考虑减少 pattern 数量,看能不能合并起来;也可以加一些 filter,过滤一些数据,减少数据量;也可以拆成多条查询。

如果你就是想要匹配 pattern,那就用 match。nGQL 中的 GO 并没有匹配 pattern 的功能,如果想用,你需要把你的需求重新建模成一个对图的游走。

1 个赞

感谢回答。还想请教几个问题。

加配置

加配置是指对Nebula集群进行扩容吗。目前以6台物理机的规模上文提到的多MATCH语句都会跑超时和将Storage打满,估计加资源也不能解决问题。

改查询

合并减少pattern的数量

测试了下,等效的语句合并pattern后执行计划是不一样的。

MATCH  (abaa9c1ec:person)-[a130d5464:trans]->(a5a5e256e:person), (a5a5e256e:person)-[a358405f5:trans]->(ad2d4c3ae:person), (a5a5e256e:person)-[a22967a8b:trans]->(aacebe3dc:person) 
where id(abaa9c1ec)=='66326d4b4*****762449ddd514'
return abaa9c1ec,ad2d4c3ae,a5a5e256e,aacebe3dc,a130d5464,a358405f5,a22967a8b

MATCH  (a8aef3659:telephone)<-[a0f346eb5:credit_telephone]-(ada7a4f28:incoming_parts)-[ad204f424:credit_person]->(a839a0663:person), 
(a619e6f62:person)-[a10b86735:has_telephone]->(a8aef3659:telephone)
where id(a839a0663)=='35a8fb43d00f8f440c7a16db58747802' return a839a0663,ada7a4f28,a8aef3659,a619e6f62,a10b86735,a0f346eb5,ad204f424

合并后应该对查询有性能的提升。

拆开多条查询

之前我们是用的类似实现来完成这个需求,将Pattern分拆为多条路径,然后用算法来取各路径的交集来保证,但看到Nebula有多MATCH这个功能就换成多MATCH进行实现。在数据量大之后,多MATCH会有很大的性能问题。
如何精确匹配子图

改为GO查询。

请问要用GO语法来实现自定义pattern这种需求有比较好的思路吗?GO加管道一步步去查询是吗?

请问是我们企业版的客户吗?

MATCH语句匹配多个模式和多MATCH检索哪个性能更优? - #5,来自 SwimSweet 我们在这个问题讨论过,对于Nebula而言,拆开和合并的语句是不等价的。对于以“,”分割的两条路径,Nebula使用Join的方式连接。对于一条路径中的内部,Nebula使用Travese算子扩张。对于不同的数据,两者的速度也会体现出差异。在您这边的问题来看的,可能合并起来更能利用图数据库的特性。如果频繁的Join,那是不是回到了关系数据库的路上了? :joy:

抱歉,我不是。目前用的社区版

sorry.之前没完全理解您的意思。看完执行计划后,发现如果路径上没带filter过滤条件的话,是会全表扫描出所有满足条件的路径,最后全量和另一条路径的匹配结果进行Join,所以性能会很差。针对我的需求,打算将能合并的路径进行合并,并保证每个合并路径后都带有filter过滤条件,这样性能应该会好一点。

嗯,会好一些。