Nebula Graphd (3.5.0) crash down

提问参考模版:

  • nebula 版本:3.5.0
  • 部署方式:分布式 3节点
  • 安装方式:RPM
  • 是否上生产环境:N
  • 硬件信息
    • 磁盘 :阿里云PL0 ESSD
    • CPU、内存信息 2C16G
  • 问题的具体描述

我们在测试环境执行 Match Path语句,双向边,10层查询时,发现再执行一段时间后(10s+),graphd出现了crash down;

执行SQL
MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300

graphd配置

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-graphd.pid
# Whether to enable optimizer
--enable_optimizer=true
# The default charset when a space is created
--default_charset=utf8
# The default collate when a space is created
--default_collate=utf8_bin
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=/alidata2/nebula-graph3/log
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=3
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=graphd-stdout.log
--stderr_log_file=graphd-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2
# wether logging files' name contain time stamp.
--timestamp_in_logfile_name=true
########## query ##########
# Whether to treat partial success as an error.
# This flag is only used for Read-only access, and Modify access always treats partial success as an error.
--accept_partial_success=false
# Maximum sentence length, unit byte
--max_allowed_query_size=4194304

########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=172.20.221.57:9559,172.20.221.58:9559,172.20.221.59:9559
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=172.20.221.58
# Network device to listen on
--listen_netdev=any
# Port to listen on
--port=9669
# To turn on SO_REUSEPORT or not
--reuse_port=false
# Backlog of the listen socket, adjust this together with net.core.somaxconn
--listen_backlog=1024
# The number of seconds Nebula service waits before closing the idle connections
--client_idle_timeout_secs=28800
# The number of seconds before idle sessions expire
# The range should be in [1, 604800]
--session_idle_timeout_secs=28800
# The number of threads to accept incoming connections
--num_accept_threads=1
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=0
# Max active connections for all networking threads. 0 means no limit.
# Max connections for each networking thread = num_max_connections / num_netio_threads
--num_max_connections=0
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=0
# HTTP service ip
--ws_ip=172.20.221.58
# HTTP service port
--ws_http_port=19669
# storage client timeout
--storage_client_timeout_ms=60000
# slow query threshold in us
--slow_query_threshold_us=200000
# Port to listen on Meta with HTTP protocol, it corresponds to ws_http_port in metad's configuration file
--ws_meta_http_port=19559

########## authentication ##########
# Enable authorization
--enable_authorize=false
# User login authentication type, password for nebula authentication, ldap for ldap authentication, cloud for cloud authentication
--auth_type=password

########## memory ##########
# System memory high watermark ratio, cancel the memory checking when the ratio greater than 1.0
--system_memory_high_watermark_ratio=0.9

########## metrics ##########
--enable_space_level_metrics=true

########## experimental feature ##########
# if use experimental features
--enable_experimental_feature=true

# if use balance data feature, only work if enable_experimental_feature is true
--enable_data_balance=true

# enable udf, written in c++ only for now
--enable_udf=false

# set the directory where the .so files of udf are stored, when enable_udf is true
--udf_path=/alidata2/nebula-graph3/udf/

########## session ##########
# Maximum number of sessions that can be created per IP and per user
--max_sessions_per_ip_per_user=500

########## memory tracker ##########
# trackable memory ratio (trackable_memory / (total_memory - untracked_reserved_memory) )
--memory_tracker_limit_ratio=0.8
# untracked reserved memory in Mib
--memory_tracker_untracked_reserved_memory_mb=50

# enable log memory tracker stats periodically
--memory_tracker_detail_log=true
# log memory tacker stats interval in milliseconds
--memory_tracker_detail_log_interval_ms=60000

# enable memory background purge (if jemalloc is used)
--memory_purge_enabled=true
# memory background purge interval in seconds
--memory_purge_interval_seconds=10

########## performance optimization ##########
# The max job size in multi job mode
--max_job_size=1
# The min batch size for handling dataset in multi job mode, only enabled when max_job_size is greater than 1
--min_batch_size=8192
# if true, return directly without go through RPC
--optimize_appendvertices=false
# number of paths constructed by each thread
--path_batch_size=10000

graphd info日志

I20230627 13:36:45.297482  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:36:45.297497  5080 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:36:45.298338  5080 MetaClient.cpp:2680] Metad last update time: 1687844119609
I20230627 13:36:54.377835  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:36:54.377879  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:36:54.378486  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:36:55.308418  5129 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230627 13:36:55.308517  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:36:55.308532  5120 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:36:55.309391  5120 MetaClient.cpp:2680] Metad last update time: 1687844119609
I20230627 13:37:04.249007  5080 Acceptor.cpp:476] Acceptor=0x7f6db1874028 onEmpty()
I20230627 13:37:04.249070  5080 Acceptor.cpp:476] Acceptor=0x7f6db1874028 onEmpty()
I20230627 13:37:04.253031  5080 Acceptor.cpp:476] Acceptor=0x7f6db1874028 onEmpty()
I20230627 13:37:05.318737  5129 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230627 13:37:05.318833  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:05.318846  5080 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:05.319660  5080 MetaClient.cpp:2680] Metad last update time: 1687844119609
I20230627 13:37:15.329375  5129 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230627 13:37:15.329465  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:15.329480  5120 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:15.330266  5120 MetaClient.cpp:2680] Metad last update time: 1687844119609
I20230627 13:37:18.544281  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:37:18.544342  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:37:18.590086  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:37:18.612643  5080 Acceptor.cpp:476] Acceptor=0x7f6db1874028 onEmpty()
I20230627 13:37:18.612695  5080 Acceptor.cpp:476] Acceptor=0x7f6db1874028 onEmpty()
I20230627 13:37:18.638947  5080 Acceptor.cpp:476] Acceptor=0x7f6db1874028 onEmpty()
I20230627 13:37:25.340356  5129 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230627 13:37:25.340485  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:25.340504  5080 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:25.341346  5080 MetaClient.cpp:2680] Metad last update time: 1687844119609
I20230627 13:37:28.028215  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:37:28.028281  5120 Acceptor.cpp:476] Acceptor=0x7f6db1874528 onEmpty()
I20230627 13:37:28.055213  5069 GraphService.cpp:77] Authenticating user root from 192.168.28.30:56540
I20230627 13:37:28.055370  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:28.055392  5120 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:28.056380  5069 GraphSessionManager.cpp:139] Create session id: 1687844272165390, for user: root
I20230627 13:37:28.056427  5069 GraphService.cpp:111] Create session doFinish
I20230627 13:37:28.087116  5070 GraphSessionManager.cpp:40] Find session from cache: 1687844272165390
I20230627 13:37:28.087208  5070 ClientSession.cpp:43] Add query: USE data_asset_10022, epId: 0
I20230627 13:37:28.087226  5070 QueryInstance.cpp:80] Parsing query: USE data_asset_10022
I20230627 13:37:28.087379  5070 Symbols.cpp:48] New variable for: __Start_0
I20230627 13:37:28.087397  5070 PlanNode.cpp:27] New variable: __Start_0
I20230627 13:37:28.087441  5070 Symbols.cpp:48] New variable for: __RegisterSpaceToSession_1
I20230627 13:37:28.087450  5070 PlanNode.cpp:27] New variable: __RegisterSpaceToSession_1
I20230627 13:37:28.087466  5070 Validator.cpp:409] root: RegisterSpaceToSession tail: Start
I20230627 13:37:28.087786  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:28.087805  5080 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:28.088167  5069 SwitchSpaceExecutor.cpp:45] Graph switched to `data_asset_10022', space id: 127
I20230627 13:37:28.088194  5069 QueryInstance.cpp:128] Finish query: USE data_asset_10022
I20230627 13:37:28.088249  5069 StatsManager.cpp:159] Registered histogram query_latency_us{space=data_asset_10022} [bucketSize: 1000, min value: 0, max value: 2000]
I20230627 13:37:28.088264  5069 ClientSession.cpp:52] Delete query, epId: 0
I20230627 13:37:28.102105  5070 GraphSessionManager.cpp:40] Find session from cache: 1687844272165390
I20230627 13:37:28.102183  5070 StatsManager.cpp:107] Registered stats num_queries{space=data_asset_10022}
I20230627 13:37:28.102208  5070 StatsManager.cpp:107] Registered stats num_active_queries{space=data_asset_10022}
I20230627 13:37:28.102223  5070 ClientSession.cpp:43] Add query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888]  RETURN DISTINCT p  LIMIT 300, epId: 1
I20230627 13:37:28.102232  5070 QueryInstance.cpp:80] Parsing query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888]  RETURN DISTINCT p  LIMIT 300
I20230627 13:37:28.102499  5070 StatsManager.cpp:107] Registered stats num_sentences{space=data_asset_10022}
I20230627 13:37:28.102524  5070 Symbols.cpp:48] New variable for: __Start_0
I20230627 13:37:28.102537  5070 PlanNode.cpp:27] New variable: __Start_0
I20230627 13:37:28.102552  5070 Validator.cpp:350] Space chosen, name: data_asset_10022 id: 127
I20230627 13:37:28.102717  5070 Symbols.cpp:48] New variable for: __VAR_0
I20230627 13:37:28.102731  5070 AnonVarGenerator.h:28] Build anon var: __VAR_0
I20230627 13:37:28.102749  5070 Symbols.cpp:48] New variable for: __PassThrough_1
I20230627 13:37:28.102762  5070 PlanNode.cpp:27] New variable: __PassThrough_1
I20230627 13:37:28.102774  5070 Symbols.cpp:48] New variable for: __Dedup_2
I20230627 13:37:28.102782  5070 PlanNode.cpp:27] New variable: __Dedup_2
I20230627 13:37:28.102797  5070 MatchPathPlanner.cpp:126] Find starts: 0, Pattern has 1 edges, root: __Dedup_2, colNames: _vid
I20230627 13:37:28.102809  5070 Symbols.cpp:48] New variable for: __Start_3
I20230627 13:37:28.102815  5070 PlanNode.cpp:27] New variable: __Start_3
I20230627 13:37:28.102830  5070 Symbols.cpp:48] New variable for: __Traverse_4
I20230627 13:37:28.102838  5070 PlanNode.cpp:27] New variable: __Traverse_4
I20230627 13:37:28.102991  5070 ServerBasedSchemaManager.cpp:68] Get Edge Schema Space 127, EdgeType 150, Version -1
I20230627 13:37:28.103017  5070 Symbols.cpp:48] New variable for: __AppendVertices_5
I20230627 13:37:28.103037  5070 PlanNode.cpp:27] New variable: __AppendVertices_5
I20230627 13:37:28.103166  5070 Symbols.cpp:48] New variable for: __Project_6
I20230627 13:37:28.103178  5070 PlanNode.cpp:27] New variable: __Project_6
I20230627 13:37:28.103204  5070 Symbols.cpp:48] New variable for: __Project_7
I20230627 13:37:28.103220  5070 PlanNode.cpp:27] New variable: __Project_7
I20230627 13:37:28.103232  5070 Symbols.cpp:48] New variable for: __Dedup_8
I20230627 13:37:28.103240  5070 PlanNode.cpp:27] New variable: __Dedup_8
I20230627 13:37:28.103255  5070 Symbols.cpp:48] New variable for: __Limit_9
I20230627 13:37:28.103261  5070 PlanNode.cpp:27] New variable: __Limit_9
I20230627 13:37:28.103281  5070 ReturnClausePlanner.cpp:52] return root: __Limit_9 colNames: p
I20230627 13:37:28.103292  5070 MatchPlanner.cpp:172] root(Limit_9): __Limit_9, tail(Start_3): __Start_3
I20230627 13:37:28.103304  5070 Validator.cpp:409] root: Limit tail: Start
I20230627 13:37:28.103313  5070 Validator.cpp:409] root: Limit tail: Start
I20230627 13:37:28.103390  5070 Symbols.cpp:48] New variable for: __Project_10
I20230627 13:37:28.103402  5070 PlanNode.cpp:27] New variable: __Project_10
I20230627 13:37:28.103574  5070 StatsManager.cpp:159] Registered histogram optimizer_latency_us{space=data_asset_10022} [bucketSize: 1000, min value: 0, max value: 2000]
I20230627 13:37:28.104048  5120 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one
I20230627 13:37:28.104074  5120 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 1 times
I20230627 13:37:28.108047  5080 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one
I20230627 13:37:28.108075  5080 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 1 times
I20230627 13:37:28.108608  5080 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one
I20230627 13:37:28.108656  5080 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times
I20230627 13:37:28.109315  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230627 13:37:28.149472  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230627 13:37:28.149734  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230627 13:37:28.149968  5120 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one
I20230627 13:37:28.149988  5120 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times
I20230627 13:37:28.174249  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230627 13:37:28.174695  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230627 13:37:28.185066  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230627 13:37:28.208976  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230627 13:37:28.209187  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230627 13:37:28.209724  5120 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one
I20230627 13:37:28.209753  5120 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 3 times
I20230627 13:37:28.220980  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230627 13:37:28.221318  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230627 13:37:28.221410  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230627 13:37:28.225270  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230627 13:37:28.225446  5080 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one
I20230627 13:37:28.225466  5080 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 3 times
I20230627 13:37:28.225734  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230627 13:37:35.350646  5129 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230627 13:37:35.350788  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:35.350805  5080 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:35.351648  5080 MetaClient.cpp:2680] Metad last update time: 1687844119609
I20230627 13:37:44.538403  5132 MemoryUtils.cpp:227] sys:12.870GiB/15.250GiB 84.39% usr:11.681GiB/12.161GiB 96.05%
I20230627 13:37:44.682644  5130 GraphSessionManager.cpp:199] Try to reclaim expired sessions out of 2 ones
I20230627 13:37:44.682714  5130 GraphSessionManager.cpp:205] SessionId: 1687844272165390, idleSecs: 17
I20230627 13:37:44.682724  5130 GraphSessionManager.cpp:205] SessionId: 1687836180124875, idleSecs: 180
I20230627 13:37:44.682732  5130 GraphSessionManager.cpp:240] Add Update session id: 1687844272165390
I20230627 13:37:44.682746  5130 GraphSessionManager.cpp:240] Add Update session id: 1687836180124875
I20230627 13:37:44.682858  5120 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:44.682878  5120 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:45.360337  5129 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230627 13:37:45.360476  5080 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230627 13:37:45.360492  5080 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230627 13:37:45.361410  5080 MetaClient.cpp:2680] Metad last update time: 1687844119609

graphd error 日志
没有相关日志

dashboard 节点内存水位线

dashboard cpu水位线

先贴一下 graphd-stderr.log 的内容

看上面的图,日志为空

又试了一次,没有生成graphd-stderr.log

还有什么方式可以排查一下么

有对应的 core 文件吗?

试试用命令egrep -i -r 'killed process' /var/log -C 5查看一下linux的日志,看看nebula-graph是否因为该进程内存占用过大而被系统kill了?
然后你贴的监控图跨度时间太长了,看不到波动,可以试试将监控周期调短为几分钟,再次执行该语句,看看内存的使用情况

1 个赞

我等下试一下,一会儿反馈到这里

[root@data-cdh6-test02 ~]# egrep -i -r 'killed process' /var/log -C 5
Binary file /var/log/journal/20190711105006363114529432776998/system.journal matches
Binary file /var/log/journal/20190711105006363114529432776998/system@ff28feb636ba42dbb3482a67bee1a596-000000000009684b-0005f8b9da77c44b.journal matches
--
/var/log/messages-Jun 27 11:55:47 data-cdh6-test02 kernel: [23012]  1003 23012    28886      113      14        0             0 bash
/var/log/messages-Jun 27 11:55:47 data-cdh6-test02 kernel: [23061]     0 23061    53675      272      60        0             0 sudo
/var/log/messages-Jun 27 11:55:47 data-cdh6-test02 kernel: [23062]     0 23062    28919      140      13        0             0 bash
/var/log/messages-Jun 27 11:55:47 data-cdh6-test02 kernel: [25562]     0 25562  3528242  3049375    6033        0             0 nebula-graphd
/var/log/messages-Jun 27 11:55:47 data-cdh6-test02 kernel: Out of memory: Kill process 25562 (nebula-graphd) score 741 or sacrifice child
/var/log/messages:Jun 27 11:55:47 data-cdh6-test02 kernel: Killed process 25562 (nebula-graphd) total-vm:14112968kB, anon-rss:12197500kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 27 11:56:10 data-cdh6-test02 super: root  /opt/nebula-graph-3.5.0/scripts [23062]: 2023-06-27 11:56:10 sh  nebula.service status graphd [0]
/var/log/messages-Jun 27 11:56:13 data-cdh6-test02 super: root  /opt/nebula-graph-3.5.0/scripts [23062]: 2023-06-27 11:56:10 sh  nebula.service status graphd [0]
/var/log/messages-Jun 27 11:56:13 data-cdh6-test02 super: root  /opt/nebula-graph-3.5.0/scripts [23062]: 2023-06-27 11:56:10 sh  nebula.service status graphd [0]
/var/log/messages-Jun 27 11:56:20 data-cdh6-test02 super: root  /opt/nebula-graph-3.5.0/scripts [23062]: 2023-06-27 11:56:20 LL [127]
/var/log/messages-Jun 27 11:56:20 data-cdh6-test02 super: root  /opt/nebula-graph-3.5.0/scripts [23062]: 2023-06-27 11:56:20 LL [127]
--
/var/log/messages-Jun 27 13:37:49 data-cdh6-test02 kernel: [ 4945]     0  4945   147947     3506     143        0             0 nebula-metad
/var/log/messages-Jun 27 13:37:49 data-cdh6-test02 kernel: [ 5052]     0  5052  4197170  3736721    7378        0             0 nebula-graphd
/var/log/messages-Jun 27 13:37:49 data-cdh6-test02 kernel: [ 5152]     0  5152   166619    10101     169        0             0 nebula-storaged
/var/log/messages-Jun 27 13:37:49 data-cdh6-test02 kernel: [ 5722]     0  5722    27024       33       9        0             0 tail
/var/log/messages-Jun 27 13:37:49 data-cdh6-test02 kernel: Out of memory: Kill process 5052 (nebula-graphd) score 908 or sacrifice child
/var/log/messages:Jun 27 13:37:49 data-cdh6-test02 kernel: Killed process 5052 (nebula-graphd) total-vm:16788680kB, anon-rss:14946884kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 27 13:38:12 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [3997]: 2023-06-27 13:36:55 tail -f nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-133443.5052 [130]
/var/log/messages-Jun 27 13:38:13 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [3997]: 2023-06-27 13:36:55 tail -f nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-133443.5052 [130]
/var/log/messages-Jun 27 13:38:13 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [3997]: 2023-06-27 13:36:55 tail -f nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-133443.5052 [130]
/var/log/messages-Jun 27 13:38:14 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [3997]: 2023-06-27 13:38:14 ll [0]
/var/log/messages-Jun 27 13:38:43 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [3997]: 2023-06-27 13:38:43 /opt/nebula-graph-3.5.0/scripts/nebula.service status graphd [0]
--
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11334]  1003 11334    28886      123      13        0             0 bash
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11369]     0 11369    53675      282      59        0             0 sudo
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11370]     0 11370    28920      152      13        0             0 bash
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11504]     0 11504  4195890  3732846    7372        0             0 nebula-graphd
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: Out of memory: Kill process 11504 (nebula-graphd) score 907 or sacrifice child
/var/log/messages:Jun 27 14:36:20 data-cdh6-test02 kernel: Killed process 11504 (nebula-graphd) total-vm:16783560kB, anon-rss:14931384kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: AliNet invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: AliNet cpuset=/ mems_allowed=0
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: CPU: 0 PID: 23972 Comm: AliNet Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e623647 04/01/2014
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: Call Trace:
--
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11334]  1003 11334    28886      123      13        0             0 bash
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11369]     0 11369    53675      282      59        0             0 sudo
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11370]     0 11370    28920      152      13        0             0 bash
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: [11523]     0 11504  4195890  3733220    7372        0             0 executor-pri3-1
/var/log/messages-Jun 27 14:36:20 data-cdh6-test02 kernel: Out of memory: Kill process 11523 (executor-pri3-1) score 907 or sacrifice child
/var/log/messages:Jun 27 14:36:20 data-cdh6-test02 kernel: Killed process 11523 (executor-pri3-1) total-vm:16783560kB, anon-rss:14932560kB, file-rss:320kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:36:27 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:35:47 ll [0]
/var/log/messages-Jun 27 14:36:27 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:35:47 ll [0]
/var/log/messages-Jun 27 14:36:28 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:35:47 ll [0]
/var/log/messages-Jun 27 14:36:38 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:36:38 cat nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-142700.11504 [0]
/var/log/messages-Jun 27 14:37:15 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:36:38 cat nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-142700.11504 [0]
--
/var/log/messages-Jun 27 14:42:02 data-cdh6-test02 kernel: [11334]  1003 11334    28886      109      13        0             0 bash
/var/log/messages-Jun 27 14:42:02 data-cdh6-test02 kernel: [11369]     0 11369    53675      267      59        0             0 sudo
/var/log/messages-Jun 27 14:42:02 data-cdh6-test02 kernel: [11370]     0 11370    28920      151      13        0             0 bash
/var/log/messages-Jun 27 14:42:02 data-cdh6-test02 kernel: [13116]     0 13116  4194482  3735425    7377        0             0 nebula-graphd
/var/log/messages-Jun 27 14:42:02 data-cdh6-test02 kernel: Out of memory: Kill process 13116 (nebula-graphd) score 908 or sacrifice child
/var/log/messages:Jun 27 14:42:02 data-cdh6-test02 kernel: Killed process 13116 (nebula-graphd) total-vm:16777928kB, anon-rss:14941700kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:42:06 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:42:06 /opt/nebula-graph-3.5.0/scripts/nebula.service status graphd [0]
/var/log/messages-Jun 27 14:42:07 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:42:06 /opt/nebula-graph-3.5.0/scripts/nebula.service status graphd [0]
/var/log/messages-Jun 27 14:42:07 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:42:06 /opt/nebula-graph-3.5.0/scripts/nebula.service status graphd [0]
/var/log/messages-Jun 27 14:42:07 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:42:06 /opt/nebula-graph-3.5.0/scripts/nebula.service status graphd [0]
/var/log/messages-Jun 27 14:42:24 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:42:16 tail -f nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-143941.13116 [130]
--
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11334]  1003 11334    28886      109      13        0             0 bash
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11369]     0 11369    53675      267      59        0             0 sudo
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11370]     0 11370    28920      186      13        0             0 bash
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [14148]     0 14148  4194482  3723812    7352        0             0 nebula-graphd
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: Out of memory: Kill process 14148 (nebula-graphd) score 905 or sacrifice child
/var/log/messages:Jun 27 14:45:45 data-cdh6-test02 kernel: Killed process 14148 (nebula-graphd) total-vm:16777928kB, anon-rss:14895248kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: nebula-storaged invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: nebula-storaged cpuset=/ mems_allowed=0
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: CPU: 1 PID: 5245 Comm: nebula-storaged Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e623647 04/01/2014
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: Call Trace:
--
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11334]  1003 11334    28886      109      13        0             0 bash
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11369]     0 11369    53675      267      59        0             0 sudo
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11370]     0 11370    28920      186      13        0             0 bash
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [14152]     0 14148  4194482  3724126    7352        0             0 HTTPSrvExec0
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: Out of memory: Kill process 14152 (HTTPSrvExec0) score 905 or sacrifice child
/var/log/messages:Jun 27 14:45:45 data-cdh6-test02 kernel: Killed process 14152 (HTTPSrvExec0) total-vm:16777928kB, anon-rss:14896280kB, file-rss:224kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: AliYunDun invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: AliYunDun cpuset=/ mems_allowed=0
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: CPU: 1 PID: 20231 Comm: AliYunDun Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e623647 04/01/2014
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: Call Trace:
--
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11334]  1003 11334    28886      102      13        0             0 bash
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11369]     0 11369    53675      259      59        0             0 sudo
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [11370]     0 11370    28920      178      13        0             0 bash
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: [14166]     0 14148  4194482  3724164    7352        0             0 executor-pri3-1
/var/log/messages-Jun 27 14:45:45 data-cdh6-test02 kernel: Out of memory: Kill process 14166 (executor-pri3-1) score 905 or sacrifice child
/var/log/messages:Jun 27 14:45:45 data-cdh6-test02 kernel: Killed process 14166 (executor-pri3-1) total-vm:16777928kB, anon-rss:14896360kB, file-rss:296kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:46:01 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:46:01 cat nebula-storaged.data-cdh6-test02.root.log.ERROR.20230627-143147.5152 [0]
/var/log/messages-Jun 27 14:46:25 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:46:24 /opt/nebula-graph-3.5.0/scripts/nebula.service status all [0]
/var/log/messages-Jun 27 14:46:26 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:46:24 /opt/nebula-graph-3.5.0/scripts/nebula.service status all [0]
/var/log/messages-Jun 27 14:46:26 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:46:24 /opt/nebula-graph-3.5.0/scripts/nebula.service status all [0]
/var/log/messages-Jun 27 14:46:29 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 14:46:29 ll [0]
--
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [11369]     0 11369    53675      259      59        0             0 sudo
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [11370]     0 11370    28920      193      13        0             0 bash
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [15066]     0 15066  4195890  3732944    7368        0             0 nebula-graphd
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [16027]     0 16027    27024       80      10        0             0 tail
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: Out of memory: Kill process 15066 (nebula-graphd) score 907 or sacrifice child
/var/log/messages:Jun 27 14:57:51 data-cdh6-test02 kernel: Killed process 15066 (nebula-graphd) total-vm:16783560kB, anon-rss:14931380kB, file-rss:396kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: systemd-journal cpuset=/ mems_allowed=0
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: CPU: 1 PID: 586 Comm: systemd-journal Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e623647 04/01/2014
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: Call Trace:
--
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [11369]     0 11369    53675      259      59        0             0 sudo
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [11370]     0 11370    28920      193      13        0             0 bash
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [15086]     0 15066  4195890  3733292    7368        0             0 executor-pri3-1
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: [16027]     0 16027    27024       80      10        0             0 tail
/var/log/messages-Jun 27 14:57:51 data-cdh6-test02 kernel: Out of memory: Kill process 15086 (executor-pri3-1) score 907 or sacrifice child
/var/log/messages:Jun 27 14:57:51 data-cdh6-test02 kernel: Killed process 15086 (executor-pri3-1) total-vm:16783560kB, anon-rss:14932416kB, file-rss:752kB, shmem-rss:0kB
/var/log/messages-Jun 27 14:58:06 data-cdh6-test02 systemd: Started Session 140594 of user super.
/var/log/messages-Jun 27 14:58:06 data-cdh6-test02 systemd-logind: New session 140594 of user super.
/var/log/messages-Jun 27 14:58:06 data-cdh6-test02 super: super 172.27.67.23 34092 172.20.221.58 22 /home/super [16083]: 2023-06-26 18:23:08 sudo -i [0]
/var/log/messages-Jun 27 14:58:14 data-cdh6-test02 super: root  /root [16117]: 2023-06-27 13:54:45 top [0]
/var/log/messages-Jun 27 14:58:15 data-cdh6-test02 super: root  /root [16117]: 2023-06-27 13:54:45 top [0]
--
/var/log/messages-Jun 27 15:02:29 data-cdh6-test02 kernel: [16116]     0 16116    53675      270      59        0             0 sudo
/var/log/messages-Jun 27 15:02:29 data-cdh6-test02 kernel: [16117]     0 16117    28919      139      14        0             0 bash
/var/log/messages-Jun 27 15:02:29 data-cdh6-test02 kernel: [16610]     0 16610  4194484  3729287    7364        0             0 nebula-graphd
/var/log/messages-Jun 27 15:02:29 data-cdh6-test02 kernel: [16648]     0 16648    40534      175      35        0             0 top
/var/log/messages-Jun 27 15:02:29 data-cdh6-test02 kernel: Out of memory: Kill process 16610 (nebula-graphd) score 906 or sacrifice child
/var/log/messages:Jun 27 15:02:29 data-cdh6-test02 kernel: Killed process 16610 (nebula-graphd) total-vm:16777936kB, anon-rss:14917148kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 27 15:02:32 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:02:32 nebula-graphd.data-cdh6-test02.root.log.ERROR.20230627-150203.16610 [127]
/var/log/messages-Jun 27 15:02:38 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:02:35 tail -f nebula-graphd.data-cdh6-test02.root.log.ERROR.20230627-150203.16610 [130]
/var/log/messages-Jun 27 15:02:38 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:02:35 tail -f nebula-graphd.data-cdh6-test02.root.log.ERROR.20230627-150203.16610 [130]
/var/log/messages-Jun 27 15:02:38 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:02:35 tail -f nebula-graphd.data-cdh6-test02.root.log.ERROR.20230627-150203.16610 [130]
/var/log/messages-Jun 27 15:02:38 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:02:35 tail -f nebula-graphd.data-cdh6-test02.root.log.ERROR.20230627-150203.16610 [130]
--
/var/log/messages-Jun 27 15:06:11 data-cdh6-test02 kernel: [16116]     0 16116    53675      260      59        0             0 sudo
/var/log/messages-Jun 27 15:06:11 data-cdh6-test02 kernel: [16117]     0 16117    28919      130      14        0             0 bash
/var/log/messages-Jun 27 15:06:11 data-cdh6-test02 kernel: [16648]     0 16648    40534      176      35        0             0 top
/var/log/messages-Jun 27 15:06:11 data-cdh6-test02 kernel: [17332]     0 17332  4195890  3729406    7361        0             0 nebula-graphd
/var/log/messages-Jun 27 15:06:11 data-cdh6-test02 kernel: Out of memory: Kill process 17332 (nebula-graphd) score 906 or sacrifice child
/var/log/messages:Jun 27 15:06:11 data-cdh6-test02 kernel: Killed process 17332 (nebula-graphd) total-vm:16783560kB, anon-rss:14917624kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 27 15:06:12 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:05:30  /opt/nebula-graph-3.5.0/scripts/nebula.service start graphd [0]
/var/log/messages-Jun 27 15:06:12 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:05:30  /opt/nebula-graph-3.5.0/scripts/nebula.service start graphd [0]
/var/log/messages-Jun 27 15:06:13 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:06:13 ll [0]
/var/log/messages-Jun 27 15:06:27 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:06:27 tail =f nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-150530.17332 [1]
/var/log/messages-Jun 27 15:07:18 data-cdh6-test02 super: root  /alidata2/nebula-graph3/log [11370]: 2023-06-27 15:06:27 tail =f nebula-graphd.data-cdh6-test02.root.log.INFO.20230627-150530.17332 [1]
--
/var/log/messages-Jun 27 16:28:39 data-cdh6-test02 kernel: [11369]     0 11369    53675      259      59        0             0 sudo

拉长的内存使用,机器是2c16G,水位线是0.8,难道达到的时候,直接kill掉了Graphd了么?
水位线保护不该是拒绝查询?

已解决,–system_memory_high_watermark_ratio=0.9 这个设置高了,已解决,感谢!

1 个赞

hello, 昨天晚上我们在做性能测试时,发现水位线可以对于当前内存超过水位线的情况,会有很好的拦截,但是如果在请求查询前,内存不到水位线的情况下,一个消耗资源巨大的查询,会直接超越水位线,消耗完所有内存后,打崩graphd.

我们已经将水位线参数调整如下(dashboard截图):

查询请求
MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300

graphd日志

20230630 13:41:51.127751  8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.41% usr:31.000MiB/7.125GiB 0.42%
I20230630 13:41:54.127897  8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.41% usr:31.000MiB/7.125GiB 0.42%
I20230630 13:41:57.129904  8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.40% usr:31.000MiB/7.125GiB 0.42%
I20230630 13:42:00.128515  8315 MemoryUtils.cpp:227] sys:1.437GiB/15.250GiB 9.42% usr:31.000MiB/7.125GiB 0.42%
I20230630 13:42:00.460376  8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230630 13:42:00.460444  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230630 13:42:00.460459  8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230630 13:42:00.461441  8292 MetaClient.cpp:2680] Metad last update time: 1688024786386
I20230630 13:42:02.843515  8282 GraphService.cpp:77] Authenticating user root from 192.168.28.30:53815
I20230630 13:42:02.843633  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230630 13:42:02.843653  8312 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230630 13:42:02.844628  8282 GraphSessionManager.cpp:139] Create session id: 1688103841596307, for user: root
I20230630 13:42:02.844677  8282 GraphService.cpp:111] Create session doFinish
I20230630 13:42:02.856063  8282 GraphSessionManager.cpp:40] Find session from cache: 1688103841596307
I20230630 13:42:02.856137  8283 ClientSession.cpp:43] Add query: USE `data_asset_10022`;, epId: 0
I20230630 13:42:02.856153  8283 QueryInstance.cpp:80] Parsing query: USE `data_asset_10022`;
I20230630 13:42:02.856284  8283 Symbols.cpp:48] New variable for: __Start_0
I20230630 13:42:02.856295  8283 PlanNode.cpp:27] New variable: __Start_0
I20230630 13:42:02.856319  8283 Symbols.cpp:48] New variable for: __RegisterSpaceToSession_1
I20230630 13:42:02.856325  8283 PlanNode.cpp:27] New variable: __RegisterSpaceToSession_1
I20230630 13:42:02.856338  8283 Validator.cpp:409] root: RegisterSpaceToSession tail: Start
I20230630 13:42:02.856627  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230630 13:42:02.856642  8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230630 13:42:02.856972  8283 SwitchSpaceExecutor.cpp:45] Graph switched to `data_asset_10022', space id: 127
I20230630 13:42:02.856995  8283 QueryInstance.cpp:128] Finish query: USE `data_asset_10022`;
I20230630 13:42:02.857013  8283 ClientSession.cpp:52] Delete query, epId: 0
I20230630 13:42:02.868083  8283 GraphSessionManager.cpp:40] Find session from cache: 1688103841596307
I20230630 13:42:02.868115  8283 ClientSession.cpp:43] Add query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888]  RETURN DISTINCT p  LIMIT 300, epId: 1
I20230630 13:42:02.868131  8283 QueryInstance.cpp:80] Parsing query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888]  RETURN DISTINCT p  LIMIT 300
I20230630 13:42:02.868371  8283 Symbols.cpp:48] New variable for: __Start_0
I20230630 13:42:02.868379  8283 PlanNode.cpp:27] New variable: __Start_0
I20230630 13:42:02.868388  8283 Validator.cpp:350] Space chosen, name: data_asset_10022 id: 127
I20230630 13:42:02.868533  8283 Symbols.cpp:48] New variable for: __VAR_0
I20230630 13:42:02.868541  8283 AnonVarGenerator.h:28] Build anon var: __VAR_0
I20230630 13:42:02.868553  8283 Symbols.cpp:48] New variable for: __PassThrough_1
I20230630 13:42:02.868558  8283 PlanNode.cpp:27] New variable: __PassThrough_1
I20230630 13:42:02.868566  8283 Symbols.cpp:48] New variable for: __Dedup_2
I20230630 13:42:02.868570  8283 PlanNode.cpp:27] New variable: __Dedup_2
I20230630 13:42:02.868579  8283 MatchPathPlanner.cpp:126] Find starts: 0, Pattern has 1 edges, root: __Dedup_2, colNames: _vid
I20230630 13:42:02.868587  8283 Symbols.cpp:48] New variable for: __Start_3
I20230630 13:42:02.868590  8283 PlanNode.cpp:27] New variable: __Start_3
I20230630 13:42:02.868599  8283 Symbols.cpp:48] New variable for: __Traverse_4
I20230630 13:42:02.868604  8283 PlanNode.cpp:27] New variable: __Traverse_4
I20230630 13:42:02.868779  8283 Symbols.cpp:48] New variable for: __AppendVertices_5
I20230630 13:42:02.868788  8283 PlanNode.cpp:27] New variable: __AppendVertices_5
I20230630 13:42:02.868871  8283 Symbols.cpp:48] New variable for: __Project_6
I20230630 13:42:02.868876  8283 PlanNode.cpp:27] New variable: __Project_6
I20230630 13:42:02.868893  8283 Symbols.cpp:48] New variable for: __Project_7
I20230630 13:42:02.868897  8283 PlanNode.cpp:27] New variable: __Project_7
I20230630 13:42:02.868913  8283 Symbols.cpp:48] New variable for: __Dedup_8
I20230630 13:42:02.868917  8283 PlanNode.cpp:27] New variable: __Dedup_8
I20230630 13:42:02.868925  8283 Symbols.cpp:48] New variable for: __Limit_9
I20230630 13:42:02.868929  8283 PlanNode.cpp:27] New variable: __Limit_9
I20230630 13:42:02.868935  8283 ReturnClausePlanner.cpp:52] return root: __Limit_9 colNames: p
I20230630 13:42:02.868942  8283 MatchPlanner.cpp:172] root(Limit_9): __Limit_9, tail(Start_3): __Start_3
I20230630 13:42:02.868948  8283 Validator.cpp:409] root: Limit tail: Start
I20230630 13:42:02.868955  8283 Validator.cpp:409] root: Limit tail: Start
I20230630 13:42:02.869010  8283 Symbols.cpp:48] New variable for: __Project_10
I20230630 13:42:02.869016  8283 PlanNode.cpp:27] New variable: __Project_10
I20230630 13:42:02.869575  8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one
I20230630 13:42:02.869596  8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 1 times
I20230630 13:42:02.873762  8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one
I20230630 13:42:02.873785  8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 1 times
I20230630 13:42:02.874086  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230630 13:42:02.874369  8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one
I20230630 13:42:02.874384  8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times
I20230630 13:42:02.913040  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230630 13:42:02.913278  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230630 13:42:02.913506  8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one
I20230630 13:42:02.913524  8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times
I20230630 13:42:02.937062  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230630 13:42:02.937290  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230630 13:42:02.937518  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230630 13:42:02.961143  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230630 13:42:02.961340  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230630 13:42:02.961562  8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one
I20230630 13:42:02.961585  8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 3 times
I20230630 13:42:02.975271  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779
I20230630 13:42:02.975541  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230630 13:42:02.975596  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230630 13:42:02.979934  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779
I20230630 13:42:02.980085  8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one
I20230630 13:42:02.980099  8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 3 times
I20230630 13:42:02.980264  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779
I20230630 13:42:03.128459  8315 MemoryUtils.cpp:227] sys:1.539GiB/15.250GiB 10.09% usr:129.000MiB/7.125GiB 1.77%
I20230630 13:42:06.127883  8315 MemoryUtils.cpp:227] sys:3.634GiB/15.250GiB 23.83% usr:2.183GiB/7.125GiB 30.63%
I20230630 13:42:09.127878  8315 MemoryUtils.cpp:227] sys:5.809GiB/15.250GiB 38.09% usr:4.315GiB/7.125GiB 60.57%
I20230630 13:42:10.471643  8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230630 13:42:10.471740  8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230630 13:42:10.471755  8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230630 13:42:10.472738  8292 MetaClient.cpp:2680] Metad last update time: 1688024786386
I20230630 13:42:12.128513  8315 MemoryUtils.cpp:227] sys:8.024GiB/15.250GiB 52.62% usr:6.502GiB/7.125GiB 91.25%
I20230630 13:42:15.128669  8315 MemoryUtils.cpp:227] sys:10.238GiB/15.250GiB 67.13% usr:8.684GiB/7.125GiB 121.87%
I20230630 13:42:18.129509  8315 MemoryUtils.cpp:227] sys:12.467GiB/15.250GiB 81.75% usr:10.882GiB/7.125GiB 152.72%
I20230630 13:42:20.483639  8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0
I20230630 13:42:20.483723  8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559
I20230630 13:42:20.483740  8312 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559
I20230630 13:42:20.485126  8312 MetaClient.cpp:2680] Metad last update time: 1688024786386
I20230630 13:42:21.128391  8315 MemoryUtils.cpp:227] sys:14.712GiB/15.250GiB 96.47% usr:13.093GiB/7.125GiB 183.76%

可以很清晰的看见从1.4G内存由14G左右,直接打爆。。。。

dashborad
dashborad这里只显示到了9GB左右的分配后,就不再显示了

1 个赞

如何打core文件? 目前打爆了之后,没有本地core文件。

你是通过dashboard修改的吗?dashboard的版本麻烦同步下。
另外,改完以后重启集群了吗?

用的是社区版dashboard,无法修改参数,人肉改的配置文件,并完成重启

最新的社区3.7.0的dashboard

在你查询的时候, 有写入的任务吗?

测试环境,流式数据资产上报量级少,不能说没有。 kafka 大概1个小时十几条的样子。

日志中有没有MemoryTracker set static ratio 这样的日志信息, 或者你把GraphD的INFO日志贴全一点,这边会好排查一些