graph服务宕掉

  • nebula 版本:2.5.1
  • 部署方式:5 节点分布式
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘 SSD
    • CPU 64核 内存 128G
  • 问题的具体描述
    GRAPH服务在使用的过程中突然就宕掉了,如下是nebula-graphd.ERROR的日志(日志中的时间比北京时间晚8个小时):
Log file created at: 2021/11/16 09:28:01
Running on machine: ip-10-11-53-22.ap-southeast-1.compute.internal
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1116 09:28:01.071585 50587 StorageAccessExecutor.h:42] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 18
E1116 09:28:01.071895 50585 StorageAccessExecutor.h:42] GetVerticesExecutor failed, error E_LEADER_CHANGED, part 18
E1116 09:28:01.072172 50588 StorageAccessExecutor.h:42] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 15
E1116 09:28:01.072382 50586 QueryInstance.cpp:110] Storage Error: The leader has changed. Try again later
E1116 09:41:42.444165 50575 QueryInstance.cpp:110] SyntaxError: syntax error near `jog'
E1116 10:08:15.813928 50561 GraphSessionManager.cpp:213] Update sessions failed: Session not existed!
E1116 10:08:15.814035 50566 GraphSessionManager.cpp:242] Update sessions failed: Update s
。。。。。。。。
E1117 02:10:08.334556 50583 QueryInstance.cpp:110] Used memory(101311344KB) hits the high watermark(0.800000) of total system memory(12
5816216KB).
E1117 02:10:09.008476 50587 QueryInstance.cpp:110] Used memory(101902416KB) hits the high watermark(0.800000) of total system memory(12
5816216KB).
E1117 02:10:12.577657 50587 QueryInstance.cpp:110] Used memory(104999584KB) hits the high watermark(0.800000) of total system memory(12
5816216KB).
E1117 02:10:13.254998 50588 QueryInstance.cpp:110] Used memory(105598348KB) hits the high watermark(0.800000) of total system memory(12
5816216KB).
E1117 02:10:13.603833 50580 QueryInstance.cpp:110] Used memory(105906360KB) hits the high watermark(0.800000) of total system memory(12
5816216KB).



如下是我的GRAPH的配置:
[ec2-user@ip-10-11-53-22 etc]$ cat nebula-graphd.conf
########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-graphd.pid
# Whether to enable optimizer
--enable_optimizer=true
# The default charset when a space is created
--default_charset=utf8
# The defaule collate when a space is created
--default_collate=utf8_bin
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=graphd-stdout.log
--stderr_log_file=graphd-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, andFATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2

########## query ##########
# Whether to treat partial success as an error.
# This flag is only used for Read-only access, and Modify access always treats partial success as an error.
--accept_partial_success=false

########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=*:9559,*:9559,*:9559
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=*
# Network device to listen on
--listen_netdev=any
# Port to listen on
--port=9669
# To turn on SO_REUSEPORT or not
--reuse_port=false
# Backlog of the listen socket, adjust this together with net.core.somaxconn
--listen_backlog=1024
# Seconds before the idle connections are closed, 0 for never closed
--client_idle_timeout_secs=0
# Seconds before the idle sessions are expired, 0 for no expiration
--session_idle_timeout_secs=60000
# The number of threads to accept incoming connections
--num_accept_threads=1
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=0
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=0
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19669
# HTTP2 service port
--ws_h2_port=19670
# storage client timeout
--storage_client_timeout_ms=60000
# Port to listen on Meta with HTTP protocol, it corresponds to ws_http_port in metad's configuration file
--ws_meta_http_port=19559

########## authentication ##########
# Enable authorization
--enable_authorize=true
# User login authentication type, password for nebula authentication, ldap for ldap authentication, cloud for cloud authentication
--auth_type=password

########## memory ##########
# System memory high watermark ratio
--system_memory_high_watermark_ratio=0.8

如下是当时的内存信息:

参考这个 nebualGraph2.5.0 查询提示内存Used memory(178593436KB) hits the high watermark(0.800000) of total system memory(196268364KB). - #32,来自 Caozehui

1 个赞

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。