使用UPDATE YIELD时: -1005:RPC failure in StorageClient: Channel got EOF

  • nebula 版本:3.2.0
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:N
  • 硬件信息:磁盘: 100G CPU、内存信息: 4C8G

问题点:
更新点时, 发现使用YIELD关键字容易导致storage Exited

  • 相关的 ngql
UPDATE  VERTEX ON `t_telephone` '18179095000'
	SET
		  `phone_no` = '18179095000' ,
		  `date_updated` = datetime('2022-10-19T10:06:17.017') ,
		  `updated_by` = 'udf' 
YIELD '18179095000' as `vid`, `phone_no`,`date_updated`,`date_created`,`updated_by`,`created_by`
  • 相关的 meta 日志信息(ERROR中没有新增日志, INFO中可以看到role = STORAGE少了一个节点)
og file created at: 2022/09/07 12:23:04
Running on machine: localhost.localdomain
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20220907 12:23:04.649888 13488 FileUtils.cpp:377] Failed to read the directory "/usr/local/nebula/data/meta/nebula" (2): No such file or directory
E20220908 00:05:53.281504 13683 Serializer.h:43] Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type `nebula::Value`.

  • 相关的storage error日志信息
Log file created at: 2022/10/19 10:35:48
Running on machine: localhost.localdomain
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20221019 10:35:48.687700 29240 Serializer.h:43] Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type `nebula::Value`.
  • 相关的 graph info 日志信息
I20221019 10:50:16.629758 13589 GraphService.cpp:76] Authenticating user root from [::ffff:126.61.78.206]:50612
I20221019 10:50:16.680176 13589 SwitchSpaceExecutor.cpp:37] Graph switched to `xwqy_graph', space id: 3
I20221019 10:50:22.407053 13586 GraphService.cpp:76] Authenticating user root from [::ffff:172.16.41.144]:42522
I20221019 10:50:23.493650 13591 GraphService.cpp:76] Authenticating user root from [::ffff:172.16.41.144]:42522
I20221019 10:50:25.435294 13591 GraphService.cpp:76] Authenticating user root from [::ffff:172.16.41.144]:42522
I20221019 10:50:26.154937 13589 GraphService.cpp:76] Authenticating user root from [::ffff:172.16.41.144]:42522
I20221019 10:50:26.463335 13593 GraphService.cpp:76] Authenticating user root from [::ffff:172.16.41.144]:42522
I20221019 10:51:58.016470 13589 GraphService.cpp:76] Authenticating user root from [::ffff:172.16.41.144]:42522
E20221019 10:51:58.457526 13694 StorageClientBase-inl.h:206] Request to "172.16.41.143":9779 failed: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
W20221019 10:51:58.457641 13589 UpdateExecutor.cpp:66] Update vertices fail: RPC failure in StorageClient: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E20221019 10:51:58.457692 13591 QueryInstance.cpp:137] RPC failure in StorageClient: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
  • 相关的 graph error 日志信息
E20221019 10:51:58.457526 13694 StorageClientBase-inl.h:206] Request to "172.16.41.143":9779 failed: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.
E20221019 10:51:58.457692 13591 QueryInstance.cpp:137] RPC failure in StorageClient: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.

Storage crash的时候的栈还有吗?

看一下/var/log/message 是不是OOM了
如果是 就参考下 Storage 服务配置 - NebulaGraph Database 手册

仅发现 011gql execute fail#011{“error”: “-1005:RPC failure in StorageClient: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes.”}

Oct 27 14:30:01 localhost systemd: Created slice User Slice of root.
Oct 27 14:30:01 localhost systemd: Started Session 8823 of user root.
Oct 27 14:30:01 localhost systemd: Removed slice User Slice of root.
Oct 27 14:32:30 localhost server: [INFO] 2022/10/27 14:32 200 2.976212ms 126.61.78.206 POST /api-nebula/db/exec
Oct 27 14:32:30 localhost server: [INFO] 2022/10/27 14:32 200 2.230745ms 126.61.78.206 POST /api-nebula/db/exec
Oct 27 14:32:30 localhost server: [INFO] 2022/10/27 14:32 200 2.964153ms 126.61.78.206 POST /api-nebula/db/exec
Oct 27 14:32:31 localhost server: [INFO] 2022/10/27 14:32 200 2.778322ms 126.61.78.206 POST /api-nebula/db/exec
Oct 27 14:32:33 localhost server: 1.6668523535460482e+09#011warn#011controller/gateway.go:56#011gql execute fail#011{"error": "-1005:RPC failure in StorageClient: Channel got EOF. Check for server hitting connection limit, server connection idle timeout, and server crashes."}
Oct 27 14:32:33 localhost server: [INFO] 2022/10/27 14:32 200 449.812964ms 126.61.78.206 POST /api-nebula/db/exec
Oct 27 14:33:45 localhost sshd[127038]: Accepted keyboard-interactive/pam for ngraph from 126.61.78.206 port 56731 ssh2
Oct 27 14:33:45 localhost systemd: Started Session 8824 of user ngraph.
Oct 27 14:33:45 localhost systemd-logind: New session 8824 of user ngraph.
Oct 27 14:33:46 localhost systemd-logind: Removed session 8824.
Oct 27 14:34:09 localhost sshd[127223]: Accepted keyboard-interactive/pam for ngraph from 126.61.78.206 port 56777 ssh2
Oct 27 14:34:09 localhost systemd: Started Session 8825 of user ngraph.
Oct 27 14:34:09 localhost systemd-logind: New session 8825 of user ngraph.
Oct 27 14:34:10 localhost systemd-logind: Removed session 8825.
Oct 27 14:34:24 localhost sshd[127354]: Accepted keyboard-interactive/pam for ngraph from 126.61.78.206 port 56800 ssh2
Oct 27 14:34:24 localhost systemd-logind: New session 8826 of user ngraph.
Oct 27 14:34:24 localhost systemd: Started Session 8826 of user ngraph.
Oct 27 14:34:25 localhost systemd-logind: Removed session 8826.

d99ffe16-f6d3-4257-7607a885-58cfe2b4.dmp (4.2 MB)

@cangfengzhs