Nebula1.1.0 graphd 宕机了

Nebula graph 1.1.0

磁盘: SSD, 1T, CPU 16C 使用率1%, 内存32G 使用率60%

Graphd服务直接宕机了, 关键的报错信息如下:

看起来是,是对graphd端口发起链接后,直接导致graphd死了。

各位专家有没有遇到? 有什么解决方案不?

[2021/9/8 9:07:34] E0908 08:16:14.326704 1797767 HTTP2Codec.cpp:1560] Connection error with ingress=
[2021/9/8 9:07:34] 00000000 03 00 00 2a 25 e0 00 00 00 00 00 43 6f 6f 6b 69 |…*%…Cooki|
[2021/9/8 9:07:34] 00000010 65 3a 20 6d 73 74 73 68 61 73 68 3d 6e 6d 61 70 |e: mstshash=nmap|
[2021/9/8 9:07:34] 00000020 0d 0a 01 00 08 00 03 00 00 00 |… |
[2021/9/8 9:07:34] E0908 08:16:29.338547 1797985 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
[2021/9/8 9:07:34] E0908 08:16:29.338769 1797985 GeneratedCodeHelper.cpp:73] invalid message from client in function process
[2021/9/8 9:07:34] E0908 08:16:34.342234 1797954 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
[2021/9/8 9:07:34] E0908 08:16:34.342392 1797954 GeneratedCodeHelper.cpp:73] invalid message from client in function process
[2021/9/8 9:07:34] E0908 08:16:39.345716 1797955 HeaderServerChannel.cpp:114] Received invalid request from client: N6apache6thrift9transport19TTransportExceptionE: Header transport frame is too large: 1330664521 (hex 0x4f505449, ascii ‘OPTI’) (transport N6apache6thrift5async12TAsyncSocketE, address 10.1.214.235, port 42938)
[2021/9/8 9:07:34] E0908 08:16:39.347924 1797953 HeaderServerChannel.cpp:114] Received invalid request from client: N6apache6thrift9transport19TTransportExceptionE: Header transport frame is too large: 2147483688 (hex 0x80000028) (transport N6apache6thrift5async12TAsyncSocketE, address 10.1.214.235, port 43008)
[2021/9/8 9:07:34] E0908 08:16:39.373239 1797770 HTTP2Codec.cpp:1560] Connection error with ingress=
[2021/9/8 9:07:34] 00000000 80 00 00 28 72 fe 1d 13 00 00 00 00 00 00 00 02 |…(r…|
[2021/9/8 9:07:34] 00000010 00 01 86 a0 00 01 97 7c 00 00 00 00 00 00 00 00 |…|…|
[2021/9/8 9:07:34] 00000020 00 00 00 00 00 00 00 00 00 00 00 00 |… |
[2021/9/8 9:07:34] E0908 08:16:39.374680 1797765 HTTP2Codec.cpp:1560] Connection error with ingress=
[2021/9/8 9:07:34] 00000000 00 1e 00 06 01 00 00 01 00 00 00 00 00 00 07 76 |…v|
[2021/9/8 9:07:34] 00000010 65 72 73 69 6f 6e 04 62 69 6e 64 00 00 10 00 03 |ersion.bind…|

v1.x 的最新的稳定版本是 1.2.1 要不是试试升级下,因为这个版本修复了一些问题。

@steam
升级一则没那么快,二则,这个问题原因是啥? 1.2.1解决了吗?

嗯,你说的在理。那我问些详细的信息给研发同学用来定位问题,你们对 graphd 发起连接应该是在做查询操作对吗,:thinking: 所以可以贴下当时的查询语句以及对应的一个数据量吗?

数据量很小,只有百万。 语句应该没有问题 跑了1-2个月了。

我们也在分析,发现,最后的错误是OptionalEmptyException:

[2021/9/8 9:07:34] E0908 08:22:09.628520 1797769 HTTP2Codec.cpp:1560] Connection error with ingress=
[2021/9/8 9:07:34] 00000000 f9 be b4 d9 76 65 72 73 69 6f 6e 00 00 00 00 00 |…version…|
[2021/9/8 9:07:34] 00000010 55 00 00 00 9c 7c 00 00 01 00 00 00 00 00 00 00 |U…|…|
[2021/9/8 9:07:34] 00000020 4f 91 37 61 00 00 00 00 01 00 00 00 00 00 00 00 |O.7a…|
[2021/9/8 9:07:34] 00000030 00 00 00 00 00 00 00 00 00 00 ff ff 0a 04 a2 37 |…7|
[2021/9/8 9:07:34] 00000040 32 ca 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |2…|
[2021/9/8 9:07:34] 00000050 00 00 00 00 ff ff 7f 00 00 01 97 70 ca 51 6a 36 |…p.Qj6|
[2021/9/8 9:07:34] 00000060 c3 db 32 27 00 12 01 00 00 |…2’… |
[2021/9/8 9:07:34] terminate called after throwing an instance of ‘folly::OptionalEmptyException’
[2021/9/8 9:07:34] what(): Empty Optional cannot be unwrapped
[2021/9/8 9:07:34] *** Aborted at 1631060536 (unix time) try “date -d @1631060536” if you are using GNU date ***
[2021/9/8 9:07:34] PC: @ 0x7ff4a5b9a77b gsignal
[2021/9/8 9:07:34] *** SIGABRT (@0x3e8001b6e80) received by PID 1797760 (TID 0x7ff4a19ff700) from PID 1797760; stack trace: ***
[2021/9/8 9:07:34] @ 0x7ff4a5d2c27f (unknown)
[2021/9/8 9:07:34] @ 0x7ff4a5b9a77b gsignal
[2021/9/8 9:07:34] @ 0x7ff4a5b9baa0 abort
[2021/9/8 9:07:34] @ 0x55e45cc9d740 (unknown)
[2021/9/8 9:07:34] @ 0x55e45dc0f3b5 __cxxabiv1::__terminate()
[2021/9/8 9:07:34] @ 0x55e45dcabe38 __cxa_call_terminate
[2021/9/8 9:07:34] @ 0x55e45dc0ede3 __gxx_personality_v0
[2021/9/8 9:07:34] @ 0x55e45dcb2e87 (unknown)
[2021/9/8 9:07:34] @ 0x55e45dcb3ac5 (unknown)
[2021/9/8 9:07:34] @ 0x55e45cc949eb ZN5folly6detail16throw_exception_INS_22OptionalEmptyExceptionEJEEEvDpOT0
[2021/9/8 9:07:34] @ 0x55e45cc94658 (unknown)
[2021/9/8 9:07:34] @ 0x55e45d3b18af (unknown)
[2021/9/8 9:07:34] @ 0x55e45d421798 proxygen::HTTPServerAcceptor::newHandler()
[2021/9/8 9:07:34] @ 0x55e45d445a2b proxygen::HTTPDownstreamSession::setupOnHeadersComplete()
[2021/9/8 9:07:34] @ 0x55e45d453507 proxygen::HTTPSession::onHeadersComplete()
[2021/9/8 9:07:34] @ 0x55e45d4b1720 proxygen::HTTPChecks::onHeadersComplete()
[2021/9/8 9:07:34] @ 0x55e45d48d97f proxygen::HTTP1xCodec::onHeadersComplete()
[2021/9/8 9:07:34] @ 0x55e45d48e864 proxygen::HTTP1xCodec::onHeadersCompleteCB()
[2021/9/8 9:07:34] @ 0x55e45d440166 proxygen::http_parser_execute()
[2021/9/8 9:07:34] @ 0x55e45d48e65e proxygen::HTTP1xCodec::onIngress()
[2021/9/8 9:07:34] @ 0x55e45d44a33a proxygen::HTTPSession::processReadData()
[2021/9/8 9:07:34] @ 0x55e45d44a476 proxygen::HTTPSession::readDataAvailable()
[2021/9/8 9:07:34] @ 0x55e45d79cf0e folly::AsyncSocket::handleRead()
[2021/9/8 9:07:34] @ 0x55e45d797ce0 folly::AsyncSocket::ioReady()
[2021/9/8 9:07:34] @ 0x55e45d83598e (unknown)
[2021/9/8 9:07:34] @ 0x55e45d8362ae event_base_loop
[2021/9/8 9:07:34] @ 0x55e45d7b1141 folly::EventBase::loopBody()
[2021/9/8 9:07:34] @ 0x55e45d7b2875 folly::EventBase::loopForever()
[2021/9/8 9:07:34] @ 0x55e45d6f4018 folly::IOThreadPoolExecutor::threadRun()
[2021/9/8 9:07:34] @ 0x55e45d6ffa08 folly::detail::function::FunctionTraits<>::callBig<>()
[2021/9/8 9:07:34] @ 0x55e45dc8aeef (unknown)
[2021/9/8 9:07:34] @ 0x7ff4a5d21f4a (unknown)

[END] 2021/9/8 9:07:58

把你的硬件环境(磁盘类型,磁盘大小,内存大小,CPU 核数)也贴一下,我去让研发来看看

磁盘: SSD, 1T, CPU 16C 使用率1%, 内存32G 使用率60%

检查一下 client 和 server 的版本匹配问题,看日志是协议不兼容收到的数据解析时报错

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。