docker镜像:vesoft/nebula-graphd:v3.2.0
暂时没看出来,我先记个issue吧
有线索提供吗,可以协助排查
没有,不过这个minidump翻译出来了,我贴一下
Operating system: Linux
0.0.0 Linux 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64
CPU: amd64
family 6 model 85 stepping 7
1 CPU
GPU: UNKNOWN
Crash reason: SIGSEGV /SEGV_MAPERR
Crash address: 0x1c
Process uptime: not available
Thread 30 (crashed)
0 nebula-graphd!apache::thrift::Cpp2Connection::stop() + 0x59
rax = 0x0000000000000000 rdx = 0x0000000002e123c0
rcx = 0x0000000002cef3f8 rbx = 0x00007f39c4f60260
rsi = 0x0000000000000000 rdi = 0x00007f39bcbff600
rbp = 0x00007f39bcbf4f80 rsp = 0x00007f39bcbf4f10
r8 = 0x0000000000000000 r9 = 0x00007f39bcbf4638
r10 = 0x00007f39bcbf4630 r11 = 0x0000000000000206
r12 = 0x00007f39bbe073e8 r13 = 0x00007f39bcbf5001
r14 = 0x00007f39bbe09110 r15 = 0x00007f39bcbf4f20
rip = 0x0000000001f2f679
Found by: given as instruction pointer in context
1 nebula-graphd!apache::thrift::Cpp2Connection::channelClosed(folly::exception_wrapper&&) + 0x39
rbx = 0x00007f39c4f10901 rbp = 0x00007f39bcbf5020
rsp = 0x00007f39bcbf4f90 r12 = 0x00007f39bbe09110
r13 = 0x00007f39bcbf5030 r14 = 0x00007f39bcbf4fa0
r15 = 0x00007f39bcbf4fb0 rip = 0x0000000001f31359
Found by: call frame info
集群挂掉的时候在做什么操作?有办法稳定复现问题?
正常的业务请求,所有语句都尝试过了,没有复现
正是因为没法复现,所以才难以定位
看了nebula-go打印的异常日志,只有io timeout这种,也没其他异常日志
暂时没找到稳定复现的方法,这个问题大概会几天出现一次, 当前的数据量很小,访问服务频率很低。
如有必要,我们会配合你做一切有必要的尝试。
刚刚又挂了,graphd的关键日志部分如下:
E20220916 09:18:42.187779 27 Serializer.h:43] Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to serialize a value of type `nebula::Value`.
I20220916 09:18:42.188601 41 ThriftClientManager-inl.h:67] resolve "nebula-aio-0":9779 as "10.42.1.103":9779
I20220916 09:18:42.194344 40 ThriftClientManager-inl.h:67] resolve "nebula-aio-0":9779 as "10.42.1.103":9779
I20220916 09:18:42.695487 28 GraphService.cpp:76] Authenticating user root from 10.42.0.172:42418
I20220916 09:18:43.207806 31 GraphService.cpp:76] Authenticating user root from 10.42.0.172:42424
E20220916 09:19:30.228691 31 IndexScanRule.cpp:440] No valid index found
E20220916 09:19:39.081180 25 IndexScanRule.cpp:440] No valid index found
E20220916 09:19:51.894157 30 IndexScanRule.cpp:440] No valid index found
hello 能否提供一些详细信息,比如:
- 集群的配置,多少 storage、graph、meta 节点
- schema && 数据规模
- 集群都做了什么操纵,查询 or 写入,qps 多少?
1、单机部署,storage、graph、meta都为1节点
2、测试schema && 数据规模采用该文件复现:https://docs.nebula-graph.io/2.0/basketballplayer-2.X.ngql
3、并发执行增删改查,出现graphd崩溃(必现)
打印错误日志如下:
2022-09-20 15:41:19 file=pool/pool.go:175 level=error session.Execute by sid[7] with NGQL:[UPDATE VERTEX ON player 'player105' SET age = age + 2], err: -1005:Storage Error: More than one request trying to add/update/delete one edge/vertex at the same time.
2022-09-20 15:41:19 file=rbac-crash/main.go:48 level=error -1005:Storage Error: More than one request trying to add/update/delete one edge/vertex at the same time.
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[4] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: read tcp 10.55.23.52:63978->10.55.16.144:9669: i/o timeout
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[10] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: read tcp 10.55.23.52:63984->10.55.16.144:9669: i/o timeout
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[0] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: read tcp 10.55.23.52:63974->10.55.16.144:9669: i/o timeout
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[2] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: read tcp 10.55.23.52:63976->10.55.16.144:9669: i/o timeout
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[15] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: read tcp 10.55.23.52:63989->10.55.16.144:9669: i/o timeout
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[11] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: read tcp 10.55.23.52:63985->10.55.16.144:9669: i/o timeout
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[12] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: EOF
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[9] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: EOF
2022-09-20 15:41:19 file=pool/pool.go:175 level=error session.Execute by sid[3] with NGQL:[UPDATE VERTEX ON player 'player105' SET age = age + 2], err: EOF
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[6] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: EOF
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[13] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: EOF
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[5] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: EOF
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[1] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: EOF
2022-09-20 15:41:19 file=rbac-crash/main.go:48 level=error EOF
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[14] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: write tcp 10.55.23.52:63988->10.55.16.144:9669: wsasend: An existing connection was forcibly closed by the remote host.
2022-09-20 15:41:19 file=pool/pool.go:195 level=error session.ExecuteJson by sid[8] with NGQL:[LOOKUP ON player YIELD id(vertex) AS VertexID;], err: read tcp 10.55.23.52:63982->10.55.16.144:9669: wsarecv: An existing connection was forcibly closed by the remote host.
2022-09-20 15:41:20 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:21 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:21 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:22 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:22 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:23 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:23 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:24 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:24 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:25 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:25 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:26 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:26 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:27 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:27 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:28 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:28 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:29 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:30 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:30 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:31 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:31 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:32 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:32 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:33 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:33 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:34 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:34 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:35 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:35 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:36 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:36 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:37 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:37 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:38 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:38 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:39 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:39 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:40 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:40 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:41 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:41 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:42 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:42 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:43 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:43 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:44 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:44 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:45 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:45 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:46 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:46 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:47 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:47 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:48 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
2022-09-20 15:41:48 file=pool/pool.go:54 level=warn open graph conn failed with host: 10.55.16.144, port: 9669, dial tcp 10.55.16.144:9669: i/o timeout
问题已解决
是我们go driver选择的问题,当我使用了nebula-go后问题消失。
thanks
1 个赞