图数据库1.2.0挂掉

  • nebula 版本:1.2.0
  • 部署方式:分布式
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息
  • 问题的具体描述
  • E0803 00:32:08.828233 111997 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode?
    报这个错误是因为使用了高版本的客户端调用还是啥?有啥解决方法?还有这个strict mode怎么关掉?图数据库直接挂掉了

你把客户端的版本号补充下

<nebula.version>1.1.0</nebula.version>

现在这边突然挂掉了不知道如何分析

报错是有不对的客户端尝试访问这它的 thrift 接口,有没有人用不匹配的客户端或者其他协议客户端比如 http(curl) 去访问了服务的 thrift 接口?(像这个 对metad进行迁移 - #7,来自 ThomasWang

报错代码: fbthrift/GeneratedCodeHelper.cpp at d5d0fa3f72ee0eb4c7b955e9e04a25052678d740 · facebook/fbthrift · GitHub

按理说这个报错应该不会造成数据库 crash 的,也许这个报错不是根因,服务可以起来么?

重启后服务可以起来,如果发现crash了,如何定位crash原因?

或者需要哪些信息我这里也可以提供下

是哪些进程 crash 了?日志都看过了么?能排除 OOM 么(dmesg |grep nebula)?

还有一个就是在阿里云访问图数据库经常会有这个错误 com.facebook.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed) 查询的代码是这样的

public List<Map<String, Object>> query(String nGql) {
        List<Map<String, Object>> resultsList = new ArrayList<>();
        try {
            ResultSet resultSet = client.get().executeQuery(nGql);
            // 没有查询到结果
            List<ResultSet.Result> results = resultSet.getResults();
            if (results == null || results.isEmpty()) {
                return resultsList;
            }
            for (ResultSet.Result result : results) {
                Map<String, Object> resultMap = new HashMap<>();
                for (String column : resultSet.getColumns()) {
                    Object value = result.get(column).getFieldValue();
                    if (value instanceof byte[]) {
                        resultMap.put(column, result.getString(column));
                    } else {
                        resultMap.put(column, value);
                    }
                }
                resultsList.add(resultMap);
            }
        } catch (Exception e) {
            logger.error("query exception. " + nGql + "  " + e.toString(), e);
        }

现在已经重启了,还能看得到这些信息?

之前crash的信息就在之前的log里,如果有,不过 OOM 的您能看下么? dmesg | grep nebula

我输入了这个命令没有任何反应

dmesg 就是看操作系统的 d message,如果所有 server grep 出来都没有的话说明没有 nebula 字样的 message。

我看了下都没有

@dingding 这里我理解是客户端尝试写入,但是服务端关闭了这个套接字,这个总出现的话是正常的么?

我这个是查询出现的,出现这个就会查询不到数据,抛异常,查询数据的时候需要我每次查询完后关闭连接?

这个是连接断开了,@jay 你做什么查询语句的时候会导致这个错误,给下服务端graphd的日志

1 个赞

error日志里边有这个
E0506 10:50:48.152115 110480 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0506 10:50:53.160640 110566 MetaClient.cpp:524] Send request to [10.199.127.149:45500], exceed retry limit
E0506 10:50:53.160840 110480 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0506 10:50:58.169234 110670 MetaClient.cpp:524] Send request to [10.199.127.148:45500], exceed retry limit
E0506 10:50:58.169441 110480 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0506 10:51:05.063966 110824 MetaClient.cpp:524] Send request to [10.199.127.148:45500], exceed retry limit
E0506 10:51:05.064162 111083 MetaClient.cpp:110] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E0525 14:45:56.299214 111165 ExecutionPlan.cpp:80] Execute failed: Get neighbors failed
E0527 14:11:25.263399 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:35.315104 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:35.315738 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:35.316291 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:35.316748 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:45.457437 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:45.458066 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:45.458472 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:11:45.458756 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:18:44.840940 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:35:18.660614 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:55:27.153434 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:55:27.161470 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:55:27.163540 111167 GraphService.cpp:132] Session not found, id[0]
E0527 14:55:27.165504 111167 GraphService.cpp:132] Session not found, id[0]
E0802 11:39:52.926661 110942 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
E0802 11:39:52.945647 110942 GeneratedCodeHelper.cpp:73] invalid message from client in function process
E0803 00:30:55.413242 111982 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
E0803 00:30:55.417990 111982 GeneratedCodeHelper.cpp:73] invalid message from client in function process
E0803 00:31:05.743783 111983 GeneratedCodeHelper.cpp:73] Method name getLeader not found in function process
E0803 00:31:07.739054 111993 HeaderServerChannel.cpp:114] Received invalid request from client: N6apache6thrift9transport19TTransportExceptionE: Could not detect client transport type: magic 0x02030a84

Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0803 00:30:55.413242 111982 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
E0803 00:30:55.417990 111982 GeneratedCodeHelper.cpp:73] invalid message from client in function process
E0803 00:31:05.743783 111983 GeneratedCodeHelper.cpp:73] Method name getLeader not found in function process
E0803 00:31:07.739054 111993 HeaderServerChannel.cpp:114] Received invalid request from client: N6apache6thrift9transport19TTransportExceptionE: Could not detect client transport type: magic 0x02030a84 (transport N6apache6thrift5async12TAsyncSocketE,)
E0803 00:31:50.734933 111995 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
E0803 00:31:50.735069 111995 GeneratedCodeHelper.cpp:73] invalid message from client in function process
E0803 00:32:08.828233 111997 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
E0803 00:32:08.828330 111997 GeneratedCodeHelper.cpp:73] invalid message from client in function process
E0803 00:32:23.900686 111998 GeneratedCodeHelper.cpp:116] received invalid message from client: No version identifier… old protocol client in strict mode? sz=1195725856
E0803 00:32:23.900830 111998 GeneratedCodeHelper.cpp:73] invalid message from client in function process
E0803 00:32:33.973858 111870 PeekingManager.h:100] Received SSL connection on non SSL port

graphd里边error日志就这些