Star

read time out和not complete异常

20/08/31 16:29:32 ERROR GraphClientImpl: Thrift rpc call failed: java.net.SocketTimeoutException: Read timed out
20/08/31 16:29:32 ERROR GraphClientImpl: Thrift rpc call failed: java.net.SocketTimeoutException: Read timed out
20/08/31 16:29:32 ERROR GraphClientImpl: Thrift rpc call failed: java.net.SocketTimeoutException: Read timed out
20/08/31 16:29:32 ERROR GraphClientImpl: Thrift rpc call failed: java.net.SocketTimeoutException: Read timed out
20/08/31 16:29:32 ERROR GraphClientImpl: Thrift rpc call failed: java.net.SocketTimeoutException: Read timed out

20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 0
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 33
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 33
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 66
20/08/31 17:54:57 ERROR GraphClientImpl: execute error: Insert vertex not complete, completeness: 50

在spark中使用Java客户端导入,偶尔出现上述异常,使用的是批量导入方式,每次64条记录

timeout和connect_time_out都设置成10s了

服务端的日志能贴出来看下吗?

嗯,需要服务端的那些文件的日志,我找下

看一眼服务是不是都在,防火墙是不是在block端口

这个问题是偶尔出现,不是一直有,我后来重跑了任务,一个错误都没有
服务和网络应该都没问题的

个人认为如果不是每次必现可以考虑下性能问题,可以参考操作系统参数配置来优化性能

这些系统优化参数都已经调优过

能否提供storage、graph的日志?
我们来确认下错误码。
另外,除了timeout和connect_timeout应该还有个SocketTimeout可以设置

storage的错误
E0831 17:16:40.230320 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230334 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230347 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230361 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230376 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230388 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230401 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230415 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230443 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230459 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230502 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230535 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230551 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230564 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230577 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:16:40.230592 172911 MetaClient.cpp:1642] Space 38, Tag Index 77 not found!
E0831 17:23:09.880120 172761 RaftPart.cpp:1664] [Port: 44501, Space: 38, Part: 80] The current role is Leader. The local term is 2. The remote term is not newer: Connection timed out [110]
E0831 17:29:44.123577 172905 RaftPart.cpp:1064] [Port: 44501, Space: 38, Part: 84] Receive response about askForVote from [10.57.36.18:44501], error code is -6
E0831 17:29:44.128188 172805 Host.cpp:389] [Port: 44501, Space: 38, Part: 38] [Host: 10.57.36.19:44501] Failed to append logs to the host (Err: -9): Resource temporarily unavailable [11]
E0831 17:30:25.127233 172905 RaftPart.cpp:1064] [Port: 44501, Space: 38, Part: 43] Receive response about askForVote from [10.57.36.18:44501], error code is -6
E0831 17:30:29.985213 172907 RaftPart.cpp:1064] [Port: 44501, Space: 38, Part: 43] Receive response about askForVote from [10.57.36.18:44501], error code is -6

graph的错误
E0831 17:56:39.902187 172886 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 84
E0831 17:56:39.902349 172886 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 80
E0831 17:56:39.902302 172893 ExecutionPlan.cpp:76] Execute failed: Insert vertex not complete, completeness: 0
E0831 17:56:39.902326 172896 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 32
E0831 17:56:39.902236 172887 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 49
E0831 17:56:39.901319 172891 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 84
E0831 17:56:39.905769 172886 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 78
E0831 17:56:39.907526 172804 StorageClient.inl:123] Request to [10.57.36.19:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: socket closing after error (peer=10.57.36.19:44500, local=10.57.36.17:49962), type = Internal error
E0831 17:56:39.907560 172800 StorageClient.inl:123] Request to [10.57.36.19:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: socket closing after error (peer=10.57.36.19:44500, local=10.57.36.17:51512), type = Internal error
E0831 17:56:39.907563 172793 StorageClient.inl:123] Request to [10.57.36.19:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: socket closing after error (peer=10.57.36.19:44500, local=10.57.36.17:49880), type = Internal error
E0831 17:56:39.908476 172797 StorageClient.inl:123] Request to [10.57.36.19:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: socket closing after error (peer=10.57.36.19:44500, local=10.57.36.17:49930), type = Internal error
E0831 17:56:39.909534 172813 StorageClient.inl:123] Request to [10.57.36.19:44500] failed: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: socket closing after error (peer=10.57.36.19:44500, local=10.57.36.17:51518), type = Internal error
E0831 17:56:39.910308 172896 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 46
E0831 17:56:39.910408 172887 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 27
E0831 17:56:39.910729 172891 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 91
E0831 17:56:39.910795 172886 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 53
E0831 17:56:39.911072 172896 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 57
E0831 17:56:39.911321 172891 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 32
E0831 17:56:39.911383 172886 ExecutionPlan.cpp:76] Execute failed: Insert vertex not complete, completeness: 0
E0831 17:56:39.911427 172896 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 19
E0831 17:56:39.911716 172891 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 46
E0831 17:56:39.911147 172887 InsertVertexExecutor.cpp:249] Insert vertices failed, error -3, part 20

我只截取了三台机器中的一台的一部分,整个日志比较大,如果还需要其他的我再提取下

按err code来看是处理不过来啦
方便问下机器的配置吗?
可以减少并发提高batch

我们三台机器,12C 64G 非SSD,批量插入模式,每批次100条记录。并发50个的时候有偶尔出现。

后来我尝试将超时时间改到10s,批量改成200条记录,200的并发,没出来了。

从我们的测试性能看,导入点在13w/秒,边在9w/秒。属性数量少的2个,多的6个

Java客户端中没有看到SocketTimeout的设置,请问在哪个位置?

hdd的iops非常惨不忍睹。。
系统参数方面请参考这篇修改关于机械硬盘和千兆网络

按照这些参数调整后,重新跑还是偶尔会出现,但相对好些,速度上对于点的插入有20%左右提升,边的插入速度无大变化

HDD不是设计的场景——不论是存储方式的设计,还是各个默认参数——就算写入性能调好了,读取的时候,性能估计也好不了。
你可以翻翻代码里面所有timeout的地方,默认参数都是基于SSD给的。

如果还会报错建议进一步增加"–raft_rpc_timeout_ms"的值
同时也可以注意一些插入数据前的小优化,比如关闭auto compaction,关于compaction的文档请参见Storage 配置

1赞

浙ICP备20010487号