Star

nebula java client 执行插入语句的thrift超时

为了更快地定位、解决问题,麻烦参考下面模版提问 ^ ^

提问参考模版:

  • nebula 版本:1.1.0

  • 部署方式 :分布式

  • 硬件信息

    • 磁盘 : 阿里云高效云盘
    • CPU、内存信息:阿里云ECS 8C/32GB
  • 出问题的 Space 的创建方式:
    java client 执行 intert 点语句失败
    insert vertex Person ( certi_no , association_id , association_person_cnt , association_size , rule1 , rule2 , rule3 , rule4 , rule5 , rule6 , rule7 , rule8 , rule9 , rule10 , is_deleted , pt ) VALUES hash( “xxxxxx”):frowning: “xxxxxx”, “xxxxxxxxxx”, 5 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , “N”, “20201104000000”)

    再同步数据之前有新增 索引并重建索引
    REBUILD TAG INDEX idx_person_certi_no OFFLINE

  • 问题的具体描述

    java 客户端抛出异常
    2021-01-05 17:31:37.460 [0-1-23-writer] ERROR GraphClientImpl - Thrift rpc call failed: java.net.SocketTimeoutException: Read timed out
    2021-01-05 17:31:37.460 [0-1-23-writer] WARN NebulaGraphWriterSlaveProxy - insert vertex tag sql execute failed with code [-3] sql [insert vertex Person ( certi_no , association_id , association_person_cnt , association_size , rule1 , rule2 , rule3 , rule4 , rule5 , rule6 , rule7 , rule8 , rule9 , rule10 , is_deleted , pt ) VALUES hash( “xxxxxx”):frowning: “xxxxxx”, “xxxxxxxxxxxxxxxxxx”, 5 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , “N”, “20201104000000”) ]

你看下索引rebuild状态是否完成。
还有你看下你客户端设置的超时时间是多少。

用命令执行查询 SHOW TAG INDEX STATUS; 发现正在running, 可能是这个问题导致。
还有,麻烦请教下,java客户端如何设置超时时间?

我再次尝试了下,索引完全建好了之后,插入还是会挂,错误打印的sql在web端控制台是可以插入的。

graphd 后台error 日志

E0105 18:00:20.949940 30534 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:20.948489 30537 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:20.954658 30538 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:20.947525 30535 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:20.953974 30532 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:20.947494 30531 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:20.947494 30533 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:21.429841 30534 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:21.429960 30537 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:21.430045 30538 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:21.430094 30535 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:21.430140 30532 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:21.430186 30531 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:21.430253 30533 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.728170 30536 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.728392 30536 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.794762 30537 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.797005 30537 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.796710 30534 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.797904 30532 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.798274 30534 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.798426 30532 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.830786 30534 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.832015 30535 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.833041 30534 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.833091 30535 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.846554 30535 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.850445 30535 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.993129 30537 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.995376 30535 InsertVertexExecutor.cpp:249] Insert vertices failed, error -16, part 3
E0105 18:00:51.998540 30537 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0
E0105 18:00:51.998590 30535 ExecutionPlan.cpp:80] Execute failed: Insert vertex not complete, completeness: 0

你用客户端写入的并发数是多少呢?你可以调整下并发数还有batch数量

不好意思,刚才忙个flinksql问题,50个并发,一次拼一行sql去执行,后台的storaged配置内容如下:

######### Raft #########
# Raft election timeout
--raft_heartbeat_interval_secs=30
# RPC timeout for raft client (ms)
--raft_rpc_timeout_ms=10000
## recycle Raft WAL
--wal_ttl=3600

########## Disk ##########
# Root data path. Split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
# One path per Rocksdb instance.
--data_path=/alidata4/nebula-graph/storage,/alidata3/nebula-graph/storage

############## Rocksdb Options ##############
# The default reserved bytes for one batch operation
--rocksdb_batch_size=16348

# The default block cache size used in BlockBasedTable. (MB)
# recommend: 1/3 of all memory
--rocksdb_block_cache=2048

还有一点要说明,目前只有一个点tag不能继续插入,其他的点tag是可以插入的。

datax尝试把并发调小了之后,隔了大概2个小时之后是可以了的,之前是一直不行。

还有一点要说明,目前只有一个点tag不能继续插入,其他的点tag是可以插入的

看日志是part 3压力过大,然后你那边只有一个tag会出现,那有可能就是你插入的点或者边集中落到part 3,所以,所以导致part 3比较忙,处理不过来。首先确认下是不是有密集点,或者点id 扎堆到part 3。

1赞

我使用的是hash(vertexid)的方式插入数据,使用的insert语句。
如何避免密集点啊?或者集群数据的ha balance?

如何避免密集点啊

这避免不了的呀,这是你们业务决定的,假如只是一开始数据导入,你可以决定插入同样起点的边的速度,或者让它分批次插入。

或者集群数据的ha balance?

你show parts,我看下part的分布。还有show hosts, 把这两个命令结果都截图下吧。

show hosts

show parts


你有个节点掉线了,是以后都不用了吗?另外两个节点leader分布也不均匀,你可以balance完成后再试下,也有可能是因为你那个节点压力大,写磁盘忙。

我们是个测试小集群做验证,想导一波生产级别的量试试参数配置,我这边拉起来了。
之前这个storaged还是好的,我是在一个9kw级别的点上做了索引并rebuild,然后挂了。
请问这样rebuild索引再tag status上还是running的,如何才能停止之前的rebuild过程.

如何才能停止之前的rebuild过程.

很抱歉,1.0 没有停止的功能

我是在一个9kw级别的点上做了索引并rebuild,然后挂了

看是不是oom了,或者看有没有日志,贴下日志信息。

那我可不可以理解为,这个索引将会一直rebuilding了?

不会的,是不是你在rebuild的时候没让状态完成,直接又做持续插入呢?

看你前面的状态,有服务掉线过,可能就是因为服务挂过,然后你提交的任务还没结束,所以就是这个状态。

浙ICP备20010487号