点/边通过 hash 方式获取点无法执行

  • nebula 版本:nightly (具体不详)

  • 部署方式(分布式 / 单机 / Docker / DBaaS):Docker

  • 出问题的 Space 的创建方式:
    image

  • 点/边通过 hash 方式获取点无法执行; 但是该点是存在的;错误信息如下图所示;

UPSERT VERTEX hash("PN:NzA3NzgxODQzMg==") SET phone_number.number="NzA3NzgxODQzMg==";
[ERROR (-8)]: Maybe vertex does not exist, part: 58, error code: -16!

image

可以贴下 logs下面storage的日志吗?

Running on machine: aidc-india-dwdockergdb-ind-dx-ol-03
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1029 11:10:31.192888    47 MetaClient.cpp:524] Send request to [192.168.9.11:45500], exceed retry limit
E1029 11:10:31.193285     1 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1029 11:10:36.202010    48 MetaClient.cpp:524] Send request to [192.168.9.10:45500], exceed retry limit
E1029 11:10:36.202082     1 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1029 11:10:41.207799    49 MetaClient.cpp:524] Send request to [192.168.9.11:45500], exceed retry limit
E1029 11:10:41.207875     1 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1029 11:10:46.217664    50 MetaClient.cpp:524] Send request to [192.168.9.11:45500], exceed retry limit
E1029 11:10:46.217738     1 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1029 11:10:51.222728    51 MetaClient.cpp:524] Send request to [192.168.9.10:45500], exceed retry limit
E1029 11:10:51.222806     1 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1029 11:10:56.227655    52 MetaClient.cpp:524] Send request to [192.168.9.9:45500], exceed retry limit
E1029 11:10:56.227726     1 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1029 11:11:01.232903    53 MetaClient.cpp:524] Send request to [192.168.9.9:45500], exceed retry limit
E1029 11:11:01.232986     1 MetaClient.cpp:58] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
E1102 06:00:36.713052    69 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 30] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.713964    61 Host.cpp:390] [Port: 44501, Space: 8, Part: 30] [Host: 192.168.9.9:44501] Failed to append logs to the host (Err: -5)
E1102 06:00:36.714898    70 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 17] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.721755    69 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 12] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.724710    72 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 22] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.787039    69 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 3] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.813047    72 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 7] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.911742    72 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 26] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.926369    70 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 2] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.945062    71 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 15] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:36.959630    70 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 9] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:37.142277    72 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 11] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:37.267036    69 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 6] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:37.384824    69 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 23] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:37.420615    69 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 27] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:00:37.466337    71 RaftPart.cpp:1075] [Port: 44501, Space: 8, Part: 8] Receive response about askForVote from [192.168.9.9:44501], error code is -5
E1102 06:31:58.489882    18 RaftPart.cpp:365] [Port: 44501, Space: 8, Part: 18] The partition is not a leader
E1102 06:31:58.490509    14 RaftPart.cpp:635] [Port: 44501, Space: 8, Part: 4] Cannot append logs, clean the buffer
E0106 11:27:47.768195    27 RowUpdater.cpp:52] Failed to encode updated data
E0106 11:27:47.774607    27 UpdateVertexProcessor.cpp:408] Failed to encode updated data
E0107 05:18:14.241756    37 RowUpdater.cpp:52] Failed to encode updated data
E0107 05:18:14.244813    37 UpdateVertexProcessor.cpp:408] Failed to encode updated data
E0107 06:21:13.028467    40 RowUpdater.cpp:52] Failed to encode updated data
E0107 06:21:13.031082    40 UpdateVertexProcessor.cpp:408] Failed to encode updated data
E0108 02:22:04.378353    23 RowUpdater.cpp:52] Failed to encode updated data
E0108 02:22:04.381130    23 UpdateVertexProcessor.cpp:408] Failed to encode updated data

$^.phone_number.number?

? Edge 也存在类似的
INSERT EDGE user_contact() VALUES hash(“USER:59432421006770176”)->hash(“PN:ODgzNzY0NDg4Nw==”):()
[ERROR (-8)]: Insert edge `user_contact’ not complete, completeness: 50

而且点无法删掉

你好,麻烦通过这个链接说的方式 https://docs.nebula-graph.com.cn/manual-CN/1.overview/2.quick-start/2.FAQ/#_4
把commit id发下

插入失败了,这个也可以把graph和strorage的日志都发下吗?

还有phone_number这个tag有没有做过修改,比如alter

有做过修改

space 可以动态增加 replica 不呢 ?

不可以

commit id : 385400c

我们目前有三台主机的一个集群,

  • 192.168.9.9
  • 192.168.9.10
  • 192.168.9.11

主机 192.168.9.9 和 192.168.9.11 的storaged日志报出错误 RaftPart.cpp:1075] [Port: 44501, Space: 1, Part: 18] Receive response about askForVote from [192.168.9.10:44501], error code is -6

主机 192.168.9.10 的 storaged报出错误日志为 :
E0108 08:00:01.382675 32 RaftPart.cpp:365] [Port: 44501, Space: 8, Part: 14] The partition is not a leader
E0108 08:00:01.382752 32 RaftPart.cpp:635] [Port: 44501, Space: 8, Part: 14] Cannot append logs, clean the buffer

1 个赞


同时 机器

  • 192.168.9.9
  • 192.168.9.11
    cpu使用和内存占用情况也非常高

现在是没有什么业务,但是storage cpu占用率很高吗?

麻烦给下这个出错的日志,graph和storage的
还有你可以desc tag phone_number,把结果截图吗?
然后你这中间对tag phone_number都进行了哪些修改,麻烦都贴下,我看能不能复现。

我也遇到类似的问题,当前我建立一个新的的space的时候使用 默认的配置,意味着当前100的partition 1 replicate,当数据写入到一定程度的时候出现部分partition无法无法选主,对应的partiton的数据无法读写。
而之前其他的space中的规模相同的数据却未出现过无法读写的情况,平稳运行了几个月的时间,但是根据观察在这个过程中发生过若干次重现选主的切换动作,我想请问一下一般在什么情况下才会触发重新的选主动作?
我有种假设,是否当前的主replicate出现一定的故障或者其他情况触发选主,如果我单replicate的情况下因为没有其他的可选replicate导致其这个整个partition不可用,而多replicate时其实也有类似的情况,但是因为能够重新选主然后可用让原有的leader降级之后自愈。

关注下这个问题 https://discuss.nebula-graph.com.cn/t/topic/1880/47, 可能问题和你的是一样的。