nebula graph 1.2版本,leader为0

  • nebula 版本:1.2
  • 部署方式(分布式 / 单机 / Docker / DBaaS):分布式,docker swarm
  • 硬件信息
    • 磁盘( 推荐使用 SSD): SSD
  • 问题的具体描述
    删除 space,然后compact,之后就出现了leader为0的情况:

storage日志

E0224 03:43:03.902843    72 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 385] Receive response about askForVote from [172.18.153.57:44501], error code is -5
E0224 03:43:03.903028    72 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 385] Receive response about askForVote from [172.18.153.59:44501], error code is -5
E0224 03:43:03.915774    74 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 355] Receive response about askForVote from [172.18.153.57:44501], error code is -5
E0224 03:43:03.915894    74 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 355] Receive response about askForVote from [172.18.153.59:44501], error code is -5
E0224 03:43:03.929023    72 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 370] Receive response about askForVote from [172.18.153.57:44501], error code is -5
E0224 03:43:03.929147    72 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 370] Receive response about askForVote from [172.18.153.59:44501], error code is -5
E0224 03:43:03.938876    71 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 25] Receive response about askForVote from [172.18.153.57:44501], error code is -5
E0224 03:43:03.939165    71 RaftPart.cpp:1075] [Port: 44501, Space: 247, Part: 25] Receive response about askForVote from [172.18.153.59:44501], error code is -5
E0224 03:43:10.361182    28 RaftPart.cpp:909] [Port: 44501, Space: 247, Part: 40] processAppendLogResponses failed!
E0224 03:43:10.443105    36 RaftPart.cpp:909] [Port: 44501, Space: 247, Part: 115] processAppendLogResponses failed!
E0224 05:33:45.163125    23 RaftPart.cpp:365] [Port: 44501, Space: 247, Part: 387] The partition is not a leader
E0224 05:33:45.163419    23 RaftPart.cpp:635] [Port: 44501, Space: 247, Part: 387] Cannot append logs, clean the buffer
E0224 05:34:00.024829    17 RaftPart.cpp:365] [Port: 44501, Space: 247, Part: 370] The partition is not a leader
E0224 05:34:00.025069    17 RaftPart.cpp:635] [Port: 44501, Space: 247, Part: 370] Cannot append logs, clean the buffer
E0224 05:34:17.438490    33 RaftPart.cpp:365] [Port: 44501, Space: 247, Part: 402] The partition is not a leader
E0224 05:34:17.438741    33 RaftPart.cpp:635] [Port: 44501, Space: 247, Part: 402] Cannot append logs, clean the buffer
E0224 05:34:26.223234    20 RaftPart.cpp:365] [Port: 44501, Space: 247, Part: 251] The partition is not a leader
E0224 05:34:26.223489    20 RaftPart.cpp:635] [Port: 44501, Space: 247, Part: 251] Cannot append logs, clean the buffer

compact 操作和 show hosts 操作的间隔大概是多久,是立刻吗?

很长时间了,大概几个小时了

leader为0有什么影响呢?现在发现查询和写入还是正常的。

@critical27 这里有一个 storage 的问题

应该只是显示的问题,实际leader存在,有可能是storage和meta的heartbeat_intercal_secs这个参数两边不一致。

还有个问题,我删除了space,然后进行compact,发现数据会删除一部分,不是删除所有,然后我重启一个storage就会删除所有的。这样正常吗?删除space后进行compact不是应该删除所有数据吗?

最初的磁盘:
/dev/sdc        1.5T   17G  1.4T   2% /export/SSD/data2
/dev/sdd        1.5T   16G  1.4T   2% /export/SSD/data3
/dev/sde        1.5T   15G  1.4T   2% /export/SSD/data4
/dev/sdf        1.5T   14G  1.4T   1% /export/SSD/data5
/dev/sdg        1.5T   14G  1.4T   2% /export/SSD/data6
/dev/sdh        1.5T   15G  1.4T   2% /export/SSD/data7
/dev/sdi        1.5T   15G  1.4T   2% /export/SSD/data8
/dev/sdj        1.5T   15G  1.4T   2% /export/SSD/data9
/dev/sdk        1.5T   14G  1.4T   1% /export/SSD/data10
/dev/sdl        1.5T   15G  1.4T   2% /export/SSD/data11
/dev/sdm        1.5T   14G  1.4T   1% /export/SSD/data12


删除space后compact后的磁盘:
/dev/sdc        1.5T   17G  1.4T   2% /export/SSD/data2
/dev/sdd        1.5T   15G  1.4T   2% /export/SSD/data3
/dev/sde        1.5T   14G  1.4T   1% /export/SSD/data4
/dev/sdf        1.5T   14G  1.4T   1% /export/SSD/data5
/dev/sdg        1.5T   14G  1.4T   1% /export/SSD/data6
/dev/sdh        1.5T   14G  1.4T   1% /export/SSD/data7
/dev/sdi        1.5T   14G  1.4T   1% /export/SSD/data8
/dev/sdj        1.5T   15G  1.4T   2% /export/SSD/data9
/dev/sdk        1.5T   13G  1.4T   1% /export/SSD/data10
/dev/sdl        1.5T   14G  1.4T   1% /export/SSD/data11
/dev/sdm        1.5T   13G  1.4T   1% /export/SSD/data12

重启storage后的磁盘:
/dev/sdc        1.5T   11G  1.4T   1% /export/SSD/data2
/dev/sdd        1.5T  9.9G  1.4T   1% /export/SSD/data3
/dev/sde        1.5T  9.3G  1.4T   1% /export/SSD/data4
/dev/sdf        1.5T  9.0G  1.4T   1% /export/SSD/data5
/dev/sdg        1.5T  9.0G  1.4T   1% /export/SSD/data6
/dev/sdh        1.5T  9.2G  1.4T   1% /export/SSD/data7
/dev/sdi        1.5T  9.0G  1.4T   1% /export/SSD/data8
/dev/sdj        1.5T  9.4G  1.4T   1% /export/SSD/data9
/dev/sdk        1.5T  8.6G  1.4T   1% /export/SSD/data10
/dev/sdl        1.5T  9.2G  1.4T   1% /export/SSD/data11
/dev/sdm        1.5T  8.7G  1.4T   1% /export/SSD/data12

storage是10


meta没有配置,默认应该也是10吧

是不是有过大量导入数据?

数据量是很大,2亿点,38亿边。但是我们的图谱就是这么大的,现在想做全量更新,要完全删除后在重新导入。这个怎么能在不重启的情况下实现呢?

nebula graph 1.2版本,drop space不能完全删除数据?

为了避免重复作答,这个问题的解决方法以上面帖子的回复为准,这边不作回复了。

两边都配成10