Nebula sst 文件经常损坏

  • nebula 版本:3.5
  • 部署方式: 分布式
  • 安装方式:RPM
  • 是否上生产环境:N
  • 硬件信息
    • 磁盘 :SSD 12 * 7 TB
    • CPU、内存信息 32 核 * 256 GB
  • 问题的具体描述:

nebula平时使用正常,数据入库是使用Exchange生成SST文件,通过Ingest 入库,数据入库已经验证1个月,入库正常,SST文件未出现异常。但是偶尔入库会出现问题,报错日志如下:

I20231117 15:49:43.451581 134328 RocksEngine.cpp:107] open rocksdb on /datad/nebula/storage/nebula/1/data
I20231117 15:49:43.451990 134328 NebulaStore.cpp:206] Need to open 30 parts of space 1
I20231117 15:49:43.453809 134420 NebulaStore.cpp:229] Load part 1, 28 from disk
...................
I20231117 15:49:43.465997 134418 NebulaStore.cpp:229] Load part 1, 694 from disk
I20231117 15:49:43.466133 134419 NebulaStore.cpp:229] Load part 1, 702 from disk
I20231117 15:49:43.466161 134328 NebulaStore.cpp:263] Load space 1 complete
I20231117 15:49:43.466190 134328 NebulaStore.cpp:93] Scan path "/datad/nebula/storage/nebula/47"
I20231117 15:49:43.466244 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option skip_checking_sst_file_sizes_on_db_open=true
I20231117 15:49:43.466255 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_open_files=3000000
I20231117 15:49:43.466264 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_jobs=10
I20231117 15:49:43.466274 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_subcompactions=10
I20231117 15:49:43.466281 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_flushes=4
I20231117 15:49:43.466290 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_compactions=10
I20231117 15:49:43.466429 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_write_buffer_number=5
I20231117 15:49:43.466439 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option write_buffer_size=67108864
I20231117 15:49:43.466447 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option target_file_size_base=134217000
I20231117 15:49:43.466456 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option disable_auto_compactions=false
I20231117 15:49:43.466645 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option cache_index_and_filter_blocks=true
I20231117 15:49:43.466655 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option block_size=32768
I20231117 15:49:43.506975 134328 RocksEngine.cpp:107] open rocksdb on /datad/nebula/storage/nebula/47/data
I20231117 15:49:43.507236 134328 NebulaStore.cpp:206] Need to open 25 parts of space 47
I20231117 15:49:43.509042 134418 NebulaStore.cpp:229] Load part 47, 68 from disk
..........
I20231117 15:49:43.613642 134417 NebulaStore.cpp:229] Load part 47, 598 from disk
I20231117 15:49:43.613672 134328 NebulaStore.cpp:263] Load space 47 complete
I20231117 15:49:43.613708 134328 NebulaStore.cpp:93] Scan path "/datad/nebula/storage/nebula/48"
I20231117 15:49:43.613754 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option skip_checking_sst_file_sizes_on_db_open=true
I20231117 15:49:43.613772 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_open_files=3000000
I20231117 15:49:43.613781 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_jobs=10
I20231117 15:49:43.613790 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_subcompactions=10
I20231117 15:49:43.613798 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_flushes=4
I20231117 15:49:43.613807 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_compactions=10
I20231117 15:49:43.613943 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_write_buffer_number=5
I20231117 15:49:43.613955 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option write_buffer_size=67108864
I20231117 15:49:43.613962 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option target_file_size_base=134217000
I20231117 15:49:43.613971 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option disable_auto_compactions=false
I20231117 15:49:43.614159 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option cache_index_and_filter_blocks=true
I20231117 15:49:43.614169 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option block_size=32768
I20231117 15:49:43.652187 134328 RocksEngine.cpp:107] open rocksdb on /datad/nebula/storage/nebula/48/data
I20231117 15:49:43.652431 134328 NebulaStore.cpp:206] Need to open 25 parts of space 48
I20231117 15:49:43.654210 134417 NebulaStore.cpp:229] Load part 48, 80 from disk
I20231117 15:49:43.654304 134420 NebulaStore.cpp:229] Load part 48, 68 from disk
........
I20231117 15:49:43.725680 134417 NebulaStore.cpp:229] Load part 48, 526 from disk
I20231117 15:49:43.753686 134419 NebulaStore.cpp:229] Load part 48, 326 from disk
I20231117 15:49:43.755438 134419 NebulaStore.cpp:229] Load part 48, 406 from disk
I20231117 15:49:43.757189 134419 NebulaStore.cpp:229] Load part 48, 486 from disk
I20231117 15:49:43.790289 134419 NebulaStore.cpp:229] Load part 48, 598 from disk
I20231117 15:49:43.790321 134328 NebulaStore.cpp:263] Load space 48 complete
I20231117 15:49:43.790367 134328 NebulaStore.cpp:93] Scan path "/datad/nebula/storage/nebula/52"
I20231117 15:49:43.790422 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option skip_checking_sst_file_sizes_on_db_open=true
I20231117 15:49:43.790434 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_open_files=3000000
I20231117 15:49:43.790443 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_jobs=10
I20231117 15:49:43.790452 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_subcompactions=10
I20231117 15:49:43.790468 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_flushes=4
I20231117 15:49:43.790477 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_background_compactions=10
I20231117 15:49:43.790616 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option max_write_buffer_number=5
I20231117 15:49:43.790627 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option write_buffer_size=67108864
I20231117 15:49:43.790635 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option target_file_size_base=134217000
I20231117 15:49:43.790644 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option disable_auto_compactions=false
I20231117 15:49:43.790838 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option cache_index_and_filter_blocks=true
I20231117 15:49:43.790849 134328 RocksEngineConfig.cpp:371] Emplace rocksdb option block_size=32768
I20231117 15:49:43.799925 134328 RocksEngine.cpp:107] open rocksdb on /datad/nebula/storage/nebula/52/data
I20231117 15:49:43.799995 134328 NebulaStore.cpp:206] Need to open 2 parts of space 52
I20231117 15:49:44.156276 134430 EventListener.h:35] **Rocksdb compaction completed column family: default because of PeriodicCompaction, status: Corruption: block checksum mismatch: stored = 2729495127, computed = 3376144360, type = 1  in /datac/nebula/storage/nebula/1/data/286789.sst offset 82730884 size 18238, compacted 1 files into 0, base level is 6, output level is 6**
**I20231117 15:49:44.156366 134430 EventListener.h:147] BackgroundError: because of Compaction Corruption: block checksum mismatch: stored = 2729495127, computed = 3376144360, type = 1  in /datac/nebula/storage/nebula/1/data/286789.sst offset 82730884 size 18238**
I20231117 15:49:44.880007 134416 MetaClient.cpp:3261] Load leader of "ga-nebulagraph1":9779 in 4 space
I20231117 15:49:44.880445 134416 MetaClient.cpp:3261] Load leader of "ga-nebulagraph2":9779 in 4 space
I20231117 15:49:44.880790 134416 MetaClient.cpp:3261] Load leader of "ga-nebulagraph3":9779 in 4 space
I20231117 15:49:44.881044 134416 MetaClient.cpp:3261] Load leader of "ga-nebulagraph4":9779 in 4 space
I20231117 15:49:44.881304 134416 MetaClient.cpp:3261] Load leader of "ga-nebulagraph5":9779 in 4 space
I20231117 15:49:44.881628 134416 MetaClient.cpp:3261] Load leader of "ga-nebulagraph6":9779 in 4 space
I20231117 15:49:44.881639 134416 MetaClient.cpp:3267] Load leader ok
I20231117 15:49:45.225443 134419 NebulaStore.cpp:229] Load part 52, 35 from disk
I20231117 15:49:45.238395 134418 NebulaStore.cpp:229] Load part 52, 17 from disk
I20231117 15:49:45.238440 134328 NebulaStore.cpp:263] Load space 52 complete
I20231117 15:49:45.238763 134328 NebulaStore.cpp:93] Scan path "/datae/nebula/storage/nebula/1"

====================================================
启动的时候会报错

Rocksdb compaction completed column family: default because of PeriodicCompaction, status: Corruption: block checksum mismatch: stored = 2729495127, computed = 3376144360, type = 1 in /datac/nebula/storage/nebula/1/data/286789.sst offset 82730884 size 18238

经常会发生,而且问题出现基本会在不同的节点出现,而且报错的sst是历史正常使用的sst文件,怀疑是MANIFEST出现问题,但是现在不知道如何排查