执行compact 报错:Write into rocksdb failed because of Corruption: Bad table magic number:

  • nebula 版本:2.0.1
  • 部署方式:分布式
  • 安装方式: RPM
  • 是否为线上版本:N
  • 硬件信息
    • 磁盘( 推荐使用 SSD)1t hdd*3
    • CPU、内存信息 16 core, 128G
  • 问题的具体描述
    数据导入过程中出现storage 节点退出的情况;数据导完后,执行compact,报如下错误
E1101 10:24:15.135341 23219 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:15.135474 23219 Part.h:124] [Port: 9780, Space: 48, Part: 19] Put failed in commit
E1101 10:24:15.135771 23219 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:15.135805 23219 Part.h:124] [Port: 9780, Space: 48, Part: 20] Put failed in commit
E1101 10:24:15.323053 23219 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:15.323207 23219 Part.cpp:350] [Port: 9780, Space: 48, Part: 20] Put failed in commit
E1101 10:24:15.323243 23219 RaftPart.cpp:1839] [Port: 9780, Space: 48, Part: 20] Bad snapshot, total rows received 0, total rows sended 78222, total size received 0, total size sended 10485825
E1101 10:24:15.327904 23199 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:15.328008 23199 Part.cpp:350] [Port: 9780, Space: 48, Part: 19] Put failed in commit
E1101 10:24:15.328109 23199 RaftPart.cpp:1839] [Port: 9780, Space: 48, Part: 19] Bad snapshot, total rows received 0, total rows sended 78355, total size received 0, total size sended 10485761
E1101 10:24:21.315282 23426 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:21.315351 23426 Part.h:124] [Port: 9780, Space: 48, Part: 18] Put failed in commit
E1101 10:24:25.011631 23201 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:25.011693 23201 Part.h:124] [Port: 9780, Space: 48, Part: 28] Put failed in commit
E1101 10:24:25.012166 23201 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:25.012217 23201 Part.h:124] [Port: 9780, Space: 48, Part: 2] Put failed in commit
E1101 10:24:25.200044 23199 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:25.200150 23199 Part.cpp:350] [Port: 9780, Space: 48, Part: 2] Put failed in commit
E1101 10:24:25.200181 23199 RaftPart.cpp:1839] [Port: 9780, Space: 48, Part: 2] Bad snapshot, total rows received 0, total rows sended 78492, total size received 0, total size sended 10485860
E1101 10:24:25.201769 23201 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:25.201833 23201 Part.cpp:350] [Port: 9780, Space: 48, Part: 28] Put failed in commit
E1101 10:24:25.201872 23201 RaftPart.cpp:1839] [Port: 9780, Space: 48, Part: 28] Bad snapshot, total rows received 0, total rows sended 78368, total size received 0, total size sended 10485840
E1101 10:24:31.329711 23423 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:31.329778 23423 Part.h:124] [Port: 9780, Space: 48, Part: 30] Put failed in commit
E1101 10:24:31.331039 23426 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:31.331108 23426 Part.h:124] [Port: 9780, Space: 48, Part: 23] Put failed in commit
E1101 10:24:35.140098 23207 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:35.140223 23207 Part.h:124] [Port: 9780, Space: 48, Part: 18] Put failed in commit
E1101 10:24:35.299697 23207 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:35.299787 23207 Part.cpp:350] [Port: 9780, Space: 48, Part: 18] Put failed in commit
E1101 10:24:35.299831 23207 RaftPart.cpp:1839] [Port: 9780, Space: 48, Part: 18] Bad snapshot, total rows received 0, total rows sended 78336, total size received 0, total size sended 10485867
E1101 10:24:45.017657 23194 RocksEngine.cpp:145] Write into rocksdb failed because of Corruption: Bad table magic number: expected 9863518390377041911, found 0 in data/storage/nebula/48/data/024213.sst
E1101 10:24:45.017727 23194 Part.h:124] [Port: 9780, Space: 48, Part: 30] Put failed in commit

请问一下可以修复这种文件异常的情况吗?

用什么导入的?exchange 生成 sst 文件,再导入的么?

Python client语句拼接导入。


集群本来是2.0.1版本,后来替换bin目录,升级到2.5.1版本,
然后导入的时候出现leader change,
接着我又替换回2.0.1,然后在2.0.1导入的时候出现storage 宕机。

在这个过程中,我执行了 dd if=/dev/zero of=/dev/vdb bs=2M count=1000 oflag=direct 

导致磁盘分区vdb损坏,如下图,nebula数据都挂载在/data目录下。

image

出现这个报错我觉得和我执行的dd命令有关。

这个 magic number 是 rocksdb 用来校验 sst 文件的,报这个错一般是 sst 文件坏了。
你看一下日志里的那个文件在磁盘里存不存在,存在的话,是不是大小已经是 0 了。
data/storage/nebula/48/data/024213.sst

1 个赞

现在文件已经被删除了,我上周五看的时候文件存在且大小不为0。我现在把space 删了重建了。按理来说正常情况下单纯导入不会出现这种文件损坏的情况。

如果后续问题又出现了,记得来更新下帖子

ok

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。