2.0.0 nebulagraph storaged 升级失败

提问参考模版:

  • nebula 版本:v2.0 ga
  • 部署方式(分布式 / 单机 / Docker / DBaaS): 分布式
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘( 推荐使用 SSD) 阿里云SSD 云盘
    • CPU、内存信息 阿里云 8C32GB 3节点
  • 问题的具体描述

从1.2.0 升级到2.0.0的过程中,metad copy完成后没有出错,在创建完2.0.0 storaged的路径后,通过db_upgrader 迁移数据时报错。

原始1.2.0 storaged 磁盘数据为209GB
image

在执行升级命令后,

/opt/nebula-graph-2.0.0/bin/db_upgrader  \
--src_db_path=/alidata1/nebula-graph/data/  \
--dst_db_path=/alidata1/nebula-graph-2.0.0/data/  \
--upgrade_meta_server=172.26.18.168:45500,172.26.25.76:45500,172.26.25.77:45500 \
--upgrade_version=1  \

后日志如下:

[root@iZbp1bjhttaexgxq805033Z etc]# /opt/nebula-graph-2.0.0/bin/db_upgrader  \> --src_db_path=/alidata1/nebula-graph/data/  \> --dst_db_path=/alidata1/nebula-graph-2.0.0/data/  \> --upgrade_meta_server=172.26.18.168:45500,172.26.25.76:45500,172.26.25.77:45500 \> --upgrade_version=1  \> ;===========================PARAMS============================meta server: 172.26.18.168:45500,172.26.25.76:45500,172.26.25.77:45500source data path: /alidata1/nebula-graph/data/destination data path: /alidata1/nebula-graph-2.0.0/data/The size of the batch written: 100upgrade data from version: 1whether to compact all data: truemaximum number of concurrent parts allowed:10maximum number of concurrent spaces allowed: 5===========================PARAMS============================I0519 17:40:05.628006 18228 DbUpgraderTool.cpp:116] Prepare phase beginI0519 17:40:05.629070 18228 MetaClient.cpp:50] Create meta client to "172.26.25.77":45500I0519 17:40:06.642655 18228 MetaClient.cpp:99] Register time task for heartbeat!I0519 17:40:06.642704 18228 DbUpgraderTool.cpp:176] Prepare phase endI0519 17:40:06.642727 18228 DbUpgraderTool.cpp:179] Upgrade phase benginI0519 17:40:06.642836 18232 DbUpgraderTool.cpp:190] Upgrade from path /alidata1/nebula-graph/data/ to path /alidata1/nebula-graph-2.0.0/data/ beginI0519 17:40:06.642962 18232 DbUpgrader.cpp:1085] Upgrade from path /alidata1/nebula-graph/data/ to path /alidata1/nebula-graph-2.0.0/data/ in DbUpgrader run beginI0519 17:40:06.643085 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_subcompactions=10I0519 17:40:06.643096 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_background_jobs=10I0519 17:40:06.643131 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option hard_pending_compaction_bytes_limit=274877906944I0519 17:40:06.643137 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option level0_slowdown_writes_trigger=999999I0519 17:40:06.643141 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option soft_pending_compaction_bytes_limit=137438953472I0519 17:40:06.643146 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option disable_auto_compactions=trueI0519 17:40:06.643149 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option level0_stop_writes_trigger=999999I0519 17:40:06.643153 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option write_buffer_size=134217728I0519 17:40:06.643157 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_write_buffer_number=12I0519 17:40:06.643162 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_bytes_for_level_base=268435456I0519 17:40:43.437387 18232 RocksEngine.cpp:119] open rocksdb on /alidata1/nebula-graph/data//nebula/1/dataI0519 17:40:43.437530 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_subcompactions=10I0519 17:40:43.437539 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_background_jobs=10I0519 17:40:43.437556 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option hard_pending_compaction_bytes_limit=274877906944I0519 17:40:43.437559 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option level0_slowdown_writes_trigger=999999I0519 17:40:43.437563 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option soft_pending_compaction_bytes_limit=137438953472I0519 17:40:43.437567 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option disable_auto_compactions=trueI0519 17:40:43.437572 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option level0_stop_writes_trigger=999999I0519 17:40:43.437575 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option write_buffer_size=134217728I0519 17:40:43.437579 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_write_buffer_number=12I0519 17:40:43.437583 18232 RocksEngineConfig.cpp:254] Emplace rocksdb option max_bytes_for_level_base=268435456I0519 17:40:43.443280 18232 RocksEngine.cpp:119] open rocksdb on /alidata1/nebula-graph-2.0.0/data//nebula/1/dataI0519 17:40:43.443537 18232 DbUpgrader.cpp:81] Src data path: /alidata1/nebula-graph/data/ space id 1 has 30 partsI0519 17:40:43.443562 18232 DbUpgrader.cpp:125] Tag id 3 has 16 fields!I0519 17:40:43.443567 18232 DbUpgrader.cpp:125] Tag id 2 has 16 fields!I0519 17:40:43.443581 18232 DbUpgrader.cpp:145] Tag id 2 has 1 indexesI0519 17:40:43.443595 18232 DbUpgrader.cpp:145] Tag id 3 has 1 indexesI0519 17:40:43.443603 18232 DbUpgrader.cpp:167] Edgetype 11 has 4 fields!I0519 17:40:43.443608 18232 DbUpgrader.cpp:167] Edgetype 6 has 4 fields!I0519 17:40:43.443612 18232 DbUpgrader.cpp:167] Edgetype 4 has 4 fields!I0519 17:40:43.443616 18232 DbUpgrader.cpp:167] Edgetype 5 has 4 fields!I0519 17:40:43.443640 18232 DbUpgrader.cpp:1112] Max concurrenct spaces: 1I0519 17:40:43.443815 18339 DbUpgrader.cpp:1130] Upgrade from path /alidata1/nebula-graph/data/ space id 1 to path /alidata1/nebula-graph-2.0.0/data/ beginI0519 17:40:43.443835 18339 DbUpgrader.cpp:378] Start to handle data in space id 1I0519 17:40:43.443838 18339 DbUpgrader.cpp:383] Max concurrenct parts: 10I0519 18:17:19.674314 18347 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 8 finishedI0519 18:17:19.684294 18347 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 11I0519 18:17:20.814790 18345 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 6 finishedI0519 18:17:20.814853 18345 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 12I0519 18:17:21.596624 18340 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 1 finishedI0519 18:17:21.596678 18340 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 13I0519 18:17:21.787756 18349 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 10 finishedI0519 18:17:21.787822 18349 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 14I0519 18:17:21.856604 18342 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 3 finishedI0519 18:17:21.856667 18342 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 15I0519 18:17:22.870478 18348 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 9 finishedI0519 18:17:22.870541 18348 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 16I0519 18:17:23.340703 18346 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 7 finishedI0519 18:17:23.340764 18346 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 17I0519 18:17:23.507486 18344 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 5 finishedI0519 18:17:23.507551 18344 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 18I0519 18:17:23.672267 18343 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 4 finishedI0519 18:17:23.672322 18343 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 19I0519 18:17:25.285491 18341 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 2 finishedI0519 18:17:25.285555 18341 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 20I0519 18:54:00.664706 18347 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 11 finishedI0519 18:54:00.674669 18347 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 21I0519 18:54:01.541780 18345 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 12 finishedI0519 18:54:01.541837 18345 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 22I0519 18:54:02.421280 18340 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 13 finishedI0519 18:54:02.421351 18340 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 23I0519 18:54:04.422369 18344 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 18 finishedI0519 18:54:04.422427 18344 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 24I0519 18:54:06.524277 18346 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 17 finishedI0519 18:54:06.524348 18346 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 25I0519 18:54:06.806751 18349 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 14 finishedI0519 18:54:06.806816 18349 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 26I0519 18:54:06.866108 18342 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 15 finishedI0519 18:54:06.866164 18342 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 27I0519 18:54:08.052470 18348 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 16 finishedI0519 18:54:08.052536 18348 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 28I0519 18:54:09.693222 18341 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 20 finishedI0519 18:54:09.693279 18341 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 29I0519 18:54:09.890674 18343 DbUpgrader.cpp:360] Handle vertex/edge/index data in space id 1 part id 19 finishedI0519 18:54:09.890744 18343 DbUpgrader.cpp:207] Start to handle vertex/edge/index data in space id 1 part id 30I0519 19:30:53.572106 18339 DbUpgrader.cpp:398] Start to handle system data in space id 1I0519 19:30:53.574751 18339 DbUpgrader.cpp:427] Handle system data in space id 1 successI0519 19:30:53.574774 18339 DbUpgrader.cpp:428] Handle data in space id 1 successI0519 19:30:53.574787 18339 DbUpgrader.cpp:1002] Copy space id 1 wal file beginI0519 19:30:57.418138 18339 DbUpgrader.cpp:1144] Copy space id 1 wal file successI0519 19:30:57.418171 18339 DbUpgrader.cpp:990] Path /alidata1/nebula-graph-2.0.0/data/ space id 1 compaction beginI0519 19:30:57.880268 18235 EventListner.h:18] Rocksdb start compaction column family: default because of ManualCompaction, status: OK, compacted 6521 files into 0, base level is 0, output level is 1

但是等待到晚上20:40几分时,新版本数据磁盘数据写到455GB时,出现KILLED

为避免错误,21:00左右用nohup挂起加载后,在晚间23:45分仍然出现了上述情况,卡compaction,新版本磁盘数据写到622-655GB
image

经查询dmesg -T 命令发现 ,upgrader被kill掉

storageb配置文件如下:

[root@iZbp10ox6w6y6oconmf6mkZ etc]# more nebula-metad.conf
########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-metad.pid

########## logging ##########
# The directory to host logging files, which must already exists
--log_dir=/alidata1/nebula-graph-2.0.0/log
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0

########## networking ##########
# Meta Server Address
--meta_server_addrs=172.26.18.168:45500,172.26.25.76:45500,172.26.25.77:45500
# Local ip
--local_ip=172.26.25.77
# Meta daemon listening port
--port=45500
# HTTP service ip
--ws_ip=172.26.25.77
# HTTP service port
--ws_http_port=11000
# HTTP2 service port
--ws_h2_port=11002

--heartbeat_interval_secs=60

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=/alidata1/nebula-graph-2.0.0/meta

########## other #########
# wether support null type
--null_type=true
[root@iZbp10ox6w6y6oconmf6mkZ etc]# more nebula-storaged.conf
########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-storaged.pid

########## logging ##########
# The directory to host logging files, which must already exists
--log_dir=/alidata1/nebula-graph-2.0.0/log
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0

########## networking ##########
# Meta server address
--meta_server_addrs=172.26.18.168:45500,172.26.25.76:45500,172.26.25.77:45500
# Local ip
--local_ip=172.26.25.77
# Storage daemon listening port
--port=44500
# HTTP service ip
--ws_ip=172.26.25.77
# HTTP service port
--ws_http_port=12000
# HTTP2 service port
--ws_h2_port=12002
# heartbeat with meta service
--heartbeat_interval_secs=60

######### Raft #########
# Raft election timeout
--raft_heartbeat_interval_secs=30
# RPC timeout for raft client (ms)
--raft_rpc_timeout_ms=10000
## recycle Raft WAL
--wal_ttl=3600

########## Disk ##########
# Root data path. Split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
# One path per Rocksdb instance.
--data_path=/alidata1/nebula-graph-2.0.0/data

############## Rocksdb Options ##############
# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096

# The default block cache size used in BlockBasedTable. (MB)
# recommend: 1/3 of all memory
--rocksdb_block_cache=10240

# Compression algorithm, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
# For the sake of binary compatibility, the default value is snappy.
# Recommend to use:
#   * lz4 to gain more CPU performance, with the same compression ratio with snappy
#   * zstd to occupy less disk space
#   * lz4hc for the read-heavy write-light scenario
--rocksdb_compression=snappy

# Set different compressions for different levels
# For example, if --rocksdb_compression is snappy,
# "no:no:lz4:lz4::zstd" is identical to "no:no:lz4:lz4:snappy:zstd:snappy"
# In order to disable compression for level 0/1, set it to "no:no"
--rocksdb_compression_per_level=

# Whether or not to enable rocksdb's statistics, disabled by default
--enable_rocksdb_statistics=false

# Statslevel used by rocksdb to collection statistics, optional values are
#   * kExceptHistogramOrTimers, disable timer stats, and skip histogram stats
#   * kExceptTimers, Skip timer stats
#   * kExceptDetailedTimers, Collect all stats except time inside mutex lock AND time spent on compression.
#   * kExceptTimeForMutex, Collect all stats except the counters requiring to get time inside the mutex lock.
#   * kAll, Collect all stats
--rocksdb_stats_level=kExceptHistogramOrTimers

# Whether or not to enable rocksdb's prefix bloom filter, disabled by default.
--enable_rocksdb_prefix_filtering=false
# Whether or not to enable the whole key filtering.
--enable_rocksdb_whole_key_filtering=true
# The prefix length for each key to use as the filter value.
# can be 12 bytes(PartitionId + VertexID), or 16 bytes(PartitionId + VertexID + TagID/EdgeType).
--rocksdb_filtering_prefix_length=12

############## rocksdb Options ##############
# rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
--rocksdb_db_options={"max_subcompactions":"1","max_background_jobs":"1"}
# rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_column_family_options={"disable_auto_compactions":"false","write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
# rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_block_based_table_options={"block_size":"8192"}

############# edge samplings ##############
# --enable_reservoir_sampling=false

max_concurrent_spaces(默认5)和max_concurrent_parts(默认10)这个调小点吧 和其他同学了解了下场景。建议先把max_concurrent_spaces改成1,max_concurrent_parts改成5试下。

请问这个是去哪里设置呀!

加参数–max_concurrent_spaces=1这种

可是我不是挂在写wal log的时候,而是挂在compaction~~ 这几个参数我通过两次观察日志,都是对写wal log有用,对compaction阶段也有用么?我没有看到日志出来.

还是挂了,参数没用!

浙ICP备20010487号