zzz
2021 年1 月 23 日 11:10
1
提问参考模版:
nebula 版本:nebula 2.0rc1
部署方式(分布式 / 单机 / Docker / DBaaS):docker swarm
硬件信息
hdd
docker swarm重启后一直起不来,每隔几分钟就会重启,
storage 日志如下,Do custom minor compaction!后进程就会stopping,不知道为啥。。。
storage 日志
I0123 11:03:01.069562 69 EventListner.h:18] Rocksdb start compaction column family: default because of LevelMaxLevelSize, status: OK, compacted 1 files into 0, base level is 1, output level is 2
I0123 11:03:01.102402 69 EventListner.h:30] Rocksdb compaction completed column family: default because of LevelMaxLevelSize, status: OK, compacted 1 files into 1, base level is 1, output level is 2
I0123 11:03:01.111342 69 EventListner.h:18] Rocksdb start compaction column family: default because of LevelL0FilesNum, status: OK, compacted 4143 files into 0, base level is 0, output level is 1
I0123 11:03:01.118247 69 CompactionFilter.h:62] Do custom minor compaction!
I0123 11:04:23.432250 1 StorageDaemon.cpp:142] Signal 15(Terminated) received, stopping this server
zzz
2021 年1 月 26 日 06:24
3
这是storage的配置
storaged2:
image: vesoft/nebula-storaged:v2.0.0-rc1
env_file:
- ./nebula.env
command:
- --meta_server_addrs=10.0.0.3:9559,10.0.0.3:9559,10.0.0.3:9559
- --local_ip=10.0.0.3
- --ws_ip=10.0.0.3
- --port=9779
- --data_path=/data1/storaged,/data2/storaged,/data3/storaged,/data4/storaged,/data5/storaged,/data6/storaged,/data7/storaged,/data8/storaged,/data9/storaged,/data10/storaged,/data11/storaged
- --log_dir=/logs
- --v=0
- --minloglevel=0
- --raft_heartbeat_interval_secs=60
- --raft_rpc_timeout_ms=30000
- --heartbeat_interval_secs=90
- --rocksdb_block_cache=20480
- --enable_rocksdb_prefix_filtering=true
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.hostname == node2
depends_on:
- metad0
- metad1
- metad2
healthcheck:
test: ["CMD", "curl", "-f", "http://10.0.0.3:19779/status"]
interval: 30s
timeout: 10s
retries: 3
start_period: 20s
ports:
- target: 19779
published: 19779
protocol: tcp
mode: host
- target: 19780
published: 19780
protocol: tcp
mode: host
- target: 9779
published: 9779
protocol: tcp
mode: host
volumes:
- /data1/nebula/data/storaged:/data1/storaged
- /data2/nebula/data/storaged:/data2/storaged
- /data3/nebula/data/storaged:/data3/storaged
- /data4/nebula/data/storaged:/data4/storaged
- /data5/nebula/data/storaged:/data5/storaged
- /data6/nebula/data/storaged:/data6/storaged
- /data7/nebula/data/storaged:/data7/storaged
- /data8/nebula/data/storaged:/data8/storaged
- /data9/nebula/data/storaged:/data9/storaged
- /data10/nebula/data/storaged:/data10/storaged
- /data11/nebula/data/storaged:/data11/storaged
- /data/nebula/logs/storaged:/logs
networks:
- nebula-net
rocksdb_block_cache 改小一些试试
zzz
2021 年1 月 26 日 06:47
7
docker info 的数据是这么多250.8GiB
容器内存没有配置其他参数
zzz
2021 年1 月 26 日 06:50
8
我理解这里我只有一个storaged,11个rocksdb实例,rocksdb不是查缓存吗,这个不是多个rocksdb共享的吗
zzz
2021 年1 月 26 日 07:08
11
我改成rocksdb_block_cache=2048,然后storage占用内存一直往上涨到30G+,再就又挂掉重启了,我记得我没重启之前内存占用有60G+
I0126 07:01:05.685215 1 NebulaStore.cpp:81] Scan path "/data4/storaged/1"
I0126 07:01:05.697016 1 RocksEngineConfig.cpp:244] Emplace rocksdb option max_bytes_for_level_base=268435456
I0126 07:01:05.697207 1 RocksEngineConfig.cpp:244] Emplace rocksdb option max_write_buffer_number=4
I0126 07:01:05.697407 1 RocksEngineConfig.cpp:244] Emplace rocksdb option write_buffer_size=67108864
I0126 07:01:05.697607 1 RocksEngineConfig.cpp:244] Emplace rocksdb option disable_auto_compactions=true
I0126 07:01:05.697825 1 RocksEngineConfig.cpp:244] Emplace rocksdb option block_size=8192
I0126 07:01:50.948037 1 StorageDaemon.cpp:142] Signal 15(Terminated) received, stopping this server
I0126 07:01:52.346240 1 RocksEngine.cpp:105] open rocksdb on /data4/storaged/nebula/1/data
I0126 07:01:52.358177 1 NebulaStore.cpp:81] Scan path "/data5/storaged/1"
I0126 07:01:52.358435 1 RocksEngineConfig.cpp:244] Emplace rocksdb option max_bytes_for_level_base=268435456
I0126 07:01:52.358605 1 RocksEngineConfig.cpp:244] Emplace rocksdb option max_write_buffer_number=4
I0126 07:01:52.358762 1 RocksEngineConfig.cpp:244] Emplace rocksdb option write_buffer_size=67108864
很抱歉,更正一下我上文中的错误回答,上文中提到的rocksdb_block_cache是在同一个进程中的多个kvEngine是共享的,单位是MB。
这个配置改小之后,内存打到30GB+后就又挂掉了,看log信息在07:01:50的时候收到了kill的Signal。目前系统的ulimit -n是多少?
min.wu
2021 年1 月 27 日 02:16
14
partition bloomfilter 那个参数改成true
zzz
2021 年1 月 27 日 09:03
16
docker swarm部署要怎么取到core日志
zzz
2021 年1 月 27 日 10:38
19
还是一样,内存增长到10G+,然后就收到终止信号重启
zzz
2021 年1 月 27 日 11:08
21
如图,是没有限制容器内存的,这里没显示是因为我服务停掉了
min.wu
2021 年1 月 27 日 11:24
22
zzz:
内存增长到10G+
每次重启增长到的内存数量都不太一样?昨天留言是30GB+,今天是10GB+?
Signal 15(Terminated) received, stopping this server 这个信号哪里发过来的? 你把完整的INFO日志也贴一下呢?
你要是不带数据测试一下能正常启动吗?你说要换SSD测试,换了吗?
你的物理机呢,启动时候有注意过内存资源占用情况不。 你swarm 要启动多少个storaged呢?3个?每个都要那么多资源?