nebula graph v2导入性能下降太快

Jun69 · 2021 年3 月 26 日 07:07

nebula 版本：v2.0 rc1
部署方式（分布式 / 单机 / Docker / DBaaS）：分布式
graphd、storaged部署在同一台上，共部署三台,单台配置：48核+128G内存+ 2T SSD *2
metad部署在另外三台上
导入客户端：48核+300G

导入场景1：
ldbc 200亿基线数据，单副本；
使用go-importer导入；导入速度从最开始 170W/s左右，下降到100W/s左右，可接受
importer部分配置如下

clientSettings:
  retry: 3
  concurrency: 100 # number of graph clients
  channelBufferSize: 3000
  space: ldbc_snb_sf1000

files:
  - path: /sf1000/social_network/dynamic/person_0_0.csv
    failDataPath: ./err/data/person_0_0.csv
    batchSize: 3000
    type: csv

space 描述

CREATE SPACE IF NOT EXISTS ldbc_snb_sf1000(PARTITION_NUM =30 , REPLICA_FACTOR = 1, vid_type = fixed_string(20));

导入场景2

在场景1的数据导入完毕后，用go-importer继续导入业务数据；
数据量200亿（数据较简单，都有1、2个属性），三副本；
开始导入速度60W/s ,导入2/5时下降到20W/s;还有继续下降的趋势
重启后导入速度可恢复至50W+，但随着时间推移又下降至20W；

原因猜测：是不是因为数据量到了这个级别rocksdb的 compact操作会花费较多时间

部分importer 配置

  retry: 3
  concurrency: 100 # number of graph clients
  channelBufferSize: 5000
  space: fake_tit
    afterPeriod: 8s    
logPath: ./err/test.log
files:
  - path: /tit_data/0/aa_make_edge.csv
    failDataPath: ./err/aa_domain_make_edge.csv
    batchSize: 5000

space 描述

CREATE SPACE IF NOT EXISTS fake_tit(PARTITION_NUM =30 , REPLICA_FACTOR = 3, vid_type = fixed_string(32));

iotop 信息

Total DISK READ :     190.90 M/s | Total DISK WRITE :     388.39 M/s
Actual DISK READ:     190.90 M/s | Actual DISK WRITE:     381.90 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
50472 be/4 root       11.74 M/s   27.33 M/s  0.00 % 51.21 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low5]
50469 be/4 root       22.83 M/s   34.40 M/s  0.00 % 42.53 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low2]
50468 be/4 root       26.64 M/s   33.49 M/s  0.00 % 40.18 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low1]
50470 be/4 root       23.19 M/s   43.80 M/s  0.00 % 30.28 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low3]
50445 be/4 root        6.54 M/s    0.00 B/s  0.00 % 28.91 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [IOThreadPool1]
50467 be/4 root       35.29 M/s   46.21 M/s  0.00 % 24.94 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low0]
50448 be/4 root        6.27 M/s    0.00 B/s  0.00 % 24.09 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [IOThreadPool4]
50471 be/4 root        0.00 B/s   49.48 M/s  0.00 % 23.97 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low4]
50473 be/4 root       44.95 M/s   49.98 M/s  0.00 % 21.88 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low6]
50459 be/4 root        6.30 M/s    0.00 B/s  0.00 % 20.51 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [IOThreadPool15]
50474 be/4 root        0.00 B/s   58.08 M/s  0.00 % 19.58 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [rocksdb:low7]
50456 be/4 root     1272.08 K/s    0.00 B/s  0.00 % 11.99 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [IOThreadPool12]
 1956 be/4 root       61.12 K/s    0.00 B/s  0.00 %  9.55 % [xfsaild/sda1]
49875 be/4 root        3.82 K/s  156.62 K/s  0.00 %  6.12 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-2]
50449 be/4 root        5.85 M/s    0.00 B/s  0.00 %  3.74 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [IOThreadPool5]
 4514 be/4 root        0.00 B/s    3.82 K/s  0.00 %  0.00 % rsyslogd -n [in:imjournal]
 4515 be/4 root        0.00 B/s    3.82 K/s  0.00 %  0.00 % rsyslogd -n [rs:main Q:Reg]
49848 be/4 root        0.00 B/s    3.43 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-4]
49850 be/4 root        0.00 B/s    2.33 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-1]
49851 be/4 root        0.00 B/s    3.49 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-2]
49854 be/4 root        0.00 B/s    2.75 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-6]
49855 be/4 root        0.00 B/s    2.85 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-7]
49860 be/4 root        0.00 B/s 2013.18 K/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-1]
49863 be/4 root        0.00 B/s    3.18 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-1]
49864 be/4 root        0.00 B/s    2.98 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-1]
49871 be/4 root        0.00 B/s    2.58 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-1]
49878 be/4 root        0.00 B/s    2.74 M/s  0.00 %  0.00 % nebula-storaged --flagfile /usr/local/nebula/etc/nebula-storaged.conf [executor-pri3-2]
49880 be/4 root        0.00 B/s    2.88 M/s  0.00 %  0.00 % nebula-storaged --flagfile

storaged配置

######### Raft #########
# Raft election timeout
--raft_heartbeat_interval_secs=30
# RPC timeout for raft client (ms)
--raft_rpc_timeout_ms=500
## recycle Raft WAL
--wal_ttl=14400

########## Disk ##########
# Root data path. Split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
# One path per Rocksdb instance.
--data_path=/matrix/data/nebula/storage1,/matrix/data/nebula/storage2

# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
# The default block cache size used in BlockBasedTable.
# The unit is MB.
--rocksdb_block_cache=40960
# The type of storage engine, `rocksdb', `memory', etc.
--engine_type=rocksdb

# Compression algorithm, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
# For the sake of binary compatibility, the default value is snappy.
# Recommend to use:
#   * lz4 to gain more CPU performance, with the same compression ratio with snappy
#   * zstd to occupy less disk space
#   * lz4hc for the read-heavy write-light scenario
--rocksdb_compression=lz4

# Set different compressions for different levels
# For example, if --rocksdb_compression is snappy,
# "no:no:lz4:lz4::zstd" is identical to "no:no:lz4:lz4:snappy:zstd:snappy"
# In order to disable compression for level 0/1, set it to "no:no"
--rocksdb_compression_per_level=

# Whether or not to enable rocksdb's statistics, disabled by default
--enable_rocksdb_statistics=false

# Statslevel used by rocksdb to collection statistics, optional values are
#   * kExceptHistogramOrTimers, disable timer stats, and skip histogram stats
#   * kExceptTimers, Skip timer stats
#   * kExceptDetailedTimers, Collect all stats except time inside mutex lock AND time spent on compression.
#   * kExceptTimeForMutex, Collect all stats except the counters requiring to get time inside the mutex lock.
#   * kAll, Collect all stats
--rocksdb_stats_level=kExceptHistogramOrTimers

# Whether or not to enable rocksdb's prefix bloom filter, disabled by default.
--enable_rocksdb_prefix_filtering=false
# Whether or not to enable the whole key filtering.
--enable_rocksdb_whole_key_filtering=true
# The prefix length for each key to use as the filter value.
# can be 12 bytes(PartitionId + VertexID), or 16 bytes(PartitionId + VertexID + TagID/EdgeType).
--rocksdb_filtering_prefix_length=12

############## rocksdb Options ##############
--rocksdb_disable_wal=true
# rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
--rocksdb_db_options={"max_subcompactions": "10", "max_background_jobs" :"10","stats_dump_period_sec":"200", "write_thread_max_yield_usec":"600"}
# rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_column_family_options={"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456","level0_file_num_compaction_trigger":"10","max_write_buffer_number":"4", "min_write_buffer_number_to_merge":"2", "max_write_buffer_number_to_maintain":"1"}
# rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_block_based_table_options={"block_size":"8192"}
--enable_partitioned_index_filter=True

critical27 · 2021 年3 月 26 日 07:21

关掉compaction 导入完数据之后再开

Jun69 · 2021 年3 月 26 日 08:21

好的。还有两个问题

1、直接关闭compaction，会不会导致write stall
2、如果生产环境需要打开compaction ，当前的写速度是不是已经到达极限了；同时写入速度有没有可能大幅下降

min.wu · 2021 年3 月 26 日 08:34

READ这么多，应该都是compact读。

200亿如果是初始化加载，可以用sst
如果每日更新量都有200亿，也可以用sst。

200亿API一条条写的话，估计读写放大会很严重了。

critical27 · 2021 年3 月 29 日 06:40

关掉的时候 flush如果跟得上就不会write stall
会有一定衰减做完compaction又会恢复具体幅度得在业务场景测试下

Jun69 · 2021 年3 月 29 日 09:07

你说的compaction之后又会恢复开始的写入速度，那我问题中的描述下降了近50%以上，是不是因为compaction没有完成。有没有办法直接获取当前的compaction 情况。
因为感觉按照这个下降速度，业务量扩大一倍，就基本没法用了。除非横向加机器

min.wu · 2021 年3 月 29 日 09:18

看rocsdb 日志文件名就叫 LOG

Jun69 · 2021 年3 月 29 日 09:19

多谢

critical27 · 2021 年3 月 29 日 09:33

如果只是导入的时候，纯写不应该有太大衰减，有index吗？要不就是compaction没关掉，或者write stall了

Jun69 · 2021 年3 月 29 日 09:39

compaction没关，我试关了一下，结果直接offline，通过参数也没解决offline 的问题。

critical27 · 2021 年3 月 29 日 09:41

额。。。你这样：

都停掉，storage conf里不要配local_config
启动graph和meta
执行 UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = true, max_bytes_for_level_base = 268435456, max_write_buffer_number = 8, write_buffer_size = 67108864 }
启动storage

之前可能是改配置的时候写的不合法，storage直接挂了

Jun69 · 2021 年3 月 29 日 09:44

你是怀疑我storage配置写错了，才导致storage挂了吗？但是一开始在线，跑着跑着就offline了

critical27 · 2021 年3 月 29 日 09:45

update config如果参数写的不合法 storage拿到这个参数就挂了

Jun69 · 2021 年3 月 29 日 09:47

rocksdb的参数都设得比较大
–rocksdb_column_family_options
{“write_buffer_size”:“67108864”,“max_write_buffer_number”:“4”,“max_bytes_for_level_base”:“268435456”,“level0_file_num_compaction_trigger”:“10”,“max_write_buffer_number”:“4”, “min_write_buffer_number_to_merge”:“2”, “max_write_buffer_number_to_maintain”:“1”}

–rocksdb_db_options
{“max_subcompactions”: “10”, “max_background_jobs” :“10”,“stats_dump_period_sec”:“200”, “write_thread_max_yield_usec”:“600”}

critical27 · 2021 年3 月 29 日 09:51

你就再加个disable_auto_compactions 把compaction关掉就好了纯写没有衰减

min.wu · 2021 年3 月 29 日 09:55

那等到compaction的时候，就惨大发了。要很久很久。

Jun69 · 2021 年3 月 29 日 09:56

纯写没有衰减，但是线上是开启了自动压缩，测试时搞不了这么多数据，所以想了解一下。按照你前面说的，compaction后就会恢复写入大部分写入速度，但是写入速度一直是下降的

critical27 · 2021 年3 月 29 日 10:37

我说的是full compaction，不是auto compaction。另外可能跟rocksdb的参数有关系，你可以用rocksdb的db_bench测测。

Jun69 · 2021 年3 月 30 日 01:35

好的。