NebulaGraph基于docker安装多机集群

堕落飞鸟 · 2023 年12 月 10 日 14:40

一、背景

由于之前学习NebulaGraph安装时发现，官方文档中并未介绍使用docker安装多机集群的方式，加上后续分析内存使用时，我这有缺少相关实验环境，所有鼓捣了一下，如何实现基于docker的多机集群部署。希望也能帮助到需要的小伙伴

1.1、服务器信息

主机	CPU	内存	磁盘	操作系统	安装组件
ubuntu01	2	12	120	ubuntu 2204	ALL
ubuntu02	2	12	120	ubuntu 2204	ALL
ubuntu02	2	12	120	ubuntu 2204	ALL

由于本人电脑配置实在拉胯，只能模拟三节点集群。

二、安装

2.1、基础安装

操作系统安装，docker安装跳过。

内核参数设置参考，文档

2.2、安装过程

2.2.1、通过官方文章多机集群部分，分析结构，使用官方docker-compose中的镜像。

首先确定安装路径，所有相关组件更路径使用/opt/NebulaGraph/，以下是各组件的路径。

meta: /opt/NebulaGraph/meta

graph: /opt/NebulaGraph/graph

storage: /opt/NebulaGraph/storage

bin：应用启动脚本

data: 应用数据目录

conf: 应用配置目录

logs: 应用日志目录

这样分配后，所有组件的各种安装信息将会非常清晰。创建相关目录

# 创建目录，三台机器

mkdir -p /opt/NebulaGraph/{meta,graph,storage}/{bin,data,conf,logs}/

2.2.2、下载镜像

通过官方的docker-compose.yaml中，找到我们需要的镜像地址，下载镜像

# 下载镜像,三台机器都下载
docker pull docker.io/vesoft/nebula-metad:nightly
docker pull docker.io/vesoft/nebula-graphd:nightly
docker pull docker.io/vesoft/nebula-storaged:nightly

2.2.3、准备配置文件

所有配置文件的模板，可以通过下载tar包安装方式获取

meta

官方文档地址

编辑配置文件，除了local_ip配置以外，其他配置三节点一样

vim  /opt/NebulaGraph/meta/conf/nebula-metad.conf

# 内容如下
########## basics ##########
# Whether to run as a daemon process
--daemonize=false
# The file to host the process id
--pid_file=pids/nebula-metad.pid

########## logging ##########
# The directory to host logging files
--log_dir=/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=metad-stdout.log
--stderr_log_file=metad-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=3
# wether logging files' name contain time stamp, If Using logrotate to rotate logging files, than should set it to true.
--timestamp_in_logfile_name=true

########## networking ##########
# Comma separated Meta Server addresses
--meta_server_addrs=192.168.17.129:9559,192.168.17.130:9559,192.168.17.131:9559
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=192.168.17.129
# Meta daemon listening port
--port=9559
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19559
# Port to listen on Storage with HTTP protocol, it corresponds to ws_http_port in storage's configuration file
--ws_storage_http_port=19779

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=/data/meta

########## Misc #########
# The default number of parts when a space is created
--default_parts_num=100
# The default replica factor when a space is created
--default_replica_factor=1

--heartbeat_interval_secs=10
--agent_heartbeat_interval_secs=60

graph

官方文档地址

编辑配置文件，除了local_ip配置以外，其他配置三节点一样

vim  /opt/NebulaGraph/graph/conf/nebula-graphd.conf

# 内容如下
########## basics ##########
# Whether to run as a daemon process
--daemonize=false
# The file to host the process id
--pid_file=pids/nebula-graphd.pid
# Whether to enable optimizer
--enable_optimizer=true
# The default charset when a space is created
--default_charset=utf8
# The default collate when a space is created
--default_collate=utf8_bin
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=graphd-stdout.log
--stderr_log_file=graphd-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=3
# wether logging files' name contain time stamp.
--timestamp_in_logfile_name=true
########## query ##########
# Whether to treat partial success as an error.
# This flag is only used for Read-only access, and Modify access always treats partial success as an error.
--accept_partial_success=false
# Maximum sentence length, unit byte
--max_allowed_query_size=4194304

########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=192.168.17.129:9559,192.168.17.130:9559,192.168.17.131:9559
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=192.168.17.129
# Network device to listen on
--listen_netdev=any
# Port to listen on
--port=9669
# To turn on SO_REUSEPORT or not
--reuse_port=false
# Backlog of the listen socket, adjust this together with net.core.somaxconn
--listen_backlog=1024
# The number of seconds Nebula service waits before closing the idle connections
--client_idle_timeout_secs=28800
# The number of seconds before idle sessions expire
# The range should be in [1, 604800]
--session_idle_timeout_secs=28800
# The number of threads to accept incoming connections
--num_accept_threads=1
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=0
# Max active connections for all networking threads. 0 means no limit.
# Max connections for each networking thread = num_max_connections / num_netio_threads
--num_max_connections=0
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=0
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19669
# storage client timeout
--storage_client_timeout_ms=60000
# slow query threshold in us
--slow_query_threshold_us=200000
# Port to listen on Meta with HTTP protocol, it corresponds to ws_http_port in metad's configuration file
--ws_meta_http_port=19559

########## authentication ##########
# Enable authorization
--enable_authorize=false
# User login authentication type, password for nebula authentication, ldap for ldap authentication, cloud for cloud authentication
--auth_type=password

########## memory ##########
# System memory high watermark ratio, cancel the memory checking when the ratio greater than 1.0
--system_memory_high_watermark_ratio=0.8

########## metrics ##########
--enable_space_level_metrics=false

########## experimental feature ##########
# if use experimental features
--enable_experimental_feature=false

# if use balance data feature, only work if enable_experimental_feature is true
--enable_data_balance=true

# enable udf, written in c++ only for now
--enable_udf=true

# set the directory where the .so files of udf are stored, when enable_udf is true
--udf_path=/home/nebula/dev/nebula/udf/

########## session ##########
# Maximum number of sessions that can be created per IP and per user
--max_sessions_per_ip_per_user=300

########## memory tracker ##########
# trackable memory ratio (trackable_memory / (total_memory - untracked_reserved_memory) )
--memory_tracker_limit_ratio=0.8
# untracked reserved memory in Mib
--memory_tracker_untracked_reserved_memory_mb=50

# enable log memory tracker stats periodically
--memory_tracker_detail_log=false
# log memory tacker stats interval in milliseconds
--memory_tracker_detail_log_interval_ms=60000

# enable memory background purge (if jemalloc is used)
--memory_purge_enabled=true
# memory background purge interval in seconds
--memory_purge_interval_seconds=10

########## performance optimization ##########
# The max job size in multi job mode
--max_job_size=1
# The min batch size for handling dataset in multi job mode, only enabled when max_job_size is greater than 1
--min_batch_size=8192
# if true, return directly without go through RPC
--optimize_appendvertices=false
# number of paths constructed by each thread
--path_batch_size=10000

storage

官方文档地址

编辑配置文件，除了local_ip配置以外，其他配置三节点一样

vim  /opt/NebulaGraph/storage/conf/nebula-storaged.conf

# 内容如下
########## basics ##########
# Whether to run as a daemon process
--daemonize=false
# The file to host the process id
--pid_file=pids/nebula-storaged.pid
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=0
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=storaged-stdout.log
--stderr_log_file=storaged-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=3
# Wether logging files' name contain time stamp.
--timestamp_in_logfile_name=true

########## networking ##########
# Comma separated Meta server addresses
--meta_server_addrs=192.168.17.129:9559,192.168.17.130:9559,192.168.17.131:9559
# Local IP used to identify the nebula-storaged process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=192.168.17.129
# Storage daemon listening port
--port=9779
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19779
# heartbeat with meta service
--heartbeat_interval_secs=10

######### Raft #########
# Raft election timeout
--raft_heartbeat_interval_secs=30
# RPC timeout for raft client (ms)
--raft_rpc_timeout_ms=500
## recycle Raft WAL
--wal_ttl=14400

########## Disk ##########
# Root data path. Split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
# One path per Rocksdb instance.
--data_path=/data/storage

# Minimum reserved bytes of each data path
--minimum_reserved_bytes=268435456

# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
# The default block cache size used in BlockBasedTable.
# The unit is MB.
--rocksdb_block_cache=4
# The type of storage engine, `rocksdb', `memory', etc.
--engine_type=rocksdb

# Compression algorithm, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
# For the sake of binary compatibility, the default value is snappy.
# Recommend to use:
#   * lz4 to gain more CPU performance, with the same compression ratio with snappy
#   * zstd to occupy less disk space
#   * lz4hc for the read-heavy write-light scenario
--rocksdb_compression=lz4

# Set different compressions for different levels
# For example, if --rocksdb_compression is snappy,
# "no:no:lz4:lz4::zstd" is identical to "no:no:lz4:lz4:snappy:zstd:snappy"
# In order to disable compression for level 0/1, set it to "no:no"
--rocksdb_compression_per_level=

# Whether or not to enable rocksdb's statistics, disabled by default
--enable_rocksdb_statistics=false

# Statslevel used by rocksdb to collection statistics, optional values are
#   * kExceptHistogramOrTimers, disable timer stats, and skip histogram stats
#   * kExceptTimers, Skip timer stats
#   * kExceptDetailedTimers, Collect all stats except time inside mutex lock AND time spent on compression.
#   * kExceptTimeForMutex, Collect all stats except the counters requiring to get time inside the mutex lock.
#   * kAll, Collect all stats
--rocksdb_stats_level=kExceptHistogramOrTimers

# Whether or not to enable rocksdb's prefix bloom filter, enabled by default.
--enable_rocksdb_prefix_filtering=true
# Whether or not to enable rocksdb's whole key bloom filter, disabled by default.
--enable_rocksdb_whole_key_filtering=false

############## rocksdb Options ##############
# rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
--rocksdb_db_options={}
# rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_column_family_options={"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
# rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_block_based_table_options={"block_size":"8192"}

############### misc ####################
# Whether turn on query in multiple thread
--query_concurrently=true
# Whether remove outdated space data
--auto_remove_invalid_space=true
# Network IO threads number
--num_io_threads=16
# Max active connections for all networking threads. 0 means no limit.
# Max connections for each networking thread = num_max_connections / num_netio_threads
--num_max_connections=0
# Worker threads number to handle request
--num_worker_threads=32
# Maximum subtasks to run admin jobs concurrently
--max_concurrent_subtasks=10
# The rate limit in bytes when leader synchronizes snapshot data
--snapshot_part_rate_limit=10485760
# The amount of data sent in each batch when leader synchronizes snapshot data
--snapshot_batch_size=1048576
# The rate limit in bytes when leader synchronizes rebuilding index
--rebuild_index_part_rate_limit=4194304
# The amount of data sent in each batch when leader synchronizes rebuilding index
--rebuild_index_batch_size=1048576

########## memory tracker ##########
# trackable memory ratio (trackable_memory / (total_memory - untracked_reserved_memory) )
--memory_tracker_limit_ratio=0.8
# untracked reserved memory in Mib
--memory_tracker_untracked_reserved_memory_mb=50

# enable log memory tracker stats periodically
--memory_tracker_detail_log=false
# log memory tacker stats interval in milliseconds
--memory_tracker_detail_log_interval_ms=60000

# enable memory background purge (if jemalloc is used)
--memory_purge_enabled=true
# memory background purge interval in seconds
--memory_purge_interval_seconds=10

2.2.4、启动脚本

meta

编辑启动脚本，给执行权限

vim  /opt/NebulaGraph/meta/bin/start.sh

# 内容 注意不同节点修改hostname、name
#!/bin/bash
docker run -d --network=host --hostname=ubuntu01 \
        --name meta01 --restart=always\
        -e USER=root \
        -e TZ=Asia/Shanghai \
        --cap-add SYS_PTRACE \
        -v  /opt/NebulaGraph/meta/conf/nebula-metad.conf:/usr/local/nebula/etc/nebula-metad.conf \
        -v  /opt/NebulaGraph/meta/data:/data \
        -v  /opt/NebulaGraph/meta/logs:/logs \
        docker.io/vesoft/nebula-metad:nightly


# 修改权限
chmod 755 /opt/NebulaGraph/meta/bin/start.sh

graph

编辑启动脚本，给执行权限

vim  /opt/NebulaGraph/graph/bin/start.sh

# 内容 注意不同节点修改hostname、name
#!/bin/bash
docker run -d --network=host --hostname=ubuntu01 \
        --name graph01 --restart=always\
        -e USER=root \
        -e TZ=Asia/Shanghai \
        --cap-add SYS_PTRACE \
        -v  /opt/NebulaGraph/graph/conf/nebula-graphd.conf:/usr/local/nebula/etc/nebula-graphd.conf \
        -v  /opt/NebulaGraph/graph/logs:/logs \
        docker.io/vesoft/nebula-graphd:nightly

# 修改权限
chmod 755 /opt/NebulaGraph/graph/bin/start.sh

storage

编辑启动脚本，给执行权限

vim /opt/NebulaGraph/storage/bin/start.sh

# 内容 注意不同节点修改hostname、name
#!/bin/bash
docker run -d --network=host --hostname=ubuntu01 \
        --name storage01 --restart=always\
        -e USER=root \
        -e TZ=Asia/Shanghai \
        --cap-add SYS_PTRACE \
        -v  /opt/NebulaGraph/storage/conf/nebula-storaged.conf:/usr/local/nebula/etc/nebula-storaged.conf \
        -v  /opt/NebulaGraph/storage/data:/data \
        -v  /opt/NebulaGraph/storage/logs:/logs \
        docker.io/vesoft/nebula-storaged:nightly


# 修改权限
chmod 755 /opt/NebulaGraph/storage/bin/start.sh

2.2.5、启动应用

注意启动顺序，确保前一个应用启动成功后，在继续启动下一个meta—>storage—>graph

# 启动应用

/opt/NebulaGraph/meta/bin/start.sh
/opt/NebulaGraph/storage/bin/start.sh
/opt/NebulaGraph/graph/bin/start.sh

2.3、验证

2.3.1、使用console连接集群注册storage，console安装过程跳过，自行查看官方文档

# 连接
nebula-console -addr 192.168.17.129 --port 9669 -u root -p password

# 注册storage
ADD HOSTS 192.168.17.129:9779,192.168.17.130:9779,192.168.17.131:9779

2.3.1、使用console连接查看集群信息，console安装过程跳过，自行查看官方文档

# 连接
nebula-console -addr 192.168.17.129 --port 9669 -u root -p password

# 查看meta
show hosts meta

# 查看graph
show hosts graph

# 查看storage
show hosts

三、总结

4.1、先说说不完善的地方

受限于时间（从准备服务器到完成验证，已经搞了十几个小时了）没有在进一步的，实现agent安装以及备份恢复的测试。

整体的安装过程，还是有些繁琐，之后如果有时间，可以研究制作一个安装脚本，用于简化安装。

4.2、关于一些部署的考虑

官方的docker-compose部署，使用cmd来传递了配置参数，而我最终决定采用的挂载配置文件的方式，主要考虑后期维护时，修改参数只需要重启即可，不需要删除重建容器。

进一步的将整个安装路径进行了重新规划，是整体上来看，更加清晰。

部署使用了docker 的host网络没有使用bridge网络，一方面是多机集群部署，不需要考虑端口冲突问题，另一方面就是host的网络性能会比bride高一点点。

本文正在参加 NebulaGraph 技术社区年度征文活动，征文详情：https://discuss.nebula-graph.com.cn/t/topic/13970

如果你觉得本文对你有所启发，记得给我点个，谢谢你的鼓励

wey · 2023 年12 月 11 日 02:32

，这种利用 host 网络去做 compose 多机的尝试很酷。

不过 docker 多机可以用 swarm 哈，GitHub - vesoft-inc/nebula-docker-compose at docker-swarm 要不要再开一个帖子探索一下这个方式哈？swarm 是 docker 原生的多机编排方式