3.1.0服务报错

haoyuezheng · 2022 年5 月 18 日 04:46

nebula 版本：3.1.0
部署方式：分布式docker， meta为域名部署，replica=3，partition_num=90
安装方式：RPM
是否为线上版本：Y
问题的具体描述

启动服务报错
graph, meta, storage均会报以下错误，或是

Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type `nebula::HostAddr`.

出现1所描述的问题，但是show hosts，插入单条语句，或者执行查询集群状态都没问题，用Java客户端(3.0.0)插入数据，集群报Request to “”:9777 time out: TTransportException: Timed out

show hosts:

graph日志：

storage日志：

meta无报错

使用Java客户端(3.0.0)，报以下错，指定max_sessions_per_ip_per_user=1000后，仍然报错，但client的maxConnSize设置的100，并且curl ip:19669/stats后显示最近1小时创建的session也不到100个，为什么还会报错

image1338×49 19.5 KB

steam · 2022 年5 月 18 日 05:34

配置贴一下呢，还有 java 那边的服务配置代码也贴一下

haoyuezheng · 2022 年5 月 18 日 05:38

配置都是默认的，用configmap创建的，1个graph，3个meta，3个storage
java的服务配置代码如下，我们包了一层，直接通过集群名获取graph ip来连接的：

steam · 2022 年5 月 18 日 05:40

报错信息那块的话，有个连接超时了，所以我觉得是配置那块可能存在问题，你把配置信息贴一下，先拆除下可能的原因。

steam · 2022 年5 月 18 日 05:41

setMinConnSize 可以不用设置 80 啊，最小连接数设置成 1 就好了。

haoyuezheng · 2022 年5 月 18 日 06:00

配置信息如下，metadns均能ping通，为对应的meta ip
graph:
启动命令：

/opt/app/nebula/bin/nebula-graphd --flagfile=/opt/app/nebula/etc/nebula-graphd.conf --meta_server_addrs=metadns0:9559,metadns1:9559, metadns2:9559 --local_ip=x.x.x.x --num_netio_threads=2 --num_worker_threads=2

########## basics ##########
# Whether to run as a daemon process
--daemonize=false
# The file to host the process id
--pid_file=/opt/app/nebula/pids/nebula-graphd.pid
# Whether to enable optimizer
--enable_optimizer=true
# Heartbeat interval of communication between meta client and graphd service
--heartbeat_interval_secs=10
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=/opt/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=1
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=3
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=false
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=graphd-stdout.log
--stderr_log_file=graphd-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2

########## query ##########
# Whether to treat partial success as an error.
# This flag is only used for Read-only access, and Modify access always treats partial success as an error.
--accept_partial_success=false
# Maximum sentence length, unit byte
--max_allowed_query_size=4194304
--max_sessions_per_ip_per_user=2000

########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=meta_server_addrs
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=local_ip
# Network device to listen on
--listen_netdev=any
# Port to listen on
--port=9669
# To turn on SO_REUSEPORT or not
--reuse_port=false
# Backlog of the listen socket, adjust this together with net.core.somaxconn
--listen_backlog=1024
# Seconds before the idle connections are closed, 0 for never closed
--client_idle_timeout_secs=172800
# Seconds before the idle sessions are expired, 0 for no expiration
--session_idle_timeout_secs=86400
# The number of threads to accept incoming connections
--num_accept_threads=1
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=0
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=0
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19669
# HTTP2 service port
--ws_h2_port=19670
# storage client timeout
--storage_client_timeout_ms=60000
# Port to listen on Meta with HTTP protocol, it corresponds to ws_http_port in metad's configuration file
--ws_meta_http_port=19559

########## authentication ##########
# Enable authorization
--enable_authorize=true
# User login authentication type, password for nebula authentication, ldap for ldap authentication, cloud for cloud authentication
--auth_type=password

########## memory ##########
# System memory high watermark ratio
# --system_memory_high_watermark_ratio=0.8

########## experimental feature ##########
# if use experimental features
--enable_experimental_feature=true

meta:
启动命令:

/opt/app/nebula/bin/nebula-metad --flagfile=/opt/app/nebula/etc/nebula-metad.conf --meta_server_addrs=metadns0:9559,metadns1:9559,metadns2:9559 --local_ip=metadnsX --num_worker_threads=1

########## basics ##########
# Whether to run as a daemon process
--daemonize=false
# The file to host the process id
--pid_file=/opt/app/nebula/pids/nebula-metad.pid
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=/opt/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=1
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=3
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=false
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=metad-stdout.log
--stderr_log_file=metad-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2

########## networking ##########
# Comma separated Meta Server addresses
--meta_server_addrs=meta_server_addrs
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=local_ip
# Meta daemon listening port
--port=9559
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19559
# HTTP2 service port
--ws_h2_port=19560
# Port to listen on Storage with HTTP protocol, it corresponds to ws_http_port in storage's configuration file
--ws_storage_http_port=19779

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=/data/meta

########## Misc #########
# The default number of parts when a space is created
--default_parts_num=100
# The default replica factor when a space is created
--default_replica_factor=1

--heartbeat_interval_secs=10

## recycle Raft WAL
--wal_ttl=3600

############## rocksdb Options ##############
--rocksdb_wal_sync=true

storage:
启动命令：

/opt/app/nebula/bin/nebula-storaged --flagfile=/opt/app/nebula/etc/nebula-storaged.conf --meta_server_addrs=metadns0:9559,metadns1:9559,metadns2:9559 --local_ip=x.x.x.x --num_worker_threads=1

########## basics ##########
# Whether to run as a daemon process
--daemonize=false
# The file to host the process id
--pid_file=/opt/app/nebula/pids/nebula-storaged.pid
# Whether to use the configuration obtained from the configuration file
--local_config=true

########## logging ##########
# The directory to host logging files
--log_dir=/opt/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=1
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=3
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=false
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=storaged-stdout.log
--stderr_log_file=storaged-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2

########## networking ##########
# Comma separated Meta server addresses
--meta_server_addrs=meta_server_addrs
# Local IP used to identify the nebula-storaged process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=local_ip
# Storage daemon listening port
--port=9779
# HTTP service ip
--ws_ip=0.0.0.0
# HTTP service port
--ws_http_port=19779
# HTTP2 service port
--ws_h2_port=19780
# heartbeat with meta service
--heartbeat_interval_secs=10

######### Raft #########
# Raft election timeout
--raft_heartbeat_interval_secs=30
# RPC timeout for raft client (ms)
--raft_rpc_timeout_ms=500
## recycle Raft WAL
--wal_ttl=3600

########## Disk ##########
# Root data path. split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
# One path per Rocksdb instance.
--data_path=/data/storage

# Minimum reserved bytes of each data path
--minimum_reserved_bytes=268435456

# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
# The default block cache size used in BlockBasedTable. (MB)
# recommend: 1/3 of all memory
--rocksdb_block_cache=512

# Compression algorithm, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
# For the sake of binary compatibility, the default value is snappy.
# Recommend to use:
#   * lz4 to gain more CPU performance, with the same compression ratio with snappy
#   * zstd to occupy less disk space
#   * lz4hc for the read-heavy write-light scenario
--rocksdb_compression=lz4

# Set different compressions for different levels
# For example, if --rocksdb_compression is snappy,
# "no:no:lz4:lz4::zstd" is identical to "no:no:lz4:lz4:snappy:zstd:snappy"
# In order to disable compression for level 0/1, set it to "no:no"
--rocksdb_compression_per_level=

############## rocksdb Options ##############
# rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
--rocksdb_db_options={"max_subcompactions":"4","max_background_jobs":"4","max_open_files":"512","max_background_compactions":"4"}
# rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_column_family_options={"disable_auto_compactions":"false","write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
# rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
--rocksdb_block_based_table_options={"block_size":"8192"}

# Whether or not to enable rocksdb's statistics, disabled by default
--enable_rocksdb_statistics=false

# Statslevel used by rocksdb to collection statistics, optional values are
#   * kExceptHistogramOrTimers, disable timer stats, and skip histogram stats
#   * kExceptTimers, Skip timer stats
#   * kExceptDetailedTimers, Collect all stats except time inside mutex lock AND time spent on compression.
#   * kExceptTimeForMutex, Collect all stats except the counters requiring to get time inside the mutex lock.
#   * kAll, Collect all stats
--rocksdb_stats_level=kExceptHistogramOrTimers

# Whether or not to enable rocksdb's prefix bloom filter, enabled by default.
--enable_rocksdb_prefix_filtering=true
# Whether or not to enable rocksdb's whole key bloom filter, disabled by default.
--enable_rocksdb_whole_key_filtering=true

############## Key-Value separation ##############
# Whether or not to enable BlobDB (RocksDB key-value separation support)
--rocksdb_enable_kv_separation=false
# RocksDB key value separation threshold. Values at or above this threshold will be written to blob files during flush or compaction.
--rocksdb_kv_separation_threshold=0
# Compression algorithm for blobs, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
--rocksdb_blob_compression=lz4
# Whether to garbage collect blobs during compaction
--rocksdb_enable_blob_garbage_collection=true

############### misc ####################
--snapshot_part_rate_limit=10485760
--snapshot_batch_size=1048576
--rebuild_index_part_rate_limit=4194304
--rebuild_index_batch_size=1048576```

wey · 2022 年5 月 19 日 10:31

graphd 上到storaged 用域名可以访达么？

haoyuezheng · 2022 年5 月 19 日 11:37

没太明白什么意思？目前只有meta用了域名，graph和storage都是用的ip，都是通的，在studio上执行查询语句和插入都没问题，用Java客户端批量写入会报错2 TTransportException: Timed out，第一个错误Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type nebula::HostAddr. 集群刚启动就会报，请问下这是什么原因呢

wey · 2022 年5 月 20 日 02:48

@Aiee 这边 java 写入时候 TT timeout，但是报错是 Thrift 序列化 HostAddr 的错误，可能是啥原因呢？这个只是后续错误，根源是超时？还是这个是根本错误？

namespace apache {
namespace thrift {

template <typename Reader, typename Writer>
struct Serializer {
 private:
  template <typename T>
  using is_thrift_class = folly::bool_constant<is_thrift_class_v<T>>;

  template <typename T>
  static void warn_unless(folly::tag_t<T>, const char* which, std::false_type) {
    FB_LOG_ONCE(ERROR)
        << "Thrift serialization is only defined for structs and unions, not"
        << " containers thereof. Attemping to " << which << " a value of type `"
        << folly::pretty_name<T>() << "`.";
  }
  template <typename T>
  static void warn_unless(folly::tag_t<T>, const char*, std::true_type) {}

Aiee · 2022 年5 月 20 日 03:33

服务启动不应该报错这时候客户端都还没有发请求. nebula::value的序列化是fbthrift做的, 按理说启动时不会有问题

haoyuezheng · 2022 年5 月 20 日 07:07

对的，服务启动客户端都还没有发请求，只是show hosts也会报Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type nebula::HostAddr.
或
Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type nebula::Value.

haoyuezheng · 2022 年5 月 20 日 07:08

虽然报这个错但是在studio上插入和查询都可以的，就是想知道为什么

Aiee · 2022 年5 月 20 日 07:15

你这里 show hosts 是通过什么执行的? 是 java 客户端吗? 执行这条语句这个操作本身就是一个从客户端发的请求了, 如果是客户端的话检查下版本是否和服务端对应.

配置改完之后重启过了吗, 要重启后才生效

haoyuezheng · 2022 年5 月 20 日 07:27

也是在studio上执行的

user80 · 2024 年1 月 8 日 01:42

老哥，这个解决了吗，怎么弄的

steam · 2024 年1 月 8 日 03:44

你也是报类似的错误么？可以开个帖子看看啊。

user80 · 2024 年1 月 8 日 07:44

我是测试环境，就没怎么深入研究，重新在搭建一遍就好了。

steam · 2024 年1 月 8 日 07:47

那估计是哪里配置出了问题。

user80 · 2024 年1 月 8 日 08:01

对，我感觉也是。大概率是配置。