nebula部署后启动失败,出现segfault错误

  • nebula 版本:3.6.0
  • 部署方式:单机
  • 安装方式:RPM
  • 是否上生产环境:Y
  • 硬件信息
    • 磁盘 1THDD
    • CPU、内存:32C 64G
  • 操作系统:centos7.9
  • nebula使用rpm包部署之后,无法正常启动,发现被系统kill了,通过dmesg查看有sefault错误
  • 对core日志解析发现主要是因为NetworkUtils::listDeviceAndIPv4s这块有问题,具体详情如下:
[root@localhost nebula-graph]# gdb bin/nebula-metad core.19758
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/nebula-graph/bin/nebula-metad...(no debugging symbols found)...done.
[New LWP 19758]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/opt/nebula-graph/bin/nebula-metad --flagfile /opt/nebula-graph/etc/nebula-meta'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000001ab8a71 in nebula::network::NetworkUtils::listDeviceAndIPv4s[abi:cxx11]() ()
Missing separate debuginfos, use: debuginfo-install nebula-graph-3.6.0-1.x86_64

metad 被 kill 了还是哪个进程被 kill 了?
无法正常启动是一直无法正常启动,还是做了什么操作以后无法正常启动了?

metad、graphd和storaged都被kill了。一直无法正常启动的

另外生成的3个core文件,分别是metad、graphd、storaged生成的,里面信息都是一样的:
Program terminated with signal 11, Segmentation fault.
#0 0x0000000001ab8a71 in nebula::network::NetworkUtils::listDeviceAndIPv4sabi:cxx11 ()

感觉是 conf 文件有问题,贴出来看看?

  • nebula-metad.conf
–daemonize=true
–pid_file=/data/nebula/pids/nebula-metad.pid
–log_dir=/data/nebula/logs
–minloglevel=0
–v=0
–logbufsecs=0
–redirect_stdout=true
–stdout_log_file=metad-stdout.log
–stderr_log_file=metad-stderr.log
–stderrthreshold=3
–timestamp_in_logfile_name=true
–meta_server_addrs=172.16.239.41:9559
–local_ip=172.16.239.41
–port=9559
–ws_ip=0.0.0.0
–ws_http_port=19559
–ws_storage_http_port=19779
–data_path=/data/nebula/data/meta
–default_parts_num=2
–default_replica_factor=1
–heartbeat_interval_secs=10
  • nebula-graphd.conf
--daemonize=true
--pid_file=/data/nebula/pids/nebula-graphd.pid
--enable_optimizer=true
--default_charset=utf8
--default_collate=utf8_bin
--local_config=true
--log_dir=/data/nebula/logs
--minloglevel=0
--v=0
--logbufsecs=0
--redirect_stdout=true
--stdout_log_file=graphd-stdout.log
--stderr_log_file=graphd-stderr.log
--stderrthreshold=3
--timestamp_in_logfile_name=true
--accept_partial_success=false
--max_allowed_query_size=4194304
--meta_server_addrs=172.16.239.41:9559
--local_ip=172.16.239.41
--listen_netdev=any
--port=9669
--reuse_port=false
--listen_backlog=1024
--client_idle_timeout_secs=28800
--session_idle_timeout_secs=28800
--num_accept_threads=1
--num_netio_threads=0
--num_max_connections=0
--num_worker_threads=0
--ws_ip=0.0.0.0
--ws_http_port=19669
--storage_client_timeout_ms=60000
--slow_query_threshold_us=200000
--ws_meta_http_port=19559
--enable_authorize=false
--auth_type=password
--system_memory_high_watermark_ratio=0.8
--enable_space_level_metrics=false
--enable_experimental_feature=false
--enable_data_balance=true
--enable_udf=true
--udf_path=/home/nebula/dev/nebula/udf/
--max_sessions_per_ip_per_user=300
--memory_tracker_limit_ratio=0.8
--memory_tracker_untracked_reserved_memory_mb=50
--memory_tracker_detail_log=false
--memory_tracker_detail_log_interval_ms=60000
--memory_purge_enabled=true
--memory_purge_interval_seconds=10
--max_job_size=1
--min_batch_size=8192
--optimize_appendvertices=false
--path_batch_size=10000
  • storaged.conf
--daemonize=true
--pid_file=/data/nebula/pids/nebula-storaged.pid
--local_config=true
--log_dir=/data/nebula/logs
--minloglevel=0
--v=0
--logbufsecs=0
--redirect_stdout=true
--stdout_log_file=storaged-stdout.log
--stderr_log_file=storaged-stderr.log
--stderrthreshold=3
--timestamp_in_logfile_name=true
--meta_server_addrs=172.16.239.41:9559
--local_ip=172.16.239.41
--port=9779
--ws_ip=0.0.0.0
--ws_http_port=19779
--heartbeat_interval_secs=10
--raft_heartbeat_interval_secs=30
--raft_rpc_timeout_ms=500
--wal_ttl=14400
--data_path=/data/nebula/data/storage
--minimum_reserved_bytes=268435456
--rocksdb_batch_size=4096
--rocksdb_block_cache=4
--engine_type=rocksdb
--rocksdb_compression=lz4
--rocksdb_compression_per_level=
--enable_rocksdb_statistics=false
--rocksdb_stats_level=kExceptHistogramOrTimers
--enable_rocksdb_prefix_filtering=true
--enable_rocksdb_whole_key_filtering=false
--rocksdb_db_options={}
--rocksdb_column_family_options={"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
--rocksdb_block_based_table_options={"block_size":"8192"}
--query_concurrently=true
--auto_remove_invalid_space=true
--num_io_threads=16
--num_max_connections=0
--num_worker_threads=32
--max_concurrent_subtasks=10
--snapshot_part_rate_limit=10485760
--snapshot_batch_size=1048576
--rebuild_index_part_rate_limit=4194304
--rebuild_index_batch_size=1048576
--memory_tracker_limit_ratio=0.8
--memory_tracker_untracked_reserved_memory_mb=50
--memory_tracker_detail_log=false
--memory_tracker_detail_log_interval_ms=60000
--memory_purge_enabled=true
--memory_purge_interval_seconds=10

麻烦看一下是否配置问题,同样配置在其他环境没有出现过这个问题

@MuYi-方扬 客户生产环境的问题,麻烦帮忙再看看。另外我们也测试了最新版本,依然还是无法正常启动。

哥,你这个问题解决了不,我最近部署 3.8.0 也遇到了同样的报错

这种报错我看不太出来问题。。看其他人懂不。。