3.2.0 nebula 添加节点, 做BALANCE received invalid message from client: No version identifier... old protocol client in strict mode? | Received invalid request from client: apache::thrift::transport::TTransportException: Header transport frame is too large:

部署信息

  • Nebula 版本: nebula-graph-3.2.0.el7.x86_64
  • Nebula Dashboard版本: nebula-dashboard-3.1.0.x86_64
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘: HDD
  • 服务部署:
机器名称 IP 地址 graphd 进程数 storaged 进程数 metad 进程数
A 110.110.200.52 1 1 1
B 110.110.200.53 1 1 1
C 110.110.200.54 1 1 1
D 122.30.240.11 1 1
E 122.30.240.12 1 1
F 122.30.240.13 1 1

配置文件

  • 相关的 meta / storage / graph 信息
########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=110.110.200.52:9559,110.110.200.53:9559,110.110.200.54:9559
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=122.30.240.11
# Network device to listen on
--listen_netdev=any
# Port to listen on
--port=9669
  • 新添加3台物理机

总共6台物理机安装,除了最后3台只安装了storaged、graphd,其他的组件在6台机器都有安装,
所以服务重新启动后,执行

ADD HOSTS 122.30.240.11:9779, 122.30.240.12:9779, 122.30.240.13:9779;
SHOW HOSTS;

六台Status 均显示 online;

  • 执行BALANCE
BALANCE LEADER

tail -200f logs/nebula-storaged.INFO
  • 查看日志:
E20220922 18:37:49.849853 48251 GeneratedCodeHelper.cpp:134] received invalid message from client: No version identifier... old protocol client in strict mode? sz=1195725856
E20220922 18:37:49.849874 48251 GeneratedCodeHelper.cpp:93] invalid message from client in function process
E20220922 18:37:49.849901 48251 HeaderServerChannel.cpp:100] Received invalid request from client: apache::thrift::transport::TTransportException: Header transport frame is too large: 1414811695 (hex 0x5454502f, ascii 'TTP/') (transport apache::thrift::PreReceivedDataAsyncTransportWrapper, address 110.110.200.52, port 41872)
  • 异常
    • 异常1 :后台日志报错如上,新添加的节点无法写入数据;
    • 异常2 :nebula-dashboard 中 监控的服务转态,graph server,正常:0,异常:6;storage server,正常:0,异常:6;meta server,正常:0,异常:3;

这个看起来像不同 storaged 的版本对不上?请确认下集群中所有服务是不是用的同一个版本,且没有升级过。

都统一版本的,截图中的,V3.2.0,观察发现应该是dashboard里监控进程nebula-stats-exporter的请求,响应的日志