flink connector方式写入,meta 连接报 E_LEADER_CHANGED

  • nebula 版本:3.4.0
  • 部署方式:分布式(三节点)
  • 安装方式:RPM
  • 是否上生产环境:Y
  • 硬件信息
    • 磁盘 1T (SSD)
    • CPU、内存信息 32 core ,128G
  • 问题的具体描述
    flink connector sink失败 ,问题现象,meta 地址填三节点地址,连接meta报错(E_LEADER_CHANGED),只能用其中一个meta,能写入成功.
  • 相关的 meta / storage / graph info 日志信息
    meta 日志里打印信息
    I20230322 16:07:10.301388 2400724 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName test1 error: E_LEADER_CHANGED
    I20230322 16:07:10.301405 2400724 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName test1 error: E_LEADER_CHANGED
    I20230322 16:30:19.798485 2400724 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName test1 error: E_LEADER_CHANGED
    I20230322 16:30:52.377151 2400724 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName test1 error: E_LEADER_CHANGED
    I20230322 16:31:40.159183 2400724 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName test1 error: E_LEADER_CHANGED
    I20230322 16:38:15.640709 2400724 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName test1 error: E_LEADER_CHANGED
Caused by: java.lang.RuntimeException: org.apache.flink.connector.nebula.shaded.com.facebook.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
	at org.apache.flink.connector.nebula.connection.NebulaMetaConnectionProvider.getVidType(NebulaMetaConnectionProvider.java:84)
	at org.apache.flink.connector.nebula.sink.NebulaVertexBatchOutputFormat.createNebulaBatchExecutor(NebulaVertexBatchOutputFormat.java:21)
	at org.apache.flink.connector.nebula.sink.NebulaBatchOutputFormat.open(NebulaBatchOutputFormat.java:101)
	at org.apache.flink.connector.nebula.sink.NebulaSinkFunction.open(NebulaSinkFunction.java:37)
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:46)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:442)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585)
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.connector.nebula.shaded.com.facebook.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
	at org.apache.flink.connector.nebula.shaded.com.facebook.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:116)
	at org.apache.flink.connector.nebula.shaded.com.facebook.thrift.transport.TTransport.readAll(TTransport.java:75)
	at org.apache.flink.connector.nebula.shaded.com.facebook.thrift.transport.THeaderTransport.readFrame(THeaderTransport.java:281)
	at org.apache.flink.connector.nebula.shaded.com.facebook.thrift.transport.THeaderTransport._resetProtocol(THeaderTransport.java:267)
	at org.apache.flink.connector.nebula.shaded.com.facebook.thrift.protocol.THeaderProtocol.readMessageBegin(THeaderProtocol.java:217)
	at com.vesoft.nebula.meta.MetaService$Client.recv_getSpace(MetaService.java:572)
	at com.vesoft.nebula.meta.MetaService$Client.getSpace(MetaService.java:549)
	at com.vesoft.nebula.client.meta.MetaClient.getSpace(MetaClient.java:240)
	at org.apache.flink.connector.nebula.connection.NebulaMetaConnectionProvider.getVidType(NebulaMetaConnectionProvider.java:81)
	... 15 more

你 show hosts 看下服务还在不,是不是有哪个节点挂了。

都是online状态

配置文件和 shows hosts 的截图都贴一下呢?


meta.conf

你的meta三台机器的9559端口都开放了吗?指定一台机器数据导入成功,是三个都分别指定过吗?

1 个赞

端口开放了,三台机器通过telnet ip 9559 测试是通过的.单独指定,只有其中一台能成功,剩余两台不行

SHOW HOSTS META

看看 meta 服务呢?

你的graphd配置文件里有meta server地址有把三个都填上吗?

image
三个都填了

三台里都能telnet, 但是测试只有一台成功是啥意思?

你可以贴下三个meta的INFO日志, (能开v=1最好)

问题找到了,第一连接meta的节点是follower,服务端返回leader_change和leader的ip,二次连接meta的leader,但是ip是内网的,本地不通导致超时.

3 个赞

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。