通过nebula spark connector读数据,storaged负载不均衡

  • nebula 版本:
    image

  • 部署方式:分布式

  • 是否为线上版本:Y

  • 硬件信息

    • 磁盘、CPU、内存信息: 24C | 128G | 44.67T HDD
  • 问题的具体描述
    通过nebula spark connector读数据,3台机器中有1台storage 负载暴涨。
    storage配置:

--rocksdb_column_family_options={"disable_auto_compactions":true,"write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
    storaged日志:
I20221011 21:00:54.171569 91277 TransactionManager.cpp:24] TransactionManager ctor()
I20221011 21:00:54.171905 91277 RocksEngineConfig.cpp:366] Emplace rocksdb option max_bytes_for_level_base=268435456
I20221011 21:00:54.171926 91277 RocksEngineConfig.cpp:366] Emplace rocksdb option max_write_buffer_number=4
I20221011 21:00:54.171936 91277 RocksEngineConfig.cpp:366] Emplace rocksdb option write_buffer_size=67108864
I20221011 21:00:54.171952 91277 RocksEngineConfig.cpp:366] Emplace rocksdb option disable_auto_compactions=1
I20221011 21:00:54.172138 91277 RocksEngineConfig.cpp:366] Emplace rocksdb option block_size=8192
I20221011 21:00:54.264791 91277 RocksEngine.cpp:97] open rocksdb on /data09/nebula-graph-3.2.0/data/storage/nebula/0/data
I20221011 21:00:54.264909 91277 AdminTaskManager.cpp:22] max concurrent subtasks: 10
I20221011 21:00:54.265033 91277 AdminTaskManager.cpp:40] exit AdminTaskManager::init()
I20221011 21:00:54.265130 91518 AdminTaskManager.cpp:227] waiting for incoming task
E20221011 21:01:06.371778 91451 MetaClient.cpp:192] Heartbeat failed, status:Machine not existed!
I20221011 21:01:17.398409 91451 MetaClient.cpp:3094] Load leader of "xx.xx.xx.xx":9779 in 2 space
I20221011 21:01:17.398468 91451 MetaClient.cpp:3094] Load leader of "xx.xx.xx.xx":9779 in 2 space
I20221011 21:01:17.398484 91451 MetaClient.cpp:3094] Load leader of "xx.xx.xx.xx":9779 in 1 space
I20221011 21:01:17.398494 91451 MetaClient.cpp:3100] Load leader ok
I20221011 21:01:27.419164 91451 MetaClient.cpp:3094] Load leader of "xx.xx.xx.xx":9779 in 2 space
I20221011 21:01:27.419250 91451 MetaClient.cpp:3094] Load leader of "xx.xx.xx.xx":9779 in 2 space
I20221011 21:01:27.419270 91451 MetaClient.cpp:3094] Load leader of "xx.xx.xx.xx":9779 in 1 space
I20221011 21:01:27.419281 91451 MetaClient.cpp:3100] Load leader ok

metad日志:

I20221011 21:41:39.487738 60618 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:41:59.342559 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:41:59.347000 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:41:59.352273 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:41:59.357383 60618 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:42:40.137874 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:42:40.139103 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:42:40.140609 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:42:40.143322 60618 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:42:40.150576 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:42:44.750705 60618 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:42:48.729068 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:42:48.733211 60618 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:42:48.733676 60618 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:42:48.734934 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:43:48.247560 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:43:48.400211 60614 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:43:48.400792 60614 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:43:48.405181 60614 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:43:48.405347 60614 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:43:48.698253 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:43:48.700026 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED
I20221011 21:43:48.704813 60618 GetSpaceProcessor.cpp:18] Get space Failed, SpaceName dp_test_spar3_zsjg error: E_LEADER_CHANGED
I20221011 21:43:49.026170 60618 ListSpacesProcessor.cpp:17] List spaces failed, error E_LEADER_CHANGED

监控截图:



你的数据是几副本?
看日志 还没进行读数据,是在查space的schema, storage发生了leader change。

是的,storaged的日志到最后读完数据结束都没有信息,只有在一开始的时候leader change。分片15,副本3。数据量实体:247256813,关系:244547905。无热点@nicole

你好,这个问题我们在多处数据集和环境上都出现了。麻烦看下能不能复现@nicole

已经让storaged相关的同学在看了哈

此话题已在最后回复的 30 天后被自动关闭。不再允许新回复。