执行多次alter tag后nebula崩溃(稳定复现)

  • nebula 版本:2.6.1
  • 部署方式:单机
  • 安装方式:RPM
  • 是否为线上版本:Y
  • 硬件信息
    • 磁盘 500G
    • CPU 8核、内存信息 16G
  • 在某一个空间下的某一个tag下,执行多次alter tag T1 add (name_xxx string),大概在300次左右后,nebula-graphd和nebula-storaged会崩溃
    可复现步骤:
    1.创建图谱空间S0
    2.在S0下创建标签T0
    3.在T0下通过代码批量新增属性
@Test
    public void testCreateTag() {      
        for (int i = 0; i < 1; i++) {
            String spaceName = "S" + i;

            for (int j = 0; j < 1; j++) {
                String tagName = "T" + j;
                String createTagStmt = String.format("USE %s; CREATE TAG IF NOT EXISTS %s", spaceName, tagName);
                try {
                    ResultSet resultSet = nebulaSession.execute(createTagStmt);
                    if (!resultSet.isSucceeded()) {
                        log.error("execute: {}, failed: {}", createTagStmt, resultSet.getErrorMessage());
                    } else {
                        log.info("create tag succeed! space: {}, tag: {}", spaceName, j);
                    }
                } catch (Exception e) {
                    log.error("execute nebula sql failed!", e);
                }
                for (int k = 0; k < 300; k++) {
                    String fieldName = tagName + "_" + k;
                    String alterTagStmt = String.format("USE %s; ALTER TAG %s ADD (%s string) ;", spaceName, tagName, fieldName);
                    try {
                        ResultSet resultSet = nebulaSession.execute(alterTagStmt);
                        if (!resultSet.isSucceeded()) {
                            log.error("execute: {}, failed: {}", alterTagStmt, resultSet.getErrorMessage());
                            break;
                        } else {
                            log.info("alter tag succeed! space: {}, tag: {}, field: {}", spaceName, tagName, fieldName);
                        }
                    } catch (Exception e) {
                        log.error("execute nebula sql failed!", e);
                    }
                }
            }
        }
    }
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
*** Aborted at 1644750506 (Unix time, try 'date -d @1644750506') ***
*** Signal 11 (SIGSEGV) (0x0) received by PID 17851 (pthread TID 0x7f11bf1e6700) (linux TID 17936) (code: address not mapped to object), stack trace: ***
/usr/local/nebula/bin/nebula-storaged(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x31)[0x2132491]
/usr/local/nebula/bin/nebula-storaged(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x1b)[0x212a2fb]
/usr/local/nebula/bin/nebula-storaged[0x2128647]
/lib64/libpthread.so.0(+0xf36f)[0x7f11d65ff36f]
/usr/local/nebula/bin/nebula-storaged[0x25a1515]
/usr/local/nebula/bin/nebula-storaged[0x2544b5f]
/usr/local/nebula/bin/nebula-storaged[0x10eab45]
/usr/local/nebula/bin/nebula-storaged(_ZN6nebula4meta20NebulaSchemaProviderD1Ev+0x9b)[0x10eb53b]
/usr/local/nebula/bin/nebula-storaged(_ZNSt10_HashtableIiSt4pairIKiSt6vectorISt10shared_ptrIKN6nebula4meta20NebulaSchemaProviderEESaIS8_EEESaISB_ENSt8__detail10_Select1stESt8equal_toIiESt4hashIiENSD_18_Mod_range_hashingENSD_20_Default_ranged_hashENSD_20_Prime_rehash_policyENSD_17_Hashtable_traitsILb0ELb0ELb1EEEED1Ev+0x87)[0xf7c2c7]
/usr/local/nebula/bin/nebula-storaged(_ZNSt23_Sp_counted_ptr_inplaceIN6nebula4meta14SpaceInfoCacheESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0x60)[0x1277610]
/usr/local/nebula/bin/nebula-storaged[0x10ee45b]
/usr/local/nebula/bin/nebula-storaged(_ZNSt10_HashtableIiSt4pairIKiSt10shared_ptrIN6nebula4meta14SpaceInfoCacheEEESaIS7_ENSt8__detail10_Select1stESt8equal_toIiESt4hashIiENS9_18_Mod_range_hashingENS9_20_Default_ranged_hashENS9_20_Prime_rehash_policyENS9_17_Hashtable_traitsILb0ELb0ELb1EEEED1Ev+0x23)[0x122b0a3]
/usr/local/nebula/bin/nebula-storaged(_ZN6nebula4meta10MetaClient8loadDataEv+0x1de2)[0x11b6052]
/usr/local/nebula/bin/nebula-storaged(_ZN6nebula4meta10MetaClient19heartBeatThreadFuncEv+0x227)[0x11b6b67]
/usr/local/nebula/bin/nebula-storaged(_ZZN6nebula6thread13GenericWorker12addDelayTaskIMNS_4meta10MetaClientEFvvEJPS4_EEENSt9enable_ifIXsrSt7is_voidINSt9result_ofIFT_DpT0_EE4typeEE5valueEN5folly10SemiFutureINSI_4UnitEEEE4typeEmOSB_DpOSC_ENKUlvE_clEv+0x1c)[0x12560fc]
/usr/local/nebula/bin/nebula-storaged[0x18416aa]
/usr/local/nebula/bin/nebula-storaged[0x219bd92]
/usr/local/nebula/bin/nebula-storaged(event_base_loop+0x506)[0x219c466]
/usr/local/nebula/bin/nebula-storaged(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt8functionIFvvEEES8_St5_BindIFMN6nebula6thread13GenericWorkerEFvvEPSL_EEEEEEE6_M_runEv+0x5c)[0x1841c2c]
/usr/local/nebula/bin/nebula-storaged[0x266025f]
/lib64/libpthread.so.0(+0x7dc4)[0x7f11d65f7dc4]
/lib64/libc.so.6(clone+0x6c)[0x7f11d632676c]
(safe mode, symbolizer not available)
1 个赞

这是个 bug 哈,:thinking: 我提了个 issue,大兄弟可以关注下 issue 动态。我也会更新进展的哈。issue 传送门:https://github.com/vesoft-inc/nebula/issues/3889

感谢!期待尽快修复。

1 个赞

嗯嗯,有进展了我和你说

S.HUANG 在这个 pr 里修复了这个问题哈,这个 pr 被 pick 到 v3.0 了,你用 v3.0 的版本就不会遇到这个问题啦。

1 个赞

优秀!方便加微信吗?还有相关的问题请教。

我们不提供私人的联系方式的。:thinking: 你可以通过加我们的微信群(加微信群先填个小问卷:问卷系统 QQ 群(人比较少)QQ 群号:564223135

上面问题导致崩溃后,无法启动服务。我试过清空整个data目录是可以重启的,但是会把其他图谱空间的数据也清掉。我知道是哪个图谱空间有问题,有没有办法定位到这个图谱空间对应的文件夹,删除其数据?

你开个新帖子?我喊研发同学来给你回复下

谢啦!新帖在此: Nebula能否在data目录删除指定图谱空间数据

2 个赞

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。