nebula graph 2.0.1版本升级到3.0.0启动不了

提问参考模版:

  • nebula 版本:2.0.1 升级到 3.0.0
  • 部署方式:分布式
  • 安装方式:RPM
  • 是否为线上版本: N
  • 硬件信息
    • 磁盘( 推荐使用 SSD)
    • CPU、内存信息
  • 问题的具体描述
  • 相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)

安装文档操作到第四部就启动失败,

/usr/local/nebula/scripts/nebula.service stop all
tar -xf nebula-graph-3.0.0.el7.x86_64.tar.gz
cp nebula-graph-3.0.0.el7.x86_64/bin/* /usr/local/nebula/bin/
vim /usr/local/nebula/etc/nebula-graphd.conf
/usr/local/nebula/scripts/nebula-metad.service start
/usr/local/nebula/scripts/nebula.service status all
[root@cnsz22pl0272 nebula]# /usr/local/nebula/scripts/nebula-metad.service start
[INFO] Starting nebula-metad...
[INFO] Done
[root@cnsz22pl0272 nebula]# /usr/local/nebula/scripts/nebula.service status all 
[INFO] nebula-metad: Running as 231434, Listening on 9559
[INFO] nebula-graphd: Exited
[INFO] nebula-storaged: Exited
[root@cnsz22pl0272 nebula]# cat logs/metad/nebula-metad.ERROR      
Log file created at: 2022/02/21 14:42:10
Running on machine: cnsz22pl0272
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg
E20220221 14:42:10.628567 228796 MetaDaemon.cpp:202] Invalid timezone file `share/resources/date_time_zonespec.csv', exception: `Unable to locate or access the required datafile. Filespec: share/resources/date_time_zonespec.csv'.
E20220221 14:42:10.633497 230376 JobDescription.cpp:181] Loading Job Description FailedE_KEY_NOT_FOUND
E20220221 14:42:10.633533 230376 JobManager.cpp:129] [JobManager] load an invalid job from queue 0
logs/metad/nebula-metad.INFO 
I20220221 14:58:34.992241 231434 MetaDaemonInit.cpp:116] Leader has not been elected, sleep 1s
I20220221 14:58:35.393018 231481 RaftPart.cpp:1184] [Port: 9560, Space: 0, Part: 0] Sending out an election request (space = 0, part = 0, term = 30, lastLogId = 11713796, lastLogTerm = 29, candidateIP = 10.202.77.197, candidatePort = 9560), isPreVote = 1
I20220221 14:58:35.393393 231475 RaftPart.cpp:1150] [Port: 9560, Space: 0, Part: 0] Did not get enough votes from election of term 30, isPreVote = 1
I20220221 14:58:35.992329 231434 MetaDaemonInit.cpp:116] Leader has not been elected, sleep 1s
I20220221 14:58:36.939888 231482 RaftPart.cpp:1184] [Port: 9560, Space: 0, Part: 0] Sending out an election request (space = 0, part = 0, term = 30, lastLogId = 11713796, lastLogTerm = 29, candidateIP = 10.202.77.197, candidatePort = 9560), isPreVote = 1
I20220221 14:58:36.940265 231475 RaftPart.cpp:1150] [Port: 9560, Space: 0, Part: 0] Did not get enough votes from election of term 30, isPreVote = 1
I20220221 14:58:36.992424 231434 MetaDaemonInit.cpp:116] Leader has not been elected, sleep 1s
I20220221 14:58:37.478935 231479 RaftPart.cpp:1184] [Port: 9560, Space: 0, Part: 0] Sending out an election request (space = 0, part = 0, term = 30, lastLogId = 11713796, lastLogTerm = 29, candidateIP = 10.202.77.197, candidatePort = 9560), isPreVote = 1
I20220221 14:58:37.482247 231475 RaftPart.cpp:1150] [Port: 9560, Space: 0, Part: 0] Did not get enough votes from election of term 30, isPreVote = 1
I20220221 14:58:37.992512 231434 MetaDaemonInit.cpp:116] Leader has not been elected, sleep 1s
I20220221 14:58:38.212052 231480 RaftPart.cpp:1184] [Port: 9560, Space: 0, Part: 0] Sending out an election request (space = 0, part = 0, term = 30, lastLogId = 11713796, lastLogTerm = 29, candidateIP = 10.202.77.197, candidatePort = 9560), isPreVote = 1
I20220221 14:58:38.212435 231475 RaftPart.cpp:1150] [Port: 9560, Space: 0, Part: 0] Did not get enough votes from election of term 30, isPreVote = 1
I20220221 14:58:38.945796 231475 RaftPart.cpp:1326] [Port: 9560, Space: 0, Part: 0] Received a VOTING request: space = 0, partition = 0, candidateAddr = 10.202.77.198:9560, term = 31, lastLogId = 11713797, lastLogTerm = 30, isPreVote = 1
I20220221 14:58:38.945832 231475 RaftPart.cpp:1355] [Port: 9560, Space: 0, Part: 0] The partition currently is a Candidate, lastLogId 11713796, lastLogTerm 29, committedLogId 11713796, term 29
I20220221 14:58:38.945843 231475 RaftPart.cpp:1434] [Port: 9560, Space: 0, Part: 0] The partition will vote for the candidate "10.202.77.198":9560, isPreVote = 1
I20220221 14:58:38.946451 231475 RaftPart.cpp:1326] [Port: 9560, Space: 0, Part: 0] Received a VOTING request: space = 0, partition = 0, candidateAddr = 10.202.77.198:9560, term = 31, lastLogId = 11713797, lastLogTerm = 30, isPreVote = 0
I20220221 14:58:38.946470 231475 RaftPart.cpp:1355] [Port: 9560, Space: 0, Part: 0] The partition currently is a Follower, lastLogId 11713796, lastLogTerm 29, committedLogId 11713796, term 30
I20220221 14:58:38.946481 231475 RaftPart.cpp:1434] [Port: 9560, Space: 0, Part: 0] The partition will vote for the candidate "10.202.77.198":9560, isPreVote = 0
I20220221 14:58:38.947038 231492 RaftPart.cpp:1683] [Port: 9560, Space: 0, Part: 0] The current role is Follower. Will follow the new leader "10.202.77.198":9560 on term 31
I20220221 14:58:38.947089 231482 Part.cpp:206] [Port: 9560, Space: 0, Part: 0] Find the new leader "10.202.77.198":9560
I20220221 14:58:38.992688 231434 MetaDaemonInit.cpp:143] Get meta version is 3
I20220221 14:58:38.992707 231434 MetaDaemonInit.cpp:160] Nebula store init succeeded, clusterId 703487425170647513
I20220221 14:58:38.992727 231434 MetaDaemon.cpp:151] Start http service
I20220221 14:58:38.992952 231434 MetaDaemonInit.cpp:168] Starting Meta HTTP Service
I20220221 14:58:38.994654 247755 WebService.cpp:128] Web service started on HTTP[19559], HTTP2[19560]
I20220221 14:58:38.994704 231434 JobManager.cpp:73] Not leader, skip reading remaining jobs
I20220221 14:58:38.994771 231434 JobManager.cpp:61] JobManager initialized
I20220221 14:58:38.994776 247761 JobManager.cpp:116] JobManager::runJobBackground() enter
E20220221 14:58:38.994954 231434 MetaDaemon.cpp:202] Invalid timezone file `share/resources/date_time_zonespec.csv', exception: `Unable to locate or access the required datafile. Filespec: share/resources/date_time_zonespec.csv'.
I20220221 14:58:38.995843 231434 JobManager.cpp:104] JobManager::shutDown() begin
I20220221 14:58:38.999858 247761 JobManager.cpp:121] [JobManager] detect shutdown called, exit
E20220221 14:58:38.999903 247761 JobDescription.cpp:181] Loading Job Description FailedE_LEADER_CHANGED
E20220221 14:58:38.999938 247761 JobManager.cpp:129] [JobManager] load an invalid job from queue 0
I20220221 14:58:38.999986 231434 JobManager.cpp:112] JobManager::shutDown() end

这个是明显缺少文件
你把3.0 安装包中的 share/resources/date_time_zonespec.csv 这个文件copy到按照目录的对应目录下面。

1 个赞

FYI,2.0GA之后的版本才能升级到3.0

啊?我现在吧share/resources/目录下文件拷贝进去了,然后meta启动成了,然后执行
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader
–src_db_path=/data/nebula/storaged
–dst_db_path=/data/nebula/storaged_backup
–upgrade_meta_server=10.202.77.197:9559,10.202.77.198:9559,10.202.77.199:9559
–upgrade_version=2:3
最后的日志如下,请问下这是失败了?

; creation time=0; time stamp of earliest key=0; file creation time=0; slow compression estimated data size=0; fast compression estimated data size=0; DB identity=SST Writer; DB session identity=G7I2X1NKNQCSZ7P15C02; DB host id=cnsz22pl0272; original file number=1; unique ID=6D37C31C3D1572F5-885ED2E1CA4DCBBD-9084FFBD4EA5E502; 
I20220221 15:28:40.337662 281523 DbUpgrader.cpp:1081] Copy space id 133 wal file begin
I20220221 15:28:40.548403 281523 DbUpgrader.cpp:1228] Copy space id 133 wal file success
I20220221 15:28:40.548442 281523 DbUpgrader.cpp:1072] Path /data/nebula/storaged_backup space id 133 compaction begin
E20220221 15:28:41.543736 281518 RocksEngine.cpp:426] Ingest Failed: Invalid argument: external_files[0] is empty
F20220221 15:28:41.543795 281518 DbUpgrader.cpp:996] Faild upgrade 2:3 when ingest sst file:-8000
*** Check failure stack trace: ***
*** Aborted at 1645428522 (Unix time, try 'date -d @1645428522') ***
*** Signal 6 (SIGABRT) (0x449aa) received by PID 281002 (pthread TID 0x7f8c679ff700) (linux TID 281518) (maybe from PID 281002, UID 0) (code: -6), stack trace: ***
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x31)[0x24ca151]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x26)[0x24c1656]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader[0x24bf5e7]
/lib64/libpthread.so.0(+0xf5df)[0x7f8c7da795df]
/lib64/libc.so.6(gsignal+0x37)[0x7f8c7d6dc1f7]
/lib64/libc.so.6(abort+0x147)[0x7f8c7d6dd8e7]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader[0x24f2e59]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader[0x24f5f53]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader[0x24f2928]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader[0x24f6608]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN6nebula7storage13UpgraderSpace11doProcessV3Ev+0x317)[0x1101f37]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN6nebula7storage10DbUpgrader7doSpaceEv+0x191)[0x11099f1]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly18ThreadPoolExecutor7runTaskERKSt10shared_ptrINS0_6ThreadEEONS0_4TaskE+0x137)[0x24098f7]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader[0x23fc10e]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly9EventBase20FunctionLoopCallback15runLoopCallbackEv+0xa)[0x2473d6a]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly9EventBase16runLoopCallbacksERN5boost9intrusive4listINS0_12LoopCallbackEJNS2_18constant_time_sizeILb0EEEEEE+0x109)[0x246c959]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly9EventBase16runLoopCallbacksEv+0x86)[0x246c9e6]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly9EventBase8loopBodyEib+0x1d2)[0x246ec92]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly9EventBase4loopEv+0x3d)[0x246f81d]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly9EventBase11loopForeverEv+0x17)[0x2472457]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly20IOThreadPoolExecutor9threadRunESt10shared_ptrINS_18ThreadPoolExecutor6ThreadEE+0x338)[0x23fcaf8]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader(_ZN5folly6detail8function14FunctionTraitsIFvvEE7callBigISt5_BindIFMNS_18ThreadPoolExecutorEFvSt10shared_ptrINS7_6ThreadEEEPS7_SA_EEEEvRNS1_4DataE+0x46)[0x240b066]
./nebula-graph-3.0.0.el7.x86_64/bin/db_upgrader[0x2a1fe9f]
/lib64/libpthread.so.0(+0x7e24)[0x7f8c7da71e24]
/lib64/libc.so.6(clone+0x6c)[0x7f8c7d79f34c]
(safe mode, symbolizer not available)
Aborted
You have mail in /var/spool/mail/root
[root@cnsz22pl0272 nebula]# 


你们这个文档里面写的可以从 2.0.0 及之后的 2.x 版本升级到 3.0.0 版本,那我现在升级到了一半,咋搞 :joy: :joy: :joy:

他没错啊,2.0.1 是GA之后的

我意思是你们这个文档最好改一下,因为 2.0.1是 2.0.0之后的版本吧,那按照文档的意思就是可以直接升级到3.0.0的

https://docs.nebula-graph.com.cn/2.5.1/4.deployment-and-installation/3.upgrade-nebula-graph/upgrade-nebula-from-200-to-250/


我是不是也可以参考这个文档,将2.0.1 先升级到2.5.1,然后再升级到3.0.0,因为文档写的适用于v2.0.x

是的,2.0.1 可以直接升级到3.0

但是日志显示失败了。。。

你看看是不是有个space下面的没有tag啊?
我找到原因了,马上fix下
多谢

1 个赞

v2.x 默认就是 2.0 ga 之后的,:joy: 我刚已经和 pandasheep 说了发版的先后顺序,我理解成了 v2.0.1 是 v2.0-alpha 版本。所以给了个错误信息给你

pr https://github.com/vesoft-inc/nebula/pull/3920

1 个赞

pr是提交了,但是我现在升级到一半,卡住了,现在是只能回滚了吗?

不用啊,你可以绕过这个pr。
你在出问题的这个space上,建立一个tag,随便插入一条tag数据,就绕过去了啊
然后再继续升级

1 个赞

遗留2个问题:
1,升级后,root密码被切换为了默认密码nebula
2,因为安装文档来,是替换bin文件夹升级,所以rpm里面的还是2.5.1,这个后续有啥影响吗?
image

针对你的两个问题:

1 这个应该是个bug
2 应该没有啥影响

第一个问题,我记录了 issue 见 https://github.com/vesoft-inc/nebula/issues/3928

您是把 root 的密码改了是吗?

我这边用 v2.0.1 版本的nebula升级到v3.0 版本 没发现这个问题

我在 v2.0.1 版本里修改了root 账户的密码

(root@nebula) [(none)]> CHANGE PASSWORD root FROM 'nebula' TO 'nebula12'
Execution succeeded (time spent 3583/4260 us)

Wed, 02 Mar 2022 15:35:50 CST

然后使用 v3.0 版本链接 root 账户密码nebula 链接失败

> ./nebula-console -addr=192.168.8.5  --port=1356 --u=root --p=nebula
2022/03/02 15:39:13 Fail to create a new session from connection pool, fail to authenticate, error: Invalid password
panic: Fail to create a new session from connection pool, fail to authenticate, error: Invalid password

使用 nebula12 作为密码链接成功

./nebula-console -addr=192.168.8.5  --port=1356 --u=root --p=nebula12

Welcome to Nebula Graph!
2 个赞