nebualGraph 2.5.1 和spark -exchange 2.5.1 导入数据直接让storaged,graphd 全部挂掉,重启立刻挂掉

有core

太大了无法上传论坛

zhengshuai1031 加我微信发你

嗯嗯,:thinking: 等会我们看下文件分析下

graph的参数配置贴一下

nebula-graphd.conf (3.1 KB)


session太多导致的,按如下改所有graph的参数,在重启所有服务包括meta,等10分钟看看能起来

好的

另外麻烦再拿下bin目录下的nebula-storaged这个可执行文件贴一下

不行改了参数还是启动就挂

是要meta 重启等10分钟,再启动storage 吗

你看看meta日志里面是不是还有那么多session,等他释放在启动。另外那个可执行文件给一下?

文件给清蒸了

meta没有session 日志呀
meta1—
318 15:49:04.346629 107605 Part.cpp:192] [Port: 9560, Space: 0, Part: 0] Find the new leader “10.130.166.243”:9560
I0318 15:49:04.346645 107602 RaftPart.cpp:1609] [Port: 9560, Space: 0, Part: 0] Local is missing logs from id 105809. Need to catch up
I0318 15:49:04.726763 107560 MetaDaemon.cpp:137] Get meta version is 2
I0318 15:49:04.726794 107560 MetaDaemon.cpp:164] Nebula store init succeeded, clusterId 6992457631440687600
I0318 15:49:04.726800 107560 MetaDaemon.cpp:255] Start http service
I0318 15:49:04.727010 107560 MetaDaemon.cpp:172] Starting Meta HTTP Service
I0318 15:49:04.729251 107708 WebService.cpp:131] Web service started on HTTP[19559], HTTP2[19560]
I0318 15:49:04.729359 107560 JobManager.cpp:53] JobManager initialized
I0318 15:49:04.729507 107714 JobManager.cpp:76] JobManager::runJobBackground() enter
I0318 15:49:04.729625 107560 MetaDaemon.cpp:312] The meta deamon start on “10.130.166.241”:9559

Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0318 15:49:03.616283 107606 RaftPart.cpp:1118] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from “10.130.166.242”:9560, error code is E_TERM_OUT_OF_DATE
E0318 15:49:03.616497 107606 RaftPart.cpp:1118] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from “10.130.166.243”:9560, error code is E_TERM_OUT_OF_DATE
E0318 15:49:04.246285 107607 RaftPart.cpp:1118] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from “10.130.166.243”:9560, error code is E_TERM_OUT_OF_DATE
E0318 15:49:04.246354 107607 RaftPart.cpp:1118] [Port: 9560, Space: 0, Part: 0] Receive response about askForVote from “10.130.166.242”:9560, error code is E_TERM_OUT_OF_DATE

meta2------------

I0318 15:48:58.616722 96958 WebService.cpp:131] Web service started on HTTP[19559], HTTP2[19560]
I0318 15:48:58.616813 96868 JobManager.cpp:53] JobManager initialized
I0318 15:48:58.616945 96964 JobManager.cpp:76] JobManager::runJobBackground() enter
I0318 15:48:58.617060 96868 MetaDaemon.cpp:312] The meta deamon start on “10.130.166.242”:9559
I0318 15:49:03.616575 96910 RaftPart.cpp:1337] [Port: 9560, Space: 0, Part: 0] Recieved a VOTING request: space = 0, partition = 0, candidateAddr = 10.130.166.241:9560, term = 7, lastLogId= 105809, lastLogTerm = 6
I0318 15:49:03.616621 96910 RaftPart.cpp:1370] [Port: 9560, Space: 0, Part: 0] The partition currently is a Follower, lastLogId 105818, lastLogTerm 10, committedLogId 105817, term 10
I0318 15:49:03.616628 96910 RaftPart.cpp:1384] [Port: 9560, Space: 0, Part: 0] The partition currently is on term 10. The term proposed by the candidate is no greater, so it will be rejected
I0318 15:49:04.246637 96910 RaftPart.cpp:1337] [Port: 9560, Space: 0, Part: 0] Recieved a VOTING request: space = 0, partition = 0, candidateAddr = 10.130.166.241:9560, term = 8, lastLogId= 105809, lastLogTerm = 6
I0318 15:49:04.246672 96910 RaftPart.cpp:1370] [Port: 9560, Space: 0, Part: 0] The partition currently is a Follower, lastLogId 105818, lastLogTerm 10, committedLogId 105817, term 10
I0318 15:49:04.246680 96910 RaftPart.cpp:1384] [Port: 9560, Space: 0, Part: 0] The partition currently is on term 10. The term proposed by the candidate is no greater, so it will be rejected

Log file created at: 2022/03/18 15:48:32
Running on machine: 10-130-166-242.okg-db-graphd.cp01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0318 15:48:32.581411 94466 JobDescription.cpp:188] Loading Job Description FailedE_KEY_NOT_FOUND
E0318 15:48:32.581638 94466 JobManager.cpp:89] [JobManager] load an invalid job from queue 0
^[[A^[[A^C
[root@10-130-166-242.okg-db-graphd.cp01 metad]# tail -200f nebula-metad.ERROR
Log file created at: 2022/03/18 15:48:32
Running on machine: 10-130-166-242.okg-db-graphd.cp01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0318 15:48:32.581411 94466 JobDescription.cpp:188] Loading Job Description FailedE_KEY_NOT_FOUND
E0318 15:48:32.581638 94466 JobManager.cpp:89] [JobManager] load an invalid job from queue 0

meta3---------------
I0318 15:48:57.675580 95178 Host.cpp:141] [Port: 9560, Space: 0, Part: 0] [Host: 10.130.166.241:9560] This is the first time to send the logs to this host, lastLogIdSent = 105813, lastLogTermSent = 6
I0318 15:48:58.039508 95122 MetaDaemon.cpp:137] Get meta version is 2
I0318 15:48:58.042169 95164 MetaVersionMan.cpp:66] Write meta version 2 succeeds
I0318 15:48:58.042198 95122 MetaDaemon.cpp:164] Nebula store init succeeded, clusterId 6992457631440687600
I0318 15:48:58.042214 95122 MetaDaemon.cpp:255] Start http service
I0318 15:48:58.042418 95122 MetaDaemon.cpp:172] Starting Meta HTTP Service
I0318 15:48:58.044404 95270 WebService.cpp:131] Web service started on HTTP[19559], HTTP2[19560]
I0318 15:48:58.044471 95122 JobManager.cpp:53] JobManager initialized
I0318 15:48:58.044486 95122 MetaDaemon.cpp:286] Check and init root user
I0318 15:48:58.044539 95122 RootUserMan.h:28] Root user exists
I0318 15:48:58.044562 95276 JobManager.cpp:76] JobManager::runJobBackground() enter
I0318 15:48:58.044721 95122 MetaDaemon.cpp:312] The meta deamon start on “10.130.166.243”:9559
E0318 15:49:02.679538 95185 Host.cpp:375] [Port: 9560, Space: 0, Part: 0] [Host: 10.130.166.241:9560] Failed to append logs to the host (Err: E_UNKNOWN_PART)
I0318 15:49:03.616523 95164 RaftPart.cpp:1337] [Port: 9560, Space: 0, Part: 0] Recieved a VOTING request: space = 0, partition = 0, candidateAddr = 10.130.166.241:9560, term = 7, lastLogId= 105809, lastLogTerm = 6
I0318 15:49:03.616559 95164 RaftPart.cpp:1370] [Port: 9560, Space: 0, Part: 0] The partition currently is a Leader, lastLogId 105818, lastLogTerm 10, committedLogId 105818, term 10
I0318 15:49:03.616566 95164 RaftPart.cpp:1384] [Port: 9560, Space: 0, Part: 0] The partition currently is on term 10. The term proposed by the candidate is no greater, so it will be rejected
I0318 15:49:04.246374 95164 RaftPart.cpp:1337] [Port: 9560, Space: 0, Part: 0] Recieved a VOTING request: space = 0, partition = 0, candidateAddr = 10.130.166.241:9560, term = 8, lastLogId= 105809, lastLogTerm = 6
I0318 15:49:04.246424 95164 RaftPart.cpp:1370] [Port: 9560, Space: 0, Part: 0] The partition currently is a Leader, lastLogId 105818, lastLogTerm 10, committedLogId 105818, term 10
I0318 15:49:04.246438 95164 RaftPart.cpp:1384] [Port: 9560, Space: 0, Part: 0] The partition currently is on term 10. The term proposed by the candidate is no greater, so it will be rejected
I0318 15:49:04.412986 95164 SlowOpTracker.h:33] [Port: 9560, Space: 0, Part: 0] total time:67ms, Total send logs: 2

og file created at: 2022/03/18 15:49:02
Running on machine: 10-130-166-243.okg-db-graphd.cp01
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0318 15:49:02.679538 95185 Host.cpp:375] [Port: 9560, Space: 0, Part: 0] [Host: 10.130.166.241:9560] Failed to append logs to the host (Err: E_UNKNOWN_PART)

还是不行等了好久启动storage 还是启动一下就挂了

这边发现只启动一个storage 没挂,但是启动第二个后就开始第一个挂了,后面都挂了

去放数据的目录下进入storage子目录下将0号 space重命名下再启动:
e.g.
cd data/storage/nebula
mv 0 0bak
重启所有服务

drwxr-xr-x 4 root root 4096 Mar 18 01:01 1
drwxr-xr-x 4 root root 4096 Mar 18 01:04 10
drwxr-xr-x 4 root root 4096 Mar 18 08:42 136
drwxr-xr-x 4 root root 4096 Mar 18 01:22 73

storage 下面没有0,meta 有0

把日志等级调高,在试着启一下看日志情况,再把日志贴一下