服务器突然退出,然后再启动nebula服务,报address is invalid, lookup graphd on 127.0.0.11:53: no such host

  • nebula 版本: V2.0 alaph
  • 部署方式( 单机 )

问题:nebula服务运行中,服务器死机,重启后,启动nebula服务,报错

2021/01/27 09:43:54 Error: address is invalid, lookup graphd on 127.0.0.11:53: no such host

启动命令是:docker run --rm -ti --network nebula-docker-compose_nebula-net --entrypoint=/bin/sh vesoft/nebula-console:v2-preview-nightly
nebula-console -u root -p nebula --address=graphd --port=3699

请问是什么原因?该如何解决?

是nebula-graphd服务没有启动成功? nebula-metad和nebula-storaged启动成功了吗? 执行下 /usr/local/nebula/scripts/nebula.service status all

执行
docker run --rm -ti --network nebula-docker-compose_nebula-net --entrypoint=/bin/sh vesoft/nebula-console:v2-preview-nightly
这个没有报错。进入了/ #
然后在/ # 中输入nebula-console -u root -p nebula --address=graphd --port=3699 。就报这个错了

在我的/usr/local中,

没有nebula的文件夹啊。然后这2个文件夹,没有发现scripts的文件夹

你的nebula服务是用nebula-docker-compose启动的呀, 那你去nebula-docker-compose文件夹下执行sudo docker-compose ps 发下截图。

 Error: address is invalid, lookup graphd on 127.0.0.11:53: no such host
```
报错信息里的127.0.0.11:53 是啥? 是你笔误写错了吗?你修改过那个docker-compose.yml文件吗?

这个真不知道是啥。之前还好好的,一直研究这个。然后就坏了。docker-compose.yml 没有动过啊。这里也没有找到

127.0.0.11:53这个ip:port看起来很奇怪啊。
nebula-docker-compose文件夹下有个logs文件夹, 你发下里面的meta0/nebula-metad.INFO, 和graph0/nebula-graphd.INFO这两个文件

nebula-graphd.INF
Log file created at: 2021/01/27 07:27:31
Running on machine: 2452200a1c94
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0127 07:27:31.885730     1 GraphDaemon.cpp:105] Starting Graph HTTP Service
I0127 07:27:32.988310     7 WebService.cpp:143] Web service started on HTTP[13000], HTTP2[13002]
I0127 07:27:32.988356     1 GraphDaemon.cpp:119] Number of networking IO threads: 1
I0127 07:27:32.988364     1 GraphDaemon.cpp:128] Number of worker threads: 1
I0127 07:27:32.988641     1 MetaClient.cpp:47] Create meta client to [metad1:45500]
I0127 07:27:33.030090     1 GflagsManager.cpp:138] Prepare to register 5 gflags to meta
I0127 07:27:33.033428    14 ThriftClientManager.inl:49] resolve [metad1:45500] as [172.18.0.9:45500]
I0127 07:27:35.043294    14 ThriftClientManager.inl:49] resolve [metad0:45500] as [172.18.0.10:45500]
E0127 07:27:36.045646    14 MetaClient.cpp:581] Send request to [metad0:45500], exceed retry limit
E0127 07:27:36.176874     1 MetaClient.cpp:60] Heartbeat failed, status:RPC failure in MetaClient: N6apache6thrift9transport19TTransportExceptionE: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
I0127 07:27:36.176913     1 MetaClient.cpp:85] Waiting for the metad to be ready!
I0127 07:27:41.187278    14 ThriftClientManager.inl:49] resolve [metad2:45500] as [172.18.0.6:45500]
I0127 07:27:41.198169     1 MetaClient.cpp:2773] Register gflags ok 5
I0127 07:27:41.198639     1 MetaClient.cpp:96] Register time task for heartbeat!
I0127 07:27:41.198678     1 GraphDaemon.cpp:160] Starting nebula-graphd on 0.0.0.0:3699
I0127 07:27:49.876113    22 GraphService.cpp:32] Authenticating user root from 172.18.0.11:45586
E0127 07:27:53.495205    22 QueryInstance.cpp:120] SyntaxError: syntax error near `ls'
I0127 07:28:03.494340    22 SwitchSpaceExecutor.cpp:43] Graph switched to `school', space id: 40
E0127 07:29:18.724325    22 QueryInstance.cpp:120] SyntaxError: syntax error near `" over s'
I0127 07:29:51.426307    22 MetaClient.cpp:2986] Load leader of [storaged0:44500] in 2 space
I0127 07:29:51.426339    22 MetaClient.cpp:2986] Load leader of [storaged1:44500] in 3 space
I0127 07:29:51.426398    22 MetaClient.cpp:2986] Load leader of [storaged2:44500] in 2 space
I0127 07:29:51.426407    22 MetaClient.cpp:2989] Load leader ok
I0127 07:29:51.427060    14 ThriftClientManager.inl:49] resolve [storaged1:44500] as [172.18.0.5:44500]
I0127 07:31:13.279439    14 ThriftClientManager.inl:49] resolve [storaged2:44500] as [172.18.0.2:44500]
E0127 08:03:44.103408    22 QueryInstance.cpp:120] SemanticError: student not found in space [school].
E0127 08:05:56.007315    22 QueryInstance.cpp:120] SyntaxError: syntax error near `on'
I0127 08:06:13.687487    14 ThriftClientManager.inl:49] resolve [storaged0:44500] as [172.18.0.7:44500]
E0127 08:07:55.634797    22 QueryInstance.cpp:120] SyntaxError: syntax error near `order'
E0127 08:12:34.635246    22 QueryInstance.cpp:120] SyntaxError: syntax error near `ORDER'
I0127 09:17:20.660774     1 GraphDaemon.cpp:187] Signal 15(Terminated) received, stopping this server
I0127 09:17:20.661494    14 Acceptor.cpp:453] Dropping all connections from Acceptor=0x7f44d7ff8100 in thread 0x7f44d7ff0280
I0127 09:17:20.662284     1 GraphDaemon.cpp:168] nebula-graphd on 0.0.0.0:3699 has been stopped
nebula-metad.INF

Log file created at: 2021/01/27 07:27:33
Running on machine: 186f333bea0c
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0127 07:27:33.579901     1 MetaDaemon.cpp:212] identify myself as [metad0:45500]
I0127 07:27:33.598197     1 NebulaStore.cpp:47] Start the raft service...
I0127 07:27:33.598701     1 RaftexService.cpp:65] Init thrift server for raft service, port: 45501
I0127 07:27:33.599489    51 RaftexService.cpp:99] Starting the Raftex Service
I0127 07:27:33.604204    51 RaftexService.cpp:87] Starting the Raftex Service on 45501
I0127 07:27:33.604231    51 RaftexService.cpp:111] Start the Raftex Service successfully
I0127 07:27:33.604351     1 NebulaStore.cpp:75] Scan the local path, and init the spaces_
I0127 07:27:33.604423     1 NebulaStore.cpp:81] Scan path "/data/meta/0"
I0127 07:27:34.900910     1 RocksEngine.cpp:105] open rocksdb on /data/meta/nebula/0/data
I0127 07:27:34.900933     1 NebulaStore.cpp:111] Load space 0 from disk
I0127 07:27:34.900996     1 NebulaStore.cpp:146] Need to open 1 parts of space 0
I0127 07:27:34.901129    69 EventListner.h:18] Rocksdb start compaction column family: default because of LevelL0FilesNum, status: OK, compacted 5 files into 0, base level is 0, output level is 1
I0127 07:27:35.226078    69 EventListner.h:30] Rocksdb compaction completed column family: default because of LevelL0FilesNum, status: OK, compacted 5 files into 1, base level is 0, output level is 1
I0127 07:27:35.708910    47 FileBasedWal.cpp:65] [Port: 45501, Space: 0, Part: 0] lastLogId in wal is 239366, lastLogTerm is 16, path is /data/meta/nebula/0/wal/0/0000000000000152823.wal
I0127 07:27:35.709025    47 RaftPart.cpp:295] [Port: 45501, Space: 0, Part: 0] There are 2 peer hosts, and total 3 copies. The quorum is 2, as learner 0, lastLogId 239366, lastLogTerm 16, committedLogId 239365, term 16
I0127 07:27:35.709038    47 RaftPart.cpp:308] [Port: 45501, Space: 0, Part: 0] Add peer [metad1:45501]
I0127 07:27:35.709055    47 RaftPart.cpp:308] [Port: 45501, Space: 0, Part: 0] Add peer [metad2:45501]
I0127 07:27:35.709132    47 NebulaStore.cpp:177] Load part 0, 0 from disk
I0127 07:27:35.709182     1 NebulaStore.cpp:192] Load space 0 complete
I0127 07:27:35.709192     1 NebulaStore.cpp:201] Init data from partManager for [metad0:45500]
I0127 07:27:35.709204     1 NebulaStore.cpp:293] Data space 0 has existed!
I0127 07:27:35.709213     1 NebulaStore.cpp:322] [Space: 0, Part: 0] has existed!
I0127 07:27:35.709228     1 NebulaStore.cpp:68] Register handler...
I0127 07:27:35.709236     1 MetaDaemon.cpp:97] Waiting for the leader elected...
I0127 07:27:35.709244     1 MetaDaemon.cpp:110] Leader has not been elected, sleep 1s
I0127 07:27:36.710095     1 MetaDaemon.cpp:110] Leader has not been elected, sleep 1s
I0127 07:27:37.710840     1 MetaDaemon.cpp:110] Leader has not been elected, sleep 1s
I0127 07:27:38.710980     1 MetaDaemon.cpp:110] Leader has not been elected, sleep 1s
I0127 07:27:38.865180    21 RaftPart.cpp:1360] [Port: 45501, Space: 0, Part: 0] Recieved a VOTING request: space = 0, partition = 0, candidateAddr = metad2:45501, term = 17, lastLogId = 239366, lastLogTerm = 16
I0127 07:27:38.865200    21 RaftPart.cpp:1393] [Port: 45501, Space: 0, Part: 0] The partition currently is a Follower, lastLogId 239366, lastLogTerm 16, committedLogId 239365, term 16
I0127 07:27:38.865211    21 RaftPart.cpp:1458] [Port: 45501, Space: 0, Part: 0] The partition will vote for the candidate
I0127 07:27:38.873078    21 RaftPart.cpp:1747] [Port: 45501, Space: 0, Part: 0] The current role is Follower. Will follow the new leader metad2:45501 [Term: 17]
I0127 07:27:38.875741    48 Part.cpp:191] [Port: 45501, Space: 0, Part: 0] Find the new leader [metad2:45501]
I0127 07:27:39.441576    20 RaftPart.cpp:1606] [Port: 45501, Space: 0, Part: 0] The local last log term is 17, which is different from the leader's prevLogTerm 16, the prevLogId is 239366. So need to rollback to last committedLogId_ 239366
I0127 07:27:40.033298    20 FileBasedWal.cpp:400] [Port: 45501, Space: 0, Part: 0] Rollback to log 239366
I0127 07:27:40.033341    20 FileBasedWal.cpp:404] [Port: 45501, Space: 0, Part: 0] Need to truncate from offset 9519630
I0127 07:27:40.033450    20 RaftPart.cpp:1616] [Port: 45501, Space: 0, Part: 0] Rollback succeeded! lastLogId is 239366, logLogTerm is 16, committedLogId is 239366, term is 17
I0127 07:27:40.033601     1 MetaDaemon.cpp:142] Nebula store init succeeded, clusterId 1922164935461597273
I0127 07:27:40.033618     1 MetaDaemon.cpp:225] Start http service
I0127 07:27:40.033859     1 MetaDaemon.cpp:150] Starting Meta HTTP Service
I0127 07:27:40.046290    93 WebService.cpp:143] Web service started on HTTP[11000], HTTP2[19560]
I0127 07:27:40.047116     1 JobManager.cpp:59] JobManager initialized
I0127 07:27:40.050393     1 StatsManager.cpp:94] registerHisto, bucketSize: 1000, min: 1, max: 1000000
I0127 07:27:40.050423     1 MetaDaemon.cpp:274] The meta deamon start on [metad0:45500]
I0127 07:27:40.050849   109 JobManager.cpp:82] JobManager::runJobBackground() enter
I0127 07:27:40.063616   116 HBProcessor.cpp:29] Receive heartbeat from [0.0.0.0:3699], role = GRAPH
E0127 07:27:40.063673   116 RaftPart.cpp:367] [Port: 45501, Space: 0, Part: 0] The partition is not a leader
E0127 07:27:40.063982   116 RaftPart.cpp:687] [Port: 45501, Space: 0, Part: 0] Cannot append logs, clean the buffer
I0127 07:27:41.009132   116 HBProcessor.cpp:29] Receive heartbeat from [storaged1:44500], role = STORAGE
I0127 07:37:35.709700    49 FileBasedWal.cpp:738] [Port: 45501, Space: 0, Part: 0] Clean wals number 1
I0127 09:17:20.695655     1 MetaDaemon.cpp:305] Signal 15(Terminated) received, stopping this server
I0127 09:17:20.695700     1 JobManager.cpp:68] JobManager::shutDown() begin
I0127 09:17:20.696671   109 JobManager.cpp:87] [JobManager] detect shutdown called, exit
E0127 09:17:20.696713   109 JobDescription.cpp:185] Loading Job Description Failed
E0127 09:17:20.696734   109 JobManager.cpp:95] [JobManager] load a invalid job from queue 0
I0127 09:17:20.696749   109 JobManager.cpp:117] [JobManager] exit
I0127 09:17:20.698361     1 JobManager.cpp:78] JobManager::shutDown() end
I0127 09:17:20.698475     1 NebulaStore.cpp:33] Cut off the relationship with meta client
I0127 09:17:20.698490     1 NebulaStore.cpp:35] Stop the raft service...
I0127 09:17:20.698501     1 RaftexService.cpp:134] Stopping the raftex service on port 45501
I0127 09:17:20.698514     1 Host.cpp:48] [Port: 45501, Space: 0, Part: 0] [Host: metad1:45501] The host has been stopped!
I0127 09:17:20.698525     1 Host.cpp:48] [Port: 45501, Space: 0, Part: 0] [Host: metad2:45501] The host has been stopped!
I0127 09:17:20.698536     1 Host.h:32] [Port: 45501, Space: 0, Part: 0] [Host: metad1:45501]  The host has been destroyed!
I0127 09:17:20.698547     1 Host.h:32] [Port: 45501, Space: 0, Part: 0] [Host: metad2:45501]  The host has been destroyed!
I0127 09:17:20.698559     1 RaftPart.cpp:352] [Port: 45501, Space: 0, Part: 0] Partition has been stopped
I0127 09:17:20.698570     1 RaftexService.cpp:141] All partitions have stopped
I0127 09:17:20.698585     1 NebulaStore.cpp:37] Waiting for the raft service stop...
I0127 09:17:20.698616    51 RaftexService.cpp:115] The Raftex Service stopped
I0127 09:17:20.706959    52 Acceptor.cpp:453] Dropping all connections from Acceptor=0x7f48d2a41400 in thread 0x7f48cbf62a00
I0127 09:17:20.707123    53 Acceptor.cpp:453] Dropping all connections from Acceptor=0x7f48d2a41700 in thread 0x7f48cbf7da00
I0127 09:17:20.710184    54 Acceptor.cpp:453] Dropping all connections from Acceptor=0x7f48d2a41a00 in thread 0x7f48cbfa8a00
I0127 09:17:20.711678    55 Acceptor.cpp:453] Dropping all connections from Acceptor=0x7f48d2a41d00 in thread 0x7f48ca24ba00
I0127 09:17:20.736415     1 RaftexService.cpp:152] Server thread has stopped. Service on port 45501 is ready to be destroyed
I0127 09:17:20.736440     1 RocksEngine.h:100] Release rocksdb on /data/meta/nebula/0
I0127 09:17:20.737762     1 NebulaStore.h:34] ~SpacePartInfo()
I0127 09:17:20.738435     1 Part.h:38] [Port: 45501, Space: 0, Part: 0] ~Part()
I0127 09:17:20.738456     1 RaftPart.cpp:254] [Port: 45501, Space: 0, Part: 0]  The part has been destroyed...
I0127 09:17:20.763502     1 FileBasedWal.cpp:83] [Port: 45501, Space: 0, Part: 0] ~FileBasedWal, dir = /data/meta/nebula/0/wal/0
I0127 09:17:20.763996     1 NebulaStore.cpp:43] ~NebulaStore()
I0127 09:17:20.774636   119 Acceptor.cpp:453] Dropping all connections from Acceptor=0x7f48c1a36100 in thread 0x7f48b3f88100
I0127 09:17:20.776213     1 MetaDaemon.cpp:287] The meta Daemon stopped
I0127 09:17:20.800695     1 JobManager.cpp:68] JobManager::shutDown() begin
I0127 09:17:20.800781     1 JobManager.cpp:71] JobManager not running, exit

我log下没有graph0 ,只有graph ,graph1,graph2

我执行了一次docker-compose start 命令,然后再用docker-compose ps查看

但是storaged都是unhealth。我再运行命令可以进去了,但是这样算是正常了吗?

storage服务不是正常的, 你可以连接上graph服务, 但是执行插入语句或者查询语句时还是会报错。 你再贴下logs下的nebula-storaged.INFO吧