nebula 版本:nebulaGraph2.5.1
部署方式:分布式
安装方式: RPM
是否为线上版本:Y
硬件信息
磁盘 2TB ssd
CPU 16c 32G
问题的具体描述
相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
nebualGraph 2.5.1 和spark -exchange 2.5.1 导入数据直接让storaged,graphd 全部挂掉,重启立刻挂掉
请问需要查看那些日志,这边看storage 日志没有突然挂掉日志
- demsg -T|grep nebula 看看有没有OOM
- 去logs目录下看nebula-storaged.INFO,nebula.graphd.INFO日志
- 导入数据量大概多大?另外exhcange是否和nebula集群放同一台机器,导入的参数配置发上来看下
之前导入多次每天,但最近几次就不行了
1.demsg 没这个命令
2.
nebula-storaged.10-130-166-241.okg-db-graphd.cp01.root.log.INFO.20220318-103349.68208 (257.6 KB)
3.数据量85G ,10亿数据量 ,echange 和nebula 都独立机器
${SPARK_HOME}/bin/spark-submit
–queue root.ipd.daily
–name “nebula2.5.1-import-$taskName”
–master yarn
–driver-cores 26
–driver-memory 32g
–executor-memory 32g
–deploy-mode cluster
–num-executors 96
–executor-cores 20
–conf spark.port.maxRetries=3
–conf spark.yarn.maxAppAttempts=3
–conf spark.executor.memoryOverhead=8g
–conf spark.driver.memoryOverhead=8g
–conf spark.hadoop.fs.defaultFS=“$ALG_HDFS”
–conf spark.default.parallelism=96
–conf spark.executor.extraJavaOptions=“-XX:MaxDirectMemorySize=7372m”
–files “$nebulaConf”
–class com.vesoft.nebula.exchange.Exchange
${baseJarPath}/nebula-exchange-2.5.0-1.jar -c nebula-import.conf -h -d
打错了,dmesg dmesg -T|grep nebula
[Wed Jun 16 05:17:35 2021] [436536] 0 436536 1938099 17046 466 0 830 nebula-metad
[Wed Jun 16 05:17:35 2021] [438309] 0 438309 18851693 6796157 27993 0 830 nebula-storaged
[Wed Jun 16 05:17:35 2021] [490332] 0 490332 4945487 24082 2419 0 830 nebula-graphd
[Wed Jun 16 05:17:35 2021] Killed process 438309 (nebula-storaged) total-vm:75406772kB, anon-rss:27184628kB, file-rss:0kB, shmem-rss:0kB
[Thu Jun 17 00:33:43 2021] [436536] 0 436536 1938099 19688 466 0 830 nebula-metad
[Thu Jun 17 00:33:43 2021] [490332] 0 490332 5273167 23901 2423 0 830 nebula-graphd
[Thu Jun 17 00:33:43 2021] [377836] 0 377836 8316256 6793720 15505 0 830 nebula-storaged
[Thu Jun 17 00:33:43 2021] Killed process 377836 (nebula-storaged) total-vm:33265024kB, anon-rss:27174880kB, file-rss:0kB, shmem-rss:0kB
[Thu Jun 24 04:40:03 2021] nebula-metad invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=830
[Thu Jun 24 04:40:03 2021] nebula-metad cpuset=a79ba9e63fe24b94a08d53bed4dbb922a4e3d6b3cc6f0d26249d83ae4024bd5b mems_allowed=0-1
[Thu Jun 24 04:40:03 2021] CPU: 63 PID: 436752 Comm: nebula-metad Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7.x86_64 #1
[Thu Jun 24 04:40:03 2021] [436536] 0 436536 2744627 24478 481 0 830 nebula-metad
[Thu Jun 24 04:40:03 2021] [490332] 0 490332 5802319 23937 2436 0 830 nebula-graphd
[Thu Jun 24 04:40:03 2021] [137404] 0 137404 14829941 7656278 19305 0 830 nebula-storaged
[Thu Jun 24 04:40:03 2021] Killed process 137404 (nebula-storaged) total-vm:59319764kB, anon-rss:30625112kB, file-rss:0kB, shmem-rss:0kB
[Thu Jun 24 04:40:04 2021] [436536] 0 436536 2744627 24478 481 0 830 nebula-metad
[Thu Jun 24 04:40:04 2021] [490332] 0 490332 5802319 23937 2436 0 830 nebula-graphd
[Thu Jun 24 04:40:04 2021] [436536] 0 436536 2744627 24478 481 0 830 nebula-metad
[Thu Jun 24 04:40:04 2021] [490332] 0 490332 5802319 23937 2436 0 830 nebula-graphd
[Thu Oct 7 19:14:20 2021] [180031] 0 180031 593701 35811 957 0 830 nebula-metad
[Thu Oct 7 19:14:20 2021] [187977] 0 187977 18193820 7188639 30103 0 830 nebula-storaged
[Thu Oct 7 19:14:20 2021] [193637] 0 193637 999814 315070 1517 0 830 nebula-graphd
[Thu Oct 7 19:14:20 2021] Killed process 187977 (nebula-storaged) total-vm:72775280kB, anon-rss:28754556kB, file-rss:0kB, shmem-rss:0kB
[Thu Oct 7 19:14:20 2021] [180031] 0 180031 593701 35811 957 0 830 nebula-metad
[Thu Oct 7 19:14:20 2021] [193637] 0 193637 999814 315080 1517 0 830 nebula-graphd
[Wed Oct 27 17:37:45 2021] [376480] 0 376480 573830 46473 976 0 830 nebula-graphd
[Wed Oct 27 17:37:45 2021] [390892] 0 390892 9605408 7075336 16538 0 830 nebula-storaged
[Wed Oct 27 17:37:45 2021] [520688] 0 520688 1592879 355875 2676 0 830 nebula-metad
[Wed Oct 27 17:37:45 2021] Killed process 390892 (nebula-storaged) total-vm:38421632kB, anon-rss:28301344kB, file-rss:0kB, shmem-rss:0kB
[Wed Oct 27 17:37:45 2021] nebula-metad invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=830
[Wed Oct 27 17:37:45 2021] nebula-metad cpuset=a79ba9e63fe24b94a08d53bed4dbb922a4e3d6b3cc6f0d26249d83ae4024bd5b mems_allowed=0-1
[Wed Oct 27 17:37:45 2021] CPU: 23 PID: 520974 Comm: nebula-metad Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7.x86_64 #1
[Wed Oct 27 17:37:45 2021] [376480] 0 376480 573830 49416 976 0 830 nebula-graphd
[Wed Oct 27 17:37:45 2021] [520688] 0 520688 1592879 354269 2676 0 830 nebula-metad
[Wed Oct 27 17:37:46 2021] [376480] 0 376480 573830 49417 976 0 830 nebula-graphd
[Wed Oct 27 17:37:46 2021] [520688] 0 520688 1592879 354432 2676 0 830 nebula-metad
[Wed Oct 27 17:37:46 2021] Killed process 520688 (nebula-metad) total-vm:6371516kB, anon-rss:1417728kB, file-rss:0kB, shmem-rss:0kB
[Mon Jan 10 15:59:29 2022] [376480] 0 376480 695430 196411 1224 0 830 nebula-graphd
[Mon Jan 10 15:59:29 2022] [604849] 0 604849 19507247 7168135 34469 0 830 nebula-storaged
[Mon Jan 10 15:59:29 2022] [587639] 0 587639 876837 136497 1396 0 830 nebula-metad
[Mon Jan 10 15:59:29 2022] Killed process 604849 (nebula-storaged) total-vm:78028988kB, anon-rss:28672440kB, file-rss:100kB, shmem-rss:0kB
[Mon Jan 10 15:59:29 2022] [376480] 0 376480 695430 196411 1224 0 830 nebula-graphd
[Mon Jan 10 15:59:29 2022] [607972] 0 604849 19507247 7171970 34469 0 830 nebula-storaged
[Mon Jan 10 15:59:29 2022] [587639] 0 587639 876837 136497 1396 0 830 nebula-metad
[Mon Jan 10 15:59:29 2022] Memory cgroup out of memory: Kill process 607972 (nebula-storaged) score 1663 or sacrifice child
[Mon Jan 10 15:59:29 2022] Killed process 607972 (nebula-storaged) total-vm:78028988kB, anon-rss:28687104kB, file-rss:776kB, shmem-rss:0kB
操作系统时间是准确的吗?
昨天和今天操作的
现在是storage,graph 启动不了,已启动几秒钟就挂了,meta 正常
date看下操作系统的时间是不是今天,另外就nebula就一台机器吗,单节点的?
6台storage
date
Fri Mar 18 11:41:36 CST 2022
挂了几台?其他都正常吗?
全部挂了,storage,graph,只有meta 没挂
把挂的节点的所有storage和graph都贴一下,再看看挂的节点的dmesg
meta三台,storage和graph都是6台吗?
是的
全部日志
nebula-graph-2.5.1.log.zip (3.7 MB)
nebula-graph-2.5.1.log (5).zip (465.9 KB)
nebula-graph-2.5.1.log (4).zip (480.2 KB)
nebula-graph-2.5.1.log (3).zip (421.2 KB)
nebula-graph-2.5.1.log (2).zip (494.1 KB)
nebula-graph-2.5.1.log (1).zip (3.0 MB)
dmesg.txt (29.2 KB)
大神帮看的怎么样了,什么原因,现在服务都不能用 了,
重启后立刻就挂了是吧?是否有在前一级目录(就是scripts和logs这层)生成core文件