nebualGraph 2.5.1 和spark -exchange 2.5.1 导入数据直接让storaged,graphd 全部挂掉,重启立刻挂掉

nebula 版本:nebulaGraph2.5.1
部署方式:分布式
安装方式: RPM
是否为线上版本:Y
硬件信息
磁盘 2TB ssd
CPU 16c 32G
问题的具体描述
相关的 meta / storage / graph info 日志信息(尽量使用文本形式方便检索)
nebualGraph 2.5.1 和spark -exchange 2.5.1 导入数据直接让storaged,graphd 全部挂掉,重启立刻挂掉
请问需要查看那些日志,这边看storage 日志没有突然挂掉日志

  1. demsg -T|grep nebula 看看有没有OOM
  2. 去logs目录下看nebula-storaged.INFO,nebula.graphd.INFO日志
  3. 导入数据量大概多大?另外exhcange是否和nebula集群放同一台机器,导入的参数配置发上来看下
1 个赞

之前导入多次每天,但最近几次就不行了

1.demsg 没这个命令
2.
nebula-storaged.10-130-166-241.okg-db-graphd.cp01.root.log.INFO.20220318-103349.68208 (257.6 KB)
3.数据量85G ,10亿数据量 ,echange 和nebula 都独立机器
${SPARK_HOME}/bin/spark-submit
–queue root.ipd.daily
–name “nebula2.5.1-import-$taskName”
–master yarn
–driver-cores 26
–driver-memory 32g
–executor-memory 32g
–deploy-mode cluster
–num-executors 96
–executor-cores 20
–conf spark.port.maxRetries=3
–conf spark.yarn.maxAppAttempts=3
–conf spark.executor.memoryOverhead=8g
–conf spark.driver.memoryOverhead=8g
–conf spark.hadoop.fs.defaultFS="$ALG_HDFS"
–conf spark.default.parallelism=96
–conf spark.executor.extraJavaOptions="-XX:MaxDirectMemorySize=7372m"
–files “$nebulaConf”
–class com.vesoft.nebula.exchange.Exchange
${baseJarPath}/nebula-exchange-2.5.0-1.jar -c nebula-import.conf -h -d

打错了,dmesg :joy:dmesg -T|grep nebula


这是内存

[Wed Jun 16 05:17:35 2021] [436536]     0 436536  1938099    17046     466        0           830 nebula-metad
[Wed Jun 16 05:17:35 2021] [438309]     0 438309 18851693  6796157   27993        0           830 nebula-storaged
[Wed Jun 16 05:17:35 2021] [490332]     0 490332  4945487    24082    2419        0           830 nebula-graphd
[Wed Jun 16 05:17:35 2021] Killed process 438309 (nebula-storaged) total-vm:75406772kB, anon-rss:27184628kB, file-rss:0kB, shmem-rss:0kB
[Thu Jun 17 00:33:43 2021] [436536]     0 436536  1938099    19688     466        0           830 nebula-metad
[Thu Jun 17 00:33:43 2021] [490332]     0 490332  5273167    23901    2423        0           830 nebula-graphd
[Thu Jun 17 00:33:43 2021] [377836]     0 377836  8316256  6793720   15505        0           830 nebula-storaged
[Thu Jun 17 00:33:43 2021] Killed process 377836 (nebula-storaged) total-vm:33265024kB, anon-rss:27174880kB, file-rss:0kB, shmem-rss:0kB
[Thu Jun 24 04:40:03 2021] nebula-metad invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=830
[Thu Jun 24 04:40:03 2021] nebula-metad cpuset=a79ba9e63fe24b94a08d53bed4dbb922a4e3d6b3cc6f0d26249d83ae4024bd5b mems_allowed=0-1
[Thu Jun 24 04:40:03 2021] CPU: 63 PID: 436752 Comm: nebula-metad Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
[Thu Jun 24 04:40:03 2021] [436536]     0 436536  2744627    24478     481        0           830 nebula-metad
[Thu Jun 24 04:40:03 2021] [490332]     0 490332  5802319    23937    2436        0           830 nebula-graphd
[Thu Jun 24 04:40:03 2021] [137404]     0 137404 14829941  7656278   19305        0           830 nebula-storaged
[Thu Jun 24 04:40:03 2021] Killed process 137404 (nebula-storaged) total-vm:59319764kB, anon-rss:30625112kB, file-rss:0kB, shmem-rss:0kB
[Thu Jun 24 04:40:04 2021] [436536]     0 436536  2744627    24478     481        0           830 nebula-metad
[Thu Jun 24 04:40:04 2021] [490332]     0 490332  5802319    23937    2436        0           830 nebula-graphd
[Thu Jun 24 04:40:04 2021] [436536]     0 436536  2744627    24478     481        0           830 nebula-metad
[Thu Jun 24 04:40:04 2021] [490332]     0 490332  5802319    23937    2436        0           830 nebula-graphd
[Thu Oct  7 19:14:20 2021] [180031]     0 180031   593701    35811     957        0           830 nebula-metad
[Thu Oct  7 19:14:20 2021] [187977]     0 187977 18193820  7188639   30103        0           830 nebula-storaged
[Thu Oct  7 19:14:20 2021] [193637]     0 193637   999814   315070    1517        0           830 nebula-graphd
[Thu Oct  7 19:14:20 2021] Killed process 187977 (nebula-storaged) total-vm:72775280kB, anon-rss:28754556kB, file-rss:0kB, shmem-rss:0kB
[Thu Oct  7 19:14:20 2021] [180031]     0 180031   593701    35811     957        0           830 nebula-metad
[Thu Oct  7 19:14:20 2021] [193637]     0 193637   999814   315080    1517        0           830 nebula-graphd
[Wed Oct 27 17:37:45 2021] [376480]     0 376480   573830    46473     976        0           830 nebula-graphd
[Wed Oct 27 17:37:45 2021] [390892]     0 390892  9605408  7075336   16538        0           830 nebula-storaged
[Wed Oct 27 17:37:45 2021] [520688]     0 520688  1592879   355875    2676        0           830 nebula-metad
[Wed Oct 27 17:37:45 2021] Killed process 390892 (nebula-storaged) total-vm:38421632kB, anon-rss:28301344kB, file-rss:0kB, shmem-rss:0kB
[Wed Oct 27 17:37:45 2021] nebula-metad invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=830
[Wed Oct 27 17:37:45 2021] nebula-metad cpuset=a79ba9e63fe24b94a08d53bed4dbb922a4e3d6b3cc6f0d26249d83ae4024bd5b mems_allowed=0-1
[Wed Oct 27 17:37:45 2021] CPU: 23 PID: 520974 Comm: nebula-metad Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
[Wed Oct 27 17:37:45 2021] [376480]     0 376480   573830    49416     976        0           830 nebula-graphd
[Wed Oct 27 17:37:45 2021] [520688]     0 520688  1592879   354269    2676        0           830 nebula-metad
[Wed Oct 27 17:37:46 2021] [376480]     0 376480   573830    49417     976        0           830 nebula-graphd
[Wed Oct 27 17:37:46 2021] [520688]     0 520688  1592879   354432    2676        0           830 nebula-metad
[Wed Oct 27 17:37:46 2021] Killed process 520688 (nebula-metad) total-vm:6371516kB, anon-rss:1417728kB, file-rss:0kB, shmem-rss:0kB
[Mon Jan 10 15:59:29 2022] [376480]     0 376480   695430   196411    1224        0           830 nebula-graphd
[Mon Jan 10 15:59:29 2022] [604849]     0 604849 19507247  7168135   34469        0           830 nebula-storaged
[Mon Jan 10 15:59:29 2022] [587639]     0 587639   876837   136497    1396        0           830 nebula-metad
[Mon Jan 10 15:59:29 2022] Killed process 604849 (nebula-storaged) total-vm:78028988kB, anon-rss:28672440kB, file-rss:100kB, shmem-rss:0kB
[Mon Jan 10 15:59:29 2022] [376480]     0 376480   695430   196411    1224        0           830 nebula-graphd
[Mon Jan 10 15:59:29 2022] [607972]     0 604849 19507247  7171970   34469        0           830 nebula-storaged
[Mon Jan 10 15:59:29 2022] [587639]     0 587639   876837   136497    1396        0           830 nebula-metad
[Mon Jan 10 15:59:29 2022] Memory cgroup out of memory: Kill process 607972 (nebula-storaged) score 1663 or sacrifice child
[Mon Jan 10 15:59:29 2022] Killed process 607972 (nebula-storaged) total-vm:78028988kB, anon-rss:28687104kB, file-rss:776kB, shmem-rss:0kB

操作系统时间是准确的吗?

昨天和今天操作的

现在是storage,graph 启动不了,已启动几秒钟就挂了,meta 正常

nebula-storaged.conf (4.4 KB)
这是storaged配置

date看下操作系统的时间是不是今天,另外就nebula就一台机器吗,单节点的?

6台storage
date
Fri Mar 18 11:41:36 CST 2022

挂了几台?其他都正常吗?

全部挂了,storage,graph,只有meta 没挂

把挂的节点的所有storage和graph都贴一下,再看看挂的节点的dmesg

meta三台,storage和graph都是6台吗?

是的

全部日志
nebula-graph-2.5.1.log.zip (3.7 MB)
nebula-graph-2.5.1.log (5).zip (465.9 KB)
nebula-graph-2.5.1.log (4).zip (480.2 KB)
nebula-graph-2.5.1.log (3).zip (421.2 KB)
nebula-graph-2.5.1.log (2).zip (494.1 KB)
nebula-graph-2.5.1.log (1).zip (3.0 MB)
dmesg.txt (29.2 KB)

大神帮看的怎么样了,什么原因,现在服务都不能用 了,

重启后立刻就挂了是吧?是否有在前一级目录(就是scripts和logs这层)生成core文件