RaftPart buffer is full如何解决?

用perf看下?


执行命令: perf top -g -p nebula_storaged_pid

:rofl:你这个perf 的结果有点奇怪,storage的符号都看不到。
要不,你 执行下 update configs storage:v=3 过几秒后,再执行 update configs storage:v=0,然后把storage的日志贴出来。
或者执行下 sudo ps -T -p storage_pid,或者将 sudo pstack pid > stack.txt

sh-4.2# ps -T -p 1
  PID  SPID TTY          TIME CMD
    1     1 ?        00:00:00 nebula-storaged
    1     6 ?        00:00:00 executor-pri0-1
    1     7 ?        00:00:00 executor-pri0-2
    1     8 ?        00:00:00 executor-pri1-2
    1     9 ?        00:00:00 executor-pri1-1
    1    10 ?        00:00:00 executor-pri2-1
    1    11 ?        00:00:00 executor-pri2-2
    1    12 ?        00:42:54 executor-pri3-3
    1    13 ?        00:43:09 executor-pri3-2
    1    14 ?        00:42:55 executor-pri3-1
    1    15 ?        00:42:54 executor-pri3-4
    1    16 ?        00:04:33 executor-pri3-6
    1    17 ?        00:00:18 executor-pri3-5
    1    18 ?        00:43:05 executor-pri3-7
    1    19 ?        00:43:09 executor-pri3-8
    1    20 ?        00:42:45 executor-pri3-9
    1    21 ?        00:43:01 executor-pri3-1
    1    22 ?        00:42:52 executor-pri3-1
    1    23 ?        00:43:08 executor-pri3-1
    1    24 ?        00:42:56 executor-pri3-1
    1    25 ?        00:42:50 executor-pri3-1
    1    26 ?        00:42:57 executor-pri3-1
    1    27 ?        00:42:59 executor-pri3-1
    1    28 ?        00:42:49 executor-pri3-1
    1    29 ?        00:43:05 executor-pri3-1
    1    30 ?        00:42:45 executor-pri3-1
    1    31 ?        00:42:55 executor-pri3-2
    1    32 ?        00:43:11 executor-pri3-2
    1    33 ?        00:43:06 executor-pri3-2
    1    34 ?        00:43:12 executor-pri3-2
    1    35 ?        00:42:50 executor-pri3-2
    1    36 ?        00:42:58 executor-pri3-2
    1    37 ?        00:42:59 executor-pri3-2
    1    38 ?        00:43:05 executor-pri3-2
    1    39 ?        00:43:02 executor-pri3-2
    1    40 ?        00:42:54 executor-pri3-2
    1    41 ?        00:42:55 executor-pri3-3
    1    42 ?        00:21:35 executor-pri3-3
    1    43 ?        00:43:02 executor-pri3-3
    1    44 ?        00:00:00 executor-pri4-1
    1    45 ?        00:00:00 executor-pri4-2
    1    46 ?        00:00:01 tsc-calibrator
    1    47 ?        01:41:45 IOThreadPool0
    1    48 ?        01:43:11 IOThreadPool1
    1    49 ?        01:48:52 IOThreadPool2
    1    50 ?        01:44:47 IOThreadPool3
    1    51 ?        01:47:22 IOThreadPool4
    1    52 ?        01:48:26 IOThreadPool5
    1    53 ?        01:48:29 IOThreadPool6
    1    54 ?        01:40:46 IOThreadPool7
    1    55 ?        01:37:34 IOThreadPool8
    1    56 ?        01:44:52 IOThreadPool9
    1    57 ?        01:55:56 IOThreadPool10
    1    58 ?        01:46:15 IOThreadPool11
    1    59 ?        01:39:04 IOThreadPool12
    1    60 ?        01:46:42 IOThreadPool13
    1    61 ?        01:40:34 IOThreadPool14
    1    62 ?        01:37:19 IOThreadPool15
    1    68 ?        00:00:01 nebula-storaged
    1    69 ?        00:00:11 nebula-bgworker
    1    70 ?        00:00:11 nebula-bgworker
    1    71 ?        00:00:11 nebula-bgworker
    1    72 ?        00:00:12 nebula-bgworker
    1    73 ?        00:00:00 nebula-storaged
    1    74 ?        00:00:00 Acceptor Thread
    1    75 ?        01:22:12 rocksdb:low0
    1    76 ?        00:11:50 rocksdb:high0
    1    77 ?        00:00:00 nebula-storaged
    1    93 ?        00:00:00 rocksdb:pst_st
    1    94 ?        00:00:00 nebula-storaged
    1   110 ?        00:00:00 rocksdb:pst_st
    1   111 ?        00:00:00 nebula-storaged
    1   127 ?        00:00:00 rocksdb:pst_st
    1   128 ?        00:00:00 nebula-storaged
    1   144 ?        00:00:00 rocksdb:pst_st
    1   145 ?        00:00:00 http thread poo
    1   146 ?        00:00:00 http thread poo
    1   147 ?        00:00:00 http thread poo
    1   148 ?        00:00:00 webservice-list
    1   149 ?        00:00:00 HTTPSrvExec0
    1   150 ?        00:01:36 HTTPSrvExec1
    1   151 ?        00:00:00 HTTPSrvExec2
    1   152 ?        00:00:00 HTTPSrvExec3
    1   153 ?        00:00:00 IOThreadPool0
    1   154 ?        00:00:00 reader-pool-pri
    1   155 ?        00:00:00 reader-pool-pri
    1   156 ?        00:00:00 reader-pool-pri
    1   157 ?        00:00:00 reader-pool-pri
    1   158 ?        00:00:00 reader-pool-pri
    1   159 ?        00:00:00 reader-pool-pri
    1   160 ?        00:19:30 reader-pool-pri
    1   161 ?        00:19:42 reader-pool-pri
    1   162 ?        00:20:44 reader-pool-pri
    1   163 ?        00:19:12 reader-pool-pri
    1   164 ?        00:20:23 reader-pool-pri
    1   165 ?        00:21:57 reader-pool-pri
    1   166 ?        00:22:47 reader-pool-pri
    1   167 ?        00:18:27 reader-pool-pri
    1   168 ?        00:21:26 reader-pool-pri
    1   169 ?        00:22:35 reader-pool-pri
    1   170 ?        00:18:40 reader-pool-pri
    1   171 ?        00:23:17 reader-pool-pri
    1   172 ?        00:21:33 reader-pool-pri
    1   173 ?        00:20:53 reader-pool-pri
    1   174 ?        00:21:07 reader-pool-pri
    1   175 ?        00:19:24 reader-pool-pri
    1   176 ?        00:18:09 reader-pool-pri
    1   177 ?        00:21:50 reader-pool-pri
    1   178 ?        00:22:36 reader-pool-pri
    1   179 ?        00:23:09 reader-pool-pri
    1   180 ?        00:21:56 reader-pool-pri
    1   181 ?        00:21:45 reader-pool-pri
    1   182 ?        00:21:34 reader-pool-pri
    1   183 ?        00:19:36 reader-pool-pri
    1   184 ?        00:21:22 reader-pool-pri
    1   185 ?        00:17:50 reader-pool-pri
    1   186 ?        00:22:22 reader-pool-pri
    1   187 ?        00:21:57 reader-pool-pri
    1   188 ?        00:19:40 reader-pool-pri
    1   189 ?        00:20:09 reader-pool-pri
    1   190 ?        00:20:29 reader-pool-pri
    1   191 ?        00:20:49 reader-pool-pri
    1   192 ?        00:00:00 reader-pool-pri
    1   193 ?        00:00:00 reader-pool-pri
    1   194 ?        00:00:00 Acceptor Thread
    1 25171 ?        00:00:00 nebula-storaged
    1 25187 ?        00:00:00 rocksdb:pst_st

日志级别好像不起作用。。我设置成了3 不过还是没有低级别的日志打出来。。。 pstack 也没有啥信息,

@datian9966 @valseek 参数是max_batch_size,改这个参数大小意义不大,导入的时候出现这个问题一般是同时大量导入一个点和他的出边,需要在导入端做限流和重试。

@valseek 另外不建议用update/upsert来做大量写入

1 个赞

问题是 现在停止写入了,这个partition还是不能插入。

show hosts还有show part partId贴下图 如果有leader 可以grep下这个space+part的info日志

我觉得前面日志已经贴的很详细了。。。。。。

W0111 09:26:35.083777    26 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:26:36.715113    23 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
E0111 09:27:14.122383    18 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:27:16.187121    19 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:27:17.446768    32 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:27:17.498497    23 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:27:18.683682    23 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:27:18.716003    22 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:27:19.995110    38 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:27:20.033175    27 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:27:21.113777    28 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:27:22.357326    40 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:27:23.501991    27 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:27:23.553769    21 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
I0111 09:29:58.890431    70 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:29:58.891422    70 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:29:58.892374    70 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
W0111 09:30:05.918068    29 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:06.174005    27 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:30:06.840802    23 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:07.176394    22 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:30:08.028753    22 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:08.229166    41 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
I0111 09:30:09.230336    71 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:30:09.230355    71 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:30:09.230376    71 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
I0111 09:30:19.277066    72 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:30:19.277096    72 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:30:19.278081    72 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
I0111 09:30:29.483870    71 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:30:29.484899    71 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:30:29.485978    71 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
I0111 09:30:39.496356    69 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:30:39.496376    69 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:30:39.496919    69 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
I0111 09:30:49.498498    72 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:30:49.499409    72 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:30:49.500454    72 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
W0111 09:30:51.556509    15 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:52.043906    28 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:30:52.640764    35 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:53.033396    28 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:30:53.771519    38 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:55.692230    15 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:30:56.320917    35 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:56.687958    31 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:30:57.275665    21 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:30:57.575654    12 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
I0111 09:30:59.504334    72 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:30:59.505008    72 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:30:59.505654    72 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
W0111 09:30:59.870792    13 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:31:00.206215    35 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:31:00.669425    18 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:31:00.952483    35 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:31:02.886149    36 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:31:03.282864    20 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:31:03.950527    18 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:31:04.445878    19 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
W0111 09:31:05.023299    39 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
E0111 09:31:06.027463    38 Part.cpp:435] [Port: 44501, Space: 35, Part: 8] RaftPart buffer is full
I0111 09:31:09.506196    72 RaftPart.cpp:1242] [Port: 44501, Space: 35, Part: 8] Need to send heartbeat
I0111 09:31:09.506217    72 RaftPart.cpp:1801] [Port: 44501, Space: 35, Part: 8] Send heartbeat
I0111 09:31:09.507309    72 RaftPart.cpp:1252] [Port: 44501, Space: 35, Part: 8] Schedule new task
W0111 09:39:49.525812    71 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
W0111 09:50:59.848855    71 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1
W0111 10:13:10.862673    72 RaftPart.cpp:567] [Port: 44501, Space: 35, Part: 8] The appendLog buffer is full. Please slow down the log appending rate.replicatingLogs_ :1

报错就是因为那个partition的buffer满了,通常停止写入过几秒钟就会恢复。如果仍然无法写入,storage应该是会持续刷某些日志。

没有插入错误的日志是因为没有插入了,这个的确是恢复不了的。

麻烦从console里往那个partition写一条数据 insert和update 然后没有报错且storage什么日志都没有?

image


同一台主机上的两个不同的partition ,一个可以插入,一个不能插入,我等了一分钟,还没有恢复,请问还需要什么日志吗?

upsert的话需要107是否已经存在 如果不存在 需要所有字段都指定或者有默认值

insert进不去的话 就单独在console执行一下,把graph和storage的info日志最后部分截个图贴下吧

日志在上面有。

你上面贴的日志都只是开着导入时候的日志,没有停掉之后单独insert或者upsert的日志

单独执行一条语句,有些日志可能在buff中不会立即刷新到文件里。
我能想办法给到的日志我都会尽可能的给你了。
上面的日志就是没开导入时候的info级别的日志
当然 我现在也没有办法给你复现最开始(第一次出现Raftpart buffer is full)时候的详细日志,因为那个时候只有error级别日志,