从 Linux 的信号处理和条件变量讲起

众所周知:Linux 的信号处理回调函数中不能使用条件变量

这篇文章简要地阐述为什么我们需要遵循这样一个简单的道理。

看以下代码:

#include <iostream>
#include <condition_variable>
#include <csignal>
#include <mutex>

std::mutex lock;
std::condition_variable cond;
int val = 0;

void sigHandler(int signo) {
    std::unique_lock<std::mutex> lk(lock);
    val++;
    cond.notify_all();

    std::cout << "signal " << signo << " with val: " << val << std::endl;
}

int main(int argc, char **argv) {
    std::signal(SIGTERM, sigHandler);

    while (true) {
        std::unique_lock<std::mutex> lk(lock);
        cond.wait(lk, [] {
            if (val > 0) {
                std::cout << "notified val: " << val << std::endl;
                        }
            return false;
        });
    }

    return 0;
}

这是一段简单的信号处理代码,它在信号处理中使用的条件变量。

它有什么问题呢?跑起来就知道:

$ g++ sig_handler.cpp -o sig_handler -pthread -lpthread -g
$ ./sig_handler
$ pid=$(ps aux | grep sig_handler | grep -v grep | awk '{print $2}')
$ while true; do kill $pid; done
$ ./sig_handler
signal 15 with val: 1
signal 15 with val: 2
signal 15 with val: 3
signal 15 with val: 4

在测试中我们发现,执行 4 次循环后,程序就不动了,非常明显的死锁症状。

调出堆栈作进一步确认:

__lll_lock_wait (futex=futex@entry=0x55873517e160 <lock>, private=0) at lowlevellock.c:52
52	lowlevellock.c: No such file or directory.
(gdb) bt
#0  __lll_lock_wait (futex=futex@entry=0x55873517e160 <lock>, private=0) at lowlevellock.c:52
#1  0x00007f62c21680a3 in __GI___pthread_mutex_lock (mutex=0x55873517e160 <lock>) at ../nptl/pthread_mutex_lock.c:80
#2  0x000055873517b5c6 in __gthread_mutex_lock (__mutex=0x55873517e160 <lock>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3  0x000055873517b61a in std::mutex::lock (this=0x55873517e160 <lock>) at /usr/include/c++/9/bits/std_mutex.h:100
#4  0x000055873517b71f in std::unique_lock<std::mutex>::lock (this=0x7fffa5de3480) at /usr/include/c++/9/bits/unique_lock.h:141
#5  0x000055873517b68b in std::unique_lock<std::mutex>::unique_lock (this=0x7fffa5de3480, __m=...) at /usr/include/c++/9/bits/unique_lock.h:71
#6  0x000055873517b33b in sigHandler (signo=15) at sig_handler.cpp:11
#7  <signal handler called>
#8  0x00007f62c2079075 in __GI___libc_write (fd=1, buf=0x55873563deb0, nbytes=16) at ../sysdeps/unix/sysv/linux/write.c:26
#9  0x00007f62c1ff9e8d in _IO_new_file_write (f=0x7f62c21586a0 <_IO_2_1_stdout_>, data=0x55873563deb0, n=16) at fileops.c:1176
#10 0x00007f62c1ffb951 in new_do_write (to_do=16, data=0x55873563deb0 "notified val: 4\nal: 4\n", fp=0x7f62c21586a0 <_IO_2_1_stdout_>) at libioP.h:948
#11 _IO_new_do_write (to_do=16, data=0x55873563deb0 "notified val: 4\nal: 4\n", fp=0x7f62c21586a0 <_IO_2_1_stdout_>) at fileops.c:426
#12 _IO_new_do_write (fp=fp@entry=0x7f62c21586a0 <_IO_2_1_stdout_>, data=0x55873563deb0 "notified val: 4\nal: 4\n", to_do=16) at fileops.c:423
#13 0x00007f62c1ffbe93 in _IO_new_file_overflow (f=0x7f62c21586a0 <_IO_2_1_stdout_>, ch=10) at fileops.c:784
#14 0x00007f62c22ce289 in std::ostream::put(char) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x00007f62c22ce508 in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#16 0x000055873517b449 in <lambda()>::operator()(void) const (__closure=0x7fffa5de42cf) at sig_handler.cpp:25
#17 0x000055873517b4f2 in std::condition_variable::wait<main(int, char**)::<lambda()> >(std::unique_lock<std::mutex> &, <lambda()>) (
    this=0x55873517e1a0 <cond>, __lock=..., __p=...) at /usr/include/c++/9/condition_variable:100
#18 0x000055873517b4aa in main (argc=1, argv=0x7fffa5de4428) at sig_handler.cpp:23
(gdb) f 6
#6  0x000055873517b33b in sigHandler (signo=15) at sig_handler.cpp:11
11	    std::unique_lock<std::mutex> lk(lock);
(gdb) p lock
$1 = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 2, __count = 0, __owner = 814581, __nusers = 1, __kind = 0, __spins = 0, __elision = 0,
        __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\365m\f\000\001", '\000' <repeats 26 times>,
      __align = 2}}, <No data fields>}
(gdb) thread
[Current thread is 1 (Thread 0x7f62c1e17740 (LWP 814581))]

我们看到程序卡在信号处理回调函数上,而它正好在获取互斥变量锁。

而这个锁的 owner 就是信号处理回调函数被调度执行的线程——这个线程在执行信号处理回调之前已经获取了这个互斥锁。

所以信号处理回调函数不能使用条件变量的原因就很清楚了:

条件变量需要和互斥变量配套使用,Linux 会随机调度信号处理函数到任意的用户线程上执行,
如果系统刚好把调度到一个已经获的互斥变量锁的线程上死锁就来了。

这一点可以进一步从互斥变量的 man 手册中得到验证:

The mutex functions are not async-signal safe. What this means is that they should not be called from a signal handler. In particular, calling pthread_mutex_lock or pthread_mutex_unlock from a signal handler may deadlock the calling thread.

不能用互斥变量、条件变量那信号处理的并发同步要怎么搞?可以用信号量,信号的文档里是这么写的:

sem_post() is async-signal-safe: it may be safely called within a signal handler.

其实呢,把信号量的代码扒出来看一下很容易就能明白为啥它不会在信号处理回调中死锁了。

2 个赞