We encountered a situation where a remote server's Linux kernel Oops'ed 1) with the Big Kernel Lock acquired 2), and we used the following program to reboot that server:
#include <sys/io.h> int main(void) { iopl(3); outb(0xfe, 0x64); return 0; }
Alternatively:
echo -en '\xfe' | dd of=/dev/port seek=100 bs=1 count=1
This program/command is asking the PC AT keyboard controller on the motherboard to pulse the reset line, as described, for example, here. This is specific to PC-compatible x86 systems indeed. The machine we actually used this trick on was fairly modern - it had the Supermicro X7DVL-E motherboard, which uses the Intel 5000V (Blackford VS) chipset and can run up to two quad-core Xeon CPUs. There's no longer a separate 8042 chip, yet the functionality remained.
The “official” ways to reboot would not work because the reboot(2) syscall starts by trying to acquire the Big Kernel Lock, so it would get stuck. Yet, despite of the Big Kernel Lock, it was possible to SSH into the server, stop most processes, and remount the filesystems read-only. All of this was with an OpenVZ kernel from their “rhel5” branch, and it shows that recent kernel versions make very little use of the Big Kernel Lock (they use fine-grained locking or data structures not requiring locking instead).
Also relevant is the fact that this specific Linux system had kernel.panic_on_oops
set to 0. This is the default with mainstream Linux kernels, but Red Hat (and thus official OpenVZ kernels based on Red Hat's) are changing the default to 1. If kernel.panic_on_oops
were 1 and kernel.panic
was 0 (the default), the server would get stuck and we would not be able to recover it remotely on our own. On the other hand, if kernel.panic_on_oops
was set to 1 and kernel.panic
to non-zero, the server would reboot on its own (after the specified number of seconds).