Comment 6 for bug 1760450

Revision history for this message
Alan Jenkins (aj504) wrote : Re: Xorg crashed with signal 7 in _dl_fixup()

Hi Ubuntu users! Signal 7 is SIGBUS. SIGBUS should be relatively unusual on x86 [1].

[1] https://stackoverflow.com/questions/2089167/debugging-sigbus-on-x86-linux

I'm excited to inform you that Fedora Linux users also started seeing the same root problem. It is tied to the upgrade from kernel v4.14 to v4.15.

Fedora bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Arch Linux independently identified this as caused by the kernel upgrade:
https://bbs.archlinux.org/viewtopic.php?id=235027

It can happen after resume from suspend, not every time but maybe once every three days. We have reports for both Xwayland and Xorg getting a fatal SIGBUS in _dl_fixup(). (While this is actually a secondary crash in xorg_backtrace(), we have a load of SIGBUS traces that have the same primary trace as each other).

Notice the specific faulting instruction in disassembly you captured: it is not performing a memory access!

=> 0x559c102a4060 <ErrorFSigSafe>: sub $0xd8,%rsp

Instead, notice that this is the first instruction in the function ErrorFSigSafe. This is a big common factor in our traces. (We actually have several different traces captured, with the failing function varying, often along the same call chain).

What's happening is a fault on the instruction fetch. You should be able to confirm this if you look at the address which generates the fault. (si_addr field of struct siginfo. I don't know where the Ubuntu crash collector saves this information)

The kernel failed to load in the page which holds the program code at this point. That's the real problem: some sort of transient IO error during wakeup. Users sometimes see other symptoms of these IO errors as well:

PM: resume devices took 1.017 seconds
Restarting tasks ...
Read-error on swap-device (253:1:836184)
PM: suspend exit
systemd-coredump[755]: Process 1356 (Xwayland) of user 42 dumped core.

and

PM: suspend exit
EXT4-fs error (device dm-2): ext4_find_entry:1436: inode #5514052: comm thunderbird: reading directory lblock 0
Buffer I/O error on dev dm-2, logical block 0, lost sync page write
WARNING: CPU: 1 PID: 748 at fs/buffer.c:1108 mark_buffer_dirty+0xd4/0xe0
 (and a kernel backtrace)