It can happen after resume from suspend, not every time but maybe once every three days. We have reports for both Xwayland and Xorg getting a fatal SIGBUS in _dl_fixup(). (While this is actually a secondary crash in xorg_backtrace(), we have a load of SIGBUS traces that have the same primary trace as each other).
Notice the specific faulting instruction in disassembly you captured: it is not performing a memory access!
=> 0x559c102a4060 <ErrorFSigSafe>: sub $0xd8,%rsp
Instead, notice that this is the first instruction in the function ErrorFSigSafe. This is a big common factor in our traces. (We actually have several different traces captured, with the failing function varying, often along the same call chain).
What's happening is a fault on the instruction fetch. You should be able to confirm this if you look at the address which generates the fault. (si_addr field of struct siginfo. I don't know where the Ubuntu crash collector saves this information)
The kernel failed to load in the page which holds the program code at this point. That's the real problem: some sort of transient IO error during wakeup. Users sometimes see other symptoms of these IO errors as well:
PM: resume devices took 1.017 seconds
Restarting tasks ...
Read-error on swap-device (253:1:836184)
PM: suspend exit
systemd-coredump[755]: Process 1356 (Xwayland) of user 42 dumped core.
and
PM: suspend exit
EXT4-fs error (device dm-2): ext4_find_entry:1436: inode #5514052: comm thunderbird: reading directory lblock 0
Buffer I/O error on dev dm-2, logical block 0, lost sync page write
WARNING: CPU: 1 PID: 748 at fs/buffer.c:1108 mark_buffer_dirty+0xd4/0xe0
(and a kernel backtrace)
Hi Ubuntu users! Signal 7 is SIGBUS. SIGBUS should be relatively unusual on x86 [1].
[1] https:/ /stackoverflow. com/questions/ 2089167/ debugging- sigbus- on-x86- linux
I'm excited to inform you that Fedora Linux users also started seeing the same root problem. It is tied to the upgrade from kernel v4.14 to v4.15.
Fedora bug report: /bugzilla. redhat. com/show_ bug.cgi? id=1553979
https:/
Arch Linux independently identified this as caused by the kernel upgrade: /bbs.archlinux. org/viewtopic. php?id= 235027
https:/
It can happen after resume from suspend, not every time but maybe once every three days. We have reports for both Xwayland and Xorg getting a fatal SIGBUS in _dl_fixup(). (While this is actually a secondary crash in xorg_backtrace(), we have a load of SIGBUS traces that have the same primary trace as each other).
Notice the specific faulting instruction in disassembly you captured: it is not performing a memory access!
=> 0x559c102a4060 <ErrorFSigSafe>: sub $0xd8,%rsp
Instead, notice that this is the first instruction in the function ErrorFSigSafe. This is a big common factor in our traces. (We actually have several different traces captured, with the failing function varying, often along the same call chain).
What's happening is a fault on the instruction fetch. You should be able to confirm this if you look at the address which generates the fault. (si_addr field of struct siginfo. I don't know where the Ubuntu crash collector saves this information)
The kernel failed to load in the page which holds the program code at this point. That's the real problem: some sort of transient IO error during wakeup. Users sometimes see other symptoms of these IO errors as well:
PM: resume devices took 1.017 seconds coredump[ 755]: Process 1356 (Xwayland) of user 42 dumped core.
Restarting tasks ...
Read-error on swap-device (253:1:836184)
PM: suspend exit
systemd-
and
PM: suspend exit entry:1436: inode #5514052: comm thunderbird: reading directory lblock 0 dirty+0xd4/ 0xe0
EXT4-fs error (device dm-2): ext4_find_
Buffer I/O error on dev dm-2, logical block 0, lost sync page write
WARNING: CPU: 1 PID: 748 at fs/buffer.c:1108 mark_buffer_
(and a kernel backtrace)