First, please try to reproduce the problem later, not so early in boot,
by disabling the bcache module on the kernel boot parameters, and then
loading it after the system has booted successfully.
(This should be possible as you mentioned the boot disk isn't involved.)
1) Edit '/etc/fstab' and either comment or add the 'noauto' option to
the mounts depending on bcache, so that systemd doesn't delay on boot.
For example,
$ sudo vim /etc/fstab
From: /dev/mapper/*whatadisk* /mountpoint ext4 defaults 0 0
To: /dev/mapper/*whatadisk* /mountpoint ext4 defaults,noauto 0 0
Esc, :x, Enter
2) Edit '/etc/default/grub' and add the 'modprobe.blacklist=bcache' option
to GRUB_CMDLINE_LINUX_DEFAULT.
For example,
$ sudo vim /etc/default/grub
From: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0"
To: GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0 modprobe.blacklist=bcache"
Esc, :x, Enter
Update and check grub config:
$ sudo update-grub
$ grep modprobe.blacklist=bcache /boot/grub/grub.cfg linux /boot/vmlinuz-4.15.0-91-generic ... modprobe.blacklist=bcache linux /boot/vmlinuz-4.15.0-88-generic ... modprobe.blacklist=bcache
3) Reboot the system in 4.15.0-91, it should not fail, as bcache is not loaded.
4) Now load bcache, retrigger device events, and check if the problem reproduces.
$ sudo modprobe bcache
$ sudo udevadm trigger
This should register the bcache devices, e.g., /dev/bcache0.
If you can see /dev/bcache0 and the problem did NOT happen,
please stop here and let me know.
If the problem reproduced, please proceed after your system
rebooted (it should boot normally as it has bcache disabled.)
...
Part 2)
------
1) Install linux-crashdump:
$ sudo apt install linux-crashdump
Answer these questions:
- Should kexec-tools handle reboots (sysvinit only)? No
- Should kdump-tools be enabled by default? Yes
2) Increase the reserved memory size for the crashdump kernel:
Edit '/etc/default/grub.d/kdump-tools.cfg' and change the crashkernel size from 192M to 512M or 768M if possible:
For example,
$ sudo vim /etc/default/grub.d/kdump-tools.cfg
from: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:192M"
to: GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:768M"
Esc, :x, Enter
4) Update grub and reboot
$ sudo update-grub
$ sudo reboot
5) Check kdump status is 'ready' and that panic_on_oops is enabled (1) by default:
$ sudo kdump-config status
current state: ready to kdump
$ cat /proc/sys/kernel/panic_on_oops
1
6) Trigger a test crashdump
$ echo 1 | sudo tee /proc/sys/kernel/sysrq
$ echo c | sudo tee /proc/sysrq-trigger
This apparently 'reboots' the system, and collects a memory dump:
7) After the system boots again, check the crashdump is stored in /var/crash/<timestamp>
$ ls -1 /var/crash/202004081540
dmesg.202004081540
dump.202004081540
If this didn't happen, please stop and let me know, so we can fix the crashdump mechanism.
If you have /var/crash/<timestamp>, the crashdump is working, let's move forward.
Feel free to remove that directory, $ sudo rm -rf /var/crash/<timestamp>
...
8) Boot again and reproduce the problem.
Again, boot in 4.15.0-91, and reproduce the problem manually as in step 4 in Part 1.
And this should generate a crashdump in /var/crash, as in the test crashdump.
Please create a tarball and attach it to Launchpad.
$ sudo tar cvf lp1867916-crashdump.tar /var/crash/<timestamp>
If there are attachment size limit issues, please let me know, or use another hosting website, if at all possible.
Ryan,
Part 1)
------
First, please try to reproduce the problem later, not so early in boot,
by disabling the bcache module on the kernel boot parameters, and then
loading it after the system has booted successfully.
(This should be possible as you mentioned the boot disk isn't involved.)
1) Edit '/etc/fstab' and either comment or add the 'noauto' option to
the mounts depending on bcache, so that systemd doesn't delay on boot.
For example,
$ sudo vim /etc/fstab *whatadisk* /mountpoint ext4 defaults 0 0 *whatadisk* /mountpoint ext4 defaults,noauto 0 0
From: /dev/mapper/
To: /dev/mapper/
Esc, :x, Enter
2) Edit '/etc/default/grub' and add the 'modprobe. blacklist= bcache' option LINUX_DEFAULT.
to GRUB_CMDLINE_
For example,
$ sudo vim /etc/default/grub LINUX_DEFAULT= "console= ttyS0" LINUX_DEFAULT= "console= ttyS0 modprobe. blacklist= bcache"
From: GRUB_CMDLINE_
To: GRUB_CMDLINE_
Esc, :x, Enter
Update and check grub config:
$ sudo update-grub
$ grep modprobe. blacklist= bcache /boot/grub/grub.cfg
linux /boot/vmlinuz- 4.15.0- 91-generic ... modprobe. blacklist= bcache
linux /boot/vmlinuz- 4.15.0- 88-generic ... modprobe. blacklist= bcache
3) Reboot the system in 4.15.0-91, it should not fail, as bcache is not loaded.
4) Now load bcache, retrigger device events, and check if the problem reproduces.
$ sudo modprobe bcache
$ sudo udevadm trigger
This should register the bcache devices, e.g., /dev/bcache0.
If you can see /dev/bcache0 and the problem did NOT happen,
please stop here and let me know.
If the problem reproduced, please proceed after your system
rebooted (it should boot normally as it has bcache disabled.)
...
Part 2)
------
1) Install linux-crashdump:
$ sudo apt install linux-crashdump
Answer these questions:
- Should kexec-tools handle reboots (sysvinit only)? No
- Should kdump-tools be enabled by default? Yes
2) Increase the reserved memory size for the crashdump kernel:
Edit '/etc/default/ grub.d/ kdump-tools. cfg' and change the crashkernel size from 192M to 512M or 768M if possible:
For example,
$ sudo vim /etc/default/ grub.d/ kdump-tools. cfg LINUX_DEFAULT= "$GRUB_ CMDLINE_ LINUX_DEFAULT crashkernel= 512M-:192M" LINUX_DEFAULT= "$GRUB_ CMDLINE_ LINUX_DEFAULT crashkernel= 512M-:768M"
from: GRUB_CMDLINE_
to: GRUB_CMDLINE_
Esc, :x, Enter
4) Update grub and reboot
$ sudo update-grub
$ sudo reboot
5) Check kdump status is 'ready' and that panic_on_oops is enabled (1) by default:
$ sudo kdump-config status
current state: ready to kdump
$ cat /proc/sys/ kernel/ panic_on_ oops
1
6) Trigger a test crashdump
$ echo 1 | sudo tee /proc/sys/ kernel/ sysrq
$ echo c | sudo tee /proc/sysrq-trigger
This apparently 'reboots' the system, and collects a memory dump:
[ 8.510809] kdump-tools[781]: Starting kdump-tools: * running makedumpfile -c -d 31 /proc/vmcore /var/crash/ 202004081540/ dump-incomplet$ 202004081540 202004081540
...
Copying data : [100.0 %] - eta: 0s
...
[ 15.964149] kdump-tools[781]: * kdump-tools: saved vmcore in /var/crash/
...
[ 16.176388] kdump-tools[781]: * kdump-tools: saved dmesg content in /var/crash/
...
[ 17.187848] kdump-tools[781]: Rebooting.
...
7) After the system boots again, check the crashdump is stored in /var/crash/ <timestamp>
$ ls -1 /var/crash/ 202004081540
dmesg.202004081540
dump.202004081540
If this didn't happen, please stop and let me know, so we can fix the crashdump mechanism.
If you have /var/crash/ <timestamp> , the crashdump is working, let's move forward. <timestamp>
Feel free to remove that directory, $ sudo rm -rf /var/crash/
...
8) Boot again and reproduce the problem.
Again, boot in 4.15.0-91, and reproduce the problem manually as in step 4 in Part 1.
And this should generate a crashdump in /var/crash, as in the test crashdump.
Please create a tarball and attach it to Launchpad.
$ sudo tar cvf lp1867916- crashdump. tar /var/crash/ <timestamp>
If there are attachment size limit issues, please let me know, or use another hosting website, if at all possible.
Thank you very much,
Mauricio