hirsute:linux 5.11.0-45.49 fails to boot

Bug #1956984 reported by Kleber Sacilotto de Souza
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Hirsute
Won't Fix
High
Kamal Mostafa

Bug Description

[ Impact ]

hirsute:linux 5.11.0-45.49 is not booting on several systems/instances.

The boot failure is caused at least in most of the cases by systemd services failing to start. On a amd64 test VM the journald service times out when it's configured to use any storage type (which is the default config).

[ Fix ]
I was able to boot a VM with the following commit reverted:

327aa2137d989f278ece7e8e31c218dfb3416c35 - mm: filemap: check if THP has hwpoisoned subpage for PMD page fault

[ Additional Info ]
On an affected VM, after a bit more than 1 hour after booting I get the following soft lockup which helped identify a suspicious commit:

[12164.210808] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [systemd-journal:478]
[12164.211616] Modules linked in: binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev bochs_drm drm_vram_helper drm_ttm_helper input_leds ttm joydev drm_kms_helper cec rc_core serio_raw fb_sys_fops syscopyarea sysfillrect sysimgblt mac_hid parport_pc parport qemu_fw_cfg sch_fq_codel msr drm ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c virtio_net net_failover psmouse failover i2c_piix4 pata_acpi floppy
[12164.211649] CPU: 1 PID: 478 Comm: systemd-journal Tainted: G L 5.11.0-45-generic #49-Ubuntu
[12164.211651] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[12164.211653] RIP: 0010:xas_load+0x33/0x80
[12164.211659] Code: 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 00 10 00 00 77 02 5d c3 0f b6 48 fe 48 8d 70 fe 38 4f 10 77 f1 48 8b 57 08 48 d3 ea <83> e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83
[12164.211661] RSP: 0000:ffffb17440c7fb68 EFLAGS: 00000246
[12164.211662] RAX: ffff92d2823ea6ca RBX: ffff92d282e52ee8 RCX: 0000000000000000
[12164.211664] RDX: 000000000000003b RSI: ffff92d2823ea6c8 RDI: ffffb17440c7fb78
[12164.211665] RBP: ffffb17440c7fb68 R08: ffff92d2823eb47a R09: ffff92d28674b750
[12164.211666] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003b
[12164.211667] R13: 000000000000003b R14: ffff92d282e52ee8 R15: 61c8864680b583eb
[12164.211669] FS: 00007fdf28760900(0000) GS:ffff92d2fdc80000(0000) knlGS:0000000000000000
[12164.211670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12164.211672] CR2: 00007fdf28522018 CR3: 0000000002d34000 CR4: 00000000000006e0
[12164.211675] Call Trace:
[12164.211678] find_get_entry+0x5a/0x160
[12164.211682] find_lock_entry+0x2d/0x120
[12164.211685] shmem_getpage_gfp+0xe8/0x8c0
[12164.211687] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[12164.211691] shmem_fault+0x7b/0x220
[12164.211693] ? file_update_time+0x62/0x140
[12164.211696] __do_fault+0x3c/0xe0
[12164.211699] do_shared_fault+0x19/0xb0
[12164.211701] do_fault+0x1cd/0x1f0
[12164.211704] handle_pte_fault+0x1e5/0x260
[12164.211706] __handle_mm_fault+0x59a/0x7c0
[12164.211709] ? timerqueue_add+0x68/0xa0
[12164.211711] handle_mm_fault+0xd7/0x2b0
[12164.211714] do_user_addr_fault+0x1a0/0x450
[12164.211717] exc_page_fault+0x69/0x150
[12164.211720] ? asm_exc_page_fault+0x8/0x30
[12164.211722] asm_exc_page_fault+0x1e/0x30
[12164.211723] RIP: 0033:0x7fdf28fbece1
[12164.211727] Code: 0f 11 5f 30 0f 11 64 17 f0 0f 11 6c 17 e0 0f 11 74 17 d0 0f 11 7c 17 c0 c3 0f 10 06 0f 10 4e 10 0f 10 54 16 f0 0f 10 5c 16 e0 <0f> 11 07 0f 11 4f 10 0f 11 54 17 f0 0f 11 5c 17 e0 c3 48 39 f7 0f
[12164.211728] RSP: 002b:00007fffd18bffd8 EFLAGS: 00010287
[12164.211730] RAX: 00007fdf28522018 RBX: 00007fffd18c0520 RCX: 00007fdf28521fd8
[12164.211731] RDX: 000000000000002b RSI: 00007fffd18c0520 RDI: 00007fdf28522018
[12164.211732] RBP: 00005643719d90c0 R08: 0000000000000001 R09: 000000000003afd8
[12164.211733] R10: 0000000000000002 R11: 00007fffd1947090 R12: 000000000000002b
[12164.211734] R13: 23e0650ff2b8eb94 R14: 0000000000000000 R15: 00007fffd18c0000

Changed in linux (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Hirsute):
status: New → Confirmed
Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

This is caused by a bogus backport of a patch from 5.14.y:

[hirsute] 327aa2137d98 mm: filemap: check if THP has hwpoisoned subpage for PMD page fault

Contains a fragment that was applied to the wrong function. Corrected patch is on the way.

Changed in linux (Ubuntu Hirsute):
assignee: nobody → Kamal Mostafa (kamalmostafa)
importance: Undecided → High
Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :
Changed in linux (Ubuntu Hirsute):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.11.0-47.52 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Boot tests with hirsure/linux 5.11.0-47.52 were all successful. Marking as verified.

tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Brian Murray (brian-murray) wrote :

The Hirsute Hippo has reached End of Life, so this bug will not be fixed for that release.

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Won't Fix
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oracle-5.11/5.11.0-1028.31~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Brian Murray (brian-murray) wrote : [linux-aws-5.11/focal] verification still needed

The fix for this bug has been awaiting testing feedback in the -proposed repository for focal for more than 90 days. Please test this fix and update the bug appropriately with the results. In the event that the fix for this bug is still not verified 15 days from now, the package will be removed from the -proposed repository.

tags: added: removal-candidate
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.