Frequent ~24h: BUG: soft lockup - CPU#XX stuck for 23s! [kworker/17:2:11758]

Asked by Brad Krane

I have a frequent CPU lockup issue it affects a random CPU core but does so roughly every 24 hour or so. I have to hard power off the system to recover and it will occur when in heavy use or at idle with Firefox and terminal processes running.

I suspect this is either BIOS issue but have not yet ruled out software by using different OS to see if the issue occurs as well which I should get to in the next week or 2.

System specifics:
Ryzen 9 5900X Vermer
x570 AORUS ELITE rev 1.0 F35 << latest firmware
RAM&CPU combo on QVL

One of the /var/crash outputs

ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
Date: Mon Aug 30 21:18:57 2021
Failure: oops
 watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [kworker/7:1:143408]
 Modules linked in: joydev input_leds snd_hda_codec_realtek intel_rapl_msr snd_hda_codec_generic intel_rapl_common ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep soundwire_bus snd_soc_core edac_mce_amd nouveau snd_compress ac97_bus snd_pcm_dmaengine kvm_amd nls_iso8859_1 snd_pcm kvm mxm_wmi snd_seq_midi drm_ttm_helper snd_seq_midi_event ttm crct10dif_pclmul ghash_clmulni_intel snd_rawmidi drm_kms_helper snd_seq cec snd_seq_device aesni_intel snd_timer rc_core crypto_simd fb_sys_fops snd syscopyarea cryptd sysfillrect glue_helper sysimgblt rapl video efi_pstore wmi_bmof soundcore ccp k10temp sch_fq_codel mac_hid hwmon_vid parport_pc ppdev lp parport drm ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage crc32_pclmul igb ahci i2c_algo_bit i2c_piix4 dca xhci_pci libahci xhci_pci_renesas nvme nvme_core wmi
 CPU: 7 PID: 143408 Comm: kworker/7:1 Tainted: G D W L 5.11.0-27-generic #29~20.04.1-Ubuntu
 Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F35 07/08/2021
 Workqueue: events free_work
 RIP: 0010:smp_call_function_many_cond+0x266/0x2b0
 Code: e8 af 7b 49 00 3b 05 2d 22 80 01 89 c7 0f 83 37 fe ff ff 48 63 c7 49 8b 0f 48 03 0c c5 00 69 4a b2 8b 41 08 a8 01 74 0a f3 90 <8b> 51 08 83 e2 01 75 f6 eb c8 48 c7 c2 20 8f 95 b2 4c 89 f6 89 df
 RSP: 0018:ffffb5d411c47c90 EFLAGS: 00000202
 RAX: 0000000000000011 RBX: 0000000000000017 RCX: ffff8de67edf3540
 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000f
 RBP: ffffb5d411c47ce0 R08: 0000000000000000 R09: 000000000000000f
 R10: ffff8dc7c0394f40 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000020 R15: ffff8de67ebed2c0
 FS: 0000000000000000(0000) GS:ffff8de67ebc0000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f0c16ffbda0 CR3: 000000191d410000 CR4: 0000000000750ee0
 PKRU: 55555554
 Call Trace:

Package: linux-image-5.11.0-27-generic 5.11.0-27.29~20.04.1
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 5.11.0-27-generic x86_64

For my future troubleshooting I'll also do the TS step below but I don't have the time available to do so atm so probably a week or two or three

Finally, it's critical to also make sure to test the latest development Ubuntu kernel version as well as the latest upstream mainline kernel.

What else can I do to help RCA this?



This question was expired because it remained in the 'Open' state without activity for the last 15 days.