CPU hangs with kernel versions 5.4.0-{89, 90}

Asked by Julian Wendelmuth

Hello,
We are currently experiencing issues with kernel version 5.4.0-89 as well as 5.4.0-90 on our hardware.
A bit of context: We are using Ubuntu 20.04 as a basis for mobile robots that are connected to WiFi and are expected to move. With kernels 5.4.0-89 and 5.4.0-90 we are experiencing connectivity loss to our hardware.
Looking into the logs we find that apparently our CPUs hang:

(Computer name redacted)
Nov 09 16:38:16 * kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [wpa_supplicant:470]
Nov 09 16:38:16 * kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [swapper/2:0]
Nov 09 16:38:16 * kernel: Modules linked in: can_raw can xsk_diag
Nov 09 16:38:16 * kernel: Modules linked in:
Nov 09 16:38:16 * kernel: af_packet_diag netlink_diag
Nov 09 16:38:16 * kernel: can_raw
Nov 09 16:38:16 * kernel: tcp_diag udp_diag
Nov 09 16:38:16 * kernel: can
Nov 09 16:38:16 * kernel: raw_diag
Nov 09 16:38:16 * kernel: xsk_diag
Nov 09 16:38:16 * kernel: inet_diag
Nov 09 16:38:16 * kernel: af_packet_diag
Nov 09 16:38:16 * kernel: unix_diag nft_ct
Nov 09 16:38:16 * kernel: netlink_diag
Nov 09 16:38:16 * kernel: nf_tables_set
Nov 09 16:38:16 * kernel: tcp_diag
Nov 09 16:38:16 * kernel: ccm
Nov 09 16:38:16 * kernel: udp_diag
Nov 09 16:38:16 * kernel: ath9k
Nov 09 16:38:16 * kernel: raw_diag
Nov 09 16:38:16 * kernel: ath9k_common
Nov 09 16:38:16 * kernel: inet_diag
Nov 09 16:38:16 * kernel: ath9k_hw
Nov 09 16:38:16 * kernel: unix_diag
Nov 09 16:38:16 * kernel: ath
Nov 09 16:38:16 * kernel: nft_ct
Nov 09 16:38:16 * kernel: intel_rapl_msr
Nov 09 16:38:16 * kernel: nf_tables_set
Nov 09 16:38:16 * kernel: intel_rapl_common
Nov 09 16:38:16 * kernel: ccm
Nov 09 16:38:16 * kernel: x86_pkg_temp_thermal
Nov 09 16:38:16 * kernel: ath9k
Nov 09 16:38:16 * kernel: intel_powerclamp mei_hdcp
Nov 09 16:38:16 * kernel: ath9k_common
Nov 09 16:38:16 * kernel: binfmt_misc snd_hda_codec_hdmi
Nov 09 16:38:16 * kernel: ath9k_hw
Nov 09 16:38:16 * kernel: coretemp
Nov 09 16:38:16 * kernel: ath
Nov 09 16:38:16 * kernel: mac80211 snd_hda_codec_realtek
Nov 09 16:38:16 * kernel: intel_rapl_msr
Nov 09 16:38:16 * kernel: kvm_intel nls_iso8859_1
Nov 09 16:38:16 * kernel: intel_rapl_common
Nov 09 16:38:16 * kernel: snd_hda_codec_generic ledtrig_audio
Nov 09 16:38:16 * kernel: x86_pkg_temp_thermal
Nov 09 16:38:16 * kernel: kvm
Nov 09 16:38:16 * kernel: intel_powerclamp
Nov 09 16:38:16 * kernel: advsocketcan(OE) sja1000 snd_hda_intel
Nov 09 16:38:16 * kernel: mei_hdcp
Nov 09 16:38:16 * kernel: snd_intel_dspcfg snd_hda_codec
Nov 09 16:38:16 * kernel: binfmt_misc
Nov 09 16:38:16 * kernel: rapl cfg80211
Nov 09 16:38:16 * kernel: snd_hda_codec_hdmi
Nov 09 16:38:16 * kernel: intel_cstate snd_hda_core
Nov 09 16:38:16 * kernel: coretemp
Nov 09 16:38:16 * kernel: snd_hwdep
Nov 09 16:38:16 * kernel: mac80211
Nov 09 16:38:16 * kernel: snd_pcm
Nov 09 16:38:16 * kernel: snd_hda_codec_realtek kvm_intel nls_iso8859_1
Nov 09 16:38:16 * kernel: libarc4
Nov 09 16:38:16 * kernel: snd_hda_codec_generic ledtrig_audio kvm
Nov 09 16:38:16 * kernel: snd_timer
Nov 09 16:38:16 * kernel: advsocketcan(OE) sja1000 snd_hda_intel
Nov 09 16:38:16 * kernel: serio_raw
Nov 09 16:38:16 * kernel: snd_intel_dspcfg snd_hda_codec rapl
Nov 09 16:38:16 * kernel: can_dev
Nov 09 16:38:16 * kernel: cfg80211 intel_cstate snd_hda_core
Nov 09 16:38:16 * kernel: ftdi_sio
Nov 09 16:38:16 * kernel: snd_hwdep snd_pcm libarc4
Nov 09 16:38:16 * kernel: snd
Nov 09 16:38:16 * kernel: snd_timer serio_raw can_dev
Nov 09 16:38:16 * kernel: mei_me
Nov 09 16:38:16 * kernel: ftdi_sio snd mei_me
Nov 09 16:38:16 * kernel: usbserial
Nov 09 16:38:16 * kernel: usbserial mei
Nov 09 16:38:16 * kernel: mei
Nov 09 16:38:16 * kernel: soundcore intel_pch_thermal
Nov 09 16:38:16 * kernel: soundcore
Nov 09 16:38:16 * kernel: acpi_pad
Nov 09 16:38:16 * kernel: intel_pch_thermal
Nov 09 16:38:16 * kernel: mac_hid
Nov 09 16:38:16 * kernel: acpi_pad mac_hid
Nov 09 16:38:16 * kernel: nft_masq
Nov 09 16:38:16 * kernel: nft_masq nft_chain_nat
Nov 09 16:38:16 * kernel: nft_chain_nat
Nov 09 16:38:16 * kernel: nf_nat nf_conntrack nf_defrag_ipv6
Nov 09 16:38:16 * kernel: nf_nat
Nov 09 16:38:16 * kernel: nf_defrag_ipv4 nf_tables
Nov 09 16:38:16 * kernel: nf_conntrack
Nov 09 16:38:16 * kernel: nfnetlink sch_fq_codel
Nov 09 16:38:16 * kernel: nf_defrag_ipv6
Nov 09 16:38:16 * kernel: ip_tables x_tables
Nov 09 16:38:16 * kernel: nf_defrag_ipv4
Nov 09 16:38:16 * kernel: autofs4 btrfs
Nov 09 16:38:16 * kernel: nf_tables
Nov 09 16:38:16 * kernel: xor zstd_compress
Nov 09 16:38:16 * kernel: nfnetlink
Nov 09 16:38:16 * kernel: raid6_pq libcrc32c
Nov 09 16:38:16 * kernel: sch_fq_codel
Nov 09 16:38:16 * kernel: hid_generic usbhid
Nov 09 16:38:16 * kernel: ip_tables
Nov 09 16:38:16 * kernel: hid i915
Nov 09 16:38:16 * kernel: x_tables
Nov 09 16:38:16 * kernel: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
Nov 09 16:38:16 * kernel: autofs4
Nov 09 16:38:16 * kernel: aesni_intel drm_kms_helper crypto_simd
Nov 09 16:38:16 * kernel: btrfs
Nov 09 16:38:16 * kernel: syscopyarea sysfillrect igb
Nov 09 16:38:16 * kernel: xor
Nov 09 16:38:16 * kernel: sysimgblt cryptd
Nov 09 16:38:16 * kernel: zstd_compress
Nov 09 16:38:16 * kernel: fb_sys_fops glue_helper
Nov 09 16:38:16 * kernel: raid6_pq
Nov 09 16:38:16 * kernel: psmouse e1000e drm
Nov 09 16:38:16 * kernel: libcrc32c
Nov 09 16:38:16 * kernel: dca i2c_algo_bit
Nov 09 16:38:16 * kernel: hid_generic
Nov 09 16:38:16 * kernel: ahci i2c_i801
Nov 09 16:38:16 * kernel: usbhid
Nov 09 16:38:16 * kernel: libahci video
Nov 09 16:38:16 * kernel: hid
Nov 09 16:38:16 * kernel: i915 crct10dif_pclmul
Nov 09 16:38:16 * kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G OE 5.4.0-90-generic #101-Ubuntu
Nov 09 16:38:16 * kernel: crc32_pclmul ghash_clmulni_intel aesni_intel
Nov 09 16:38:16 * kernel: Hardware name: Default string Default string/SKYBAY, BIOS 5.12 03/28/2017
Nov 09 16:38:16 * kernel: drm_kms_helper crypto_simd syscopyarea sysfillrect
Nov 09 16:38:16 * kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x60/0x1d0
Nov 09 16:38:16 * kernel: igb sysimgblt cryptd fb_sys_fops glue_helper
Nov 09 16:38:16 * kernel: Code: 6e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 48 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 >
Nov 09 16:38:16 * kernel: psmouse e1000e drm
Nov 09 16:38:16 * kernel: RSP: 0018:ffffb70ac0168de8 EFLAGS: 00000202
Nov 09 16:38:16 * kernel: dca i2c_algo_bit ahci
Nov 09 16:38:16 * kernel: ORIG_RAX: ffffffffffffff13
Nov 09 16:38:16 * kernel: i2c_i801 libahci video
Nov 09 16:38:16 * kernel: RAX: 0000000000000101 RBX: ffff8a3e163c0018 RCX: 0000000000000000
Nov 09 16:38:16 * kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8a3e10613310
Nov 09 16:38:16 * kernel: CPU: 4 PID: 470 Comm: wpa_supplicant Tainted: G OE 5.4.0-90-generic #101-Ubuntu
Nov 09 16:38:16 * kernel: Hardware name: Default string Default string/SKYBAY, BIOS 5.12 03/28/2017
Nov 09 16:38:16 * kernel: RBP: ffffb70ac0168de8 R08: 0000000000000040 R09: 0000000000000000
Nov 09 16:38:16 * kernel: RIP: 0010:ath9k_txq_has_key+0x1bd/0x200 [ath9k]
Nov 09 16:38:16 * kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffb70ac0168e30
Nov 09 16:38:16 * kernel: R13: 0000000000000003 R14: ffff8a3e10611e80 R15: ffff8a3e106132f0
Nov 09 16:38:16 * kernel: FS: 0000000000000000(0000) GS:ffff8a3e1da80000(0000) knlGS:0000000000000000
Nov 09 16:38:16 * kernel: Code: e0 04 49 8b 44 05 10 48 39 c6 74 26 0f b6 58 53 84 db 75 16 48 8b 48 20 48 85 c9 74 0d 0f b6 49 4b 41 39 c9 0f 84 6e ff ff ff <48> 8b 00 48 39 c6 75 da 41 83 c3 01 41 83 >
Nov 09 16:38:16 * kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 09 16:38:16 * kernel: CR2: 0000147ca80100b8 CR3: 000000080fb06005 CR4: 00000000003606e0
Nov 09 16:38:16 * kernel: RSP: 0018:ffffb70ac06c7628 EFLAGS: 00000297
Nov 09 16:38:16 * kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 09 16:38:16 * kernel: ORIG_RAX: ffffffffffffff13
Nov 09 16:38:16 * kernel: RAX: ffff8a3e173030d8 RBX: 0000000000000000 RCX: 00000000000000ff
Nov 09 16:38:16 * kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 09 16:38:16 * kernel: RDX: 0000000000000001 RSI: ffff8a3e10613330 RDI: ffff8a3e10613310
Nov 09 16:38:16 * kernel: Call Trace:
Nov 09 16:38:16 * kernel: RBP: ffffb70ac06c7670 R08: ffff8a3dcf98ea30 R09: 0000000000000004
Nov 09 16:38:16 * kernel: R10: 0000000000000027 R11: 0000000000000001 R12: 0000000000000003
Nov 09 16:38:16 * kernel: R13: ffff8a3e10611e80 R14: 000000000000014a R15: ffff8a3e10613300
Nov 09 16:38:16 * kernel: <IRQ>
Nov 09 16:38:16 * kernel: FS: 00007f321b1ab140(0000) GS:ffff8a3e1db00000(0000) knlGS:0000000000000000
Nov 09 16:38:16 * kernel: _raw_spin_lock_bh+0x29/0x30
Nov 09 16:38:16 * kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 09 16:38:16 * kernel: ath_tx_edma_tasklet+0x164/0x3c0 [ath9k]
Nov 09 16:38:16 * kernel: ? check_preempt_curr+0x4e/0x90
Nov 09 16:38:16 * kernel: CR2: 00005607ad38f000 CR3: 000000080fb06006 CR4: 00000000003606e0
Nov 09 16:38:16 * kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 09 16:38:16 * kernel: ? ath9k_ioread32+0x33/0x90 [ath9k]
Nov 09 16:38:16 * kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 09 16:38:16 * kernel: ath9k_tasklet+0x141/0x280 [ath9k]
Nov 09 16:38:16 * kernel: Call Trace:
Nov 09 16:38:16 * kernel: tasklet_action_common.isra.0+0x60/0x110
Nov 09 16:38:16 * kernel: tasklet_action+0x22/0x30
Nov 09 16:38:16 * kernel: ath9k_set_key+0xf5/0x290 [ath9k]
Nov 09 16:38:16 * kernel: __do_softirq+0xe1/0x2d6
Nov 09 16:38:16 * kernel: ieee80211_key_replace+0x370/0x870 [mac80211]
Nov 09 16:38:16 * kernel: irq_exit+0xae/0xb0
Nov 09 16:38:16 * kernel: ieee80211_free_sta_keys+0xb3/0xf0 [mac80211]
Nov 09 16:38:16 * kernel: do_IRQ+0x5a/0xf0
Nov 09 16:38:16 * kernel: __sta_info_destroy_part2+0x3a/0x190 [mac80211]
Nov 09 16:38:16 * kernel: __sta_info_flush+0x128/0x180 [mac80211]
Nov 09 16:38:16 * kernel: common_interrupt+0xf/0xf
Nov 09 16:38:16 * kernel: </IRQ>
Nov 09 16:38:16 * kernel: ieee80211_set_disassoc+0xc0/0x5f0 [mac80211]
Nov 09 16:38:16 * kernel: RIP: 0010:cpuidle_enter_state+0xc5/0x450
Nov 09 16:38:16 * kernel: ieee80211_mgd_auth+0x15b/0x3d0 [mac80211]
Nov 09 16:38:16 * kernel: Code: ff e8 af cf 84 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 c2 d6 8a ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b >
Nov 09 16:38:16 * kernel: ieee80211_auth+0x18/0x20 [mac80211]
Nov 09 16:38:16 * kernel: RSP: 0018:ffffb70ac00cfe38 EFLAGS: 00000246
Nov 09 16:38:16 * kernel: cfg80211_mlme_auth+0x104/0x210 [cfg80211]
Nov 09 16:38:16 * kernel: ORIG_RAX: ffffffffffffffdc
Nov 09 16:38:16 * kernel: RAX: ffff8a3e1daaae00 RBX: ffffffffbc959f60 RCX: 000000000000001f
Nov 09 16:38:16 * kernel: RDX: 0000000000000000 RSI: 000000002c13bf1e RDI: 0000000000000000
Nov 09 16:38:16 * kernel: nl80211_authenticate+0x284/0x2e0 [cfg80211]
Nov 09 16:38:16 * kernel: RBP: ffffb70ac00cfe78 R08: 000004fc6f412665 R09: 000000000000017a
Nov 09 16:38:16 * kernel: R10: ffff8a3e1daa9b00 R11: ffff8a3e1daa9ae0 R12: ffff8a3e1dab6700
Nov 09 16:38:16 * kernel: R13: 0000000000000001 R14: 0000000000000001 R15: ffff8a3e1dab6700
Nov 09 16:38:16 * kernel: genl_family_rcv_msg+0x1b9/0x470
Nov 09 16:38:16 * kernel: ? cpuidle_enter_state+0xa1/0x450
Nov 09 16:38:16 * kernel: cpuidle_enter+0x2e/0x40
Nov 09 16:38:16 * kernel: ? __netlink_sendskb+0x42/0x50
Nov 09 16:38:16 * kernel: call_cpuidle+0x23/0x40
Nov 09 16:38:16 * kernel: genl_rcv_msg+0x4c/0xa0
Nov 09 16:38:16 * kernel: ? _cond_resched+0x19/0x30
Nov 09 16:38:16 * kernel: do_idle+0x1dd/0x270
Nov 09 16:38:16 * kernel: ? genl_family_rcv_msg+0x470/0x470
Nov 09 16:38:16 * kernel: netlink_rcv_skb+0x50/0x120
Nov 09 16:38:16 * kernel: cpu_startup_entry+0x20/0x30
Nov 09 16:38:16 * kernel: start_secondary+0x167/0x1c0
Nov 09 16:38:16 * kernel: genl_rcv+0x29/0x40
Nov 09 16:38:16 * kernel: netlink_unicast+0x187/0x220
Nov 09 16:38:16 * kernel: secondary_startup_64+0xa4/0xb0
Nov 09 16:38:16 * kernel: netlink_sendmsg+0x222/0x3e0
Nov 09 16:38:16 * kernel: sock_sendmsg+0x65/0x70
Nov 09 16:38:16 * kernel: ____sys_sendmsg+0x212/0x280
Nov 09 16:38:16 * kernel: ___sys_sendmsg+0x88/0xd0
Nov 09 16:38:16 * kernel: ? sock_sendmsg+0x65/0x70
Nov 09 16:38:16 * kernel: ? sock_write_iter+0x93/0xf0
Nov 09 16:38:16 * kernel: ? new_sync_write+0x125/0x1c0
Nov 09 16:38:16 * kernel: ? __cgroup_bpf_run_filter_setsockopt+0xae/0x2d0
Nov 09 16:38:16 * kernel: ? _cond_resched+0x19/0x30
Nov 09 16:38:16 * kernel: ? aa_sk_perm+0x43/0x170
Nov 09 16:38:16 * kernel: __sys_sendmsg+0x5c/0xa0
Nov 09 16:38:16 * kernel: __x64_sys_sendmsg+0x1f/0x30
Nov 09 16:38:16 * kernel: do_syscall_64+0x57/0x190
Nov 09 16:38:16 * kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 09 16:38:16 * kernel: RIP: 0033:0x7f321b53b747
Nov 09 16:38:16 * kernel: Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 >
Nov 09 16:38:16 * kernel: RSP: 002b:00007ffeb41f6a28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
Nov 09 16:38:16 * kernel: RAX: ffffffffffffffda RBX: 0000560d7138f440 RCX: 00007f321b53b747
Nov 09 16:38:16 * kernel: RDX: 0000000000000000 RSI: 00007ffeb41f6a60 RDI: 0000000000000004
Nov 09 16:38:16 * kernel: RBP: 0000560d713974e0 R08: 0000000000000004 R09: 00007f321b603b80
Nov 09 16:38:16 * kernel: R10: 00007ffeb41f6b34 R11: 0000000000000246 R12: 0000560d7138f350
Nov 09 16:38:16 * kernel: R13: 00007ffeb41f6a60 R14: 00007ffeb41f6b34 R15: 0000560d71401270

This repeats until the computer is restarted. The problem occurs on many hosts that we updated to 5.4.0-{89, 90} but goes away once we downgrade to 5.4.0-88.
We tried and failed to reproduce this behavior on testing hardware locally but were unable to do so. Looking into the logs of affected units we usually see that the last thing wpa_supplicant wants to do is roam between access points.

Any pointers on how to proceed debugging/fixing this issue would be much appreciated.

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
actionparsnip (andrew-woodhead666) said :
#1

I suggest you report a bug

Can you help with this problem?

Provide an answer of your own, or ask Julian Wendelmuth for more information if necessary.

To post a message you must log in.