KVM emulation failure when booting into VM crash kernel with multiple CPUs

Bug #1948862 reported by Heitor Alves de Siqueira
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Heitor Alves de Siqueira
Bionic
Fix Released
High
Heitor Alves de Siqueira
Focal
Fix Released
High
Heitor Alves de Siqueira

Bug Description

[Impact]
When kexec'ing into a crash kernel with ncpus > 1, VMs can raise a KVM emulation failure. This will cause the VM to go into the "paused" state, and prevents it from being restored without a full VM restart.

This happens only when there are multiple enabled CPUs in the crash kernel command-line, regardless of whether `nr_cpus` or `maxcpus` is being used. Due to the vCPU MMU state not being cleaned up correctly, the secondary CPUs try to access virtual addresses with a faulty MMU context that will result in the emulation failure. This shows up with a similar spew as below:

$ sudo tail -n20 /var/log/libvirt/qemu/focal-vm.log
KVM internal error. Suberror: 1
emulation failure
EAX=0000de8f EBX=00000000 ECX=0000008f EDX=00000600
ESI=00000000 EDI=00000000 EBP=00000000 ESP=0000f90c
EIP=0000cdb1 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =de00 000de000 0000ffff 00009300
DS =de00 000de000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=290b8001 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=66 83 c4 28 66 5b 66 c3 66 56 66 53 66 52 b1 8f 88 c8 e6 70 <e4> 71 66 0f b6 f0 66 89 f2 67 88 54 24 03 88 c8 e6 70 66 31 db 88 d8 e6 71 66 56 66 68 1a

[Test Plan]
1. Boot an Ubuntu guest VM with e.g. multipass:
$ multipass launch daily:focal -c8 -m16g -n focal-vm

2. Configure guest crash kernel command-line with `nr_cpus=8`:
ubuntu@focal-vm:~$ grep CMDLINE_APPEND /etc/default/kdump-tools
# KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=8 irqpoll nousb ata_piix.prefer_ms_hyperv=0"

3. Crash guest VM and watch for the KVM emulation failure:
ubuntu@focal-vm:~$ echo c | sudo tee /proc/sysrq-trigger

[Where problems could occur]
As we're resetting MMU context on vCPUs, potential regressions would show up in workloads relying on KVM guests. We should properly test the scenario mentioned in the bug to make sure secondary CPUs are being cleaned up properly, and that no other regressions have been introduced when rebooting or kexec'ing into different kernels.
Since we're adding an MMU reset at kvm_vcpu_reset(), the overall regression potential should be fairly low and contained to starting/resetting vCPUs (i.e. VM start and reboot).

[Other info]
This has been fixed by upstream commit:
  0aa1837533e5 KVM: x86: Properly reset MMU context at vCPU RESET/INIT

The commit above has been picked up by stable trees up until 5.11, so it's only needed in Bionic and Focal (4.15 and 5.4 kernels). There are also two follow up commits, which revert the vendor-specific resets:
  5d2d7e41e3b8 KVM: SVM: Drop explicit MMU reset at RESET/INIT
  61152cd907d5 KVM: VMX: Remove explicit MMU reset in enter_rmode()

These follow ups have not been picked up in stable trees due to the risk of
regressions. According to the original fix, they have been introduced primarily to aid bisection in case there are workflows relying on the vendor resets. As these are not required for the fix and don't conflict with the backport, we should leave them out to prevent potential regressions in the older kernels.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1948862

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
description: updated
description: updated
description: updated
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Changed in linux (Ubuntu Focal):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in linux (Ubuntu Focal):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in linux (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Revision history for this message
Heitor Alves de Siqueira (halves) wrote :

For future reference, this bug has been reported in Ubuntu KVM hosts outside of the specific kdump test scenario (i.e. when running actual VMs in production-like setups). The "kdump with multiple CPUs" case was discovered as an alternative that hits the same MMU context bug, without needing complex hypervisor setups or fancy configuration.

Changed in linux (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in linux (Ubuntu Focal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.4.0-91.102 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
tags: added: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/4.15.0-163.171 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Matthew Ruffell (mruffell) wrote :
Download full text (4.9 KiB)

Performing verification for Bionic.

I installed the current 4.15.0-162-generic kernel from -updates to a host in segmaas, and brought up my Bionic VM I installed for the reproducer. The Bionic VM has linux-crashdump configured, with nr_cpus configured to 8:

ubuntu@bionic-vm:~$ kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.15.0-159-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-159-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.15.0-159-generic root=UUID=5c23b432-4d00-4a01-8ecd-964803c8e10a ro console=tty1 console=ttyS0 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=8 irqpoll nousb ata_piix.prefer_ms_hyperv=0 ignore_loglevel nmi_debug=state,regs apic_extnmi=none" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

I then triggered the crashdump in the VM:

$ sudo -s
# echo c > /proc/sysrq-trigger

On the host, we look at the tail of:

$ sudo tail -f /var/log/libvirt/qemu/bionic-vm.log
...
KVM internal error. Suberror: 1
emulation failure
EAX=0000de8f EBX=00000000 ECX=0000008f EDX=00000600
ESI=00000000 EDI=00000000 EBP=00000000 ESP=0000f90c
EIP=0000cdb1 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =de00 000de000 0000ffff 00009300
DS =de00 000de000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=32b2a000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=66 83 c4 28 66 5b 66 c3 66 56 66 53 66 52 b1 8f 88 c8 e6 70 <e4> 71 66 0f b6 f0 66 89 f2 67 88 54 24 03 88 c8 e6 70 66 31 db 88 d8 e6 71 66 56 66 68 1a

$ virsh list
 Id Name State
----------------------------------------------------
 1 bionic-vm paused

Okay, we can reproduce the issue.

I then enabled -proposed, and installed the 4.15.0-163-generic kernel:

$ uname -rv
4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021

I started the same VM, verified it has a crashkernel with nr_cpus=8:

$ kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.15.0-159-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-159-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.15.0-159-generic root=UUID=5c23b432-4d00-4a01-8ecd-964803c8e10a ro console=tty1 console=ttyS0 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=8 irqpoll nousb ata_piix.prefer_ms_hyperv=0 ignore_...

Read more...

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Heitor Alves de Siqueira (halves) wrote :

Verified for Focal according to test case from description. A standard multipass VM with 8 vCPUs configured was able to capture a kernel dump without errors.

Additionally, since the 5.4 kernel doesn't hit the MMU/reset bug directly, I've done some additional testing to ensure we didn't introduce any weird regressions. These scenarios included:

- default crash kernel config
- nested VM crash kernel with multiple CPUs
- nested VM crash kernel with default config
- basic smoke testing of VM up/down through virsh
- basic smoke testing of reboots from within the VM

All of these behaved as expected and showed no issues.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.3 KiB)

This bug was fixed in the package linux - 4.15.0-163.171

---------------
linux (4.15.0-163.171) bionic; urgency=medium

  * bionic/linux: 4.15.0-163.171 -proposed tracker (LP: #1949874)

  * Packaging resync (LP: #1786013)
    - [Packaging] update Ubuntu.md
    - debian/dkms-versions -- update from kernel-versions (main/2021.11.08)

  * Unable to build net/reuseport_bpf and other tests in ubuntu_kernel_selftests
    on Bionic with make command (LP: #1949889)
    - selftests: Fix loss of test output in run_kselftests.sh
    - selftests: Makefile set KSFT_TAP_LEVEL to prevent nested TAP headers
    - selftests: fix headers_install circular dependency
    - selftests: fix bpf build/test workflow regression when KBUILD_OUTPUT is set
    - selftests: vm: Fix test build failure when built by itself

  * KVM emulation failure when booting into VM crash kernel with multiple CPUs
    (LP: #1948862)
    - KVM: x86: Properly reset MMU context at vCPU RESET/INIT

  * aufs: kernel bug with apparmor and fuseblk (LP: #1948470)
    - SAUCE: aufs: bugfix, stop omitting path->mnt

  * ebpf: bpf_redirect fails with ip6 gre interfaces (LP: #1947164)
    - net: handle ARPHRD_IP6GRE in dev_is_mac_header_xmit()

  * require CAP_NET_ADMIN to attach N_HCI ldisc (LP: #1949516)
    - Bluetooth: hci_ldisc: require CAP_NET_ADMIN to attach N_HCI ldisc

  * ACL updates on OCFS2 are not revalidated (LP: #1947161)
    - ocfs2: fix remounting needed after setfacl command

  * ppc64 BPF JIT mod by 1 will not return 0 (LP: #1948351)
    - powerpc/bpf: Fix BPF_MOD when imm == 1

  * Drop "UBUNTU: SAUCE: cachefiles: Page leaking in
    cachefiles_read_backing_file while vmscan is active" (LP: #1947709)
    - Revert "UBUNTU: SAUCE: cachefiles: Page leaking in
      cachefiles_read_backing_file while vmscan is active"
    - cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is
      active

  * Some test in ubuntu_bpf test_verifier failed on i386 Bionic kernel
    (LP: #1788578)
    - bpf: fix context access in tracing progs on 32 bit archs

  * test_bpf.sh from ubuntu_kernel_selftests.net from linux ADT test failure
    with linux/4.15.0-149.153 i386 (Segmentation fault) (LP: #1934414)
    - selftests/bpf: make test_verifier run most programs
    - bpf: add couple of test cases for div/mod by zero
    - bpf: add further test cases around div/mod and others

  * Bionic update: upstream stable patchset 2021-11-02 (LP: #1949512)
    - usb: gadget: r8a66597: fix a loop in set_feature()
    - usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
    - cifs: fix incorrect check for null pointer in header_assemble
    - xen/x86: fix PV trap handling on secondary processors
    - usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
    - USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
    - staging: greybus: uart: fix tty use after free
    - Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
    - USB: serial: mos7840: remove duplicated 0xac24 device ID
    - USB: serial: option: add Telit LN920 compositions
    - USB: serial: option: remove duplicate USB device ID
    - USB: serial: option: add device id for Foxco...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.1 KiB)

This bug was fixed in the package linux - 5.4.0-91.102

---------------
linux (5.4.0-91.102) focal; urgency=medium

  * focal/linux: 5.4.0-91.102 -proposed tracker (LP: #1949840)

  * Packaging resync (LP: #1786013)
    - [Packaging] update Ubuntu.md
    - debian/dkms-versions -- update from kernel-versions (main/2021.11.08)

  * KVM emulation failure when booting into VM crash kernel with multiple CPUs
    (LP: #1948862)
    - KVM: x86: Properly reset MMU context at vCPU RESET/INIT

  * aufs: kernel bug with apparmor and fuseblk (LP: #1948470)
    - SAUCE: aufs: bugfix, stop omitting path->mnt

  * ebpf: bpf_redirect fails with ip6 gre interfaces (LP: #1947164)
    - net: handle ARPHRD_IP6GRE in dev_is_mac_header_xmit()

  * require CAP_NET_ADMIN to attach N_HCI ldisc (LP: #1949516)
    - Bluetooth: hci_ldisc: require CAP_NET_ADMIN to attach N_HCI ldisc

  * ACL updates on OCFS2 are not revalidated (LP: #1947161)
    - ocfs2: fix remounting needed after setfacl command

  * ppc64 BPF JIT mod by 1 will not return 0 (LP: #1948351)
    - powerpc/bpf: Fix BPF_MOD when imm == 1

  * Drop "UBUNTU: SAUCE: cachefiles: Page leaking in
    cachefiles_read_backing_file while vmscan is active" (LP: #1947709)
    - Revert "UBUNTU: SAUCE: cachefiles: Page leaking in
      cachefiles_read_backing_file while vmscan is active"

  * Reassign I/O Path of ConnectX-5 Port 1 before Port 2 causes NULL dereference
    (LP: #1943464)
    - s390/pci: fix leak of PCI device structure
    - s390/pci: fix use after free of zpci_dev
    - s390/pci: fix zpci_zdev_put() on reserve

  * [SRU][F] USB: serial: pl2303: add support for PL2303HXN (LP: #1948377)
    - USB: serial: pl2303: add support for PL2303HXN
    - USB: serial: pl2303: fix line-speed handling on newer chips

  * Focal update: v5.4.151 upstream stable release (LP: #1947888)
    - tty: Fix out-of-bound vmalloc access in imageblit
    - cpufreq: schedutil: Use kobject release() method to free sugov_tunables
    - cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
    - usb: cdns3: fix race condition before setting doorbell
    - fs-verity: fix signed integer overflow with i_size near S64_MAX
    - hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary
      structure field
    - hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary
      structure field
    - hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary
      structure field
    - scsi: ufs: Fix illegal offset in UPIU event trace
    - mac80211: fix use-after-free in CCMP/GCMP RX
    - x86/kvmclock: Move this_cpu_pvti into kvmclock.h
    - drm/amd/display: Pass PCI deviceid into DC
    - ipvs: check that ip_vs_conn_tab_bits is between 8 and 20
    - hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced
      from sysfs
    - mac80211: Fix ieee80211_amsdu_aggregate frag_tail bug
    - mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotap
    - mac80211: mesh: fix potentially unaligned access
    - mac80211-hwsim: fix late beacon hrtimer handling
    - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
    - hwmon: (tmp421) report /P...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.