[UBUNTU 20.04] zPCI device hot-plug during boot may result in unusable device

Bug #1893778 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Skipper Bug Screeners
Focal
Fix Released
Undecided
Frank Heimes
Groovy
Fix Released
Undecided
Skipper Bug Screeners

Bug Description

SRU Justification:
==================

[Impact]

* If a PCI device (incl. virtio-pci) is hot-plugged during boot-up on s390x, it can be detected as an entry in CLP List PCI functions and via the hot-plug event.

* (This is basically equivalent to boot time probing on other architectures.)

* In such a case the hot-plug event will be stale, but Linux still tries to add and enable the device which leads to:

* a) a duplicate entry in zPCI internal device list

* b) an attempt to enable the device with a stale function handle

* In case b) the device will be placed in error state which makes it unusable.

[Fix]

* b76fee1bc56c31a9d2a49592810eba30cc06d61a b76fee1bc56c "s390/pci: ignore stale configuration request event"

[Test Case]

* Setup an Ubuntu Server 20.04 (focal) Linux operating system on an IBM Z or LinuxONE III LPAR.

* It's now easiest to test on KVM using virtio-pci (on s390x).

* Start a test virtual machine: sudo virsh start <test-guest>

* Attach and hotplug a virtio-pci device: sudo virsh attach-device <test-guest> hotplug_pci_block.xml

* Where hotplug_pci_block.xml looks like:
   <disk device="disk" type="file">
      <driver name="qemu" type="raw" />
      <address type="pci">
         <zpci fid="4660" uid="4660" />
      </address>
      <source file="testdisk.img" />
      <target bus="virtio" dev="vdt" />
   </disk>

[Regression Potential]

* The regression risk is moderate, since the modification is very limited and therefore manageable (additional if statement - two lines of code) and easily testable on KVM using virtio-pci.

* The changes are in the zPCI event code, so in worst-case it can happen that the event handling get harmed which may break zPCI entirely, affecting all PCI devices incl. virtio-pci (on s390x).

* A bug in PCI 'availability' handling also just lead to wrong states of PCI devices which make them unavailable, hence unusable.

* Notice that zPCI is the s390x-specific PCI implementation, modifications here do not affect any other architecture.

* And zPCI devices are less wide-spread compared to ccw devices on s390x.

* On top a test kernel was build and made available for further testing atesting can be easily done with virtio-pci on KVM.

[Other]

* The fix/patch got upstream accepted with kernel v5.9-rc2.

* But it landed already in groovy's proposed kernel 5.8 (Ubuntu-5.8.0-18.19), due to 'Groovy update: v5.8.4 upstream stable release' that is handled in LP 1893048.

* Hence this fix/patch need to be applied to focal only.

__________

When a PCI device (including virtio-pci for which this is easiest to test)
is hot-plugged while Linux is still booting, it can be detected as
an entry in CLP List PCI Functions (basically equivalent to boot time probing
on other architectures) and with the hot-plug event.
In this case the hot-plug event will be stale but Linux still
tried to add and enable the device leading

a) to a duplicate entry in zPCI internal device list
b) an attempt to enable the device witha stale function handle

Part b) would lead to the device being place in the error state
and make it unusable.

This can most easily be reproduced using KVM and doing

# sudo virsh start myguest && sudo virsh attach-device myguest hotplug_pci_block.xml

Where hotplug_pci_block.xml looks like the following:

<disk device="disk" type="file">
        <driver name="qemu" type="raw" />
        <address type="pci">
                <zpci fid="4660" uid="4660" />
        </address>
        <source file="testdisk.img" />
        <target bus="virtio" dev="vdt" />
</disk>

The problem is fixed with the 3-line upstream commit

b76fee1bc56c31a9d2a49592810eba30cc06d61a s390/pci: ignore stale configuration request event

I also confirmed that as of the focal tag Ubuntu-5.4.0-46.50 this
cherry-picks cleanly.

CVE References

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-187974 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

The commit mentioned got upstream accepted with v5.9-rc2, but already landed in groovy via Groovy update: v5.8.4 upstream stable release of LP 1893048.
Hence only SRU to Focal is needed.

Changed in linux (Ubuntu Groovy):
status: New → Fix Released
Frank Heimes (fheimes)
Changed in linux (Ubuntu Focal):
assignee: nobody → Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
importance: Undecided → High
Revision history for this message
Frank Heimes (fheimes) wrote :

While working on this SRU a set of patched kernel packages was created that were now made available here for further testing:
https://people.canonical.com/~fheimes/lp1893778/

Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-September/thread.html#113190
Updating status to 'In Progress'.

Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in ubuntu-z-systems:
status: New → In Progress
description: updated
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-09-22 04:02 EDT-------
I've verified that this now works as expected on focal-proposed kernel 5.4.0-49.53.
Thanks!

Revision history for this message
Frank Heimes (fheimes) wrote :

Thx Niklas for the verification - updating tags ...

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (31.2 KiB)

This bug was fixed in the package linux - 5.4.0-51.56

---------------
linux (5.4.0-51.56) focal; urgency=medium

  * Packaging resync (LP: #1786013)
    - update dkms package versions

linux (5.4.0-50.55) focal; urgency=medium

  * CVE-2020-16119
    - SAUCE: dccp: avoid double free of ccid on child socket

  * CVE-2020-16120
    - Revert "UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading
      directories"
    - ovl: pass correct flags for opening real directory
    - ovl: switch to mounter creds in readdir
    - ovl: verify permissions in ovl_path_open()
    - ovl: call secutiry hook in ovl_real_ioctl()
    - ovl: check permission to open real file

linux (5.4.0-49.53) focal; urgency=medium

  * focal/linux: 5.4.0-49.53 -proposed tracker (LP: #1896007)

  * Comet Lake PCH-H RAID not support on Ubuntu20.04 (LP: #1892288)
    - ahci: Add Intel Comet Lake PCH-H PCI ID

  * Novalink (mkvterm command failure) (LP: #1892546)
    - tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup()

  * Oops and hang when starting LVM snapshots on 5.4.0-47 (LP: #1894780)
    - SAUCE: Revert "mm: memcg/slab: fix memory leak at non-root kmem_cache
      destroy"

  * Intel x710 LOMs do not work on Focal (LP: #1893956)
    - i40e: Fix LED blinking flow for X710T*L devices
    - i40e: enable X710 support

  * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490)
    - kvm: svm: Update svm_xsaves_supported

  * Fix non-working NVMe after S3 (LP: #1895718)
    - SAUCE: PCI: Enable ACS quirk on CML root port

  * Focal update: v5.4.65 upstream stable release (LP: #1895881)
    - ipv4: Silence suspicious RCU usage warning
    - ipv6: Fix sysctl max for fib_multipath_hash_policy
    - netlabel: fix problems with mapping removal
    - net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    - sctp: not disable bh in the whole sctp_get_port_local()
    - taprio: Fix using wrong queues in gate mask
    - tipc: fix shutdown() of connectionless socket
    - net: disable netpoll on fresh napis
    - Linux 5.4.65

  * Focal update: v5.4.64 upstream stable release (LP: #1895880)
    - HID: quirks: Always poll three more Lenovo PixArt mice
    - drm/msm/dpu: Fix scale params in plane validation
    - tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup
    - drm/msm: add shutdown support for display platform_driver
    - hwmon: (applesmc) check status earlier.
    - nvmet: Disable keep-alive timer when kato is cleared to 0h
    - drm/msm: enable vblank during atomic commits
    - habanalabs: validate FW file size
    - habanalabs: check correct vmalloc return code
    - drm/msm/a6xx: fix gmu start on newer firmware
    - ceph: don't allow setlease on cephfs
    - drm/omap: fix incorrect lock state
    - cpuidle: Fixup IRQ state
    - nbd: restore default timeout when setting it to zero
    - s390: don't trace preemption in percpu macros
    - drm/amd/display: Reject overlay plane configurations in multi-display
      scenarios
    - drivers: gpu: amd: Initialize amdgpu_dm_backlight_caps object to 0 in
      amdgpu_dm_update_backlight_caps
    - drm/amd/display: Retry AUX write when fail occurs
    - drm/amd/display: Fix memleak in amdg...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-10-20 08:40 EDT-------
IBM Bugzilla status-> closed, Fix Released with all requested Distros

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.