[UBUNTU 20.04] zPCI: Enabling of a reserved PCI function regression introduced by multi-function support

Bug #1891437 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Skipper Bug Screeners
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Skipper Bug Screeners

Bug Description

SRU Justification:
==================

[Impact]

* If a NVMe drive is assigned/hotplugged to a Linux on s390x LPAR, a bug in lib/list_debug.c is hit and the device is not accessible.

* The reason is a missing /dev/ file -- lspci does not report it either.

[Fix]

* 3047766bc6ec9c6bc9ece85b45a41ff401e8d988 3047766bc6ec "s390/pci: fix enabling a reserved PCI function"

[Test Case]

* Assign a NMVe drive to your LPAR (using the HMC)

* Unassign the NVMe drive from your LPAR

* Reassign it to your LPAR again

* Look at dmesg for 'kernel BUG at lib/list_debug.c'

[Regression Potential]

* There is some regression risk with having code changes in the zPCI sub-system.

* zPCI is the PCI implementation on s390x, modifications here do not affect any other architecture.

* It could be that PCI events do not work anymore and NVMe devices don't IPL (boot) on s390x anymore.

* However, the code changes below to a single file: arch/s390/pci/pci_event.c

* and IPL from NVMe is brand new in Ubuntu for s390x,

* and zPCI devices are less wide-spread compared to ccw devices on s390x.

* On top a test kernel was build and made available for further testing.

[Other]

* Since the fix/patch got upstream accepted with kernel v5.8-rc5, it's already in the groovy proposed kernel 5.8, hence this SRU is for focal only.
__________

When a NVMe drive is assigned/hotplugged to a Linux LPAR then
a bug is hit in lib/list_debug.c. And the device is not accessible, there is no /dev/ file
and lspci does not report it also.

[ 1681.564462] list_add double add: new=00000000eed0f808, prev=00000000eed0f808, next=000000004070a300.
[ 1681.564489] ------------[ cut here ]------------
[ 1681.564490] kernel BUG at lib/list_debug.c:31!
[ 1681.564504] monitor event: 0040 ilc:2 [#1] SMP
[ 1681.564507] Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter s390_trng ghash_s390 prng aes_s390 des_s390 libdes sha512_s390 vfio_ccw sha1_s390 vfio_mdev mdev chsc_sch vfio_iommu_type1 eadm_sch vfio ip_tables dm_service_time nvme crc32_vx_s390 sha256_s390 sha_common nvme_core qeth_l2 zfcp qeth scsi_transport_fc qdio ccwgroup dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey zcrypt
[ 1681.564534] CPU: 6 PID: 139 Comm: kmcheck Not tainted 5.8.0-rc1+ #2
[ 1681.564535] Hardware name: IBM 8561 T01 701 (LPAR)
[ 1681.564536] Krnl PSW : 0704c00180000000 000000003ffcadb8 (__list_add_valid+0x70/0xa8)
[ 1681.564544] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 1681.564545] Krnl GPRS: 0000000000000040 0000000000000027 0000000000000058 0000000000000007
[ 1681.564546] 000000003ffcadb4 0000000000000000 0000000000000000 000003e0051a7ce0
[ 1681.564547] 000000004070a300 00000000eed0f808 00000000eed0f808 000000004070a300
[ 1681.564548] 00000000f56a2000 0000000040c2c788 000000003ffcadb4 000003e0051a7bc8
[ 1681.564583] Krnl Code: 000000003ffcada8: c02000302b09 larl %r2,00000000405d03ba
                          000000003ffcadae: c0e5ffdd30b1 brasl %r14,000000003fb70f10
                         #000000003ffcadb4: af000000 mc 0,0
                         >000000003ffcadb8: b9040054 lgr %r5,%r4
                          000000003ffcadbc: c02000302aad larl %r2,00000000405d0316
                          000000003ffcadc2: b9040041 lgr %r4,%r1
                          000000003ffcadc6: c0e5ffdd30a5 brasl %r14,000000003fb70f10
                          000000003ffcadcc: af000000 mc 0,0
[ 1681.564592] Call Trace:
[ 1681.564594] [<000000003ffcadb8>] __list_add_valid+0x70/0xa8
[ 1681.564596] ([<000000003ffcadb4>] __list_add_valid+0x6c/0xa8)
[ 1681.564599] [<000000003faf2920>] zpci_create_device+0x60/0x1b0
[ 1681.564601] [<000000003faf704a>] zpci_event_availability+0x282/0x2f0
[ 1681.564605] [<0000000040367848>] chsc_process_crw+0x2b8/0xa18
[ 1681.564607] [<000000004036f35c>] crw_collect_info+0x254/0x348
[ 1681.564610] [<000000003fb2a6ea>] kthread+0x14a/0x168
[ 1681.564613] [<00000000403a55c0>] ret_from_fork+0x24/0x2c
[ 1681.564614] Last Breaking-Event-Address:
[ 1681.564618] [<000000003fb70f62>] printk+0x52/0x58
[ 1681.564620] ---[ end trace 7ea67c348aa67e14 ]---

uname:
Linux t83lp49.lnxne.boe 5.8.0-rc1+ #2 SMP Thu Jun 18 12:38:02 CEST 2020 s390x s390x s390x GNU/Linux

How to reproduce:
1. Unassign a NVMe drive in HMC from your LPAR
2. Reassign it to your LPAR again
3. dmesg

This issue is fixed by the following upstream commit
that is also CCed to stable so might be coming in over the stable pulls
in parallel:
3047766bc6ec ("s390/pci: fix enabling a reserved PCI function")

CVE References

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-186335 severity-medium targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Frank Heimes (fheimes) wrote :

The commit got upstream accepted with v5.8-rc5 and is with that part of groovy's proposed kernel 5.8:
user@box:~/ubuntu-groovy-master-next$ git tag --contains 3047766bc6ec
Ubuntu-5.8.0-13.14
Ubuntu-5.8.0-14.15
Ubuntu-5.8.0-15.16
Ubuntu-5.8.0-16.17
v5.8
v5.8-rc5
v5.8-rc6
v5.8-rc7
Hence updating status of groovy entry to Fix Committed.

Changed in linux (Ubuntu Groovy):
status: New → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-August/thread.html#112935
Updating status for Focal to 'In Progress'.

Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

I made builds of the patched kernel packages available here for further testing:
https://people.canonical.com/~fheimes/lp1891437/

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-08-25 06:07 EDT-------
Thanks for providing the build Frank, there is another
Launchpad bug mirror incoming for which the patches will
touch the same area (but should apply cleanly on top of the
fix for this issue) it's titled "zPCI attach/detach issues with PF/VF linking support ".
Do you think it makes sense to postpone verifying this
until we have both parts together?

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Niklas, okay, the second one just came in today.
I've build some focal kernel packages that incl. both tickets (4 commits in total):
06c1c445e1a3 (HEAD -> master-next) s390/pci: fix PF/VF linking on hot plug
e90c81552200 s390/pci: re-introduce zpci_remove_device()
0f54ce9be290 s390/pci: fix zpci_bus_link_virtfn()
2103c66d09b1 s390/pci: fix enabling a reserved PCI function

They are available here:
https://people.canonical.com/~fheimes/lp1891437+lp1892849/

Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-01 04:18 EDT-------
Thanks for your quick work, I can confirm that this works as designed on
the 5.4.0-46-generic kernel from proposed.

Revision history for this message
Frank Heimes (fheimes) wrote :

Thx Niklas for the quick verification - I adjusted the tags accordingly ...

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-04 08:39 EDT-------
*** Bug 184194 has been marked as a duplicate of this bug. ***

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (42.6 KiB)

This bug was fixed in the package linux - 5.4.0-48.52

---------------
linux (5.4.0-48.52) focal; urgency=medium

  * focal/linux: 5.4.0-48.52 -proposed tracker (LP: #1894654)

  * mm/slub kernel oops on focal kernel 5.4.0-45 (LP: #1895109)
    - SAUCE: Revert "mm/slub: fix a memory leak in sysfs_slab_add()"

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for nvidia 450 and 450-server

  * [UBUNTU 20.04] zPCI attach/detach issues with PF/VF linking support
    (LP: #1892849)
    - s390/pci: fix zpci_bus_link_virtfn()
    - s390/pci: re-introduce zpci_remove_device()
    - s390/pci: fix PF/VF linking on hot plug

  * [UBUNTU 20.04] kernel: s390/cpum_cf,perf: changeDFLT_CCERROR counter name
    (LP: #1891454)
    - s390/cpum_cf, perf: change DFLT_CCERROR counter name

  * [UBUNTU 20.04] zPCI: Enabling of a reserved PCI function regression
    introduced by multi-function support (LP: #1891437)
    - s390/pci: fix enabling a reserved PCI function

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * [Hyper-V] VSS and File Copy daemons intermittently fails to start
    (LP: #1891224)
    - [Packaging] Bind hv_vss_daemon startup to hv_vss device
    - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * alsa/hdmi: support nvidia mst hdmi/dp audio (LP: #1867704)
    - ALSA: hda - Rename snd_hda_pin_sense to snd_hda_jack_pin_sense
    - ALSA: hda - Add DP-MST jack support
    - ALSA: hda - Add DP-MST support for non-acomp codecs
    - ALSA: hda - Add DP-MST support for NVIDIA codecs
    - ALSA: hda: hdmi - fix regression in connect list handling
    - ALSA: hda: hdmi - fix kernel oops caused by invalid PCM idx
    - ALSA: hda: hdmi - preserve non-MST PCM routing for Intel platforms
    - ALSA: hda: hdmi - Keep old slot assignment behavior for Intel platforms
    - ALSA: hda - Fix DP-MST support for NVIDIA codecs

  * Focal update: v5.4.60 upstream stable release (LP: #1892899)
    - smb3: warn on confusing error scenario with sec=krb5
    - genirq/affinity: Make affinity setting if activated opt-in
    - genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()
    - PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()
    - PCI: Add device even if driver attach failed
    - PCI: qcom: Define some PARF params needed for ipq8064 SoC
    - PCI: qcom: Add support for tx term offset for rev 2.1.0
    - btrfs: allow use of global block reserve for balance item deletion
    - btrfs: free anon block device right after subvolume deletion
    - btrfs: don't allocate anonymous block device for user invisible roots
    - btrfs: ref-verify: fix memory leak in add_block_entry
    - btrfs: stop incremening log_batch for the log root tree when syncing log
    - btrfs: remove no longer needed use of log_writers for the log root tree
    - btrfs: don't traverse into the seed devices in show_devname
    - btrfs: open device...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

I just verified if the patch/commit also landed in groovy and it did.
Hence updating the groovy entry to Fix Released and with that the entire case.

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-21 09:07 EDT-------
IBM bugzilla status->closed, Fix Released for focal and groovy

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.