[Ubuntu] kernel: zcrypt: reinit ap queue state machine

Bug #1805414 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Canonical Kernel Team
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Bionic
Fix Released
High
Joseph Salisbury
Cosmic
Fix Released
High
Joseph Salisbury
Disco
Fix Released
High
Joseph Salisbury

Bug Description

== SRU Justification ==
The vfio device driver when receiving an ap queue device does
              additional resets thereby removing the registration for
              interrupts for the ap device done by the ap bus core
              code. So when later the vfio driver releases the device and
              one of the default zcrypt drivers takes care of the device
              the interrupt registration needs to get renewed. The current
              code does no renew and result is that requests send into such
              a queue will never see a reply processed - the application
              hangs.

This commit has also been cc'd to upstream stable.

== Fix ==
104f708fd ("s390/zcrypt: reinit ap queue state machine during device probe")

== Regression Potential ==
Low. Limited to s390.

== Original Bug Description ==
Description: kernel: zcrypt: reinit ap queue state machine

Symptom: Zcrypt ap queue device not operational at host level after a
              kvm guest used it.

Problem: The vfio device driver when receiving an ap queue device does
              additional resets thereby removing the registration for
              interrupts for the ap device done by the ap bus core
              code. So when later the vfio driver releases the device and
              one of the default zcrypt drivers takes care of the device
              the interrupt registration needs to get renewed. The current
              code does no renew and result is that requests send into such
              a queue will never see a reply processed - the application
              hangs.

Solution: This patch adds a function which resets the aq queue state
              machine for the ap queue device and triggers the walk through
              the initial states (which are reset and registration for
              interrupts). This function is now called before the driver's
              probe function is invoked.
              When the association between driver and device is released,
              the driver's remove function is called. The current
              implementation calls a ap queue function
              ap_queue_remove(). This invokation has been moved to the ap
              bus function to make the probe / remove pair for ap bus and
              drivers more symmetric.

Reproduction: Set up an kvm guest to use one or more ap queues in
              pass-through mode. Start the guest. Stop the guest. Reassign
              the ap resources back to the host system. Run an application
              which uses exactly this ap resources. Without the fix, the
              application hangs; with the fix the application should run
              fine.

Upstream commit(s):
104f708fd1241b22f808bdf066ab67dc5a051de5
Available on kernel.org

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-173361 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: kernel-key
Revision history for this message
Frank Heimes (fheimes) wrote :

"s390/zcrypt: reinit ap queue state machine during device probe"
104f708fd1241b22f808bdf066ab67dc5a051de5

https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/patch/?id=104f708fd1241b22f808bdf066ab67dc5a051de5

Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It looks like the following commit is a prereq:
f1b0a4343c418 ("s390/zcrypt: Integrate ap_asm.h into include/asm/ap.h")

This would define ap_zapq, which is used in commit 104f708fd1241b. Can you confirm this?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-11-28 03:47 EDT-------
YEs this is a prereq. already listed within https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787405. which has to applied ahead of this integration.

Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
Changed in linux (Ubuntu Disco):
status: Triaged → In Progress
Changed in linux (Ubuntu Cosmic):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Disco):
assignee: Skipper Bug Screeners (skipper-screen-team) → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built Bionic, Cosmic and Disco test kernels with commit 104f708fd1241b22f808bdf066ab67dc5a051de5. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1805414

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-12-04 04:36 EDT-------
The test kernel fixes the issue.... Thx

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Seth Forshee (sforshee)
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
tags: added: verification-needed-bionic
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-01-17 07:55 EDT-------
Fix verified upfront via upstream integration, as verification statement from IBM

Revision history for this message
Frank Heimes (fheimes) wrote :

adjusting tags according to #11

tags: added: verification-done-bionic verification-done-cosmic
removed: verification-needed-bionic verification-needed-cosmic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (47.0 KiB)

This bug was fixed in the package linux - 4.15.0-44.47

---------------
linux (4.15.0-44.47) bionic; urgency=medium

  * linux: 4.15.0-44.47 -proposed tracker (LP: #1811419)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: pass in enum wbt_flags to get_rq_wait()
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: sdhci: Disable 1.8v modes (HS200/HS400/UHS) if controller can't support
      1.8v
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb: Use MMC_CAP2_NO_SDIO
    - mmc: rtsx_usb: Enable MMC_CAP_ERASE to allow erase/discard/trim requests
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant

  * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200)
    - perf: Export perf_event_update_userpage
    - Documentation: perf: Add documentation for ThunderX2 PMU uncore driver
    - drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver
    - [Config] New config CONFIG_THUNDERX2_PMU=m

  * Update hisilicon SoC-specific drivers (LP: #1810457)
    - SAUCE: Revert "net: hns3: Updates RX packet info fetch in case of multi BD"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: separate roce from nic when
      resetting"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Use roce handle when calling roce
      callback function"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Add calling roce callback
      function when link status change"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: optimize the process of notifying
      roce client"
    - Revert "UBUNTU: S...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (56.3 KiB)

This bug was fixed in the package linux - 4.18.0-14.15

---------------
linux (4.18.0-14.15) cosmic; urgency=medium

  * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406)

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766)
    - efi: honour memory reservations passed via a linux specific config table
    - efi/arm: libstub: add a root memreserve config table
    - efi: add API to reserve memory persistently across kexec reboot
    - irqchip/gic-v3-its: Change initialization ordering for LPIs
    - irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage
    - irqchip/gic-v3-its: Split property table clearing from allocation
    - irqchip/gic-v3-its: Move pending table allocation to init time
    - irqchip/gic-v3-its: Keep track of property table's PA and VA
    - irqchip/gic-v3-its: Allow use of pre-programmed LPI tables
    - irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump
      kernels
    - irqchip/gic-v3-its: Check that all RDs have the same property table
    - irqchip/gic-v3-its: Register LPI tables with EFI config table
    - irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
    - arm64: memblock: don't permit memblock resizing until linear mapping is up
    - efi/arm: Defer persistent reservations until after paging_init()
    - efi: Permit calling efi_mem_reserve_persistent() from atomic context
    - efi: Prevent GICv3 WARN() by mapping the memreserve table before first use

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: c...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.1 KiB)

This bug was fixed in the package linux - 4.19.0-12.13

---------------
linux (4.19.0-12.13) disco; urgency=medium

  * linux: 4.19.0-12.13 -proposed tracker (LP: #1813664)

  * kernel oops in bcache module (LP: #1793901)
    - SAUCE: bcache: never writeback a discard operation

  * Disco update: 4.19.18 upstream stable release (LP: #1813611)
    - ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
    - mlxsw: spectrum: Disable lag port TX before removing it
    - mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
    - net: dsa: mv88x6xxx: mv88e6390 errata
    - net, skbuff: do not prefer skb allocation fails early
    - qmi_wwan: add MTU default to qmap network interface
    - ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
    - net: clear skb->tstamp in bridge forwarding path
    - netfilter: ipset: Allow matching on destination MAC address for mac and
      ipmac sets
    - gpio: pl061: Move irq_chip definition inside struct pl061
    - drm/amd/display: Guard against null stream_state in set_crc_source
    - drm/amdkfd: fix interrupt spin lock
    - ixgbe: allow IPsec Tx offload in VEPA mode
    - platform/x86: asus-wmi: Tell the EC the OS will handle the display off
      hotkey
    - e1000e: allow non-monotonic SYSTIM readings
    - usb: typec: tcpm: Do not disconnect link for self powered devices
    - selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
    - of: overlay: add missing of_node_put() after add new node to changeset
    - writeback: don't decrement wb->refcnt if !wb->bdi
    - serial: set suppress_bind_attrs flag only if builtin
    - bpf: Allow narrow loads with offset > 0
    - ALSA: oxfw: add support for APOGEE duet FireWire
    - x86/mce: Fix -Wmissing-prototypes warnings
    - MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
    - crypto: ecc - regularize scalar for scalar multiplication
    - arm64: perf: set suppress_bind_attrs flag to true
    - drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
    - clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
    - samples: bpf: fix: error handling regarding kprobe_events
    - usb: gadget: udc: renesas_usb3: add a safety connection way for
      forced_b_device
    - fpga: altera-cvp: fix probing for multiple FPGAs on the bus
    - selinux: always allow mounting submounts
    - ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
    - scsi: qedi: Check for session online before getting iSCSI TLV data.
    - drm/amdgpu: Reorder uvd ring init before uvd resume
    - rxe: IB_WR_REG_MR does not capture MR's iova field
    - efi/libstub: Disable some warnings for x86{,_64}
    - jffs2: Fix use of uninitialized delayed_work, lockdep breakage
    - clk: imx: make mux parent strings const
    - pstore/ram: Do not treat empty buffers as valid
    - media: uvcvideo: Refactor teardown of uvc on USB disconnect
    - powerpc/xmon: Fix invocation inside lock region
    - powerpc/pseries/cpuidle: Fix preempt warning
    - media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
    - ASoC: use dma_ops of parent device for acp_audio_dma
    - media: ve...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-02-05 02:50 EDT-------
IBM Bugzilla status -> closed, Fix Released for all Distros

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.