s390: dbginfo.sh triggers kernel panic, reading from /sys/kernel/mm/page_idle/bitmap

Bug #1904884 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Focal
Fix Released
Medium
Unassigned
Groovy
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:
==================

[Impact]

* While executing dbginfo.sh (a script to collect runtime, configuration, and trace information on s390x) the systems hangs.

* This is because 'idle page tracking' users can pass random pfn, that might be mapped to
an offline page - and attempts to access offline pages lead to the hang.

* It needs to be avoided that such pages are accessed.

* The upstream commit modifies 'page_idle_get_page()' to use 'pfn_to_online_page()' instead of a
'pfn_valid()' and 'pfn_to_page()' combination, so that the pfn mapped to an offline page is skipped.

[Fix]

* 92fb1db26eef "mm/page_idle.c: skip offline pages"

[Test Case]

* IBM Z or LinuxONE hardware with Ubuntu Server 18.04 (GA kernel, 4.15) installed.

* Execute a test application that tries to access offline pages.

* Or execute dbginfo.sh with having some offline (idle) pages in the system.

[Regression Potential]

* There is a certain regression risk, especially for bionic, since the structure in the kernel 4.15 is a bit different compared to kernel 5.4 (and newer).

* However, for newer kernels the modification is pretty save, since it's upstream accepted since kernel 5.8 and with that already inluded in hirsute and groovy.

* And the patch is fine (and cherry picks cleanly) for focal as well.

* For bionic there is a slightly conflicting context, since the struct 'zone' was replaced by 'pg_data_t *pgdat' (by another commit: 92fb1db26eef), but that change (or any change to the struct zone) would not be necessary to fix the uninitialized struct page access.

* Hence the upstream commit/patch needs to be adjusted/backported to bionic 4.15, largely by replacing line 'pg_data_t *pgdat;' with 'struct zone *zone;' (or actually leaving this line).

* But this needs to be carefully considered, since the handling of idle pages could be harmful, in the end it could make things even worse, means break even more.

[Other]

* The patch got upstream accepted with kernel v5.8, hence it's already is in groovy and hirsute.

* The upstream commit cherry picks cleanly to focal, but for bionic a backport is needed.

* Hence this kernel SRU request is for focal (cherry-pick) and bionic (backport).
__________

System hangs on dbginfo.sh script execution.

Solution:
Commit 92fb1db26eef ("mm/page_idle.c: skip offline pages")

Included upstream since kernel v5.8, so it is already included in Ubuntu 20.10, but not in 20.04 and earlier.

Commit 92fb1db26eef ("mm/page_idle.c: skip offline pages") applies cleanly on ubuntu-focal, but not on ubuntu-bionic.

Adjustment / backport for bionic should be trivial, but it is not IBM code and therefore the backport will be requested here by Canonical.

CVE References

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-189321 severity-critical targetmilestone-inin20041
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → High
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-11-19 10:17 EDT-------
Reduced importance from "ship issue" to "high", not a real ship issue, but is mandatory to be fixed within the service stream

tags: added: severity-high
removed: severity-critical
Revision history for this message
Frank Heimes (fheimes) wrote : Re: [UBUNUT 20.04] s390: dbginfo.sh triggers kernel panic, reading from /sys/kernel/mm/page_idle/bitmap

The commit 92fb1db26eef "mm/page_idle.c: skip offline pages" is upstream with 5.8 (rc-1) (also tagged with next-20200820).
Hence it's in groovy (since Ubuntu-5.8.0-13.14) and in hirsute (Ubuntu-5.8.0-30.32+21.04.1).

It's not in focal (also not via upstream stable updates).

Hence updating only G and H to Fix Released.

Changed in linux (Ubuntu Hirsute):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
status: New → Fix Released
Changed in linux (Ubuntu Groovy):
status: New → Fix Released
Changed in ubuntu-z-systems:
status: New → Triaged
summary: - [UBUNUT 20.04] s390: dbginfo.sh triggers kernel panic, reading from
+ s390: dbginfo.sh triggers kernel panic, reading from
/sys/kernel/mm/page_idle/bitmap
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-11-20 09:50 EDT-------
(In reply to comment #14)
> suggested bionic backport for /mm/page_idle.c

Hmm, not sure if I read this diff correcty, but it seems to remove struct zone and spinlock, which would not be right, and introduce a new bug.

The reason why the upstream commit does not apply directly to bionic, is because of conflicting context. The "struct zone" was replaced with "pg_data_t *pgdat" by (another) commit 92fb1db26eef, but that change (or any change to the struct zone) would not be necessary to fix the uninitialized struct page access.

So, I would suggest simply replacing the line
pg_data_t *pgdat;
from the upstream commit context, with this line to fix the context for bionic
struct zone *zone;

Removing struct zone and the spinlock would certainly introduce a new bug, because the spinlock is necessary.

Revision history for this message
Frank Heimes (fheimes) wrote :

Well, there should be no patch attached, yet.
There was just an attempt to cherry-pick but it wasn't added to this LP.
(And at least on LP there is no patch attached.)
If you accidentally received one via the BZ bridge, please ignore and remove.

Anyway, appreciate your comment ...

Revision history for this message
bugproxy (bugproxy) wrote : mm_page_idle.c_skip_offline_pages-backport.patch

Default Comment by Bridge

Revision history for this message
Frank Heimes (fheimes) wrote :

quick update on focal:
Cherry pick worked cleanly, build succeeded w/o issues and a test install didn't showed any regressions so far (after some hours and oidn another build on top).
The kernel test build is available here for further evaluation: https://people.canonical.com/~fheimes/lp1904884/focal/

Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Revision history for this message
Frank Heimes (fheimes) wrote :

Hmm, that's interesting, the BZ bridge seems to have the patch re-attached :-/ (see LP comment #6)
I removed it (however, not sure if the bridge will attach it again ...)

Revision history for this message
Frank Heimes (fheimes) wrote :

Looks like it's fixed now - the BZ bridge stopped re-attaching the attachment ...

Revision history for this message
Frank Heimes (fheimes) wrote :

'draft' backport for bionic - tbd

Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted for bionic and focal:
https://lists.ubuntu.com/archives/kernel-team/2020-November/thread.html#115045
changing status to 'In Progress' for bionic and focal.

Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

applied to B and F trees

Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-12-01 09:14 EDT-------
Verified

Revision history for this message
Frank Heimes (fheimes) wrote :

Thx Gerald, adjusting tags accordingly ...

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-12-02 08:27 EDT-------
Verified

Revision history for this message
Frank Heimes (fheimes) wrote :

thank you for the verification, adjusting tag accordingly ...

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (32.1 KiB)

This bug was fixed in the package linux - 5.4.0-59.65

---------------
linux (5.4.0-59.65) focal; urgency=medium

  * focal/linux: 5.4.0-59.65 -proposed tracker (LP: #1907604)

  * focal: selftests/bpf build broken: test_map_init.skel.h: No such file or
    directory (LP: #1906866)
    - SAUCE: Revert selftests/ "bpf: Zero-fill re-used per-cpu map element"

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * memory is leaked when tasks are moved to net_prio (LP: #1886859)
    - netprio_cgroup: Fix unlimited memory leak of v2 cgroups

  * Focal update: v5.4.78 upstream stable release (LP: #1905618)
    - drm/i915/gem: Flush coherency domains on first set-domain-ioctl
    - time: Prevent undefined behaviour in timespec64_to_ns()
    - nbd: don't update block size after device is started
    - KVM: arm64: Force PTE mapping on fault resulting in a device mapping
    - PCI: qcom: Make sure PCIe is reset before init for rev 2.1.0
    - usb: dwc3: gadget: Continue to process pending requests
    - usb: dwc3: gadget: Reclaim extra TRBs after request completion
    - btrfs: tracepoints: output proper root owner for trace_find_free_extent()
    - btrfs: sysfs: init devices outside of the chunk_mutex
    - btrfs: reschedule when cloning lots of extents
    - ASoC: Intel: kbl_rt5663_max98927: Fix kabylake_ssp_fixup function
    - genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
    - hv_balloon: disable warning when floor reached
    - net: xfrm: fix a race condition during allocing spi
    - ASoC: codecs: wcd9335: Set digital gain range correctly
    - xfs: set xefi_discard when creating a deferred agfl free log intent item
    - netfilter: use actual socket sk rather than skb sk when routing harder
    - netfilter: nf_tables: missing validation from the abort path
    - netfilter: ipset: Update byte and packet counters regardless of whether they
      match
    - powerpc/eeh_cache: Fix a possible debugfs deadlock
    - perf trace: Fix segfault when trying to trace events by cgroup
    - perf tools: Add missing swap for ino_generation
    - ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
    - iommu/vt-d: Fix a bug for PDP check in prq_event_thread
    - afs: Fix warning due to unadvanced marshalling pointer
    - can: rx-offload: don't call kfree_skb() from IRQ context
    - can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ
      context
    - can: dev: __can_get_echo_skb(): fix real payload length return value for RTR
      frames
    - can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
    - can: j1939: swap addr and pgn in the send example
    - can: j1939: j1939_sk_bind(): return failure if netdev is down
    - can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error
      path
    - can: xilinx_can: handle failure cases of pm_runtime_get_sync
    - can: peak_usb: add range checking in decode operations
    - can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
    - can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is
      on
    - can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
    - c...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (28.8 KiB)

This bug was fixed in the package linux - 4.15.0-129.132

---------------
linux (4.15.0-129.132) bionic; urgency=medium

  * bionic/linux: 4.15.0-129.132 -proposed tracker (LP: #1907635)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Ubuntu 18.04- call trace in kernel buffer when unloading ib_ipoib module
    (LP: #1904848)
    - SAUCE: net/mlx5e: IPoIB, initialize update_stat_work for ipoib devices

  * memory is leaked when tasks are moved to net_prio (LP: #1886859)
    - netprio_cgroup: Fix unlimited memory leak of v2 cgroups

  * s390: dbginfo.sh triggers kernel panic, reading from
    /sys/kernel/mm/page_idle/bitmap (LP: #1904884)
    - mm/page_idle.c: skip offline pages

  * Bionic update: upstream stable patchset 2020-11-23 (LP: #1905333)
    - drm/i915: Break up error capture compression loops with cond_resched()
    - tipc: fix use-after-free in tipc_bcast_get_mode
    - gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
    - gianfar: Account for Tx PTP timestamp in the skb headroom
    - net: usb: qmi_wwan: add Telit LE910Cx 0x1230 composition
    - sctp: Fix COMM_LOST/CANT_STR_ASSOC err reporting on big-endian platforms
    - sfp: Fix error handing in sfp_probe()
    - Blktrace: bail out early if block debugfs is not configured
    - i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
    - Fonts: Replace discarded const qualifier
    - ALSA: usb-audio: Add implicit feedback quirk for Qu-16
    - lib/crc32test: remove extra local_irq_disable/enable
    - kthread_worker: prevent queuing delayed work from timer_fn when it is being
      canceled
    - mm: always have io_remap_pfn_range() set pgprot_decrypted()
    - gfs2: Wake up when sd_glock_disposal becomes zero
    - ftrace: Fix recursion check for NMI test
    - ftrace: Handle tracing when switching between context
    - tracing: Fix out of bounds write in get_trace_buf
    - futex: Handle transient "ownerless" rtmutex state correctly
    - ARM: dts: sun4i-a10: fix cpu_alert temperature
    - x86/kexec: Use up-to-dated screen_info copy to fill boot params
    - of: Fix reserved-memory overlap detection
    - blk-cgroup: Fix memleak on error path
    - blk-cgroup: Pre-allocate tree node on blkg_conf_prep
    - scsi: core: Don't start concurrent async scan on same host
    - vsock: use ns_capable_noaudit() on socket create
    - drm/vc4: drv: Add error handding for bind
    - ACPI: NFIT: Fix comparison to '-ENXIO'
    - vt: Disable KD_FONT_OP_COPY
    - fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent
    - serial: 8250_mtk: Fix uart_get_baud_rate warning
    - serial: txx9: add missing platform_driver_unregister() on error in
      serial_txx9_init
    - USB: serial: cyberjack: fix write-URB completion race
    - USB: serial: option: add Quectel EC200T module support
    - USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
    - USB: serial: option: add Telit FN980 composition 0x1055
    - USB: Add NO_LPM quirk for Kingston flash drive
    - usb: mtu3: fix panic in mtu3_gadget_stop()
    - ARC: stack unwinding: avoid indefinite looping
    - Revert "ARC: entry: fix potential EFA c...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.