eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 / P9

Bug #1882503 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Po-Hsu Lin
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
The breakable devices test is hardware-dependent. In our test pool
it will take about:
* 30 seconds to run on a Power8 system that with 5 breakable devices,
* 60 seconds to run on a Power9 system that with 4 breakable devices.

The default 45 seconds kselftest framework timeout is not enough to
allow this test to finish on some nodes. Thus causing this test to
fail with TIMEOUT error.

[Fix]
* f5eca0b279117f ("selftests/powerpc/eeh: disable kselftest timeout
setting for eeh-basic")

We have this testcase since Focal, and this patch can be cherry-picked
into all affected releases.

[Test case]
Run this test on P9 node baltar, on which this timeout issue can be
100% reproduced. With this patch applied, the test can finish without
being terminated by the default timeout.

[Where problems could occur]
This will make test takes longer to finish, but it's still being
controlled by the timeout mechanism both in the test case and
kselftest framework. It's unlikely to make the test hang forever.

== Original Bug Report ==
Issue found on 5.4.0-34.38, Focal P9 "baltar"

 # selftests: powerpc/eeh: eeh-basic.sh
 # 0000:00:00.0, Skipped: bridge
 # 0001:00:00.0, Skipped: bridge
 # 0002:00:00.0, Skipped: bridge
 # 0002:01:00.0, Added
 # 0003:00:00.0, Skipped: bridge
 # 0003:01:00.0, Added
 # 0004:00:00.0, Skipped: bridge
 # 0004:01:00.0, Skipped: bridge
 # 0004:02:00.0, Added
 # 0005:00:00.0, Skipped: bridge
 # 0005:01:00.0, Added
 # 0030:00:00.0, Skipped: bridge
 # 0031:00:00.0, Skipped: bridge
 # 0032:00:00.0, Skipped: bridge
 # 0033:00:00.0, Skipped: bridge
 # Found 4 breakable devices...
 # Breaking 0002:01:00.0...
 # 0002:01:00.0, waited 0/60
 # 0002:01:00.0, waited 1/60
 # 0002:01:00.0, waited 2/60
 # 0002:01:00.0, waited 3/60
 # 0002:01:00.0, waited 4/60
 # 0002:01:00.0, waited 5/60
 # 0002:01:00.0, waited 6/60
 # 0002:01:00.0, Recovered after 7 seconds
 # Breaking 0003:01:00.0...
 # 0003:01:00.0, waited 0/60
 # 0003:01:00.0, waited 1/60
 # 0003:01:00.0, waited 2/60
 # 0003:01:00.0, waited 3/60
 # 0003:01:00.0, waited 4/60
 # 0003:01:00.0, waited 5/60
 # 0003:01:00.0, waited 6/60
 # 0003:01:00.0, waited 7/60
 # 0003:01:00.0, waited 8/60
 # 0003:01:00.0, waited 9/60
 # 0003:01:00.0, waited 10/60
 # 0003:01:00.0, waited 11/60
 # 0003:01:00.0, waited 12/60
 # 0003:01:00.0, waited 13/60
 # 0003:01:00.0, waited 14/60
 # 0003:01:00.0, waited 15/60
 # 0003:01:00.0, waited 16/60
 # 0003:01:00.0, waited 17/60
 # 0003:01:00.0, waited 18/60
 # 0003:01:00.0, waited 19/60
 # 0003:01:00.0, waited 20/60
 # 0003:01:00.0, waited 21/60
 # 0003:01:00.0, waited 22/60
 # 0003:01:00.0, waited 23/60
 # 0003:01:00.0, waited 24/60
 # 0003:01:00.0, waited 25/60
 # 0003:01:00.0, waited 26/60
 # 0003:01:00.0, waited 27/60
 # 0003:01:00.0, waited 28/60
 # 0003:01:00.0, waited 29/60
 # 0003:01:00.0, waited 30/60
 # 0003:01:00.0, waited 31/60
 # 0003:01:00.0, waited 32/60
 # 0003:01:00.0, waited 33/60
 # 0003:01:00.0, waited 34/60
 # 0003:01:00.0, waited 35/60
 # 0003:01:00.0, Recovered after 36 seconds
 # Breaking 0004:02:00.0...
 # 0004:02:00.0, Recovered after 0 seconds
 # Breaking 0005:01:00.0...
 # 0005:01:00.0, waited 0/60
 # 0005:01:00.0, waited 1/60
 #
 not ok 1 selftests: powerpc/eeh: eeh-basic.sh # TIMEOUT

Po-Hsu Lin (cypressyew)
tags: added: 5.4 focal ppc64el sru-20200518 ubuntu-kernel-selftests
Po-Hsu Lin (cypressyew)
tags: added: kqa-blocker
Po-Hsu Lin (cypressyew)
tags: added: sru-20200608
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue can be found on Bionic 5.4.0-42.46~18.04.1 P8 as well, but with some other "No such file" error in the log:

 # selftests: powerpc/eeh: eeh-basic.sh
 # 0000:00:00.0, Skipped: bridge
 # 0001:00:00.0, Skipped: bridge
 # 0001:01:00.0, Skipped: bridge
 # 0001:02:01.0, Skipped: bridge
 # 0001:02:08.0, Skipped: bridge
 # 0001:02:09.0, Skipped: bridge
 # 0001:08:00.0, Added
 # 0001:09:00.0, Added
 # 0004:00:00.0, Skipped: bridge
 # 0005:00:00.0, Skipped: bridge
 # 0005:01:00.0, Skipped: bridge
 # 0005:02:01.0, Skipped: bridge
 # 0005:02:08.0, Skipped: bridge
 # 0005:02:09.0, Skipped: bridge
 # 0005:02:10.0, Skipped: bridge
 # 0005:02:11.0, Skipped: bridge
 # 0005:03:00.0, Added
 # 0005:04:00.0, Added
 # 0005:05:00.0, Added
 # 0040:00:00.0, Skipped: bridge
 # 0044:00:00.0, Skipped: bridge
 # 0045:00:00.0, Skipped: bridge
 # Found 5 breakable devices...
 # Breaking 0001:08:00.0...
 # 0001:08:00.0, waited 0/60
 # 0001:08:00.0, waited 1/60
 # 0001:08:00.0, waited 2/60
 # 0001:08:00.0, Recovered after 3 seconds
 # Breaking 0001:09:00.0...
 # cut: -: No such device
 # ./eeh-basic.sh: 13: ./eeh-basic.sh: cannot open /sys/bus/pci/devices/0001:09:00.0/eeh_pe_state: No such file
 # 0001:09:00.0, waited 0/60
 # 0001:09:00.0, waited 1/60
 # 0001:09:00.0, waited 2/60
 # 0001:09:00.0, waited 3/60
 # 0001:09:00.0, waited 4/60
 # 0001:09:00.0, waited 5/60
 # 0001:09:00.0, waited 6/60
 # 0001:09:00.0, waited 7/60
 # 0001:09:00.0, Recovered after 8 seconds
 # Breaking 0005:03:00.0...
 # cut: -: No such device
 # ./eeh-basic.sh: 13: ./eeh-basic.sh: cannot open /sys/bus/pci/devices/0005:03:00.0/eeh_pe_state: No such file
 # 0005:03:00.0, waited 0/60
 # 0005:03:00.0, waited 1/60
 # 0005:03:00.0, waited 2/60
 # 0005:03:00.0, waited 3/60
 # 0005:03:00.0, waited 4/60
 # 0005:03:00.0, waited 5/60
 # 0005:03:00.0, waited 6/60
 # 0005:03:00.0, waited 7/60
 # 0005:03:00.0, Recovered after 8 seconds
 # Breaking 0005:04:00.0...
 # 0005:04:00.0, waited 0/60
 # 0005:04:00.0, waited 1/60
 # 0005:04:00.0, waited 2/60
 # 0005:04:00.0, Recovered after 3 seconds
 # Breaking 0005:05:00.0...
 # 0005:05:00.0, waited 0/60
 # 0005:05:00.0, waited 1/60
 # 0005:05:00.0, waited 2/60
 # 0005:05:00.0, Recovered after 3 seconds
 # 0 devices failed to recover (5 tested)
 #
 not ok 1 selftests: powerpc/eeh: eeh-basic.sh # TIMEOUT

tags: added: bionic sru-20200629
summary: - eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with Focal
- P9
+ eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8
+ / P9
Po-Hsu Lin (cypressyew)
tags: added: sru-20200810
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Found on Groovy/linux 5.8.0-31.33

tags: added: 5.8 groovy sru-20201109
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux (Ubuntu Groovy):
status: New → In Progress
Po-Hsu Lin (cypressyew)
description: updated
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.7 KiB)

This bug was fixed in the package linux - 5.10.0-14.15

---------------
linux (5.10.0-14.15) hirsute; urgency=medium

  * hirsute/linux: 5.10.0-14.15 -proposed tracker (LP: #1913724)

  * Restore palm ejection on multi-input devices (LP: #1913520)
    - HID: multitouch: Apply MT_QUIRK_CONFIDENCE quirk for multi-input devices

  * intel-hid is not loaded on new Intel platform (LP: #1907160)
    - platform/x86: intel-hid: add Rocket Lake ACPI device ID

  * Hirsute update: v5.10.11 upstream stable release (LP: #1913430)
    - scsi: target: tcmu: Fix use-after-free of se_cmd->priv
    - mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload
    - mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine
    - i2c: tegra: Wait for config load atomically while in ISR
    - i2c: bpmp-tegra: Ignore unknown I2C_M flags
    - platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
    - ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
    - ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
    - ALSA: hda/via: Add minimum mute flag
    - crypto: xor - Fix divide error in do_xor_speed()
    - dm crypt: fix copy and paste bug in crypt_alloc_req_aead
    - ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
    - btrfs: don't get an EINTR during drop_snapshot for reloc
    - btrfs: do not double free backref nodes on error
    - btrfs: fix lockdep splat in btrfs_recover_relocation
    - btrfs: don't clear ret in btrfs_start_dirty_block_groups
    - btrfs: send: fix invalid clone operations when cloning from the same file
      and root
    - fs: fix lazytime expiration handling in __writeback_single_inode()
    - pinctrl: ingenic: Fix JZ4760 support
    - mmc: core: don't initialize block size from ext_csd if not present
    - mmc: sdhci-of-dwcmshc: fix rpmb access
    - mmc: sdhci-xenon: fix 1.8v regulator stabilization
    - mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend
    - dm: avoid filesystem lookup in dm_get_dev_t()
    - dm integrity: fix a crash if "recalculate" used without "internal_hash"
    - dm integrity: conditionally disable "recalculate" feature
    - drm/atomic: put state on error path
    - drm/syncobj: Fix use-after-free
    - drm/amdgpu: remove gpu info firmware of green sardine
    - drm/amd/display: DCN2X Find Secondary Pipe properly in MPO + ODM Case
    - drm/i915/gt: Prevent use of engine->wa_ctx after error
    - drm/i915: Check for rq->hwsp validity after acquiring RCU lock
    - ASoC: Intel: haswell: Add missing pm_ops
    - ASoC: rt711: mutex between calibration and power state changes
    - SUNRPC: Handle TCP socket sends with kernel_sendpage() again
    - HID: sony: select CONFIG_CRC32
    - dm integrity: select CRYPTO_SKCIPHER
    - x86/hyperv: Fix kexec panic/hang issues
    - scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
    - scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
    - scsi: qedi: Correct max length of CHAP secret
    - scsi: scsi_debug: Fix memleak in scsi_debug_init()
    - scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
    - riscv: ...

Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Released
Po-Hsu Lin (cypressyew)
tags: added: verification-done-focal verification-done-groovy
removed: verification-needed-focal verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (60.8 KiB)

This bug was fixed in the package linux - 5.4.0-66.74

---------------
linux (5.4.0-66.74) focal; urgency=medium

  * focal/linux: 5.4.0-66.74 -proposed tracker (LP: #1913152)

  * Add support for selective build of special drivers (LP: #1912789)
    - [Packaging] Add support for ODM drivers
    - [Packaging] Turn on ODM support for amd64

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 460-server series and update the 460 series
    (LP: #1913200)
    - [Config] dkms-versions -- drop NVIDIA 435 455 and 440-server
    - [Config] dkms-versions -- add the 460-server nvidia driver

  * Enable mute and micmute LED on HP EliteBook 850 G7 (LP: #1910102)
    - ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7

  * SYNA30B4:00 06CB:CE09 Mouse on HP EliteBook 850 G7 not working at all
    (LP: #1908992)
    - HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device

  * HD Audio Device PCI ID for the Intel Cometlake-R platform (LP: #1912427)
    - SAUCE: ALSA: hda: Add Cometlake-R PCI ID

  * switch to an autogenerated nvidia series based core via dkms-versions
    (LP: #1912803)
    - [Packaging] nvidia -- use dkms-versions to define versions built
    - [Packaging] update-version-dkms -- maintain flags fields
    - [Config] dkms-versions -- add transitional/skip information for nvidia
      packages

  * udpgro.sh in net from ubuntu_kernel_selftests seems not reflecting sub-test
    result (LP: #1908499)
    - selftests: fix the return value for UDP GRO test

  * qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP
    tx csum offload (LP: #1909062)
    - qede: fix offload for IPIP tunnel packets

  * Use DCPD to control HP DreamColor panel (LP: #1911001)
    - SAUCE: drm/dp: Another HP DreamColor panel brigntness fix

  * kvm: Windows 2k19 with Hyper-v role gets stuck on pending hypervisor
    requests on cascadelake based kvm hosts (LP: #1911848)
    - KVM: x86: Set KVM_REQ_EVENT if run is canceled with req_immediate_exit set

  * Ubuntu 20.10 four needed fixes to 'Add driver for Mellanox Connect-IB
    adapters' (LP: #1905574)
    - net/mlx5: Fix a race when moving command interface to polling mode

  * Fix right sounds and mute/micmute LEDs for HP ZBook Fury 15/17 G7 Mobile
    Workstation (LP: #1910561)
    - ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines

  * Ubuntu 20.04 - multicast counter is not increased in ip -s (LP: #1901842)
    - net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

  * eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 /
    P9 (LP: #1882503)
    - selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

  * DMI entry syntax fix for Pegatron / ByteSpeed C15B (LP: #1910639)
    - Input: i8042 - unbreak Pegatron C15B

  * CVE-2020-29372
    - mm: check that mm is still valid in madvise()

  * update ENA driver, incl. new ethtool stats (LP: #1910291)
    - net: ena: Change WARN_ON expression in ena_del_napi_in_range()
    - net: ena: ethtool: convert stat_offset to 64 bit resolution
    - net: ena: eth...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (129.8 KiB)

This bug was fixed in the package linux - 5.8.0-44.50

---------------
linux (5.8.0-44.50) groovy; urgency=medium

  * groovy/linux: 5.8.0-44.50 -proposed tracker (LP: #1914805)

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 460-server series and update the 460 series
    (LP: #1913200)
    - [Config] dkms-versions -- drop NVIDIA 435 455 and 440-server
    - [Config] dkms-versions -- add the 460-server nvidia driver

  * [SRU][G/H/U/OEM-5.10] re-enable s0ix of e1000e (LP: #1910541)
    - Revert "UBUNTU: SAUCE: e1000e: bump up timeout to wait when ME un-configure
      ULP mode"
    - e1000e: Only run S0ix flows if shutdown succeeded
    - Revert "e1000e: disable s0ix entry and exit flows for ME systems"
    - e1000e: Export S0ix flags to ethtool

  * suspend only works once on ThinkPad X1 Carbon gen 7 (LP: #1865570) //
    [SRU][G/H/U/OEM-5.10] re-enable s0ix of e1000e (LP: #1910541)
    - e1000e: bump up timeout to wait when ME un-configures ULP mode

  * Cannot probe sata disk on sata controller behind VMD: ata1.00: failed to
    IDENTIFY (I/O error, err_mask=0x4) (LP: #1894778)
    - PCI: vmd: Offset Client VMD MSI-X vectors

  * Enable mute and micmute LED on HP EliteBook 850 G7 (LP: #1910102)
    - ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7

  * SYNA30B4:00 06CB:CE09 Mouse on HP EliteBook 850 G7 not working at all
    (LP: #1908992)
    - HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device

  * HD Audio Device PCI ID for the Intel Cometlake-R platform (LP: #1912427)
    - SAUCE: ALSA: hda: Add Cometlake-R PCI ID

  * switch to an autogenerated nvidia series based core via dkms-versions
    (LP: #1912803)
    - [Packaging] nvidia -- use dkms-versions to define versions built
    - [Packaging] update-version-dkms -- maintain flags fields
    - [Config] dkms-versions -- add transitional/skip information for nvidia
      packages

  * udpgro.sh in net from ubuntu_kernel_selftests seems not reflecting sub-test
    result (LP: #1908499)
    - selftests: fix the return value for UDP GRO test

  * [UBUNTU 21.04] vfio: pass DMA availability information to userspace
    (LP: #1907421)
    - vfio/type1: Refactor vfio_iommu_type1_ioctl()
    - vfio iommu: Add dma available capability

  * qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP
    tx csum offload (LP: #1909062)
    - qede: fix offload for IPIP tunnel packets

  * Use DCPD to control HP DreamColor panel (LP: #1911001)
    - SAUCE: drm/dp: Another HP DreamColor panel brigntness fix

  * Fix right sounds and mute/micmute LEDs for HP ZBook Fury 15/17 G7 Mobile
    Workstation (LP: #1910561)
    - ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines

  * Ubuntu 20.04 - multicast counter is not increased in ip -s (LP: #1901842)
    - net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

  * eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 /
    P9 (LP: #1882503)
    - selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

  * DMI entry syntax fix for Pegatron /...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.