REGRESSION: shiftfs lets sendfile fail with EINVAL

Bug #1939301 reported by Simon Fels
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Hirsute
Fix Released
High
Unassigned
Impish
Fix Released
Undecided
Unassigned
linux-hwe-5.11 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Hirsute
Invalid
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned
linux-meta-hwe-5.11 (Ubuntu)
Focal
Invalid
Undecided
Unassigned
Hirsute
Invalid
Undecided
Unassigned
Impish
Invalid
Undecided
Unassigned

Bug Description

With the 5.11 HWE kernel landing for Ubuntu 20.04 we noticed that LXC tools we're using in bionic containers as part of Anbox Cloud start to fail when executed on the 5.11 kernel.

A simple reproducer looks like this:

1. Run Ubuntu 20.04 with HWE kernel (linux-generic-hwe-20.04, 5.11.0-25-generic #27~20.04.1-Ubuntu)
2. Install LXD and enable shiftfs
$ snap install lxd
$ snap set lxd shiftfs.enable true
$ snap restart --reload lxd
3. Launch bionic container and run `lxc-info`
$ lxc launch ubuntu:b c0
$ lxc shell c0
c0$ apt update
c0$ apt install -y lxc-utils
root@c1:~# apt show lxc-utils | grep Version
Version: 3.0.3-0ubuntu1~18.04.1
c0$ mkdir -p containers/test
c0$ touch containers/test/config
c0$ lxc-info -P containers -n test
Failed to load config for test
Failure to retrieve information on containers:test

Looking into the failing `lxc-info` call with strace reveals:

...
memfd_create(".lxc_config_file", MFD_CLOEXEC) = 4
openat(AT_FDCWD, "containers/test/config", O_RDONLY|O_CLOEXEC) = 5
sendfile(4, 5, NULL, 2147479552) = -1 EINVAL (Invalid argument
...

LXC >= 4.0.0 doesn't use sendfile anymore and with that isn't affected. Any other tool using sendfile however is affected and will fail. Bionic is affected as the 3.0.3 version of LXC it includes still uses sendfile.

Disabling shiftfs makes things work again and can be considered as a workaround to a certain degree, but not be applicable in all cases.

Further analysis with Christian (cbrauner) from the LXD team this morning showed that shiftfs is missing an implementation for the now required slice_read handler in the file_operations structure. So whenever shiftfs is being used, all calls to sendfile will fail because of the missing implementation. The generic handler for this got removed in the following upstream change: https://<email address hidden>/

Christian implemented a quick fix: https://paste.ubuntu.com/p/TPsjfCpnD5/

As of today I don't know of any customer of Anbox Cloud who is affected by this as most of them run with one of our cloud kernels. However as soon as 5.11 rolls out to the cloud kernels, we will hit production systems and cause them to fail.

Changed in linux-meta-hwe-5.11 (Ubuntu):
status: New → Confirmed
Changed in linux-meta-hwe-5.11 (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu Hirsute):
status: New → Confirmed
Changed in linux-meta-hwe-5.11 (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-meta-hwe-5.11 (Ubuntu Impish):
status: Confirmed → Invalid
Changed in linux (Ubuntu Focal):
status: New → Invalid
Stefan Bader (smb)
Changed in linux (Ubuntu Hirsute):
importance: Undecided → High
status: Confirmed → In Progress
Revision history for this message
Simon Fels (morphis) wrote :

I've verified on my end that with the patch from https://paste.ubuntu.com/p/TPsjfCpnD5/ the failing sendfile syscall on top of shiftfs is gone.

Changed in linux (Ubuntu Impish):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Changed in linux-hwe-5.11 (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-hwe-5.11 (Ubuntu Impish):
status: New → Invalid
Changed in linux-hwe-5.11 (Ubuntu Focal):
status: New → In Progress
no longer affects: linux-meta-hwe-5.11 (Ubuntu)
Changed in linux-meta-hwe-5.11 (Ubuntu Focal):
status: Confirmed → Invalid
Changed in linux-hwe-5.11 (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Simon Fels (morphis) wrote :

I can confirm that running the HWE kernel 5.11.0-27-generic #29~20.04.1-Ubuntu from proposed on our Ubuntu 20.04 test system resolves the problem. Thanks!

Stefan Bader (smb)
tags: added: verification-done-hirsute
removed: verification-needed-hirsute
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-hwe-5.11 - 5.11.0-27.29~20.04.1

---------------
linux-hwe-5.11 (5.11.0-27.29~20.04.1) focal; urgency=medium

  * focal/linux-hwe-5.11: 5.11.0-27.29~20.04.1 -proposed tracker (LP: #1939554)

  * Update SmartPQI driver (LP: #1933518)
    - scsi: smartpqi: Add support for new product ids
    - scsi: smartpqi: Refactor aio submission code
    - scsi: smartpqi: Refactor scatterlist code
    - scsi: smartpqi: Add support for RAID5 and RAID6 writes
    - scsi: smartpqi: Add support for RAID1 writes
    - scsi: smartpqi: Add support for BMIC sense feature cmd and feature bits
    - scsi: smartpqi: Add support for long firmware version
    - scsi: smartpqi: Align code with oob driver
    - scsi: smartpqi: Add stream detection
    - scsi: smartpqi: Add host level stream detection enable
    - scsi: smartpqi: Disable WRITE SAME for HBA NVMe disks
    - scsi: smartpqi: Remove timeouts from internal cmds
    - scsi: smartpqi: Add support for wwid
    - scsi: smartpqi: Update event handler
    - scsi: smartpqi: Update soft reset management for OFA
    - scsi: smartpqi: Synchronize device resets with mutex
    - scsi: smartpqi: Update suspend/resume and shutdown
    - scsi: smartpqi: Update RAID bypass handling
    - scsi: smartpqi: Update OFA management
    - scsi: smartpqi: Update device scan operations
    - scsi: smartpqi: Fix driver synchronization issues
    - scsi: smartpqi: Convert snprintf() to scnprintf()
    - scsi: smartpqi: Add phy ID support for the physical drives
    - scsi: smartpqi: Update SAS initiator_port_protocols and
      target_port_protocols
    - scsi: smartpqi: Add additional logging for LUN resets
    - scsi: smartpqi: Update enclosure identifier in sysfs
    - scsi: smartpqi: Correct system hangs when resuming from hibernation
    - scsi: smartpqi: Update version to 2.1.8-045
    - scsi: smartpqi: Fix blocks_per_row static checker issue
    - scsi: smartpqi: Fix device pointer variable reference static checker issue
    - scsi: smartpqi: Remove unused functions

  * Hirsute update: upstream stable patchset 2021-06-14 (LP: #1931896) // HWE
    kernels: NFSv4.1 NULL pointer dereference (LP: #1939157)
    - NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()

  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

 -- Kleber Sacilotto de Souza <email address hidden> Wed, 11 Aug 2021 16:53:07 +0200

Changed in linux-hwe-5.11 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (48.1 KiB)

This bug was fixed in the package linux - 5.11.0-31.33

---------------
linux (5.11.0-31.33) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-31.33 -proposed tracker (LP: #1939553)

  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

linux (5.11.0-26.28) hirsute; urgency=medium

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * large_dir in ext4 broken (LP: #1933074)
    - SAUCE: ext4: fix directory index node split corruption

  * Add l2tp.sh in net from ubuntu_kernel_selftests back (LP: #1934293)
    - Revert "UBUNTU: SAUCE: selftests/net -- disable l2tp.sh test"

  * icmp_redirect.sh in net from ubuntu_kernel_selftests failed on F-OEM-5.6 /
    F-OEM-5.10 / F-OEM-5.13 / F / G / H (LP: #1880645)
    - selftests: icmp_redirect: support expected failures

  * Mute/mic LEDs no function on some HP platfroms (LP: #1934878)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8

  * [SRU][OEM-5.10/H] Fix HDMI output issue on Intel TGL GPU (LP: #1934864)
    - drm/i915: Fix HAS_LSPCON macro for platforms between GEN9 and GEN10

  * mute/micmute LEDs no function on HP EliteBook 830 G8 Notebook PC
    (LP: #1934239)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC

  * ubuntu-host driver lacks lseek ops (LP: #1934110)
    - ubuntu-host: add generic lseek op

  * ubuntu_kernel_selftests ftrace fails on arm64 F / aws-5.8 / amd64 F
    azure-5.8 (LP: #1927749)
    - selftests/ftrace: fix event-no-pid on 1-core machine

  * Hirsute update: upstream stable patchset 2021-06-29 (LP: #1934012)
    - proc: Track /proc/$pid/attr/ opener mm_struct
    - ASoC: max98088: fix ni clock divider calculation
    - ASoC: amd: fix for pcm_read() error
    - spi: Fix spi device unregister flow
    - spi: spi-zynq-qspi: Fix stack violation bug
    - bpf: Forbid trampoline attach for functions with variable arguments
    - net/nfc/rawsock.c: fix a permission check bug
    - usb: cdns3: Fix runtime PM imbalance on error
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
    - vfio-ccw: Reset FSM state to IDLE inside FSM
    - vfio-ccw: Serialize FSM IDLE state with I/O completion
    - ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
    - spi: sprd: Add missing MODULE_DEVICE_TABLE
    - usb: chipidea: udc: assign interrupt number to USB gadget structure
    - isdn: mISDN: netjet: Fix crash in nj_probe:
    - bonding: init notify_work earlier to avoid uninitialized use
    - netlink: disable IRQs for netlink_lock_table()
    - net: mdiobus: get rid of a BUG_ON()
    - cgroup: disable controllers at parse time
    - wq: handle VM suspension in stall detection
    - net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
    - RDS tcp loopback connection can hang
    - net:sfc: fix non-freed irq in legacy irq mode
    - scsi: bnx2fc: Return failure if io_req is already in ABTS processing
    - scsi:...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.13.0-16.16

---------------
linux (5.13.0-16.16) impish; urgency=medium

  * impish/linux: 5.13.0-16.16 -proposed tracker (LP: #1942611)

  * Miscellaneous Ubuntu changes
    - [Config] update toolchain in configs

  * Miscellaneous upstream changes
    - Revert "UBUNTU: [Config] Enable CONFIG_UBSAN_BOUNDS"

 -- Andrea Righi <email address hidden> Fri, 03 Sep 2021 16:21:14 +0200

Changed in linux (Ubuntu Impish):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.