5.4.0-11 crash on cryptsetup open

Bug #1860231 reported by Claudio Matsuoka
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Stefan Bader
Xenial
Fix Released
High
Unassigned
Bionic
Fix Released
High
Unassigned
Disco
Won't Fix
High
Unassigned
Eoan
Fix Released
High
Unassigned

Bug Description

[Impact]

An attempt to run cryptsetup open on a newly created LUKS partition on Ubuntu Core 20 causes a kernel crash. This happens in 100% of the attempts on the snapd Core 20 installation test, but on an image created to reproduce this bug it happens only when certain parameters are passed to cryptsetup. Both images are built similarly so the reason for this discrepancy is unknown. The kernel was installed from pc-kernel_374.snap.

[Test Case]

$ dir=$(mktemp -d /tmp/lp1860231.XXXXX)
$ dmsetup create lp1860231 --notable
$ mount -t ext4 \
  "/dev/dm-$(dmsetup info -c -o minor --noheadings lp1860231)" "$dir"

Now check the logs for a backtrace.

[Regression Potential]

The currently proposed fix introduces no chance of stability regressions. There is a chance of a very small performance regression since an additional pointer comparison is performed on each block layer request but this is unlikely to be noticeable.

[Original Report]

Linux version 5.4.0-11-generic (buildd@lgw01-amd64-021) (gcc version 9.2.1 20200104 (Ubuntu 9.2.1-22ubuntu2)) #14-Ubuntu SMP Thu Jan 9 16:14:26 UTC 2020

Version signature: Ubuntu 5.4.0-11.14-generic 5.4.8

How to reproduce the crash in 3 "easy" steps:

1. Build a Core 20 image using the attached model file:
   1.1. Install the ubuntu-image from latest/edge
        $ sudo snap install --channel latest/edge ubuntu-image
   1.2. Build the image
        $ sudo ubuntu-image --image-size=4G ubuntu-core-20-amd64.model

2. Boot the image in kvm
   2.1. Install ovmf version 0~20190606.20d2e5a1-2ubuntu1 or newer (the
        stock ovmf from bionic may not work)
   2.2. Boot the image
        $ sudo kvm -snapshot -m 2048 -smp 4 \
          -netdev user,id=mynet0,hostfwd=tcp::8022-:22,hostfwd=tcp::8090-:80 \
          -device virtio-net-pci,netdev=mynet0 \
          -drive file=pc.img,if=virtio \
          -bios /usr/share/OVMF/OVMF_CODE.ms.fd
   2.3. In the grub menu, edit the default option to include parameter
        "systemd.debug-shell=1" in the kernel command line
   2.4. Boot the kernel

3. Crash the kernel
   3.1. When the system boots to the "Press enter to configure"
        message, press ALT-F9 to enter the debug shell.
   3.2. The system should have two partitions in /dev/vda. Create a
        third one with fdisk.
   3.3. Create a LUKS encrypted partition:
        # echo 123|cryptsetup luksFormat -q --type luks2 --key-file - --pbkdf argon2i --iter-time 1 /dev/vda3
        (the system will complain about a missing locking directory,
        just ignore it.)
   3.4. Open the encrypted device:
        # echo 123|cryptsetup open --key-file - /dev/vda name

        The Core 20 images contain the following udev rule which causes
        the new block device to be mounted automatically. This mount is
        what triggers the BUG:
        ACTION=="add", SUBSYSTEM=="block", KERNEL!="loop*", KERNEL!="ram*" \
        RUN+="/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/%k"
   3.5. Read the crash message

The attached screenshots show these steps being executed.

A few notes:

- The backtrace seems very similar to the one reported in bug #1835279, however that problem was possibly caused by a race between partition creation and LUKS formatting. This time it doesn't seem to be the case, delays between commands don't help us here.
- In the test case above using large values of KDF iter-time may prevent the crash. I successfully opened the device in kernel 5.4.0-9 with --iter-time larger than 100, but 5.4.0-11 seems to require values closer to 1000. Regardless of the --iter-time value used, the crash always happen when running the test in a spread-driven automated environment (same kernel with image built in the same way, some other variable seems to be disturbing the system).
- All necessary modules are loaded before the LUKS partition creation (i.e. it doesn't seem to be caused by a race between dm-crypt loading and cryptsetup luksFormat for example).

CVE References

Revision history for this message
Claudio Matsuoka (cmatsuoka) wrote :
Revision history for this message
Claudio Matsuoka (cmatsuoka) wrote :
Revision history for this message
Claudio Matsuoka (cmatsuoka) wrote :

Snap versions used to build the image:

pc-kernel_374.snap
pc_83.snap
snapd_6113.snap
core20_322.snap

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1860231

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Claudio Matsuoka (cmatsuoka) wrote :

This model file is used to build both the spread test image and the manually built image. The two generated images, however, seem to behave differently regarding the conditions leading to the crash: while it always happen in the spread test, higher KDF iteration times values allow the encrypted device to be opened correctly in the image created using the steps described in the bug description.

Michael Vogt also reports that the crash doesn't happen in a classic system running the same kernel version.

Andrea Righi (arighi)
Changed in linux (Ubuntu):
assignee: nobody → Andrea Righi (arighi)
Tyler Hicks (tyhicks)
description: updated
Revision history for this message
Andrea Righi (arighi) wrote :

After a first look at the kernel bug trace, it seems that q->make_request_fn(q, bio) (block/blk-core.c:1064) became NULL.

The reason might be a race with a block device not yet properly initialized when some I/O requests were submitted (or a block device de-registered too early while some I/O was still in progress), but, considering it was triggered upon a mount, I would say the former scenario is more likely to be the case.

I haven't noticed any potentially related fix in DM or in the core block layer. I'll keep investigating.

Revision history for this message
Michael Vogt (mvo) wrote :

I was able to reproduce this with the attached snapd which is essentialls PR#7999 plus the following patch:
"""
diff --git a/cmd/snap/cmd_auto_import.go b/cmd/snap/cmd_auto_import.go
index 7408371e11..f6b8f1d0d0 100644
--- a/cmd/snap/cmd_auto_import.go
+++ b/cmd/snap/cmd_auto_import.go
@@ -38,7 +38,6 @@ import (
        "github.com/snapcore/snapd/i18n"
        "github.com/snapcore/snapd/logger"
        "github.com/snapcore/snapd/osutil"
- "github.com/snapcore/snapd/release"
 )

 const autoImportsName = "auto-import.assert"
@@ -264,11 +263,6 @@ func (x *cmdAutoImport) Execute(args []string) error {
                return ErrExtraArgs
        }

- if release.OnClassic && !x.ForceClassic {
- fmt.Fprintf(Stderr, "auto-import is disabled on classic\n")
- return nil
- }
-
        for _, path := range x.Mount {
                // udev adds new /dev/loopX devices on the fly when a
                // loop mount happens and there is no loop device left.
"""

and then running the following spread test as shell code:
"""
    echo "Setup the image as a block device"
    # without -P this test will not work, then /dev/loop1p? will be missing
    losetup -fP fake.img
    losetup -a |grep fake.img|cut -f1 -d: > loop.txt
    LOOP="$(cat loop.txt)"

    echo "Create an empty partition header"
    echo "label: gpt" | sfdisk "$LOOP"

    echo "Get the UC20 gadget"
    snap download --channel=20/edge pc

    unsquashfs -d gadget-dir pc_*.snap
    LOOP="$(cat loop.txt)"
    echo "Run the snap-bootstrap tool"
    /usr/lib/snapd/snap-bootstrap create-partitions --encrypt --mount --key-file keyfile ./gadget-dir "$LOOP"
"""

Revision history for this message
Michael Vogt (mvo) wrote :

I reproduced it successfully with the following spread commandline using PR#7999 plus the patch in the previous comment:

$ spread -debug qemu:ubuntu-20.04-64:tests/main/uc20-snap-recovery-encrypt

Revision history for this message
Stefan Bader (smb) wrote :

With additional data it is basically a bug in either the mount syscall, the generic_make_request_checks, or dm.c. Basically device-mapper is set up in two stages, the initial device creation and the table load. Somehwere around v4.1 things were changed to defer setting the make-request function of the device (queue) to when the mapping table gets loaded.

One can create such a intermediate setup using "dmsetup create -n <name>". Then a "mount /dev/dm-?" triggers the bug. Since generic_make_request_checks has a check for device->queue == NULL but not for device->queue->make_request_fn == NULL.

Interestingly neither blkid nor dd would trigger this. Likely because they first check the size which is still 0 at that time. Only mount seems to go off and try to read superblock info regardless.

Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → High
Tyler Hicks (tyhicks)
description: updated
description: updated
Revision history for this message
Tyler Hicks (tyhicks) wrote :
Changed in linux (Ubuntu):
assignee: Andrea Righi (arighi) → Stéphane Graber (stgraber)
assignee: Stéphane Graber (stgraber) → Stefan Bader (smb)
Revision history for this message
Tyler Hicks (tyhicks) wrote :

Upstream submission:

  https://<email address hidden>/T/#t

tags: added: verification-needed-focal
Revision history for this message
Stefan Bader (smb) wrote :

Upstream fixed this in device-mapper with:

Author: Mike Snitzer <email address hidden>
  dm: fix potential for q->make_request_fn NULL pointer

This is to be included in:

Xenial: Ubuntu-4.4.0-177.207 (committed)
Bionic: Ubuntu-4.15.0-92.93 (committed, not prepared yet)
Eoan: Ubuntu-5.3.0-43.35 (committed)
Focal: Ubuntu-5.4.0-15.18 (released, revert of SAUCE committed)

Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Changed in linux (Ubuntu Bionic):
status: New → Fix Committed
Changed in linux (Ubuntu Disco):
status: New → Fix Committed
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Changed in linux (Ubuntu Disco):
importance: Undecided → High
Changed in linux (Ubuntu Eoan):
status: New → Fix Committed
Changed in linux (Ubuntu):
status: Triaged → Fix Released
Changed in linux (Ubuntu Eoan):
importance: Undecided → High
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
Khaled El Mously (kmously) wrote :

@Claudio: Does this bug need to be verified? Would you be able to verify this bug on any of the -proposed kernels?

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (49.1 KiB)

This bug was fixed in the package linux - 5.3.0-46.38

---------------
linux (5.3.0-46.38) eoan; urgency=medium

  * eoan/linux: 5.3.0-43.36 -proposed tracker (LP: #1867301)

  * Fix AMD Stoney Ridge screen flickering under 4K resolution (LP: #1864005)
    - iommu/amd: Disable IOMMU on Stoney Ridge systems

  * Allow BPF tracing under lockdown (LP: #1868626)
    - Revert "UBUNTU: SAUCE: (efi-lockdown) Lock down kprobes"
    - Revert "bpf: Restrict bpf when kernel lockdown is in confidentiality mode"

  * Missing wireless network interface after kernel 5.3.0-43 upgrade with eoan
    (LP: #1868442)
    - iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts

  * iSCSI-target: Deleting a LUN hangs in the kernel (LP: #1862682)
    - scsi: Revert "target/core: Inline transport_lun_remove_cmd()"

  * Stop using get_scalar_status command in Dell AIO uart backlight driver
    (LP: #1865402)
    - SAUCE: platform/x86: dell-uart-backlight: add get_display_mode command

  * Eoan update: upstream stable patchset 2020-03-11 (LP: #1867051)
    - Revert "drm/sun4i: dsi: Change the start delay calculation"
    - ovl: fix lseek overflow on 32bit
    - kernel/module: Fix memleak in module_add_modinfo_attrs()
    - media: iguanair: fix endpoint sanity check
    - ocfs2: fix oops when writing cloned file
    - x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
    - udf: Allow writing to 'Rewritable' partitions
    - printk: fix exclusive_console replaying
    - iwlwifi: mvm: fix NVM check for 3168 devices
    - sparc32: fix struct ipc64_perm type definition
    - cls_rsvp: fix rsvp_policy
    - gtp: use __GFP_NOWARN to avoid memalloc warning
    - l2tp: Allow duplicate session creation with UDP
    - net: hsr: fix possible NULL deref in hsr_handle_frame()
    - net_sched: fix an OOB access in cls_tcindex
    - net: stmmac: Delete txtimer in suspend()
    - bnxt_en: Fix TC queue mapping.
    - tcp: clear tp->total_retrans in tcp_disconnect()
    - tcp: clear tp->delivered in tcp_disconnect()
    - tcp: clear tp->data_segs{in|out} in tcp_disconnect()
    - tcp: clear tp->segs_{in|out} in tcp_disconnect()
    - rxrpc: Fix use-after-free in rxrpc_put_local()
    - rxrpc: Fix insufficient receive notification generation
    - rxrpc: Fix missing active use pinning of rxrpc_local object
    - rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect
    - media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
    - mfd: dln2: More sanity checking for endpoints
    - ipc/msg.c: consolidate all xxxctl_down() functions
    - tracing: Fix sched switch start/stop refcount racy updates
    - rcu: Avoid data-race in rcu_gp_fqs_check_wake()
    - brcmfmac: Fix memory leak in brcmf_usbdev_qinit
    - usb: typec: tcpci: mask event interrupts when remove driver
    - usb: gadget: legacy: set max_speed to super-speed
    - usb: gadget: f_ncm: Use atomic_t to track in-flight request
    - usb: gadget: f_ecm: Use atomic_t to track in-flight request
    - ALSA: usb-audio: Fix endianess in descriptor validatio...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.9 KiB)

This bug was fixed in the package linux - 4.4.0-177.207

---------------
linux (4.4.0-177.207) xenial; urgency=medium

  * xenial/linux: 4.4.0-177.207 -proposed tracker (LP: #1867243)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts

  * Xenial update: 4.4.214 upstream stable release (LP: #1864775)
    - media: iguanair: fix endpoint sanity check
    - x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
    - sparc32: fix struct ipc64_perm type definition
    - ASoC: qcom: Fix of-node refcount unbalance to link->codec_of_node
    - cls_rsvp: fix rsvp_policy
    - net: hsr: fix possible NULL deref in hsr_handle_frame()
    - net_sched: fix an OOB access in cls_tcindex
    - tcp: clear tp->total_retrans in tcp_disconnect()
    - tcp: clear tp->segs_{in|out} in tcp_disconnect()
    - media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
    - mfd: dln2: More sanity checking for endpoints
    - brcmfmac: Fix memory leak in brcmf_usbdev_qinit
    - usb: gadget: legacy: set max_speed to super-speed
    - usb: gadget: f_ncm: Use atomic_t to track in-flight request
    - usb: gadget: f_ecm: Use atomic_t to track in-flight request
    - ALSA: dummy: Fix PCM format loop in proc output
    - lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()
    - powerpc/pseries: Advance pfn if section is not present in lmb_is_removable()
    - mmc: spi: Toggle SPI polarity, do not hardcode it
    - PCI: keystone: Fix link training retries initiation
    - crypto: api - Check spawn->alg under lock in crypto_drop_spawn
    - scsi: qla2xxx: Fix mtcp dump collection failure
    - power: supply: ltc2941-battery-gauge: fix use-after-free
    - of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
    - dm space map common: fix to ensure new block isn't already in use
    - crypto: pcrypt - Do not clear MAY_SLEEP flag in original request
    - crypto: api - Fix race condition in crypto_spawn_alg
    - crypto: picoxcell - adjust the position of tasklet_init and fix missed
      tasklet_kill
    - btrfs: set trans->drity in btrfs_commit_transaction
    - ARM: tegra: Enable PLLP bypass during Tegra124 LP1
    - mwifiex: fix unbalanced locking in mwifiex_process_country_ie()
    - sunrpc: expiry_time should be seconds not timeval
    - KVM: x86: Refactor prefix decoding to prevent Spectre-v1/L1TF attacks
    - KVM: x86: Protect DR-based index computations from Spectre-v1/L1TF attacks
    - KVM: x86: Protect kvm_hv_msr_[get|set]_crash_data() from Spectre-v1/L1TF
      attacks
    - KVM: x86: Protect ioapic_write_indirect() from Spectre-v1/L1TF attacks
    - KVM: x86: Protect MSR-based index computations in pmu.h from Spectre-v1/L1TF
      attacks
    - KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks
    - KVM: x86: Protect MSR-based index computations from Spectre-v1/L1TF attacks
      in x86.c
    - KVM: x86: Protect x86_decode_insn from Spectre-v1/L1TF attacks
    - KVM: x86: Protect MSR-based index computations in fixed_msr_to_seg_unit()
      from Spectre-v1/L1TF attacks
    - KVM: PPC: Book3S HV: Uninit vCPU if vcore creation fails
    - KVM:...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (27.7 KiB)

This bug was fixed in the package linux - 4.15.0-96.97

---------------
linux (4.15.0-96.97) bionic; urgency=medium

  * CVE-2020-8834
    - KVM: PPC: Book3S HV: Factor fake-suspend handling out of
      kvmppc_save/restore_tm
    - KVM: PPC: Book3S PR: Move kvmppc_save_tm/kvmppc_restore_tm to separate file
    - KVM: PPC: Book3S PR: Add guest MSR parameter for
      kvmppc_save_tm()/kvmppc_restore_tm()

linux (4.15.0-94.95) bionic; urgency=medium

  * bionic/linux: 4.15.0-94.95 -proposed tracker (LP: #1868984)

  * Missing wireless network interface after kernel 5.3.0-43 upgrade with eoan
    (LP: #1868442)
    - iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices

linux (4.15.0-93.94) bionic; urgency=medium

  * bionic/linux: 4.15.0-93.94 -proposed tracker (LP: #1868764)

  * quotactl04 from ubuntu_ltp_syscalls failed with B (LP: #1868665)
    - ext4: fix mount failure with quota configured as module

linux (4.15.0-92.93) bionic; urgency=medium

  * bionic/linux: 4.15.0-92.93 -proposed tracker (LP: #1867272)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts

  * Introduce the new NVIDIA 440 series, and add 5.4 Linux compatibility to the
    340 and 390 series (LP: #1854485)
    - [Packaging] NVIDIA -- add support for the 435 and the 440 series

  * Stop using get_scalar_status command in Dell AIO uart backlight driver
    (LP: #1865402)
    - SAUCE: platform/x86: dell-uart-backlight: add get_display_mode command

  * Bionic update: upstream stable patchset 2020-03-12 (LP: #1867194)
    - RDMA/core: Fix locking in ib_uverbs_event_read
    - gpio: zynq: Report gpio direction at boot
    - arm64: ptrace: nofpsimd: Fail FP/SIMD regset operations
    - KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests
    - KVM: arm: Make inject_abt32() inject an external abort instead
    - mtd: onenand_base: Adjust indentation in onenand_read_ops_nolock
    - mtd: sharpslpart: Fix unsigned comparison to zero
    - padata: fix null pointer deref of pd->pinst
    - Input: synaptics - switch T470s to RMI4 by default
    - Input: synaptics - enable SMBus on ThinkPad L470
    - Input: synaptics - remove the LEN0049 dmi id from topbuttonpad list
    - ALSA: hda/realtek - Fix silent output on MSI-GL73
    - ALSA: usb-audio: Apply sample rate quirk for Audioengine D1
    - arm64: cpufeature: Set the FP/SIMD compat HWCAP bits properly
    - ALSA: usb-audio: sound: usb: usb true/false for bool return type
    - ext4: don't assume that mmp_nodename/bdevname have NUL
    - ext4: fix support for inode sizes > 1024 bytes
    - ext4: fix checksum errors with indexed dirs
    - ext4: add cond_resched() to ext4_protect_reserved_inode
    - ext4: improve explanation of a mount failure caused by a misconfigured
      kernel
    - Btrfs: fix race between using extent maps and merging them
    - btrfs: ref-verify: fix memory leaks
    - btrfs: print message when tree-log replay starts
    - btrfs: log message when rw remount is attempted with unclean tree-log
    - arm64: ssbs: Fix context-switch when SSBS is present on all CPUs
    - perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's ev...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Steve Langasek (vorlon)
Changed in linux (Ubuntu Disco):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.