Cryptswap periodically fails to mount at boot due to missing a udev notification

Bug #1838329 reported by Michael Aaron Murphy
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd
New
Unknown
systemd (Ubuntu)
Fix Released
High
Unassigned
Bionic
Confirmed
Undecided
Unassigned
Focal
Fix Released
High
Dan Streetman
Groovy
Fix Released
High
Unassigned

Bug Description

[impact]

systems using cryptsetup-based encrypted swap may hang during boot due to udevd missing the notification that swap has been setup on the newly created swap device.

[test case]

see original description, and reproduction is intermittent based on timing

[regression potential]

any regression would likely occur during, or after, boot when creating an encrypted swap device and/or while waiting to activate the new swap device. Regressions may cause failure to correctly enable swap and/or hung boot waiting for the swap device.

[scope]

this was (potentially) fixed upstream with PR 15836, which is not yet included in any upstream release, so this is needed in all releases, including groovy.

also note while the upstream bug is closed, and code review seems to indicate this *should* fix this specific issue, there are some comments in the upstream bug indicating it may not completely solve the problem, although there is no further debug of the new reports.

[original description]

On some systems, cryptsetup-based encrypted swap partitions cause systemd to get stuck at boot. This is a timing-sensitive Heisenbug, so the rate of occurrence varies from one system to another. Some hardware will not experience the issue at all, others will only occasionally experience the issue, and then there are the unlucky who are unable to boot at all, no matter how many times they restart.

The workaround is for the cryptsetup-generator to generate cryptswap service entries that call `udevadm trigger` after `mkswap`. This will ensure that the udev event is triggered, so that systemd is notified that the encrypt swap partition is ready to activate. This patch has already been submitted upstream to systemd, but it was not accepted because it is a workaround for the side effect of systemd not seeing the udev event upon creating the swap partition.

Revision history for this message
Michael Aaron Murphy (mmstick76) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "The workaround for this issue" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Dan Streetman (ddstreet) wrote :

> This patch has already been submitted upstream to systemd

What is the upstream systemd issue number?

Revision history for this message
Dan Streetman (ddstreet) wrote :

Looks like this still needs to be worked out upstream.

Revision history for this message
Michael Aaron Murphy (mmstick76) wrote :

However, it's unknown when the issue is going to be fixed. So for now I'm carrying it in Pop!_OS 18.04, 19.04, and 19.10 at the moment.

Changed in systemd:
status: Unknown → New
Dan Streetman (ddstreet)
tags: added: ddstreet
Revision history for this message
Sebastien Bacher (seb128) wrote :

Dan, the workaround seems safe and fix a real issue than some users are hitting, maybe it would make sense to distro patch include it?

Changed in systemd (Ubuntu):
status: New → Triaged
importance: Undecided → High
tags: added: rls-ff-incoming
Revision history for this message
Dan Streetman (ddstreet) wrote :

@mmstick76, while I agree with the grumblings upstream that udev should be better about race conditions like this, if we're working around it I'd prefer to flock while mkswap instead of retriggering udev...can you test with that change?

Patch below, and I have test builds for b/f here:
https://launchpad.net/~ddstreet/+archive/ubuntu/lp1838329

--- a/src/cryptsetup/cryptsetup-generator.c
+++ b/src/cryptsetup/cryptsetup-generator.c
@@ -202,8 +202,8 @@ static int create_disk(

         if (swap)
                 fprintf(f,
- "ExecStartPost=/sbin/mkswap '/dev/mapper/%s'\n",
- name_escaped);
+ "ExecStartPost=/usr/bin/flock -F '/dev/mapper/%s' /sbin/mkswap '/dev/mapper/%s'\n",
+ name_escaped, name_escaped);

         r = fflush_and_check(f);
         if (r < 0)

Changed in systemd (Ubuntu Focal):
status: New → Triaged
importance: Undecided → High
milestone: none → ubuntu-20.04.1
tags: removed: rls-ff-incoming
tags: added: id-5eb44cf735b12c4b9b721452
Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Focal):
assignee: nobody → Dan Streetman (ddstreet)
status: Triaged → In Progress
Changed in systemd (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Dan Streetman (ddstreet) wrote :

This was (possibly) fixed upstream in a similar way to comment 7:
https://github.com/systemd/systemd/pull/15836

essentially instead of calling mkswap inside flock, it calls systemd-makefs swap, which itself flocks the block device.

Dan Streetman (ddstreet)
description: updated
tags: removed: bionic ddstreet disco eoan
Changed in systemd (Ubuntu Groovy):
status: Fix Released → In Progress
status: In Progress → New
Revision history for this message
Robie Basak (racb) wrote :

@ddstreet

Does the Groovy bug task status need fixing?

I see that in your Focal upload you have a series of seven patches picked from upstream to fix this properly. But if your trivial patch in comment 7 works, wouldn't that be better for an SRU? Reference this paragraph of SRU policy:

"In line with this, the requirements for stable updates are not necessarily the same as those in the development release. When preparing future releases, one of our goals is to construct the most elegant and maintainable system possible, and this often involves fundamental improvements to the system's architecture, rearranging packages to avoid bundled copies of other software so that we only have to maintain it in one place, and so on. However, once we have completed a release, the priority is normally to minimise risk caused by changes not explicitly required to fix qualifying bugs, and this tends to be well-correlated with minimising the size of those changes. As such, the same bug may need to be fixed in different ways in stable and development releases."

How do you think this applies to this case?

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Does the Groovy bug task status need fixing?

The g MR is open and linked in this bug; I prefer to leave that up to @rbalint for the devel release, I'm not sure if he wants to take the patches or do a merge of newer systemd later.

> But if your trivial patch in comment 7 works, wouldn't that be better for an SRU?

I'm not convinced it would work (it needs to lock the parent device if the target is a partition), and it wouldn't help with cryptsetup other than swap. Hence the proper, complete, upstream patch series is required.

Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.2)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.2) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

apt/unknown (armhf)
indicator-session/17.3.20+19.10.20190921-0ubuntu1 (arm64)
dovecot/1:2.3.7.2-1ubuntu3.1 (armhf)
postgresql-12/unknown (armhf)
mir/unknown (armhf)
systemd/245.4-4ubuntu3.2 (amd64)
umockdev/unknown (armhf)
policykit-1/unknown (armhf)
asterisk/unknown (armhf)
anbox/unknown (armhf)
php7.4/unknown (armhf)
ksystemlog/unknown (armhf)
polkit-qt-1/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.8 KiB)

This bug was fixed in the package systemd - 245.6-3ubuntu3

---------------
systemd (245.6-3ubuntu3) groovy; urgency=medium

  * Rebuild against libselinux 3.0

systemd (245.6-3ubuntu2) groovy; urgency=medium

  * basic/cap-list: Print unknown capabilities in hexadecimal.
    This fixes autopkgtest running on 5.8 kernels
    (when systemd was built on an earlier one) (LP: #1885755)
    File: debian/patches/basic-cap-list-parse-print-numerical-capabilities.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=ef46ec8289df815d42c9a3fdbf9fb347226d6be4

systemd (245.6-3ubuntu1) groovy; urgency=medium

  * Merge to Ubuntu from Debian unstable
    - Dropped changes:
      * Enable EFI/bootctl on armhf.

systemd (245.6-3) unstable; urgency=medium

  [ Dan Streetman ]
  * d/t/upstream: capture new merged 'system.journal' from tests.
    https://github.com/systemd/systemd/pull/15281
  * d/t/upstream: use --directory or --file param for journalctl.
    Properly tell journalctl if the journal to parse is a dir or file.
  * d/t/storage: check for ext2 or ext4 fs when using crypttab 'tmp' option.
    https://github.com/systemd/systemd/pull/15853

  [ Martin Pitt ]
  * debian/tests/localed-locale: Fix for environments without en_US.UTF-8.
    Unconditionally back up/restore locale configuration files and generate
    en_US.UTF-8. Previously the test failed in environments which have some
    locale other than en_US.UTF-8 in /etc/default/locale.
    Also fix the assertion of /etc/locale.conf not being present after
    localectl. This only applies to Debian/Ubuntu tests, not upstream ones.

  [ Dimitri John Ledkov ]
  * Enable EFI/bootctl on armhf.

systemd (245.6-2ubuntu2) groovy; urgency=medium

  [ Balint Reczey ]
  * debian/tests/tests-in-lxd: Work around snapd.seeded.service hanging
    File: debian/tests/tests-in-lxd
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=04a3342ff533b234ccb1a1020f6d854ab0acd053

  [ Dimitri John Ledkov ]
  * ubuntu: enable CET on amd64.
    File: debian/rules
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=cc42a377e7e8c372124bcf43d9f4fb9c169f4292

  [ Dan Streetman ]
  * Lock swap blockdevice while calling mkswap (LP: #1838329)
    Files:
    - debian/patches/lp1838329/0001-blockdev-propagate-one-more-unexpected-error.patch
    - debian/patches/lp1838329/0003-dissect-use-log_debug_errno-where-appropriate.patch
    - debian/patches/lp1838329/0004-blockdev-add-helper-for-locking-whole-block-device.patch
    - debian/patches/lp1838329/0005-makefs-lock-device-while-we-operate.patch
    - debian/patches/lp1838329/0006-makefs-normalize-logging-a-bit.patch
    - debian/patches/lp1838329/0007-cryptsetup-generator-use-systemd-makefs-for-implemen.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=652a148cf1d3ecfa93cfee288c152c90caee3ac6

systemd (245.6-2ubuntu1) groovy; urgency=medium

  * Merge to Ubuntu from Debian unstable
    - Dropped changes:
      * dhclient-exit-hooks.d/timesyncd: Act only when systemd-timesyncd is enabled
  * hwdb: Mask rfkill event from intel-hid on HP platforms (LP: #1883846...

Read more...

Changed in systemd (Ubuntu Groovy):
status: New → Fix Released
Revision history for this message
Dan Streetman (ddstreet) wrote :

I wasn't able to reproduce this myself, due to the failure being dependent on timing, but I set up the reproducer from the upstream bug and rebooted several times with the proposed package, and had no problems/regressions. Marking this verified.

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.2

---------------
systemd (245.4-4ubuntu3.2) focal; urgency=medium

   [ Dan Streetman ]
   * Hotadd only offline memory and CPUs (LP: #1876018)
     File: debian/extra/rules-ubuntu/40-vm-hotadd.rules
     https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=72d815471596056b7727be5b10f87513ff1d5757
   * Lock swap blockdevice while calling mkswap (LP: #1838329)
     Files:
     - d/p/lp1838329/0001-blockdev-propagate-one-more-unexpected-error.patch
     - d/p/lp1838329/0002-makefs-log-about-OOM-condition.patch
     - d/p/lp1838329/0003-dissect-use-log_debug_errno-where-appropriate.patch
     - d/p/lp1838329/0004-blockdev-add-helper-for-locking-whole-block-device.patch
     - d/p/lp1838329/0005-makefs-lock-device-while-we-operate.patch
     - d/p/lp1838329/0006-makefs-normalize-logging-a-bit.patch
     - d/p/lp1838329/0007-cryptsetup-generator-use-systemd-makefs-for-implemen.patch
     https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=c81b75c4297cbb04554488b070b6f79996b8cceb

   [ Balint Reczey ]
   * debian/udev.postinst: Allow kvm to be an already present non-system group
     (LP: #1880541)
     File: debian/udev.postinst
     https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=8b5c31828d4323ddb719326b1316c179b7cdbdef
   * d/p/hwdb-Mask-rfkill-event-from-intel-hid-on-HP-platforms.patch:
     hwdb: Mask rfkill event from intel-hid on HP platforms
     (LP: #1883846)
     https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=164c016b466210c7d6d05963fd753eccf4679844
   * journald: stream pid change newline fix (LP: #1875708)
     Files:
     - debian/patches/lp1875708/journald-Increase-stdout-buffer-size-sooner-when-almost-f.patch
     - debian/patches/lp1875708/journald-rework-end-of-line-marker-handling-to-use-a-fiel.patch
     - debian/patches/lp1875708/journald-rework-pid-change-handling.patch
     - debian/patches/lp1875708/journald-use-log_warning_errno-where-appropriate.patch
     - debian/patches/lp1875708/journald-use-the-fact-that-client_context_release-returns.patch
     - debian/patches/lp1875708/man-document-the-new-_LINE_BREAK-type.patch
     - debian/patches/lp1875708/socket-util-introduce-type-safe-dereferencing-wrapper-CMS.patch
     - debian/patches/lp1875708/test-Add-a-test-case-for-15654.patch
     https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=2dc19f7ae4aad7277e9d89849182453ff1d046dc

 -- Dan Streetman <email address hidden> Mon, 06 Jul 2020 17:38:31 -0400

Changed in systemd (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu Bionic):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.