fully support package installation in systemd

Bug #1576692 reported by Scott Moser
82
This bug affects 10 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Critical
Unassigned
cloud-init (Ubuntu)
Fix Released
Critical
Unassigned
Xenial
Fix Released
High
Unassigned
init-system-helpers (Ubuntu)
Fix Released
High
Martin Pitt
Xenial
Fix Released
High
Unassigned

Bug Description

in cloud-init users can install packages via cloud-config:
#cloud-config
packages: [apache2]

Due to some intricacies of systemd and service installation that doesn't work all that well.
We fixed the issue for simple services that do not have any dependencies on other services, or at least don't check those dependencies well under bug 1575572.

We'd like to have a way to fully support this in cloud-init.

Related bugs:
 * bug 1575572: apache2 fails to start if installed via cloud config (on Xenial)
 * bug 1611973: postgresql@9.5-main service not started if postgres installed via cloud-init
 * bug 1621336: snapd.boot-ok.service hangs eternally on cloud image upgrades (snapd packaging bug, but this cloud-init fix will workaround it)
 * bug 1620780: dev-sda2.device job running and times out
 * bug 1623570: Azure: cannot start walinux agent (Transaction order is cyclic.)
 * bug 1623868: cloud-final.service does not run due to dependency cycle
 * bug 1627436: [gce] Startup scripts do not run on 1604 images
 * bug 1624596: [azure] ephemeral-disk-warning.service causes ordering cycle on multi-user.target

SRU INFORMATION
===============
FIX for init-system-helpers: https://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=1460d6a02

REGRESSION POTENTIAL for init-system-helpers: This changes invoke-rc.d and service, two very central pieces of packaging infrastructure. Errors in it will break installation/upgrades of packages or /etc/network/if-up.d/ hooks and the like. This changes the condition when systemd units get started without their dependencies, and the condition gets weakened. This means that behaviour in a booted system is unchanged, but during boot this could change the behaviour of if-up.d/ hooks (although they have never been defined well during boot anyway). However, I tested this change extensively in cloud images and desktop installations (particularly I recreated https://bugs.debian.org/777113 and confirmed that this approach also fixes it) and could not find any regression.

TEST CASE (for both packages):
Run
   lxc launch ubuntu-daily:x --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1

This will install all three packages, but "systemctl status postgresql@9.5-main" will not be running.

Now prepare a new image with the proposed cloud-init and init-system-helpers:

   lxc launch ubuntu-daily:x xprep
   lxc exec xprep bash
   # enable -proposed and dist-upgrade, then poweroff
   lxc publish xprep x-proposed

Now run the initial lxc launch again, but against that new x-proposed image instead of the standard daily:

  lxc launch x-proposed --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1

You should now have "systemctl status postgresql@9.5-main" running. Directly after rebooting the instance, check that there are no hanging jobs (systemctl list-jobs), particularly networking.service, to ensure that https://bugs.debian.org/777113 did not come back.

Also test interactively installing a package that ships a service, like "apache2", and verify that it starts properly after installation.

Verify that journalctl shows no dependency cycles and that all cloud init services and the target are active:

$ systemctl list-units --no-legend --all 'cloud*'
cloud-config.service loaded active exited Apply the settings specified in cloud-config
cloud-final.service loaded active exited Execute cloud user/final scripts
cloud-init-local.service loaded active exited Initial cloud-init job (pre-networking)
cloud-init.service loaded active exited Initial cloud-init job (metadata service crawler)
cloud-config.target loaded active active Cloud-config availability
cloud-init.target loaded active active Cloud-init target

Related branches

Scott Moser (smoser)
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Martin Pitt (pitti) wrote :

Would it be possible to move package installation into a cloud-init-*.service that is "Type=idle", i. e. runs after booting is complete? This would avoid all these corner cases and breakage when trying to install/start things while booting is not complete yet.

description: updated
Revision history for this message
Neil Wilson (neil-aldur) wrote :

Martin,

Probably worth noting that this impacts upon the configuration systems as well. I'm using the PostgreSQL puppet configuration system, and that will sit in a loop waiting for PostgresQL to come up before moving onto the next stage of the configuration.

So if you are using puppet within cloud-init, and cloud events delay the start event until the boot is complete, then the configurator that expects things to happen in sequence will break.

It looks to me that large chunks of cloud-init need to be moved so it runs after 'multi-user.target' has been reached, not just package installation.

Revision history for this message
Scott Moser (smoser) wrote :

An update to this, I think for the moment the plan is to move many of the config modules that run in 'config_modules' to 'final_modules' and to move final_modules to run as idle.

I dont love it, but it seems like the only actual path to package installation to work.

Jon Grimm (jgrimm)
Changed in cloud-init (Ubuntu):
importance: Medium → Critical
Changed in cloud-init:
importance: Medium → Critical
Dave Chiluk (chiluk)
tags: added: sts
Scott Moser (smoser)
description: updated
Revision history for this message
Ryan Harper (raharper) wrote :

It's worth mentioning the scope of the package upgrade/install issue related to systemd.

For packages like apache2 which do not use dependent systemd service files, those service packages install and start properly.

For packages with dependent service files, like postgresql (it has both a postgresql.service (a dummy) and systemd generator service which creates a postgresql@<version>-<dbname> service; they currently do not start automatically due to being installed during cloud-init's cloud-config.service unit execution.

W.r.t package upgrades, the issue is scoped to service packages in the image which have an update that's not in the current image and also require an update to systemd services.

Revision history for this message
Scott Moser (smoser) wrote :

I have a branch at https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bug/1576692 which:
a.) moves cloud-init-final to be Type=idle
b.) moves config modules such as package-update-upgrade-install to run in the final_modules

I've patched a lxc container with that, and then launched several instances. My experience is that out of 5 containers 2 or 3 of them will have a running postgres at the end (per systemctl status postgresql@9.5-main).

The user-data I'm providing is just:
 #cloud-config
 packages: [postgresql]
 runcmd:
   - "systemctl status postgresql@9.5-main > /run/my-status || true"

Then you can just look at /run/my-status.

To start a patched image what you can do is:
 n=y1
 lxc init ubuntu-daily:yakkety $n "--config=user.user-data=$(cat my.user-data)"
 lxc-chroot $n -- sh -ec 'd=/tmp/my.deb; trap "rm -f $d" EXIT; cat > $d && dpkg -i $d' < "$deb"
 lxc start $n

Revision history for this message
Martin Pitt (pitti) wrote :

I cannot see failed containers in your cloud instance, nor reproduce the failure by starting new ones.

I have also created and run http://people.canonical.com/~pitti/tmp/psql-idle.sh on my laptop and your cloud instance for 40 iterations, and I couldn't reproduce a failure. This takes cloud-init out of the equation and just tests running apt install postgresq in a Type=idle unit (plus some glue around it to wait for booting and iterate). So I'm fairly sure that this approach works in principle -- but of course with more moving parts there's more that can go wrong.

Revision history for this message
Martin Pitt (pitti) wrote :

The bit that I have doubts about in https://git.launchpad.net/~smoser/cloud-init/commit/?h=bug/1576692&id=6a249689a179f is why "runcmd" still runs in cloud_config_modules -- it's arbitrary code which might (and often does) run package installs, so it should really live in cloud_final_modules as well?

Revision history for this message
Martin Pitt (pitti) wrote :

Scott and I debugged this further, and the best hint so far is bug 1620780. In Scott's local instances he gets "systemctl is-system-running" == "starting" with

JOB UNIT TYPE STATE
  2 dev-sda2.device start running

postgresql-9.5.postinst calls "invoke-rc.d postgresql start", and since the system is not booted yet (according to is-system-running) it starts the postgresql.service wrapper job with --job-mode=ignore-dependencies, and thus it never starts the @9.5-main instance.

I suggest to handle this bit (which is LXD specific) in bug 1620780, and keep this bug for the cloud-init change.

Revision history for this message
Martin Pitt (pitti) wrote :

Type=idle not waiting for running *.device jobs is related, but not identical to bug 1620780. I filed that as bug 1621846. Those are both on the systemd side, and for the cases where bug 1620780 does not hit (and thus bug 1621846 does not happen), it has been demonstrated that moving these units after the boot process works in principle.

Revision history for this message
Martin Pitt (pitti) wrote :

Pre-weekend braindump: I've had success with modifying a xenial image like that:

/usr/sbin/invoke-rc.d:

               if ! systemctl --quiet is-active default.target; then
                    sctl_args="--job-mode=ignore-dependencies"
               fi

Add multi-user.target to cloud-{config,final}.service

Then this works:

  lxc launch ubuntu-xenial-mod --config=user.user-data="$(printf "#cloud-config\npackages: [postgresql, samba, postfix]")" x1

Services start:

# systemctl list-units --no-legend postg* samb* postfix*
postfix.service loaded active running LSB: Postfix Mail Transport Agent
postgresql.service loaded active exited PostgreSQL RDBMS
postgresql@9.5-main.service loaded active running PostgreSQL Cluster 9.5-main
samba-ad-dc.service loaded active exited LSB: start Samba daemons for the AD DC

and reboot works fine.

Revision history for this message
Martin Pitt (pitti) wrote :

> Add multi-user.target to cloud-{config,final}.service

Sorry, I meant to add After=multi-user.target

Revision history for this message
Martin Pitt (pitti) wrote :

Change invoke-rc.d to check is-active multi-user.target instead of is-system-running, to match After=multi-user.target.
Also, always use --no-block for reload as an additional line of defence for if-up.d/ scripts, as reload has never been synchronous.

Changed in cloud-init (Ubuntu):
status: Confirmed → In Progress
Changed in init-system-helpers (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Martin Pitt (pitti) wrote :

For invoke-rc.d: original commit https://anonscm.debian.org/cgit/collab-maint/sysvinit.git/commit/?id=38e2b9fca

Try to reproduce the hangs in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777113 when completely removing the hack, then ensure that they go away again with is-active m-u.target

Changed in init-system-helpers (Ubuntu):
milestone: none → ubuntu-16.09
Scott Moser (smoser)
Changed in init-system-helpers (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
Changed in cloud-init (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.7-28-g34a26f7-0ubuntu1

---------------
cloud-init (0.7.7-28-g34a26f7-0ubuntu1) yakkety; urgency=medium

  * New upstream snapshot.
    - systemd: Better support package and upgrade.
      (LP: #1576692, #1621336)
    - tests: cleanup tempdirs in apt_source tests

 -- Scott Moser <email address hidden> Fri, 09 Sep 2016 16:01:13 -0400

Changed in cloud-init (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

Just a comment / status here.
cloud-init is now running the modules that do package installation after multi-user.target. the plan is to change init-system-helpers as pitti described in comment 10. Until that is fixed, the problem isn't really fixed.

Revision history for this message
Martin Pitt (pitti) wrote :

@Scott: Oh, does that actually work with adding the After= to just cloud-final.service? I thought the thing that *actually* does the package installs is cloud-config.service, and this needed that After= as well?

Martin Pitt (pitti)
description: updated
Revision history for this message
Martin Pitt (pitti) wrote :
Changed in init-system-helpers (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

i-s-h SRU uploaded.

description: updated
Changed in init-system-helpers (Ubuntu Xenial):
status: Confirmed → In Progress
Revision history for this message
Scott Moser (smoser) wrote :

Martin,
I moved all things that do package installation into final. The user could still manage to have some things run at 'config' point in boot that would install packages, but anything that does it in cloud-init directly is now part of final.

Revision history for this message
Scott Moser (smoser) wrote :

fixed in 0.7.8.

Changed in cloud-init:
status: Confirmed → Fix Released
Scott Moser (smoser)
Changed in cloud-init (Ubuntu Xenial):
status: Confirmed → In Progress
Revision history for this message
Dave Chiluk (chiluk) wrote :

@smoser

Did you commit your changes to the xenial cloud-init as well? I'm not sure where xenial images grab cloud init for themselves, but I assume out of the xenial archives. Am I missing something here?

Revision history for this message
Patricia Gaughen (gaughen) wrote :

We use what's in the archive for what we include in cloud images. So once cloud-init lands, it will makes it way to an image. I would expect that Scott working his way through the SRU process.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package init-system-helpers - 1.44

---------------
init-system-helpers (1.44) unstable; urgency=medium

  * invoke-rc.d, service: Check for multi-user.target instead of graphical.target.
    There is a curious bug which sometimes causes "systemctl is-active
    default.target" to say inactive until "show" or "status" gets called on
    the unit. This needs to be investigated. Until then, check for
    multi-user.target which by and large does the same job, but seems to work
    reliably.

 -- Martin Pitt <email address hidden> Mon, 12 Sep 2016 22:52:23 +0200

Changed in init-system-helpers (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Scott, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.7-31-g65ace7b-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cloud-init (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Neil Wilson (neil-aldur) wrote :

Have we back ported the init-system-helpers changes to Xenial?

I'm only seeing 1.29ubuntu2 this morning.

Revision history for this message
Martin Pitt (pitti) wrote :

init-system-helpers is still sitting in the SRU queue and needs to be reviewed/accepted.

Revision history for this message
Andy Whitcroft (apw) wrote :

Hello Scott, or anyone else affected,

Accepted init-system-helpers into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/init-system-helpers/1.29ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in init-system-helpers (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Neil Wilson (neil-aldur) wrote :

Added both cloud-ini t and init-system-helpers from proposed to the standard Xenial cloud image (com.ubuntu.cloud:released:download/com.ubuntu.cloud:server:16.04:amd64/20160907.1/disk1.img) on a suitably sized server.

Reset the cloud init with rm -rf /var/lib/cloud/instances/*, shutdown the server and snapshotted the image.

Rebuilt a new server from the snapshotted image using the previously failing postgresql user data and all is well. The new packages correct my problem - bug 1611973

Revision history for this message
Scott Moser (smoser) wrote :

Thank you Neil!

I've been going through my testing here, and found:
* bug 1623570: Azure: cannot start walinux agent (Transaction order is cyclic.)

That will require us to get that fix in and through proposed or we will break Azure boot. Its fallout of the systemd ordering.

description: updated
description: updated
Scott Moser (smoser)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

I just filed bug 1623868 which is fallout from this change, so blocking this SRU for now.

tags: added: verification-failed
removed: verification-done
Martin Pitt (pitti)
description: updated
Revision history for this message
Martin Pitt (pitti) wrote :

Hello Scott, or anyone else affected,

Accepted cloud-init into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cloud-init/0.7.8-1-g3705bb5-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-failed
tags: added: verification-needed
Scott Moser (smoser)
description: updated
Revision history for this message
Scott Moser (smoser) wrote :

verified with:
printf "#cloud-config\npackages: [postgresql, samba, postfix]\n" > user-data
n=x1
lxc launch ubuntu-daily:xenial $n
sleep 10
lxc exec $n -- sh -c '
    p=/etc/apt/sources.list.d/proposed.list
    echo deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc)-proposed main > "$p" &&
    apt-get update -q && apt-get -qy install cloud-init'

lxc file push - $n/etc/cloud/cloud.cfg.d/update.cfg < user-data

## clean it out so next is first boot.
lxc exec $n -- sh -c '
  cd /var/lib/cloud && for d in *; do [ "$d" = "seed" ] || rm -Rf "$d"; done
  rm -Rf /var/log/cloud-init*'

lxc exec $n reboot
lxc exec $n -- tail -f /var/log/cloud-init-output.log

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.9 KiB)

This bug was fixed in the package cloud-init - 0.7.8-1-g3705bb5-0ubuntu1~16.04.1

---------------
cloud-init (0.7.8-1-g3705bb5-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * New upstream release 0.7.8.
  * New upstream snapshot.
    - systemd: put cloud-init.target After multi-user.target (LP: #1623868)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.2) xenial-proposed; urgency=medium

  * debian/control: add Breaks of older versions of walinuxagent (LP: #1623570)

cloud-init (0.7.7-31-g65ace7b-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/control: fix missing dependency on python3-serial,
    and make SmartOS datasource work.
  * debian/cloud-init.templates fix capitalisation in template so
    dpkg-reconfigure works to select OpenStack. (LP: #1575727)
  * d/README.source, d/control, d/new-upstream-snapshot, d/rules: sync
    with yakkety for changes due to move to git.
  * d/rules: change PYVER=python3 to PYVER=3 to adjust to upstream change.
  * debian/rules, debian/cloud-init.install: remove install file
    to ensure expected files are collected into cloud-init deb.
    (LP: #1615745)
  * debian/dirs: remove obsolete / unused file.
  * upstream move from bzr to git.
  * New upstream snapshot.
    - Allow link type of null in network_data.json [Jon Grimm] (LP: #1621968)
    - DataSourceOVF: fix user-data as base64 with python3 (LP: #1619394)
    - remove obsolete .bzrignore
    - systemd: Better support package and upgrade. (LP: #1576692, #1621336)
    - tests: cleanup tempdirs in apt_source tests
    - apt config conversion: treat empty string as not provided. (LP: #1621180)
    - Fix typo in default keys for phone_home [Roland Sommer] (LP: #1607810)
    - salt minion: update default pki directory for newer salt minion.
      (LP: #1609899)
    - bddeb: add --release flag to specify the release in changelog.
    - apt-config: allow both old and new format to be present.
      [Christian Ehrhardt] (LP: #1616831)
    - python2.6: fix dict comprehension usage in _lsb_release. [Joshua Harlow]
    - Add a module that can configure spacewalk. [Joshua Harlow]
    - add install option for openrc [Matthew Thode]
    - Generate a dummy bond name for OpenStack (LP: #1605749)
    - network: fix get_interface_mac for bond slave, read_sys_net for ENOTDIR
    - azure dhclient-hook cleanups
    - Minor cleanups to atomic_helper and add unit tests.
    - Fix Gentoo net config generation [Matthew Thode]
    - distros: fix get_primary_arch method use of os.uname [Andrew Jorgensen]
    - Apt: add new apt configuration format [Christian Ehrhardt]
    - Get Azure endpoint server from DHCP client [Brent Baude]
    - DigitalOcean: use the v1.json endpoint [Ben Howard]
    - MAAS: add vendor-data support (LP: #1612313)
    - Upgrade to a configobj package new enough to work [Joshua Harlow]
    - ConfigDrive: recognize 'tap' as a link type. (LP: #1610784)
    - NoCloud: fix bug providing network-interfaces via meta-data.
      (LP: 1577982)
    - Add distro tags on config modules that should have it [Joshua Harlow]
    - ChangeLog: update changelog for previous commit.
    - add ntp config module [Ryan Harper]
    - SmartOS: more improvement...

Read more...

Changed in cloud-init (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for cloud-init has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Scott Moser (smoser)
description: updated
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package init-system-helpers - 1.29ubuntu3

---------------
init-system-helpers (1.29ubuntu3) xenial-proposed; urgency=medium

  * invoke-rc.d, service: Only ignore systemd unit dependencies before
    multi-user.target. "systemctl is-system-running" might still be false in
    case of running jobs for device/mount/hotplug/dynamic actions units. But
    in those cases we already do want to respect unit dependencies, as the
    system is booted up sufficiently to avoid dependency loops. Thus weaken
    the condition to "multi-user.target is active".

    This does not change the behaviour for single-user: is-system-running has
    always been false there, so dependencies continue to be ignored.

    Fixes installation of packages like PostgreSQL under cloud-init or when
    manually installing packages right after booting.

    LP: #1576692

 -- Martin Pitt <email address hidden> Mon, 12 Sep 2016 10:57:57 +0200

Changed in init-system-helpers (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.