maas install fails inside of a 16.04 lxd container due to avahi problems

Bug #1661869 reported by Dustin Kirkland 
48
This bug affects 7 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
avahi (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
Medium
Trent Lloyd
Xenial
Fix Released
Medium
Trent Lloyd
Artful
Fix Released
Medium
Trent Lloyd
lxd (Ubuntu)
Invalid
Undecided
Unassigned
Trusty
Invalid
Undecided
Unassigned
Xenial
Invalid
Undecided
Unassigned
Artful
Invalid
Undecided
Unassigned

Bug Description

[Original Description]
The bug, and workaround, are clearly described in this mailing list thread:

https://lists.linuxcontainers.org/pipermail/lxc-users/2016-January/010791.html

I'm trying to install MAAS in a LXD container, but that's failing due to avahi package install problems. I'm tagging all packages here.

[Issue]
Avahi sets a number of rlimits on startup including the maximum number of processes (nproc=2) and limits on memory usage. These limits are hit in a number of cases - specifically the maximum process limit is hit if you run lxd containers in 'privileged' mode such that avahi has the same uid in multiple containers and large networks can trigger the memory limit.

The fix is to remove these default rlimits completely from the configuration file.

[Impact]

 * Avahi is unable to start inside of containers without UID namespace isolation because an rlimit on the maximum number of processes is set by default to 2. When a container launches Avahi, the total number of processes on the system in all containers exceeds this limit and Avahi is killed. It also fails at install time, rather than runtime due to a failure to start the service.
 * Some users also have issues with the maximum memory allocation causing Avahi to exit on networks with a large number of services as the memory limit was quite small (4MB). Refer LP #1638345

[Test Case]

 * setup lxd (apt install lxd, lxd init, get working networking)
 * lxc launch ubuntu:16.04 avahi-test --config security.privileged=true
 * lxc exec avahi-test sudo apt install avahi-daemon

This will fail if the parent host has avahi-daemon installed, however, if it does not you can setup a second container (avahi-test2) and install avahi there. That should then fail (as the issue requires 2 copies of avahi-daemon in the same uid namespace to fail)

[Regression Potential]

 * The fix removes all rlimits configured by avahi on startup, this is an extra step avahi takes that most programs did not take (limiting memory usage, running process count, etc). It's possible an unknown bug then consumes significant system resources as a result of that limit no longer being in place, that was previously hidden by Avahi crashing instead. However I believe this risk is significantly reduced as this change has been shipping upstream for many months and have not seen any reports of new problems - however it has fixed a number of existing crashes/problems.

 * The main case this may not fix the issue is if they have modified their avahi-daemon.conf file - but it will fix new installs and most installs as most users don't modify the file. And users may be prompted on upgrade to replace the file.

[Other Info]

 * This change already exists upstream in 0.7 which is in bionic. SRU required to artful, xenial, trusty.

tags: added: maas-at-home
Revision history for this message
Stéphane Graber (stgraber) wrote :

Avahi is setting some rather strict rlimits which affect everything which uses that kernel uid, crossing container boundaries and so breaking containers.

Unfortunately MAAS requires a privileged container right now, so you can't resort to uid mapping to avoid this problem. At the LXD level, all we can do to avoid this problem is to allow you to have one distinct id map per container, which we already support. But that's only going to work for unprivileged containers.

One fix could be to tweak our avahi to relax or if not that useful, entirely remove those rlimits as it's a rather frequent pain point and I'm not sure of the benefit of those rlimits in the first place.

Another fix would be to not have MAAS depend on avahi and let you install and run it without avahi, which is effectively what Brian's instructions do (as they disable avahi-daemon in the container).

Marking the LXD task Invalid, as we're already doing all we can in this regard by supporting non-overlapping id maps for unprivileged containers.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Stéphane Graber (stgraber) wrote :

(but keeping ~ubuntu-lxc subscribed to this bug)

Revision history for this message
Trent Lloyd (lathiat) wrote :

Avahi starts fine in a 16.04 container for me. Can you share what errors you are actually seeing Dustin?

lxc launch ubuntu:16.04 xenial
ssh ubuntu@<ip>
sudo apt install avahi-daemon
sudo systemctl status avahi-daemon

The post you linked is from January 2016 and on 15.10 (wily).. it does in fact not launch correctly on wily but it does fine on xenial.

On wily, setting rlimit-nproc=4 seems to fix it, for some reason rlimit-nproc=3 fails on wily though the same setting is working on xenial.

Revision history for this message
Trent Lloyd (lathiat) wrote :

Oh right, I see now.. too early to comment as usual :(

The problem is that you are setting up a "privileged" container for MAAS which does not use UID mapping, hence the issue shows up in the MAAS workflow but not with a normal container deployment.

The rlimit-nproc is simply set in /etc/avahi/avahi-daemon.conf, so can easily be tweaked in the package. I believe the idea behind it originally is basically to ensure that avahi cannot be used to execute something else, despite all the chrooting, etc - even if there was a way. Essentially blocking further forking. For that reason, probably makes most sense to simply remove the limit rather than increase it by any given number.

Revision history for this message
Trent Lloyd (lathiat) wrote :

There was previously a patch to skip setting this (because it would fail), it was removed for a couple of reasons including an upstream change not to abort of setting RLIMIT_NPROC failed:

I've committed a change upstream to simply remove the default setting of this option, and will prepare a debdiff to patch this change into Xenial:
https://github.com/lathiat/avahi/commit/537371c786479f44882ece3d905a0e5ccda4f0a2

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Since [the released version of] MAAS declares avahi-utils as a "Recommends", you can use `apt-get --no-install-recommends` as an alternate workaround. But then you'll lose zeroconf hostname discoveries in MAAS.

I understand that this has been changed to a hard dependency in MAAS 2.2, so I'm glad this is getting some attention upstream.

Changed in maas:
status: New → Invalid
Trent Lloyd (lathiat)
Changed in avahi (Ubuntu):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Trent Lloyd (lathiat)
Revision history for this message
Ryan Beisner (1chb1n) wrote :

FWIW - I'm using MAAS 2.1.3+bzr5573-0ubuntu1 (16.04.1) in a LXD container successfully in production, albeit privileged, per https://docs.ubuntu.com/maas/2.1/en/installconfig-lxd-install.

Revision history for this message
Trent Lloyd (lathiat) wrote :

Attached debdiff to remove all rlimits from the default avahi-daemon.conf. These commits have been made upstream. This also solves #1638345 because it is effectively the same fix, so I plan to solve it with the same upload. Will handle the process here.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "avahi-rlimits-artful.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

This is great, Trent!

Andres, could you get someone to look at this and merge it upstream?

Thanks!
Dustin

Changed in avahi (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
TJ (tj) wrote :

In IRC support we've been getting reports about this issue for 17.10; Can we get the SRU pushed out?

Revision history for this message
Trent Lloyd (lathiat) wrote : Re: [Bug 1661869] Re: maas install fails inside of a 16.04 lxd container due to avahi problems

I’ll follow up on this tomorrow and see what I need to get it pushed
through.

On Sun, 4 Mar 2018 at 9:50 pm, TJ <email address hidden> wrote:

> In IRC support we've been getting reports about this issue for 17.10;
> Can we get the SRU pushed out?
>
> --
> You received this bug notification because you are a member of Avahi,
> which is subscribed to avahi in Ubuntu.
> https://bugs.launchpad.net/bugs/1661869
>
> Title:
> maas install fails inside of a 16.04 lxd container due to avahi
> problems
>
> Status in MAAS:
> Invalid
> Status in avahi package in Ubuntu:
> In Progress
> Status in lxd package in Ubuntu:
> Invalid
>
> Bug description:
> The bug, and workaround, are clearly described in this mailing list
> thread:
>
> https://lists.linuxcontainers.org/pipermail/lxc-
> users/2016-January/010791.html
>
> I'm trying to install MAAS in a LXD container, but that's failing due
> to avahi package install problems. I'm tagging all packages here.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1661869/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=maas; status=Invalid; importance=Undecided;
> assignee=None;
> Launchpad-Bug: distribution=ubuntu; sourcepackage=avahi; component=main;
> status=In Progress; importance=High; <email address hidden>;
> Launchpad-Bug: distribution=ubuntu; sourcepackage=lxd; component=main;
> status=Invalid; importance=Undecided; assignee=None;
> Launchpad-Bug-Tags: maas-at-home patch
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: 1chb1n crichton kirkland lathiat mpontillo
> stgraber tj
> Launchpad-Bug-Reporter: Dustin Kirkland  (kirkland)
> Launchpad-Bug-Modifier: TJ (tj)
> Launchpad-Message-Rationale: Subscriber (avahi in Ubuntu) @avahi
> Launchpad-Message-For: avahi
>

Trent Lloyd (lathiat)
description: updated
Revision history for this message
Trent Lloyd (lathiat) wrote :

Trusty is technically not directly affected by the container proc issue as there was an Ubuntu patch dropped in xenial to skip setting rlimit-nproc when /run/container_type=lxc

Could happen if that doesn't exist though, and the memory issue can still occur, so still recommend upload.

description: updated
Revision history for this message
Trent Lloyd (lathiat) wrote :
Revision history for this message
Trent Lloyd (lathiat) wrote :
Revision history for this message
Trent Lloyd (lathiat) wrote :
Eric Desrochers (slashd)
description: updated
Changed in avahi (Ubuntu Trusty):
assignee: nobody → Trent Lloyd (lathiat)
Changed in avahi (Ubuntu Xenial):
assignee: nobody → Trent Lloyd (lathiat)
Changed in avahi (Ubuntu Artful):
assignee: nobody → Trent Lloyd (lathiat)
Changed in avahi (Ubuntu Trusty):
importance: Undecided → Medium
Changed in avahi (Ubuntu Xenial):
importance: Undecided → Medium
Changed in avahi (Ubuntu Artful):
importance: Undecided → Medium
Revision history for this message
Eric Desrochers (slashd) wrote :

Hi Trent,

Thanks for your patch.

I'll gladly sponsor your patch in Stable Release, but before I proceed can you confirm the rlimit removal already in place in avahi (devel release) fixes the above situation for Bionic (even though it has been shipped upstream for many months already)

Did you also test the patches in affected stable releases ? If yes what was the result ?

If you can confirm this, I'll then proceed with the sponsorship.

Thanks
Eric

Revision history for this message
Trent Lloyd (lathiat) wrote :

Yeah the exact same change in the 0.7 release (rlimit section removal) that is shipping in Bionic fixes the issue there, and the issue isn't present in Bionic.

I also individually tested each of the trusty/xenial/artful packages built from the supplied debdiffs to ensure the issue goes away after upgrade to the rebuilt package - and it did with the exception of trusty where the number of open files issue doesn't occur because of an Ubuntu patch to not set that rlimit on LXC containers (where /run/container_type=lxc). However trusty still needs the rlimit-data removal so I applied the exact same changes there.

Revision history for this message
Eric Desrochers (slashd) wrote :

Ok I'll sponsor the debdiff this week.

Revision history for this message
Eric Desrochers (slashd) wrote :

Sponsored for T/X/A

Thanks

Eric

Eric Desrochers (slashd)
Changed in lxd (Ubuntu Trusty):
status: New → Invalid
Changed in lxd (Ubuntu Xenial):
status: New → Invalid
Changed in lxd (Ubuntu Artful):
status: New → Invalid
Changed in avahi (Ubuntu Trusty):
status: New → In Progress
Changed in avahi (Ubuntu Xenial):
status: New → In Progress
Changed in avahi (Ubuntu Artful):
status: New → In Progress
Eric Desrochers (slashd)
Changed in avahi (Ubuntu):
status: In Progress → Fix Released
assignee: Trent Lloyd (lathiat) → nobody
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Dustin, or anyone else affected,

Accepted avahi into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/avahi/0.6.31-4ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in avahi (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-trusty
Revision history for this message
Robie Basak (racb) wrote :

Unsubscribed ~ubuntu-sponsors as there is nothing left to sponsor.

Changed in avahi (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Revision history for this message
Robie Basak (racb) wrote :

Hello Dustin, or anyone else affected,

Accepted avahi into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/avahi/0.6.32~rc+dfsg-1ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in avahi (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed-artful
Revision history for this message
Robie Basak (racb) wrote :

Hello Dustin, or anyone else affected,

Accepted avahi into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/avahi/0.6.32-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Trent Lloyd (lathiat)
tags: added: verification-done-artful verification-done-trusty verification-done-xenial
removed: verification-needed-artful verification-needed-trusty verification-needed-xenial
Revision history for this message
Trent Lloyd (lathiat) wrote :

Verification completed on trusty, xenial and artful

Revision history for this message
Brian Murray (brian-murray) wrote :

Could you please add some details about the verification steps you took rather than just saying you verified it?

Revision history for this message
Trent Lloyd (lathiat) wrote :

Sure thing!

I conducted two tests based on the reproduction steps in the SRU template

 * setup lxd (apt install lxd, lxd init, get working networking)
 * lxc launch ubuntu:16.04 avahi-test --config security.privileged=true
 * lxc exec avahi-test sudo apt install avahi-daemon

For xenial, artful versions I installed a container, installed the current package and then verified that it failed to install/start as expected. I then removed that container, created a fresh container, enabled -proposed and tested the install again to ensure it succeeded with the new version. I then further installed avahi-utils and executed "avahi-browse -a" to ensure services from the network were appearing and that the /etc/avahi/avahi-daemon.conf file had changed as expected based on the patch (which was the only change, there are no code changes).

For trusty I conducted the same tests however the initial package install does not fail under LXD due to a patch within the trusty version of avahi that skips the nproc rlimit when inside containers for reasons that no longer apply to modern lxd versions, however I did still ensure the avahi-daemon.conf file was updated as expected. The patch is still required on trusty because a host that has containers on it, will still have the problem with the avahi instance on the host itself that still has the rlimit applied (even though the containers themselves don't see the issue).

Lastly for each version I also installed the broken version and tested that an upgrade also went as expected rather than fresh install for completeness.

Hope that helps.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package avahi - 0.6.32-1ubuntu1.1

---------------
avahi (0.6.32-1ubuntu1.1) artful; urgency=medium

  * d/p/0002-Remove-default-rlimit-nproc-3.patch,
  * d/p/0003-Remove-default-rlimits-from-avahi-daemon.conf.patch:
    - Remove all overly restrictive default rlimit restrictions in
    avahi-daemon.conf which can cause avahi to fail to start due to
    too many running process or crash out of memory. (LP: #1661869)

 -- Trent Lloyd <email address hidden> Thu, 15 Mar 2018 10:05:18 +0800

Changed in avahi (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for avahi has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package avahi - 0.6.32~rc+dfsg-1ubuntu2.1

---------------
avahi (0.6.32~rc+dfsg-1ubuntu2.1) xenial; urgency=medium

  * d/p/0001-Remove-default-rlimit-nproc-3.patch,
  * d/p/0002-Remove-default-rlimits-from-avahi-daemon.conf.patch:
    - Remove all overly restrictive default rlimit restrictions in
    avahi-daemon.conf which can cause avahi to fail to start due to
    too many running process or crash out of memory. (LP: #1661869)

 -- Trent Lloyd <email address hidden> Thu, 15 Mar 2018 10:16:57 +0800

Changed in avahi (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package avahi - 0.6.31-4ubuntu1.2

---------------
avahi (0.6.31-4ubuntu1.2) trusty; urgency=medium

  * d/p/Remove-default-rlimit-nproc-3.patch,
  * d/p/Remove-default-rlimits-from-avahi-daemon.conf.patch:
    - Remove all overly restrictive default rlimit restrictions in
    avahi-daemon.conf which can cause avahi to fail to start due to
    too many running process or crash out of memory. (LP: #1661869)

 -- Trent Lloyd <email address hidden> Thu, 15 Mar 2018 10:20:53 +0800

Changed in avahi (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.