user lxc containers fail to start under systemd: login name=systemd cgroup is not owned by user

Bug #1413927 reported by Martin Pitt
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
High
Martin Pitt

Bug Description

When a user logs in, systemd-logind should create cgroups for the user, with the directory (i.e. /user.slice/user-1000.slice/session-c2.scope) owned by the user. This is no longer hapening for the name=systemd cgroup. This prevents containers from starting. (If lxc were to simply not create/use that controller, then it would prevent system in the container from using it).

I wanted to test the new lxc with lxcfs. A system container (with upstart or systemd) works perfectly well now (great!), but user containers regressed:

$ lxc-create -n v1 -t download -- -d ubuntu -r vivid -a amd64
$ lxc-start -n v1 -F
lxc-start: cgmanager.c: lxc_cgmanager_enter: 694 call to cgmanager_move_pid_sync failed: invalid request
lxc-start: start.c: __lxc_start: 1099 failed to spawn 'v1'
lxc-start: lxc_start.c: main: 345 The container failed to start.

My host is running systemd, but cgmanager is running (i. e. it's not bug 1400394, I enabled cgmanager.service).

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: lxc 1.1.0~rc1-0ubuntu1
ProcVersionSignature: Ubuntu 3.18.0-9.10-generic 3.18.2
Uname: Linux 3.18.0-9-generic x86_64
ApportVersion: 2.15.1-0ubuntu2
Architecture: amd64
CurrentDesktop: Unity
Date: Fri Jan 23 10:35:55 2015
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-11-20 (63 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Alpha amd64 (20141119)
SourcePackage: lxc
UpgradeStatus: No upgrade log present (probably fresh install)
defaults.conf:
 lxc.network.type = veth
 lxc.network.link = lxcbr0
 lxc.network.flags = up
 lxc.network.hwaddr = 00:16:3e:xx:xx:xx
lxc.conf: lxc.lxcpath = /srv/lxc

Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :
Download full text (7.9 KiB)

I suppose the user container runs upstart (from the template), as that's still the ubuntu vivid default. But I have a feeling it's not even getting that far; when I start with --logfile /dev/stdout --logpriority debug it all just seems to be early setup:

$ lxc-start -n v1 --logfile /dev/stdout --logpriority debug -F
      lxc-start 1422005786.443 INFO lxc_start_ui - lxc_start.c:main:265 - using rcfile /home/martin/.local/share/lxc/v1/config
      lxc-start 1422005786.444 WARN lxc_confile - confile.c:config_pivotdir:1776 - lxc.pivotdir is ignored. It will soon become an error.
      lxc-start 1422005786.445 INFO lxc_confile - confile.c:config_idmap:1384 - read uid map: type u nsid 0 hostid 100000 range 65536
      lxc-start 1422005786.445 INFO lxc_confile - confile.c:config_idmap:1384 - read uid map: type g nsid 0 hostid 100000 range 65536
      lxc-start 1422005786.445 WARN lxc_log - log.c:lxc_log_init:316 - lxc_log_init called with log already initialized
      lxc-start 1422005786.446 WARN lxc_cgmanager - cgmanager.c:cgm_get:963 - do_cgm_get exited with error
      lxc-start 1422005786.447 INFO lxc_lsm - lsm/lsm.c:lsm_init:48 - LSM security driver AppArmor
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:298 - processing: .reject_force_umount # comment this to allow umount -f; not recommended.
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:371 - Adding non-compat rule for reject_force_umount action 0
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:do_resolve_add_rule:192 - Setting seccomp rule to reject force umounts

      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:382 - Adding compat rule for reject_force_umount action 0
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:390 - Adding non-compat rule bc nr1 == nr2 (-1, -1)
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:do_resolve_add_rule:192 - Setting seccomp rule to reject force umounts

      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:298 - processing: .[all].
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:298 - processing: .kexec_load errno 1.
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:371 - Adding non-compat rule for kexec_load action 327681
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:382 - Adding compat rule for kexec_load action 327681
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:395 - Really adding compat rule bc nr1 == nr2 (283, 246)
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:298 - processing: .open_by_handle_at errno 1.
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:371 - Adding non-compat rule for open_by_handle_at action 327681
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:382 - Adding compat rule for open_by_handle_at action 327681
      lxc-start 1422005786.447 INFO lxc_seccomp - seccomp.c:parse_config_v2:395 - Re...

Read more...

Revision history for this message
Martin Pitt (pitti) wrote :

The lxcsyslog.txt attachment might be worth a look though, there are several cgmanager errors there. "vivid-systemd" is the name of my system container (standard vivid plus apt install systemd-sysv).

Revision history for this message
Stéphane Graber (stgraber) wrote :

can you paste your /proc/self/cgroup and /var/log/upstart/cgmanager.log?

Revision history for this message
Martin Pitt (pitti) wrote :

$ cat /proc/self/cgroup
10:blkio:/user.slice/user-1000.slice/session-c2.scope
9:memory:/user.slice/user-1000.slice/session-c2.scope
8:freezer:/user.slice/user-1000.slice/session-c2.scope
7:cpu,cpuacct:/user.slice/user-1000.slice/session-c2.scope
6:net_cls,net_prio:/user.slice/user-1000.slice/session-c2.scope
5:cpuset:/user.slice/user-1000.slice/session-c2.scope
4:perf_event:/user.slice/user-1000.slice/session-c2.scope
3:devices:/user.slice/user-1000.slice/session-c2.scope
2:hugetlb:/user.slice/user-1000.slice/session-c2.scope
1:name=systemd:/user.slice/user-1000.slice/session-c2.scope

/var/log/upstart/cgmanager.log doesn't exist under systemd, but here is the journal:

-- Logs begin at Fr 2015-01-23 13:10:32 CET, end at Fr 2015-01-23 17:06:26 CET. --
Jan 23 17:05:24 donald cgmanager[758]: cgmanager:do_create_main: pid 14681 (uid 1000 gid 1000) may not create under /run/cgmanager/fs/none,name=systemd/user.slice/user-1000.slice/session-c2.scope
Jan 23 17:05:24 donald cgmanager[758]: cgmanager: Invalid path /run/cgmanager/fs/none,name=systemd/user.slice/user-1000.slice/session-c2.scope/v1
Jan 23 17:05:24 donald cgmanager[758]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/none,name=systemd/user.slice/user-1000.slice/session-c2.scope/v1

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Right so the bug her eis that your session-c2.scope was created without giving you ownership of the directory and the tasks and cgroup.procs files. Manually changing those permissions fixes it for me.

So this may actually be a regression in systemd itself.

affects: lxc (Ubuntu) → systemd (Ubuntu)
Changed in systemd (Ubuntu):
importance: Undecided → High
status: New → Confirmed
summary: - lxc_cgmanager_enter: 694 call to cgmanager_move_pid_sync failed: invalid
- requestUser container fails to start:
+ login name=systemd cgroup is not owned by user
description: updated
Revision history for this message
Stéphane Graber (stgraber) wrote : Re: login name=systemd cgroup is not owned by user

Might be worth checking that the same is done for all controllers.

Revision history for this message
Martin Pitt (pitti) wrote :

> Right so the bug her eis that your session-c2.scope was created without giving you ownership of the directory

Indeed this hasn't previously been done for the "systemd" controller; it didn't seem necessary with previous LXC versions, but apparently is now. Chowning the

> and the tasks and cgroup.procs files.

No, I am not going to own those to the user. This would be a (small) privilege escalation bug, as the user could then move processes from a less privileged session (like from ssh) to a more privileged one (like a local desktop session). This also doesn't seem to be necessary, neither for upstart nor systemd containers.

Changed in systemd (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
description: updated
Changed in systemd (Ubuntu):
status: Confirmed → In Progress
Martin Pitt (pitti)
summary: - login name=systemd cgroup is not owned by user
+ user lxc containers fail to start: login name=systemd cgroup is not
+ owned by user
summary: - user lxc containers fail to start: login name=systemd cgroup is not
- owned by user
+ user lxc containers fail to start under systemd: login name=systemd
+ cgroup is not owned by user
Revision history for this message
Stéphane Graber (stgraber) wrote :

How are we supposed to run a systemd container on such a system then?

systemd in a container will need to create sub-entries in the name=systemd controller. If the user doesn't own its cgroup, LXC will not be able to create the entry for the container and the container will not be able to write to it, leading to systemd crashing.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Oh and the same goes for containers that aren't running systemd but are running logind as logind also expects to be able to create sub-entries in the name=systemd controller, which with the current cgroup ownership, it won't be able to do.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 1413927] Re: user lxc containers fail to start under systemd: login name=systemd cgroup is not owned by user

Stéphane Graber [2015-01-25 17:15 -0000]:
> How are we supposed to run a systemd container on such a system then?
>
> systemd in a container will need to create sub-entries in the
> name=systemd controller.

Yes, that works fine, as the cgroup *directories* are owned by the
user. I just don't want to make the cgroup.procs and task files owned
by the user, as that would allow the user to modify that "session
root" cgroup and move PIDs between host sessions. What user containers
do in sub-groups of the host's "session-XX.cgroup" is up to them, and
of course the user on the host can meddle with them from the outside.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Hmm, so here the user didn't own the directories which was a good part of the issue I suspect.

Martin Pitt (pitti)
Changed in systemd (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Martin Pitt (<email address hidden>):
> Stéphane Graber [2015-01-25 17:15 -0000]:
> > How are we supposed to run a systemd container on such a system then?
> >
> > systemd in a container will need to create sub-entries in the
> > name=systemd controller.
>
> Yes, that works fine, as the cgroup *directories* are owned by the
> user. I just don't want to make the cgroup.procs and task files owned
> by the user, as that would allow the user to modify that "session
> root" cgroup and move PIDs between host sessions. What user containers
> do in sub-groups of the host's "session-XX.cgroup" is up to them, and
> of course the user on the host can meddle with them from the outside.

If that's all you're objecting to, we can make do with that. The
important things are that (a) the directory be owned by the user
and (b) all files other than tasks and cgroup.procs files NOT be
owned by the user :) Having the tasks file owned by the uesr is
only a nicety.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package systemd - 218-6ubuntu1

---------------
systemd (218-6ubuntu1) vivid; urgency=medium

  * Merge with Debian unstable. Remaining Ubuntu changes:
    - Hack to support system-image read-only /etc, and modify files in
      /etc/writable/ instead.
    - Keep our much simpler udev maintainer scripts (all platforms must
      support udev, no debconf).
    - initramfs init-top: Drop $ROOTDELAY, we do that in a more sensible way
      with wait-for-root. Will get applicable to Debian once Debian gets
      wait-for-root in initramfs-tools.
    - initramfs init-bottom: If LVM is installed, settle udev,
      otherwise we get missing LV symlinks. Workaround for LP #1185394.
    - Add debian/udev.lvm2.init: Dummy SysV init script to satisfy insserv
      dependencies to "lvm2" which is handled with udev rules in Ubuntu.
    - Provide shutdown fallback for upstart. (LP: #1370329)
    - debian/extra/ifup@.service: Additionally run for "auto" class. We don't
      really support "allow-hotplug" in Ubuntu at the moment, so we need to
      deal with "auto" devices appearing after "/etc/init.d/networking start"
      already ran. (LP: #1374521)
    - Add Get-RTC-is-in-local-time-setting-from-etc-default-rc.patch: In
      Ubuntu we currently keep the setting whether the RTC is in local or UTC
      time in /etc/default/rcS "UTC=yes|no", instead of /etc/adjtime.
      (LP: #1377258)
    - Put session scopes into all cgroup controllers. This makes unprivileged
      user LXC containers work under systemd. (LP: #1346734)
    - Lower Breaks: to plymouth version which has the udev inotify fix in
      Ubuntu.
    - Lower libappamor1 dep to the Ubuntu version where it moved to /lib.
    - Make failure of boot-and-services NSpawn.test_boot non-fatal for now.
      This currently fails when being triggered by Jenkins, but is totally
      unreproducible when running this manually on the exact same machine.

    Upgrade fixes, keep until 16.04 LTS release:
    - systemd Conflicts/Replaces/Provides systemd-services.
    - Remove obsolete systemd-logind upstart job.
    - Clean up obsolete /etc/udev/rules.d/README.

  * Make the "systemd" controller session scope cgroup directory owned by the
    user as well. This fixes user containers with latest LXC, and with systemd
    in the container. (LP: #1413927)
  * ifup@.service: Drop dependency on networking.service (i. e.
    /etc/init.d/networking), and merely ensure that /run/network exists. This
    avoids unnecessary dependencies/waiting during boot and dependency cycles
    if hooks wait for other interfaces to come up (like ifenslave with bonding
    interfaces). (LP: #1414544)

systemd (218-6) experimental; urgency=medium

  [ Martin Pitt ]
  * initramfs hook: Install 61-persistant-storage-android.rules if it exists.
  * Generate POT file during package build, for translators.
  * Pull latest keymaps from upstream git.
  * Order ifup@.service and networking.service after network-pre.target.
    (Closes: #766938)
  * Tone down "Network interface NamePolicy= disabled on kernel commandline,
    ignoring" info message to debug, as we expect this while we disable
    net.ifnames by de...

Read more...

Changed in systemd (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Daniel (hackie) wrote :

Is this bug really fixed? I have very similar errors with the version from the vivid live cd (systemd 219-7ubuntu3) when trying to start user-space LXC containers

Revision history for this message
Martin Pitt (pitti) wrote :

Daniel [2015-05-20 16:19 -0000]:
> Is this bug really fixed? I have very similar errors with the version
> from the vivid live cd (systemd 219-7ubuntu3) when trying to start user-
> space LXC containers

I haven't tried creating/running containers on a live system, as that
environment is quite a bit different. It works fine for me and other
people on an installed system. Please file a new bug with
details/program outputs/logs if you still see it somewhere. Thanks!

Martin

Revision history for this message
god (humper) wrote :

I've made separate Bug #1467611 as requested.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.