dhclient does not remain running on boot

Bug #1466790 reported by Stefan Bader
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
High
Martin Pitt

Bug Description

Not sure whether this is a systemd or ifupdown issue. I just happened to notice today for a Wily VM (set up from server iso) that I failed to get a DNS resolution for a VM I started earlier the day. DNS updates in my case are related to DHCP updates. And on the DHCP/DNS server I noticed that the IP address had been removed at some point. Looking at the VM, I find no dhclient being started. When I manually run ifdown and ifup on the interface this does happen. But not whenever I boot.
Curious about this I also checked a Vivid VM (also server install) and this looks the same. However for Vivid this seems fixed in the latest versions.

Network config in /etc/network/interfaces:
auto eth0
iface eth0 inet dhcp

Expected result:
dhclient running for eth0

Actual result after boot:
No dhclient process is running.

Revision history for this message
Martin Pitt (pitti) wrote :

Can you please attach the "sudo journalctl" output after booting?

Changed in ubuntu:
status: New → Incomplete
Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Stefan Bader (smb) wrote :
Changed in ubuntu:
status: Incomplete → New
Revision history for this message
Martin Pitt (pitti) wrote :

The journal clearly shows that ifup did get started:

Jun 19 13:49:08 lam-wily6401 systemd[1]: Starting ifup for eth0...
[...]
Jun 19 13:49:08 lam-wily6401 ifup[456]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x15a1af31)
Jun 19 13:49:08 lam-wily6401 dhclient[488]: DHCPREQUEST of 192.168.2.189 on eth0 to 255.255.255.255 port 67 (xid=0x31afa115)
Jun 19 13:49:08 lam-wily6401 dhclient[488]: DHCPOFFER of 192.168.2.189 from 192.168.2.1
Jun 19 13:49:08 lam-wily6401 ifup[456]: DHCPREQUEST of 192.168.2.189 on eth0 to 255.255.255.255 port 67 (xid=0x31afa115)
Jun 19 13:49:08 lam-wily6401 ifup[456]: DHCPOFFER of 192.168.2.189 from 192.168.2.1
[...]
Jun 19 13:49:09 lam-wily6401 dhclient[488]: DHCPACK of 192.168.2.189 from 192.168.2.1
Jun 19 13:49:09 lam-wily6401 ifup[456]: DHCPACK of 192.168.2.189 from 192.168.2.1
Jun 19 13:49:09 lam-wily6401 dhclient[488]: bound to 192.168.2.189 -- renewal in 297 seconds.
Jun 19 13:49:09 lam-wily6401 ifup[456]: bound to 192.168.2.189 -- renewal in 297 seconds.
Jun 19 13:49:09 lam-wily6401 ifquery[748]: eth0=eth0
Jun 19 13:49:09 lam-wily6401 systemd[1]: Started ifup for eth0.

So I'm a bit confused what's wrong now. Can you please copy&paste the output of "sudo systemctl status -s ifup@eth0"?

Revision history for this message
Stefan Bader (smb) wrote :

Hm -s did not work. So I used -l instead. Clearly it gets started. It just feels like it gets ended when ifup exits. As if not damonized enough...

<email address hidden> - ifup for eth0
   Loaded: loaded (/lib/systemd/system/ifup@.service; static; vendor preset: enabled)
   Active: active (exited) since Fr 2015-06-19 13:49:09 CEST; 1h 29min ago
  Process: 748 ExecStartPost=/sbin/ifquery --state %I (code=exited, status=0/SUCCESS)
  Process: 456 ExecStartPost=/sbin/ifup --allow=auto %I (code=exited, status=0/SUCCESS)
  Process: 455 ExecStart=/sbin/ifup --allow=hotplug %I (code=exited, status=0/SUCCESS)
 Main PID: 455 (code=exited, status=0/SUCCESS)
   CGroup: /<email address hidden>

Jun 19 13:49:08 lam-wily6401 dhclient[488]: DHCPREQUEST of 192.168.2.189 on eth0 to 255.255.255.255 port 67 (xid=0x31afa115)
Jun 19 13:49:08 lam-wily6401 dhclient[488]: DHCPOFFER of 192.168.2.189 from 192.168.2.1
Jun 19 13:49:08 lam-wily6401 ifup[456]: DHCPREQUEST of 192.168.2.189 on eth0 to 255.255.255.255 port 67 (xid=0x31afa115)
Jun 19 13:49:08 lam-wily6401 ifup[456]: DHCPOFFER of 192.168.2.189 from 192.168.2.1
Jun 19 13:49:09 lam-wily6401 dhclient[488]: DHCPACK of 192.168.2.189 from 192.168.2.1
Jun 19 13:49:09 lam-wily6401 ifup[456]: DHCPACK of 192.168.2.189 from 192.168.2.1
Jun 19 13:49:09 lam-wily6401 dhclient[488]: bound to 192.168.2.189 -- renewal in 297 seconds.
Jun 19 13:49:09 lam-wily6401 ifup[456]: bound to 192.168.2.189 -- renewal in 297 seconds.
Jun 19 13:49:09 lam-wily6401 ifquery[748]: eth0=eth0
Jun 19 13:49:09 lam-wily6401 systemd[1]: Started ifup for eth0.

Revision history for this message
Stefan Bader (smb) wrote :

Seems on Vivid this was fixed in some way (at least systemd was very outdated in the VM I had quickly booted up). Before the update systemd was at version 219-4*, now updated to 219-7ubuntu3.
The wily VM was up-to-date and runs systemd 220-7ubuntu1. Could it be that something about keeping daemons started during ifup running be accidentally dropped?

summary: - dhclient not started on boot
+ dhclient does not remain running on boot
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
Stefan Bader (smb) wrote :

One thing that I noticed on the Wily VM but did not happen on a server of someone else I asked: somehow ntpdate seems to get disrupted in some way that leaves the lockfile in /var/lock around. Which then causes the ominous "exceeded maximum number of lock attempts".

Revision history for this message
Stefan Bader (smb) wrote :

Addition to above statement. The ntpdate lock error happens when I manually down and up eth to get dhclient running.

Revision history for this message
Martin Pitt (pitti) wrote :

> Could it be that something about keeping daemons started during ifup running be accidentally dropped?

Not that I know of, and in my wily VMs dhclient runs happily.

I don't know whether ifupdown has some useful debugging facilities. I tried adding /usr/bin/strace -fvvs1024 -o /run/ifup.trace to /etc/init.d/networking, but it doesn't give much information (in particular, it doesn't end up calling dhclient). Do you get anything useful if you set VERBOSE=yes in /etc/default/networking and reboot? (Please attach journal again)

affects: ubuntu → ifupdown (Ubuntu)
Changed in ifupdown (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Stefan Bader (smb) wrote :

Seems on bare-metal servers I have, I do not see this. The VM as well as well as bare-metal server of the team mate I asked are booting rather quickly. That as side note. I upgraded the VM today and now run on systemd/udev 221-1ubuntu2.

Still the same. One observation that looks odd: There is a pid file in /run: /run/dhclient.eth0.pid which contains PID 684. However in the journal as well as in "systemctl status ifup@eth0" there is a different PID for dhclient. The pid file is re-created each boot (/run is tmpfs and I also deliberately deleted it before reboot). From the number it seems the file is from a later start of dhclient. But that seems to have left no tracks in the log at all.

● <email address hidden> - ifup for eth0
   Loaded: loaded (/lib/systemd/system/ifup@.service; static; vendor preset: enabled)
   Active: active (exited) since Di 2015-06-30 10:14:31 CEST; 10min ago
  Process: 729 ExecStartPost=/sbin/ifquery --state %I (code=exited, status=0/SUCCESS)
  Process: 450 ExecStartPost=/sbin/ifup --allow=auto %I (code=exited, status=0/SUCCESS)
  Process: 448 ExecStart=/sbin/ifup --allow=hotplug %I (code=exited, status=0/SUCCESS)
 Main PID: 448 (code=exited, status=0/SUCCESS)
   CGroup: /<email address hidden>

Jun 30 10:14:31 lam-wily6401 dhclient[477]: DHCPREQUEST of 192.168.2.189 on ...)
Jun 30 10:14:31 lam-wily6401 dhclient[477]: DHCPOFFER of 192.168.2.189 from ...1
Jun 30 10:14:31 lam-wily6401 ifup[450]: DHCPREQUEST of 192.168.2.189 on eth0...)
Jun 30 10:14:31 lam-wily6401 ifup[450]: DHCPOFFER of 192.168.2.189 from 192....1
Jun 30 10:14:31 lam-wily6401 dhclient[477]: DHCPACK of 192.168.2.189 from 19...1
Jun 30 10:14:31 lam-wily6401 ifup[450]: DHCPACK of 192.168.2.189 from 192.16...1
Jun 30 10:14:31 lam-wily6401 dhclient[477]: bound to 192.168.2.189 -- renewa....
Jun 30 10:14:31 lam-wily6401 ifup[450]: bound to 192.168.2.189 -- renewal in....
Jun 30 10:14:31 lam-wily6401 ifquery[729]: eth0=eth0
Jun 30 10:14:31 lam-wily6401 systemd[1]: Started ifup for eth0.

Stefan Bader (smb)
Changed in ifupdown (Ubuntu):
status: Incomplete → New
Revision history for this message
Stefan Bader (smb) wrote :

Changed the disk from virtio to IDE without change. Added debug to the kernel options and got a bit more info in journal but it looks odd. There is a dhclient with PID 496 early on. Which does not seem to exit. But then later on there is a 699 getting killed which matches the pid file but never seems to be started.

Could be a gap from pivoting root but I cannot really say how to tell for sure.

Revision history for this message
Stefan Bader (smb) wrote :

Comparing to output of a vivid VM, the sysctl status differs in one important detail. The initial dhclient again is gone but under the system-ifup.slice there is the new pid...

● <email address hidden> - ifup for eth0
   Loaded: loaded (/lib/systemd/system/ifup@.service; static; vendor preset: enabled)
   Active: active (exited) since Tue 2015-06-30 15:08:44 CEST; 20s ago
  Process: 403 ExecStart=/bin/sh -ec ifup --allow=hotplug %I; ifup --allow=auto %I; if ifquery %I >/dev/null; then ifquery --state %I >/dev/null; fi (code=exited, status=0/SUCCESS)
 Main PID: 403 (code=exited, status=0/SUCCESS)
   CGroup: /<email address hidden>
           └─736 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/...

Jun 30 15:08:45 lam-vivid6401 sh[403]: Sending on Socket/fallback
Jun 30 15:08:45 lam-vivid6401 sh[403]: DHCPDISCOVER on eth0 to 255.255.255....7)
Jun 30 15:08:46 lam-vivid6401 dhclient[430]: DHCPREQUEST of 192.168.2.159 on...)
Jun 30 15:08:46 lam-vivid6401 dhclient[430]: DHCPOFFER of 192.168.2.159 from...1
Jun 30 15:08:46 lam-vivid6401 sh[403]: DHCPREQUEST of 192.168.2.159 on eth0...7)
Jun 30 15:08:46 lam-vivid6401 sh[403]: DHCPOFFER of 192.168.2.159 from 192.....1
Jun 30 15:08:47 lam-vivid6401 dhclient[430]: DHCPACK of 192.168.2.159 from 1...1
Jun 30 15:08:47 lam-vivid6401 sh[403]: DHCPACK of 192.168.2.159 from 192.168.2.1
Jun 30 15:08:47 lam-vivid6401 dhclient[430]: bound to 192.168.2.159 -- renew....
Jun 30 15:08:47 lam-vivid6401 sh[403]: bound to 192.168.2.159 -- renewal in...s.
Hint: Some lines were ellipsized, use -l to show in full.
ubuntu@lam-vivid6401:~$ cat /run/dhclient.eth0.pid
736

Revision history for this message
Stefan Bader (smb) wrote :

I think I may have a clue now what the problem is. Incidentally on my bare-metal servers the network interface seems to be brought up by something else than the ifup@.service. The status there reports it already up when the service runs. The same seems to be happening to MAAS instances after initial deployment. On first login dhclient is not running but after reboot it is, though then again the ifup@.service was not starting the interface.

The problem looks to be that systemd acts differently on ExecStart and ExecStart[Pre|Post] elements. In Vivid we still had only a ExecStart element which was a shell command running two ifup and some conditional ifquery command. In Wily this changed into a ExecStart which is ifup for hotplug and two ExecStartPost which are a second ifup (but this time for auto class) and an ifquery. The interface is brought up by the second ifup (as it is set to auto), but it seems systemd does not check/handle any processes started up in the background of a ExecStartPost.
So one way that fixed the dhclient issue was to exchange the hotplug and auto ifup calls, so auto was done in the ExecStart call. But that would only lead to problems if there was ever an interface in the hotplug class.

What seems to work for me (not sure this is all valid) is to make the service oneshot and have multiple ExecStart elements, like this:

[Service]
Type=oneshot
ExecStart=/sbin/ifup --allow=hotplug %I
ExecStart=/sbin/ifup --allow=auto %I
ExecStart=/sbin/ifquery --state %I
ExecStop=/sbin/ifdown %I
RemainAfterExit=true

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Martin Pitt (pitti)
affects: ifupdown (Ubuntu) → systemd (Ubuntu)
Changed in systemd (Ubuntu):
status: New → In Progress
Changed in systemd (Ubuntu):
status: New → Confirmed
Martin Pitt (pitti)
Changed in systemd (Ubuntu):
importance: Undecided → High
assignee: nobody → Martin Pitt (pitti)
Martin Pitt (pitti)
Changed in systemd (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.5 KiB)

This bug was fixed in the package systemd - 222-2ubuntu1

---------------
systemd (222-2ubuntu1) wily; urgency=medium

  * Merge with Debian unstable. Remaining Ubuntu changes:
    - Hack to support system-image read-only /etc, and modify files in
      /etc/writable/ instead.
    - Keep our much simpler udev maintainer scripts (all platforms must
      support udev, no debconf).
    - initramfs init-bottom: If LVM is installed, settle udev,
      otherwise we get missing LV symlinks. Workaround for LP #1185394.
    - Add debian/udev.lvm2.init: Dummy SysV init script to satisfy insserv
      dependencies to "lvm2" which is handled with udev rules in Ubuntu.
    - Add debian/udev.lvm2.service to avoid running the dummy lvm2 init
      script.
    - Provide shutdown fallback for upstart. (LP: #1370329)
    - debian/extra/ifup@.service: Additionally run for "auto" class. We don't
      really support "allow-hotplug" in Ubuntu at the moment, so we need to
      deal with "auto" devices appearing after "/etc/init.d/networking start"
      already ran. (LP: #1374521) Also run ifup in the background during boot,
      to avoid blocking network.target. (LP: #1425376)
    - ifup@.service: Drop dependency on networking.service (i. e.
      /etc/init.d/networking), and merely ensure that /run/network exists.
      This avoids unnecessary dependencies/waiting during boot and dependency
      cycles if hooks wait for other interfaces to come up (like ifenslave
      with bonding interfaces). (LP: #1414544)
    - Add Get-RTC-is-in-local-time-setting-from-etc-default-rc.patch: In
      Ubuntu we currently keep the setting whether the RTC is in local or UTC
      time in /etc/default/rcS "UTC=yes|no", instead of /etc/adjtime.
      (LP: #1377258)
    - Put session scopes into all cgroup controllers. This makes unprivileged
      user LXC containers work under systemd. (LP: #1346734)
    - systemctl: Don't forward telinit u to upstart. This works around
      upstart's Restart() always reexec'ing /sbin/init on Restart(), even if
      that changes to point to systemd during the upgrade. This avoids running
      systemd during a dist-upgrade. (LP: #1430479)
    - Drop hwdb-update dependency from udev-trigger.service, which got
      introduced in v219-stable. This causes udev and plymouth to start too
      late and isn't really needed in Ubuntu yet as we don't support stateless
      systems yet and handle hwdb.bin updates through dpkg triggers. This can
      be dropped again with initramfs-tools 0.117.
    - Lower Breaks: to plymouth version which has the udev inotify fix in
      Ubuntu.
    - Lower apparmor Breaks: to the Ubuntu version that dropped $remote_fs.
    - Change systemd-sysv's conflicts to upstart-sysv. (LP: #1422681)

    Upgrade fixes, keep until 16.04 LTS release:
    - systemd Conflicts/Replaces/Provides systemd-services.
    - Remove obsolete systemd-logind upstart job.
    - Clean up obsolete /etc/udev/rules.d/README.
    - systemd.postinst: Migrate mountall specific fstab options to standard
      util-linux "nofail" option.

  * ifup@.service: Run ifup for "auto" class in ExecStart instead of
    ExecStartPost, as it spawns a long-run...

Read more...

Changed in systemd (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.