STC850:Brazos:Br16:Br16p05: Network ethernet port name changed under Ubuntu 16.04 with added adapters (ibmveth)

Bug #1561096 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
High
Unassigned
Xenial
Fix Released
High
Martin Pitt

Bug Description

Problem Description : I had installed a fresh install of Ubuntu 16.04 from an ISO image and had set up the partition as directed in the Wiki. I had set up the networking as directed in the WIKI and was able to ping "iofnim" and was able to perform updates and upgrades to the Ubuntu installation without issue.

I had checked to make sure (by adding them to an existing AIX partition) that the Mason and Travis_EN adapters to be added were at the latest microcode levels. I assigned them to the partition to be used (br16p05) and then shut down the partition and restarted it to allow the partition to activate the adapters.

After the partition restarted, I attempted to build the network using build_net and during its run noticed that the output for the adapters was returning "Network is unreachable". After it completed, I attempted to mount iofnim using the command:

"mount iofnim.aus.stglabs.ibm.com:/nim/build_net /root/test -o nolock"

which returned a "Failed" error message.

I then ran "ifconfig -a" to check on the state of the network which had been working until I rebooted the partition after adding the adapters.

I found the unconventional names for both the Mason and the Travis_EN adapters contained in the output from "ifconfig -a" but also found that "eth0", with which I had originally set up the network access for the partition during setupp, was no longer listed but instead "eth1" was now listed and none of the networking data including IP's reported from "ifconfig -a" were set.

I consulted with Thiru and he asked I write it up and include a tar file created from /var/log which I have attached to the defect.

As an additional note, I was able to go back into /etc/network/interfaces and modify the settings for "eth0" to now be set to "eth1" and after bringing the port down and back up, was able to again ping out and access the network.

Please advise.

== Comment: #7 - Kevin W. Rudd - 2016-02-11 12:32:49 ==
Thank you for the additional info. This is not quite the same as the bug I referenced earlier. It is actually a match for bug 122308 .

Canonical:

This is the same basic issue as originally worked in LP Bug 1437375

The ibmveth based devices are not associated with PCI bus locations, and still rely on the legacy eth? naming. The problem here is that the 75-persistent-net-generator.rules file that used to set up the 70-persistent-net.rules file for these devices has been removed in 16.04

This creates a name-slip problem for these ibmveth devices depending on the timing of other devices (where another NIC will temporarily be assigned a name like eth0 before it is renamed by udev).

Please add back persistent-net-generator support for non-PCI-based devices like this.

Revision history for this message
bugproxy (bugproxy) wrote : log.tgz

Default Comment by Bridge

tags: added: architecture-ppc64 bugnameltc-136930 severity-high targetmilestone-inin1604
Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Kevin W. Rudd (kevinr)
affects: ubuntu → systemd (Ubuntu)
Steve Langasek (vorlon)
Changed in systemd (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Martin Pitt (pitti)
summary: STC850:Brazos:Br16:Br16p05: Network ethernet port name changed under
- Ubuntu 16.04 with added adapters
+ Ubuntu 16.04 with added adapters (ibmveth)
Steve Langasek (vorlon)
Changed in systemd (Ubuntu):
importance: Undecided → High
status: New → Triaged
milestone: none → ubuntu-16.04
Revision history for this message
Martin Pitt (pitti) wrote :

> This creates a name-slip problem for these ibmveth devices depending on the timing of other devices (where another NIC will temporarily be assigned a name like eth0 before it is renamed by udev).

Can you please explain this further? The kernel should always pick a new eth* name and avoid name clashes with existing eth* devices, and udev (since Ubuntu 15.04) will never assign a name like "eth*" which potentially collides with the kernel default names. At least that's the current assumption, if that is invalid and you actually see a device being *re*named to "eth*", this is what we need to fix.

From your logs I see that the kernel apparently detected an eth0 and eth2 and udev renamed them to slot-based names:

Feb 10 11:41:39 br16p05 kernel: [ 1.129225] qlge 0001:a0:00.1 enP1p160s0f1: renamed from eth2
Feb 10 11:41:39 br16p05 kernel: [ 1.151033] qlge 0001:a0:00.0 enP1p160s0f0: renamed from eth0
Feb 10 11:41:39 br16p05 kernel: [ 3.844707] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0
Feb 10 11:41:39 br16p05 kernel: [ 3.866910] mlx4_core 0000:01:00.0 enp1s0d1: renamed from eth2

I'm a bit confused why there are two different device drivers claiming the same device, though.

After booting, what does "ip a" show, i. e. which ethernet devices do you actually have? It would also be helpful to attach the output of "udevadm info --export-db" to show me which information udev collected about the ethernet devices.

Did you manually put "eth0" into /etc/network/interfaces, or was that done by the installer? The latter would mean that the installer environment names devices differently than the installed OS (that'd be a major bug indeed).

> Please add back persistent-net-generator support for non-PCI-based devices like this.

This isn't going to happen, I'm afraid. That generator was conceptually broken for virtual hardware (and thus had a large blacklist) and had an unfixable race condition as it renamed devices to names that are also being used by the kernel. Let's rather fix this properly with ifnames.

Changed in systemd (Ubuntu Xenial):
status: Triaged → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-04-05 10:26 EDT-------
(In reply to comment #16)
> > This creates a name-slip problem for these ibmveth devices depending on the timing of other devices (where another NIC will temporarily be assigned a name like eth0 before it is renamed by udev).
>
> Can you please explain this further? The kernel should always pick a new
> eth* name and avoid name clashes with existing eth* devices, and udev (since
> Ubuntu 15.04) will never assign a name like "eth*" which potentially
> collides with the kernel default names. At least that's the current
> assumption, if that is invalid and you actually see a device being *re*named
> to "eth*", this is what we need to fix.

At the time of the install, only one network device existed (the ibmveth device). This was assigned the name eth0, and this is what the installer recorded in the interfaces file. After the install, additional physical NICs were added to the system. Upon boot, one of the new NIC's driver loaded first and temporarily grabbed the eth0 name. When the ibmveth driver loaded, it grabbed eth1 (since eth0 was taken at the time). Eth0 was subsequently renamed (as you noticed in the logs), but the ibmveth device name had already slipped, and now there was no longer an eth0 device to go with the configuration in the interfaces file.
> I'm a bit confused why there are two different device drivers claiming the
> same device, though.

As noted, network device additions post-boot.

> After booting, what does "ip a" show, i. e. which ethernet devices do you
> actually have? It would also be helpful to attach the output of "udevadm
> info --export-db" to show me which information udev collected about the
> ethernet devices.

This information should be contained in the sosreport attached.
> Did you manually put "eth0" into /etc/network/interfaces, or was that done
> by the installer? The latter would mean that the installer environment names
> devices differently than the installed OS (that'd be a major bug indeed).

Done by the installer due to the installation being done while there was only the ibmveth device.

Revision history for this message
Martin Pitt (pitti) wrote :

Ah, I found it in the sosreport tarball:

P: /devices/vio/30000002/net/eth1
E: DEVPATH=/devices/vio/30000002/net/eth1
E: ID_NET_DRIVER=ibmveth
E: ID_NET_LINK_FILE=/lib/systemd/network/99-default.link
E: ID_NET_NAME_MAC=enx46dee7232602
E: IFINDEX=3
E: INTERFACE=eth1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth1
E: TAGS=:systemd:
E: USEC_INITIALIZED=829495

The device is not included in the sosreport's sys/, but I suppose other than the MAC address we indeed don't have any physical information (slot or PCI number) about these cards.

One solution would be to call these enx46dee7232602 (by MAC address), but while this is stable it is also rather long and ugly. Another would be to expose any helpful extra information in the kernel ibmveth driver to assign a stable name or number to those from outside.

If neither of those is practical, we need to reimplement a reduced/customized version of persistent-net-generator that names those "ibmvethN" with ascending N; this avoids the renaming race, but then the device names are still not predictable if there ever is more than one devices (i. e. before installation you don't know what the names will be, and you can't simply clone an installation and use it on another machine).

Revision history for this message
Martin Pitt (pitti) wrote :

@IBM: Do you have a preference/opinion here, or should we decide this on our side? (This is also a bit related to https://lists.ubuntu.com/archives/ubuntu-devel/2016-April/039302.html).

I. e. use MAC-based names which are predictable (i. e. will be the same across reinstalls, and you know what the name is going to be when you add the device/before install) and don't need any state in /etc, but are very long. Or use a reduced form of the old persistent-net-generator whose names are *not* predictable, need to write state into /etc, but have shorter names.

Revision history for this message
Martin Pitt (pitti) wrote :

> Or use a reduced form of the old persistent-net-generator

Just for the record: This is *very* intrusive and quite some work, as writing dynamic rules needs to be done in the installer, from initramfs, copied to the final system, etc. The brittleness of this was also one of the reasons why we got rid of this some time ago. But still technically possible of course, it's just the much more expensive option.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-08 11:53 EDT-------
I hate to cause extra work, but a return to the old persistent-net-generator behavior for the ibmveth devices would be the ideal solution due to the consistency of behavior with the older releases. It is what the users are used to, and what we were planning on documenting as a workaround for this problem (lock down the install device name with a manually configured udev rule in /etc) if we couldn't get it officially resolved

I would hope that a stripped down version of the old persistent-net-generator behavior that just applies to the ibmveth devices should hopefully not be that brittle (since the target device is fairly consistent).

Pradeep,

What are your thoughts in this area?

Revision history for this message
Martin Pitt (pitti) wrote :

> the ideal solution due to the consistency of behavior with the older releases

Just a note: it will behave completely differently to any other network interface, though, so it's still not consistent at all. ibmveth is a bit of a special child here as it neither has useful hardware information nor has pre-defined names like "normal" linux veths.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-08 14:24 EDT-------
(In reply to comment #23)
> > the ideal solution due to the consistency of behavior with the older releases
>
> Just a note: it will behave completely differently to any other network
> interface, though, so it's still not consistent at all.

Can you expand on what you mean by it will not be consistent at all? Yes, it will not have names like enP1p160s0f, since it is not a PCI device, but can't it have a name like ethX? This would be in line with expectations from earlier releases.

The problem is not really that we are assigning these devices new names, but the device names change with hot plug or DLPAR operations.

>ibmveth is a bit of a special child here as it neither has useful hardware
>information nor has pre-defined names like "normal" linux veths.

Indeed ibmveth device is a special child. Can we use the "DRIVER" match key in a udev rule to name these ibmveth devices as ethX. This will not interfere with the PCI device naming scheme, it will consistently be called ethX and meet prior expectations. Hot plug or DLPAR operations will not impact these with a device name change, and is probably only applicable to the POWER platform.

Revision history for this message
Martin Pitt (pitti) wrote :

> can't it have a name like ethX?

No, we can't *re*name devices to names which are also (potentially) being used by the kernel. That has been a major race condition which was one of the major reasons to get rid of the old persistent-net-generator.

It *can* get a name like "ibmvethX", though, as that does not collide.

> Can we use the "DRIVER" match key in a udev rule to name these ibmveth devices as ethX

Yes, if we go back to the persistent-net approach, we'll match on DRIVERS=="ibmveth".

Revision history for this message
Martin Pitt (pitti) wrote :

> Can you expand on what you mean by it will not be consistent at all?

sorry, forgot that bit -- I meant that every other network device has a "location" based name now (BIOS-assigned slot, PCI enumeration number, etc.) and thus are stable/predictable across (re)installations, and don't need persistent state in /etc.

Revision history for this message
Martin Pitt (pitti) wrote :

Timing wise, we'll enter final freeze this Thursday, so I'd like to upload either solution tomorrow. Any final thoughts about which approach to take? Thanks!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-11 16:27 EDT-------
It really seems like the right fix here might be to enhance the code generating the location name to better handle ibmveth. If we look at the information below:

P: /devices/vio/30000002/net/eth1
E: DEVPATH=/devices/vio/30000002/net/eth1
E: ID_NET_DRIVER=ibmveth
E: ID_NET_LINK_FILE=/lib/systemd/network/99-default.link
E: ID_NET_NAME_MAC=enx46dee7232602
E: IFINDEX=3
E: INTERFACE=eth1
E: SUBSYSTEM=net
E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth1
E: TAGS=:systemd:
E: USEC_INITIALIZED=829495

The 30000002 is likely what we'd want to bind to. The first digit should always be 3. Its the rightmost digits we are interested in, as they are associated with the slot id selected in the HMC. In this case 2. So I think what we want to do is figure out how best to reflect this physical location in order for the existing code to just work...

Revision history for this message
Martin Pitt (pitti) wrote :

> The 30000002 is likely what we'd want to bind to.

Thanks for pointing this out! This sounds like a very good solution, much better than either of the above. If this is stable across reboots and indeed a property of the (virtual) "hardware", then let's use this.

Thus an udev rule like this ought to work, e. g. in /lib/udev/rules.d/76-ibmveth-naming.rules:

SUBSYSTEM=="net", NAME=="", DRIVERS=="ibmveth", PROGRAM="/bin/sh -ec 'D=${DEVPATH#*/vio/}; D=${D#3}; echo ${D%%%%/*} | sed s/^0*//", NAME="ibmveth$result"

This will sort after any existing 70-persistent-net.rules, but before 80-net-setup-link.rules.

So for your example the interface would be called "ibmveth2", and for a device which breaks the "starts with 3" assumption and has e. g. /devices/vio/5000003, it would then be named "ibmveth5000003". If this (not starting with '3') "Should Not Happen™", that's fine, but I'd rather be cautious.

bugproxy (bugproxy)
tags: removed: bugnameltc-136930 severity-high
Martin Pitt (pitti)
Changed in systemd (Ubuntu Xenial):
status: Incomplete → In Progress
Revision history for this message
Martin Pitt (pitti) wrote :

Fixed:
  http://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?id=4bd57cb
  http://anonscm.debian.org/cgit/pkg-systemd/systemd.git/commit/?id=e9b0a

I'll upload a new package tomorrow. I tested this with some fake DEVPATH and QEMU devices (DRIVER=="virtio_net"), but a full install/boot test on a real ppc64el machine with ibmveth would be highly appreciated once this lands, to make double-sure everything is correct.

Thanks! I'm happy about this solution, it keeps simple but predictable and stable names and avoids all the overhead and brittleness of the old generator.

Changed in systemd (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-11 18:44 EDT-------
*** Bug 136973 has been marked as a duplicate of this bug. ***

tags: added: bugnameltc-136930 severity-high
Revision history for this message
Martin Pitt (pitti) wrote :

As it's quite horrible to take apart the devpath number (like 71000102) in shell, and it's not entirely clear how that number is built up (i. e. how many digits are the bus number, and how many are the slot ID), would it be possible to fix the ibmveth kernel driver to export those two numbers separately as sysfs attributes? This would provide a much safer and more efficient matching and naming in userspace.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-12 08:58 EDT-------
As it's quite horrible to take apart the devpath number (like 71000102) in shell, and it's not entirely clear how that number is built up (i. e. how many digits are the bus number, and how many are the slot ID), would it be possible to fix the ibmveth kernel driver to export those two numbers separately as sysfs attributes? This would provide a much safer and more efficient matching and naming in userspace.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 229-4ubuntu4

---------------
systemd (229-4ubuntu4) xenial; urgency=medium

  * 73-special-net-names.rules: Further refine ibmveth naming.

 -- Martin Pitt <email address hidden> Tue, 12 Apr 2016 12:06:30 +0200

Changed in systemd (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-04-18 10:09 EDT-------
FYI: This change was indirectly validated via IBM bug 140408 (we forgot to warn some of the other test teams). ;)

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (4.7 KiB)

------- Comment From <email address hidden> 2016-05-09 19:14 EDT-------
==== State: Working by: furrer on 09 May 2016 17:50:45 ====

#=#=# 2016-05-09 17:50:43 (CDT) #=#=#
Action = [rejectfix]

I have just installed a fresh install of Ubuntu 16.04 on br16p05.aus.stglabs.ibm.com and followed the setup from the WIKI to use "eth0" as its network access. I completed the install and it was working as desired.

I then added two adapters to the partitions' configuration using the HMC GUI under I/O to add "As Desired" to the partition, a Mason adapter at location FHZ0145-P2-C6 and a Travis_EN adapter at location CSS00HL-P1-C5. I then rebooted the partition to activate both adapters in prep for an attempt at two TER completions.

I attempted to run "build_net help" to set up the network but when I ran the command, it appeared to hang. I exited the instance and checked the state of the network access by running the command "mount iofnim.aus.stglabs.ibm.com:/nim/build_net /root/test -o nolock" to make sure the test/tools directory was mounted. The command returned:

(0) root @ br16p05: /root
# mount iofnim.aus.stglabs.ibm.com:/nim/build_net /root/test -o nolock
mount.nfs: Failed to resolve server iofnim.aus.stglabs.ibm.com: Temporary failure in name resolution

I then checked the /etc/network/interfaces file I had modified for network access for the Ubuntu 16.04 according to the WIKI via .ISO and it is set up as it was when I originally set up the fresh install. Thinking the interface may have an issue, I ran "ifdown eth0" and it returned:

(0) root @ br16p05: /root
# ifdown eth0
ifdown: interface eth0 not configured

To double-check there is more at play here, I also ran "ifup eth0" just to see what it returned:

(0) root @ br16p05: /root
# ifup eth0
Cannot find device "eth0"
Failed to bring up eth0.

As the setup for eth0 DOES exist in /etc/network/interfaces, I ran "ifconfig -a" to view the existing interfaces and it returned this:

(0) root @ br16p05: /root
# ifconfig -a
enP1p160s0f0 Link encap:Ethernet HWaddr 00:c0:dd:10:1d:c4
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt:249

enP1p160s0f1 Link encap:Ethernet HWaddr 00:c0:dd:10:1d:c6
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt:250

enp1s0 Link encap:Ethernet HWaddr e4:1d:2d:54:70:82
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

enp1s0d1 Link encap:Ethernet HWaddr e4:1d:2d:54:70:83
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

ibmveth2 Link encap:Ethernet HWaddr 46:de:e7:23:26:02
BROADCAST MULTICAST MTU:1500 Metric...

Read more...

Revision history for this message
Anton Blanchard (anton-samba) wrote :

This fix creates a race when booting the Ubuntu cloud images unfortunately. The networking scripts race with udev renaming, and every now and then networking fails to come up because cloud-init thinks the device is eth0 but it has been renamed to ibmveth0 in parallel.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-02-13 19:37 EDT-------

wangyang (wangyang1)
Changed in systemd (Ubuntu):
assignee: Martin Pitt (pitti) → wangyang (wangyang1)
assignee: wangyang (wangyang1) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.