Comment 34 for bug 1615021

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-09-15 17:13 EDT-------
> On Tue, Sep 13, 2016 at 06:20:49PM -0000, bugproxy wrote:
[...]
> Based on the feedback from <email address hidden>, it does not appear that the
> buggy udev rule is blocking progress on this bug.
>
[...]
> > I should also ammend my previous comment by saying, if Canonical has some
> > suggestions of how to gather more information in order to help debug this,
> > they should let us know and we can make test runs for them.
>
> My previous suggestion to gpiccoli on IRC was to modify the initrd to dump
> the state of the udev database at a point after the hang. I haven't seen
> such output attached here; does that mean it's not possible to produce such
> results because the kernel hard locks? Currently the only debugging
> information I've seen is that the /lib/debian-installer/start-udev script
> never returns, but that does not mean the kernel has locked up - it only
> shows that udev believes it has not finished processing. I would still like
> to see a dump of the udev database at the point of the hang, not just a udev
> debug log showing processing up to that point.
>
> Is this problem only reproducible with the X710 ethernet adapter? Is this a
> removable ethernet adapter, and have you tested what happens if it's
> removed? If it's not removable, have you tested what happens if you
> blacklist the i40e driver? The ethernet driver may be a complete red
> herring, and the problem may be with something that normally happens after
> ethernet driver initialization rather than with the ethernet driver itself.
>
> I would also have asked whether this could be an issue with the console
> output being redirected to some different device, but since Guilherme
> indicated that the problem appeared to be racy, with boot to the installer
> sometimes succeeding, that seems unlikely to be the problem.
>
> If you can reproduce this problem with the cloud image from
> <http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-
> ppc64el-disk1.img>,
> that would present additional debugging opportunities since that uses a
> standard Ubuntu initramfs instead of the installer initramfs and will
> support various 'break=' options to interrupt the boot and introspect the
> system state.

Vorlon, thanks very much for your assistance. In fact, your ideas were useful and we tried many of them. And finally we seem to have figured what's going on hehehe

Firstly, our bad trials:

i) "udev info -e" was impossible to accomplish in a bad boot, because even if I try to run it as one of the first things in init, the system seems still hangs.

ii) Adding modprobe blacklist to any driver makes things work. In fact, I added the command-line "vorlon" and it worked too hehehe

iii) I wasn't able to test this Cloud image - never installed this before, is it a complete functional image? I wondered if it needs to be write directly on the disk, perhaps...

Anyway, after all the analysis we finally observed something important: by putting any command-line we ended up overwriting the default cmdline, and that was the reason of _any_ command-line worked.
Now, the default cmdline was: "console=hvc0 console=tty1", so I guess the installer was booting normally, but with the output redirected to tty1!
It was that simple mistake that makes things show as hanged. We were reading that default command-line from Petitboot, the ppc64 bootloader.

By the way, you already suggested that might be a console output redirection vorlon, so unfortunately I was unable to figure it until today.
Thanks very much for the help!

Now, let me ask: is it expected that Ubuntu redirects output preferably to tty1, or to the last "console" presented in cmdline?

Cheers,

Guilherme