Comment 27 for bug 1796292

Revision history for this message
Ryan Harper (raharper) wrote :

After looking at the logs, I believe curtin is doing all that it can to shut down the bcache device.

Waiting 1200 seconds is more than reasonable and so I believe there is something else going on in the bcache device in the kernel at this point. I would like to either open a task against the kernel or start a new bug with these details.

Some conversation from IRC for context.

<rharper> that's a lot of seconds to wait
<rharper> I wonder if there's a kernel bug in there; is it reasonable to wait 2400 seconds? 3600 seconds?
<jhobbs> no
<rharper> jhobbs: do you know if we capture dmesg from the host?
<jhobbs> that's all ridiculous
<jhobbs> for nvme...
<rharper> right; so it smells like a bug, or deadlock in bcache itself; I found *many* of those while working on a more reliable way to stop them
<rharper> jhobbs: so I'm generally happy with the curtin code; I think we're doing a reliable job of shutting them down and waiting a more than reasonable amount of time for the device to stop at this point. We may need to open up a different/new issue against the kernel to see if we can get to the bottom of why it isn't shutting down;
<jhobbs> rharper: that log is from the node logging syslog to maas, we don't have any hooks in there to get dmesg during install
<rharper> jhobbs: ok