Comment 3 for bug 1849682

Revision history for this message
dann frazier (dannf) wrote :

OK - this is a messy one. It is due to the backport of this:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

Reverting that is probably not the right answer because the point of it is to avoid corruption. But this is a pretty serious usability issue. It is not at all clear from the message that a user needs to do *something* - and what that *something* is is even less clear:

Here's the message, buried in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524

So if you understand from that that you need to pass a kernel parameter, you're more intuitive than I am. And if you understand from that *why*, and *to which one* - well, you probably wrote the patch. And even then, you probably didn't realize the parameter is actually incorrect (HINT: we should backport this as well: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571).

IMO, the error message should include a URL to page with clear steps on how to proceed which I think is something along the lines of "Use mdadm to figure out when your array was created, figure out what kernel you were running back then (ideally with a mapping to Ubuntu release), and then how to fix it.

That said, it isn't clear to me why we saw this issue on this specific machine. This issue is supposedly restricted to only multi-zone RAID0 configs, which should only happen if not all members are the same size. But I happen to know that all members on this system here *are* the same size! I've tried to reproduce it but, after redeploying the system with MAAS, it upgrades and reboots w/o error :(