Comment 4 for bug 1772675

Revision history for this message
Dan Streetman (ddstreet) wrote : Re: Intel i40e PF reset due to incorrect MDD detection (continues...again...)

@terryh-orcas,

if you are able to reproduce the problem relatively quickly and easily, then I suggest testing different kernel versions, up to the latest upstream, to see if and where it may be fixed with a newer i40e kernel driver. You can get upstream kernel debs here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D

If you can narrow down the kernel to a specific short range (i.e. kernel X definitely fails, kernel Y never fails), I can review the upstream i40e driver for specific changes to backport.

If you can't reproduce it easily/quickly, there is another method of debug involving undocumented i40e register modification. See bug 1723127 comment 10 for details. If you try that method, you should attempt it with the latest kernel you can reproduce the problem with. As I don't have the chipset specifications, if you do reproduce it this way and can isolate the problem to a specific register/bit, I'll have to take that info back to Intel to ask them for clarification. Also note that there are 2 registers that you have to test each bit individually for, so this method can take a very long time if it takes you a long time to reproduce the problem.

Unfortunately, as has been mentioned in this and past bugs, the MDD event is generated by the i40e firmware and there is no documented way to tell what the i40e kernel driver did that the firmware didn't like (assuming it was something the driver did, and not external or firmware issues). Intel does update their upstream i40e driver with fixes for MDD firmware/driver bugs regularly, so this will likely only be fixed by a patch coming from Intel upstream, that we need to backport to our older stable Ubuntu kernel(s).

Sorry I can't help more.