bionic/linux: 4.15.0-172.181 snap-debs snap:pc-kernel

Bug #1964213 reported by Kleber Sacilotto de Souza
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kernel SRU Workflow
Fix Released
Medium
Unassigned
Snap-certification-testing
Incomplete
Medium
Canonical Hardware Certification
Snap-prepare
Fix Released
Medium
Canonical Kernel Team
Snap-release-to-beta
Fix Released
Medium
Canonical Kernel Team
Snap-release-to-candidate
New
Medium
Canonical Kernel Team
Snap-release-to-edge
Fix Released
Medium
Canonical Kernel Team
Snap-release-to-stable
New
Medium
Canonical Kernel Team

Bug Description

This bug will contain status and test results related to a kernel source (or snap) as stated in the title.

For an explanation of the tasks and the associated workflow see:
  https://wiki.ubuntu.com/Kernel/kernel-sru-workflow

-- swm properties --
issue: KSRU-1303
kernel-stable-master-bug: 1964240
phase: Certification Testing
phase-changed: Thursday, 10. March 2022 19:56 UTC
reason:
  snap-certification-testing: Stalled -s testing FAILED
snap-name: pc-kernel
variant: snap-debs
versions-clamp:
  parent: 4.15.0-172.181
  self: 4.15.0-172.181

tags: added: kernel-release-tracking-bug-live
description: updated
tags: added: kernel-sru-cycle-2022.02.21-3
description: updated
description: updated
tags: added: kernel-sru-derivative-of-1964240
Changed in kernel-sru-workflow:
status: New → Confirmed
importance: Undecided → Medium
Changed in kernel-sru-workflow:
status: Confirmed → Triaged
summary: - bionic/linux: <version to be filled> snap-debs
+ bionic/linux: <version to be filled> snap-debs snap:pc-kernel
description: updated
Changed in kernel-sru-workflow:
status: Triaged → In Progress
tags: added: kernel-jira-issue-ksru-1303
description: updated
summary: - bionic/linux: <version to be filled> snap-debs snap:pc-kernel
+ bionic/linux: 4.15.0-172.181 snap-debs snap:pc-kernel
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Paul Larson (pwlars) wrote :
Download full text (14.9 KiB)

I'm seeing a regression with this kernel on dawson-i nuc (NUC7i3DNHE) on the clocktest test in checkbox. This is a test that checks for cpu clock jitter, and I retried it multiple times and it is very reproducible. I also reverted to the kernel in stable and failed to reproduce it there. You can find the code for this test at https://git.launchpad.net/plainbox-provider-checkbox/tree/src/clocktest.c or you can also install the checkbox snap (uc18/stable) and run "checkbox.checkbox-cli run '.*clocktest.*'" to run it from the snap.

When the test passed for me (on the previous kernel) the output looked like this:
Testing for clock jitter on 4 cpus
PASSED: largest jitter seen was 0.000167

Testing clock direction for 5 minutes...
PASSED: Iteration 0 delta: 0.000229
PASSED: Iteration 1 delta: 0.000259
PASSED: Iteration 2 delta: 0.000242
PASSED: Iteration 3 delta: 0.000235
PASSED: Iteration 4 delta: 0.000247
clock direction test: sleeptime 60 sec per iteration, failed iterations: 0

When it failed on this kernel, it took much longer to run. This was the output:
Testing for clock jitter on 4 cpus
ERROR: jitter = 0.254654 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 50, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236886 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 148, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236831 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 181, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.339615 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 197, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236866 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 215, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236817 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 522, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.254743 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 549, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.254881 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 617, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.254600 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 619, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236812 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 736, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236946 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 760, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.339465 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 821, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.254704 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 934, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236925 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 937, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.254870 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 978, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236836 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 1000, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.236932 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 1057, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.254800 Jitter must be < 0.2 to pass
ERROR: Failed Iteration = 1112, Slowest CPU: 0 Fastest CPU: 3
ERROR: jitter = 0.23...

tags: added: certification-testing-failed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Certification Testing FAILURE

The bug was tagged as certification-testing-failed

description: updated
description: updated
description: updated
Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

Do you know if the previous kernel that was tested was the CRD kernel recently released to updates, the 4.15.0-171.180 one? I ask because we are confused why this test would be failing now, as this spin is a merge of the last two kernels and doesn't introduce anything new except for a security fix (which shouldn't be affecting anything to do with timing).

Revision history for this message
Paul Larson (pwlars) wrote :

Yes, it appears the one in stable is 4.15.0-171.180, which is what I installed to confirm that it was only reproducible in the newer kernel:
  18/stable: 4.15.0-171.180 2022-03-09 (925) 227MB -

Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

Okay thank you for confirming that. After trying to reproduce this issue both locally in a vm and on Dawson-i I could not reproduce it. On dawson-i I could get the jitter a few orders of magnitude higher by putting the system under stress but still couldn't get the jitter that you reported.

Just to get more of an idea, do you know if there was anything else running on the system or if you got the same failures after a reboot or something like that. The nature of this test seems to be very susceptible to differences in cpu performance or scheduling differences that could be causing the jitter. But what worries me is that you seemed to get pretty consistent results and ran it multiple times.

I admittedly have only tested the regular kernel from proposed and will test the snap kernel to see if there is a difference for whatever reason.

Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

I reran the test on the NUC7i3DNHE with the snap kernel and still couldnt reproduce it, even under heavy load. I'm not entirely sure what is wrong and would hate to just label it as some transient timing/hardware problem, but I am not really sure what else to test/where to look.

Revision history for this message
Zachary Tahenakos (ztahenakos) wrote :

Hey Paul,

Would it be possible to get a copy of the dmesg output when the issue occurs? Depending on how long it takes to reproduce it and if the nuc is particularly noisy in the logging dept the log size may have to be increased. Along the lines of time it takes to reproduce, do you have a feel for how often the issue occurs? (I also wonder if we should make a dedicated lp bug at this point for this issue and continue the conversation there..)

Thanks,
Zack

Revision history for this message
Jonathan Cave (jocave) wrote :

I've added a dmesg capture to the failing test and the result is attached. Note I curtailed the output because it was completely full of these Bluetooth warnings. I've not encountered warnings like this on our devices before. Could this be impacting the clocktest result?

Andy Whitcroft (apw)
tags: removed: kernel-release-tracking-bug-live
Changed in kernel-sru-workflow:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.