Comment 1 for bug 1951289

Revision history for this message
dann frazier (dannf) wrote :

I can reproduce on scobee w/ latest LTP from git. This issue is also reproducible on the previous hirsute kernel (5.11.0-40.44), so does not appear to be a regression. scobee has 4 x 32 core NUMA nodes, totaling 128 cpus. Interestingly I could not reproduce on a system w/ the same SoC but only 3 NUMA nodes (96 cpus).

It isn't clear to me what exactly the test thinks is wrong, but I did find something interesting. I captured /proc/schedstat at the time of the failure and I noticed that half of the cpus (0-31, 64-95) have 4 sched domains (domain0-domain3), while the other half (32-63, 96-127) only include the first 3 sched domains. I'll attach the full file, but here's a snippet contrasting the entries for cpu31 and cpu32:

----------------
cpu31 0 0 0 0 0 0 21656590030 600326550 26576
domain0 00000000,00000000,00000000,ffffffff 0 0 0 [...]
domain1 00000000,00000000,ffffffff,ffffffff 0 0 0 [...]
domain2 00000000,ffffffff,ffffffff,ffffffff 0 0 0 [...]
domain3 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 [...]
cpu32 0 0 0 0 0 0 5351733990 838031470 39918
domain0 00000000,00000000,ffffffff,00000000 0 0 0 [...]
domain1 00000000,00000000,ffffffff,ffffffff 0 0 0 [...]
domain2 00000000,ffffffff,ffffffff,ffffffff 0 0 0 [...]
----------------

Note that domain3 is the domain that comprises all CPUs. If that is what the test is looking for, then it would make sense that the test would begin to fail at CPU32. I turned on sched-domain debugging (/sys/kernel/debug/sched_debug), and verified that it seems to match what I see in /proc/schedstat. Specifically, that CPU32 is not assigned a domain-3:

[18683.213478] CPU31 attaching sched-domain(s):
[18683.213480] domain-0: span=0-31 level=MC
[18683.213485] groups: 31:{ span=31 }, 0:{ span=0 }, 1:{ span=1 }, 2:{ span=2 }, 3:{ span=3 }, 4:{ span=4 }, 5:{ span=5 }, 6:{ span=6 }, 7:{ span=7 }, 8:{ span=8 }, 9:{ span=9 }, 10:{ span=10 }, 11:{ span=11 }, 12:{ span=12 }, 13:{ span=13 }, 14:{ span=14 }, 15:{ span=15 }, 16:{ span=16 }, 17:{ span=17 }, 18:{ span=18 }, 19:{ span=19 }, 20:{ span=20 }, 21:{ span=21 }, 22:{ span=22 }, 23:{ span=23 }, 24:{ span=24 }, 25:{ span=25 }, 26:{ span=26 }, 27:{ span=27 }, 28:{ span=28 }, 29:{ span=29 }, 30:{ span=30 }
[18683.213582] domain-1: span=0-63 level=NUMA
[18683.213586] groups: 0:{ span=0-31 cap=32768 }, 32:{ span=32-63 cap=32768 }
[18683.213599] domain-2: span=0-95 level=NUMA
[18683.213604] groups: 0:{ span=0-63 cap=65536 }, 64:{ span=64-95 cap=32768 }
[18683.213615] domain-3: span=0-127 level=NUMA
[18683.213622] groups: 0:{ span=0-95 mask=0-31 cap=98304 }, 96:{ span=64-127 mask=96-127 cap=65536 }
[18683.213655] CPU32 attaching sched-domain(s):
[18683.213658] domain-0: span=32-63 level=MC
[18683.213662] groups: 32:{ span=32 }, 33:{ span=33 }, 34:{ span=34 }, 35:{ span=35 }, 36:{ span=36 }, 37:{ span=37 }, 38:{ span=38 }, 39:{ span=39 }, 40:{ span=40 }, 41:{ span=41 }, 42:{ span=42 }, 43:{ span=43 }, 44:{ span=44 }, 45:{ span=45 }, 46:{ span=46 }, 47:{ span=47 }, 48:{ span=48 }, 49:{ span=49 }, 50:{ span=50 }, 51:{ span=51 }, 52:{ span=52 }, 53:{ span=53 }, 54:{ span=54 }, 55:{ span=55 }, 56:{ span=56 }, 57:{ span=57 }, 58:{ span=58 }, 59:{ span=59 }, 60:{ span=60 }, 61:{ span=61 }, 62:{ span=62 }, 63:{ span=63 }
[18683.213762] domain-1: span=0-63 level=NUMA
[18683.213767] groups: 32:{ span=32-63 cap=32768 }, 0:{ span=0-31 cap=32768 }
[18683.213779] domain-2: span=0-95 level=NUMA
[18683.213784] groups: 32:{ span=0-63 mask=32-63 cap=65536 }, 64:{ span=64-95 cap=32768 }
[18683.213855] CPU33 attaching sched-domain(s):
[18683.213856] domain-0: span=32-63 level=MC