Kernel panic in find_busiest_group

Asked by M Janssen

We are experiencing a kernel panic in find_busiest_group function call (find_busiest_group+0x3f1/0xbb0); the issue seems to be causes by a division by zero.

The issue seems to be related to these bug reports:
• Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/824304
• Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1024309
• RHEL5: https://bugzilla.redhat.com/show_bug.cgi?id=644903

We are running a large number of servers using the 2.6.35-30-server kernel on HP blade and HP rack servers (physical servers, no vmware, etc). The servers have been stable for >200 days but startin som weeks ago servers started locking up (at least 7 servers up to now).

In the postings mentioned above I see large similarity with our issue.
- are we experiencing the same issue(s)?
- where can we find a released patch for this issue?
- or, which kernel version has such a patch included by default?

Kind regards

Michiel

What we see on screen after the kernel panic (no dumps, system is completely unresponsive):
--------------------------
[1502604.203815] [<ffffffff8159a67c>1 start secondary+Ox100/0x102
[1502604.203827] Code: 90 90 Bb 48 08 48 86 60 48 01 ca 48 39 e6 75 fZ 41 89 51 08 48 86 95 60 fe ff fl 48 Elb 75 98 86 4a 08 413 89 10 31 d2 48 cl cO Od <48> 17 ft 48 86 4d a0 48 89 45 90 31 GO 48 85 c9 74 Oc 48 86 45
[1502604.203870] RIP Wfffffff81050c71>1 find_busiest_group+0x3f1/0x660
[1502604.203877] RSP <ff1188052e223660>
[1502604.203902] 1 end trace 9e41a1d3ca5e1318
[1502604.203913] Hemel panic - not syncing: Fatal exception in interrupt
[15011604.203927] lid: 0, comm: swapper Tainted: G D 2.6.35-30-server 061-Ubuntu
[1502604.203930] Call Trace,
[1502604.203933] <IRQ> [afffffff8159ff65>1 panic+0x90/0x113
[1502604.203972] f<ffffifff815a421a>1 oops_end+0xea/Oxf0
[1502604.203981] [afffffff13100de01,>1 die+Ox56/0x90
[1502604.203993] 1<ffffffff815a3a84>1 do_trap+Oxc4/0x170
[1502604.204001] [(ffffffff810066ff>] do_diuide_error*Ox0f/Ox60
[1502604.204029] [afffffff81050c71>1 ? find_busiest_group+Ox3f1/0x660
[1502604.204037] Iaffffff1810513607>1 ? try_to_wake_up0x267/0x400
[1502604.204050] Wfffffff8100acfb>1 divide_error+Ox16/0x20
[1502604.204071] Wfffffff81050c71>1 ? find_busiest_group+Ox3f1/0x660
[1502604.204097] [affffffal11076f6>1 ? free_one_page+Oxla6/0x3f0
[1502604.204106] Wfffffff81057560>1 load_balance+Oxd0/0x520
[1502604.204125] Wfffffff81011609>1 ? sched_clock+0x9/0x10
[1502604.204136] [<ffffffff810118d3>I ? native_sched_clock+Ox13/0x60
[1502604.204147] f<ffffifff81057a99>1 rebalance_domains+0x99/0x180
[1502604.204162] [afffffff810571.9>1 run_rebalance_domains+0x49/Oxf0
[1502604.204187] 1<ffffffff8108a633>1 ? ktime_get+0x63/0xe0
[1502604.204202] [(ffffffff8106(162d>1 do_softirg+Oxbd/8x200
[1502604.204221] [afffffff8108f9ca>1 ? tick_prograo_event+Ox2a/0x30
[1502604.204229] Iafffffff13100afdc>1 call_softirq+Oxlc/0x30
[1502604.204252] Wfffffff8100ca65>1 do_softirg+0x65/0xa0
[1502604.204260] [(ffffffff8106134e5>1 irg_exit+0x135/0x90
[1502604.204287] [<ffffffff815aa730>I sop_apictimer_interrupt+0x70/0x96
[1502604.204306] [<ffffffff8100aa93>1 apic_timer_interrupt+0x13/0x20
[1502604.204312] <ED]> f<ffff11ff8130a664>1 7 intel_idle+Oxe4/0x180
[1502604.204335] [<ffffffff8130a647>1 ? intel_idle+Oxc7/00180
[1502604.204375] f<ffffifff814876c2>1 cpuidle_idle_call+0x92/0x140
[1502604.204425] [afffffff131008d93>1 cpu idle+Ox63/0x110
[1502604.204436] 1<ffffffff8159a67c>1 start_secondary+Ox100/0x102
[1502605.273609] panic occurred, switching back to text console
--------------------------

Question information

Revision history for this message
actionparsnip (andrew-woodhead666) said :
#1

2.6.35-30-server is a Maverick kernel. Maverick is EOL and no longer supported in ANY way you can name. I recommend you reinstall with a clean install of Precise which is supported til April 2017.

Revision history for this message
M Janssen (mjj4791) said :
#2

We are aware of the status; upgrading to Precise is planned.

However, we would like to have confirmation that this underlying issue for these panics is addressed in precise (or if additional patches are available for solving this issue).

Can you (or anyone else) confirm this?

Revision history for this message
actionparsnip (andrew-woodhead666) said :
#3

Maverick is dead, so it's pretty much mooted. Sorry

Can you help with this problem?

Provide an answer of your own, or ask M Janssen for more information if necessary.

To post a message you must log in.