How to diagnose 100% CPU spike when system is not responsive

Asked by Karl

I am having an intermittent issue where my 8.04 system spikes CPU to 100% and becomes unresponsive to console commands. Cron processes become delayed and running processes are also hung until the load drops.

I tried running top as a batch command every 60 seconds but output simply shows low CPU in the minute before the issue and then there is no entry for 5 minutes and some change with CPU again very low. The only indication of issue is the lag and the load average showing above 11.

I can not find any logging in dmesg, syslog, messages, etc. with error or issue.

How can I identify what is causing my CPU spikes? I have already restarted my top with the default 3 second interval in hopes that I can catch a process ramping up CPU load.

Any suggestions would be appreciated.

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Tom (tom6) said :
#1

Hi :)

Ok, this is going to seem dumb & unrelated but please could you reformat your Swap partition and give us the output of

free -m

to check that ram & swap are at good ratios to each other. It might be a good idea to do a "mem test" from the reboot menu or LiveCd but it takes ages so perhaps leave that for a time when you are off for lunch or perhaps overnight?

Good luck and regards from
Tom :)

Revision history for this message
Karl (kputz) said :
#2

Thanks Tom.

Output from free:
root@eti001:~# free -m
             total used free shared buffers cached
Mem: 3948 830 3118 0 151 442
-/+ buffers/cache: 236 3712
Swap: 3129 0 3129

I'll run the mem test as soon as possible.

If this has any bearing, this is a Dell Poweredge R610 and I did notice some dmesg and syslog entries around ACPI and clock initially. I tried noacpi which seemed to resolve the CPU spikes, but this system is running Asterisk which needs a solid timer. So we are currently running with clocksource=hpet. I have not seen any logging indicating a timer issue since using hpet.

Karl

Revision history for this message
Tom (tom6) said :
#3

Hi :)

Well you appear to have plenty of Ram & swap for this sort of thing. To be able to use hibernate mode the swap should be larger than ram but anything over 2Gb of ram is plenty. Just avoid closing down to hibernate or sleep modes. A full shutdown is much faster and safer anyway.

The fact that your ram appears to be shown as just under 4Gb of ram makes me wonder if you are using the 64bit version of Ubuntu? I often do have troubles with the 64bit and find the 32bit version works much more smoothly. Do you havea separate /home partition or do you have plenty of room to try the 32bit version to see if the gives you the same problems? or do you prefer the 64bit version apart from this particular issue?

Regards from
Tom :)

Revision history for this message
Thomas Krüger (thkrueger) said :
#4

You should try to run top with highest priority:
sudo nice -n -20 top -b -d 30 > top.out

If that doesn't work you might have a hardware problem. Check you RAM (memtest) and your HDD (smartctl) first. Then remove all hardware that is not absolutely required, try an other graphics card and so on...

Can you help with this problem?

Provide an answer of your own, or ask Karl for more information if necessary.

To post a message you must log in.