Often kernel panick on Ubuntu 10.10

Asked by nazar

I have installed Ubuntu 10.10 on my laptop (CPU: Intel dual core 2,2GHz, RAM: 2GB). But after few days it begun to freeze completely with no reason. CapsLock indicator is blinking after that. I heard, maybe it means "kernel panick", but I don't know how to solve it and what is the reason for it. It can freeze when I'm using browser, office or even it is iddle...

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu linux Edit question
Assignee:
No assignee Edit question
Solved by:
nazar
Solved:
Last query:
Last reply:
Revision history for this message
mycae (mycae) said :
#1

Yes, Caps lock blinking means that there is a kernel panic. These are difficult to trace without a good working knowledge of the terminal and a bit of programming.

As a general test, try the following:
* Run the memtest utility from the ubuntu boot disk to check your ram is good (this can take a few hrs).
* Disable any modules that you are not using (eg wireless).
* Ensure your laptop has no excessive internal dust buildup, which can cause overheating. Disassembly is one option in laptops, but a dry burst of compressed air (if you have it) can help too (turn off + gently does it!).

Also, you might want to post the output of these commands from a terminal (Ctrl+Alt+T or Applications->Accessories->Terminal)

sudo lspci
sudo lsusb

Revision history for this message
nazar (jura-n) said :
#2

The tempherature of my CPU doesn't rise upper than 55°C for both cores. I didn't run memtest, but Ubuntu 10.04 and 10.04.1 (and other OS) never caused this problem. I saw in the forums that this problem happens with other users too, but I couldn't find a solution. If it can't be helped now, I'll try to install 10.10 again and watch logs after "kernel panic" occures/ Could you say what logs I have to watch?

 Here is the answer for sudo lspci:

00:00.0 Host bridge: nVidia Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: nVidia Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: nVidia Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: nVidia Corporation MCP79 Co-processor (rev b1)
00:04.0 USB Controller: nVidia Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB Controller: nVidia Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: nVidia Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: nVidia Corporation MCP79 PCI Bridge (rev b1)
00:0b.0 SATA controller: nVidia Corporation MCP79 AHCI Controller (rev b1)
00:0c.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
00:15.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
00:16.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
02:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9600M GT] (rev a1)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
04:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)

And sudo lsusb:

00:00.0 Host bridge: nVidia Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: nVidia Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: nVidia Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: nVidia Corporation MCP79 Co-processor (rev b1)
00:04.0 USB Controller: nVidia Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB Controller: nVidia Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: nVidia Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: nVidia Corporation MCP79 PCI Bridge (rev b1)
00:0b.0 SATA controller: nVidia Corporation MCP79 AHCI Controller (rev b1)
00:0c.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
00:15.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
00:16.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
02:00.0 VGA compatible controller: nVidia Corporation G96 [GeForce 9600M GT] (rev a1)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
04:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)

Revision history for this message
mycae (mycae) said :
#3

One more idea:

* Re-seat your ram -- sometimes after transport or motion, the ram connection can be loose at the micron scale, causing contact problems. make sure you remove most of the dust a minute or so before you re-seat the ram, so fresh falling dust does not get in the connectors.

 --
Unfortunately, the crash will probably be proceeded by an error on the microsecond scale, and you can't ask the computer to look at files for you, 'cause it has crashed. Similarly the kernel can't write to disk, as it has no idea if it safe to do so, because it is in a bad state. So looking at logs probably won't help.

You can save the output of the kernels "ring buffer" (its debug output memory area) with the command

dmesg > dmesg-log.txt

and you can even put this in a loop every (say) 10 seconds to continually grow a file.

while [ 1 ] ; do dmesg > dmesg-log.txt; sleep 10; done

But there is no guarantee that when you reboot the file will contain any error messages, as the detected problem will likely be immediately proceeded by a crash. It might spit out something useful, but it probably wont.

There is the kernel "oops" crash file, but debugging this requires a good knowledge of the workings of your CPU and your kernel, so its pretty much non-helpful for all but kernel programmers. The best way to debug this would be to try things like:

* do memory intensive computations versus just leaving the computer idle
* Disable kernel "modules" slowly with rmmod to see if the problem magically goes away (you have to know what you can reasonably take out, or your computer will become useless till the next reboot if you get it wrong (eg accidentally taking out your USB drivers -- look ma no keyboard!).

 A list of modules can be obtained with "lsmod" Modules can be removed with "sudo rmmod NAMEOFMODULE"

Finally the CPU temp, whilst sorta-useful is not the be all and end all -- other components can dissipate power too (eg power supplies/batteries) so doing a cleanout is still occasionally helful

Revision history for this message
mycae (mycae) said :
#4

Last thought:

If you are running the open-source graphics drivers, install the Nvidia ones and vice-versa (ie, switch your graphics drivers between your two options, open-source and proprietary). These modules are probably the most buggy from my experience.

Others might still have ideas.

* Also, don't forget memtest :)

Revision history for this message
nazar (jura-n) said :
#5

Thanks a lot. I used memtest, but there were no errors. And as I sayed, other OS including Ubuntu 10.04, work properly, kernel panic never occures. Anyway, I don't want to open my notebook because I can lost my warranty...
I like an idea of disabling kernel "modules". I use Nvidia proprietary driver on both versions of Ubuntu. I'll try to use open-source driver. Unfortunatelly, it will take a lot of time, so the results I can post at least in few days...

* This is not the only problem on 10.10, but the main for me. I think, maybe the best solution is to stay on 10.04, but I have one problem there, described in my other question:
https://answers.launchpad.net/ubuntu/+question/137940
This problem is not present in 10.10.

Revision history for this message
nazar (jura-n) said :
#6

Finally I have solved my the only problem with Ubuntu 10.04 so I will use it, because there is not one problem with 10.10...
And also testing what causes kernel panic on 10.10 will take much time, that I don't have for now...