How to trouble shoot crash?

Asked by Adrian Mariano

After a while (a few minutes? an hour?) my newly installed Ubuntu running 15.04 freezes. I get no mouse cursor, the keyboard is not responsive, even to ctrl-alt-F# to switch to a different terminal. The mouse is still on (red light). I tried replugging devices with no effect.

How can I troubleshoot this failure?

The machine is still up: I can ssh in from another host. I did this and started killing processes. In fact, I killed all user processes (except my ssh) but I still don't regain control of the console. I tried to kill X and it is listed as defunct but I still don't get the console back.

Any suggestions on how to regain control of my console and/or figure out what is going on and how to stop this crash from happening?

Question information

Language:
English Edit question
Status:
Expired
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
actionparsnip (andrew-woodhead666) said :
#1

I test the RAM using Memtest86 from Grub as a good starting point. Red means some or all RAM is bad

Revision history for this message
Adrian Mariano (adrign) said :
#2

Tested the RAM and passed with no errors. Any analogous way to test the video card? When the crash happens the machine is still running, since I can connect remotely. Only (?) the display is frozen. Note also that the machine worked, apparently without any problems, under Windows.

Revision history for this message
actionparsnip (andrew-woodhead666) said :
#3

If you install openssh-server can you connect from another PC via SSH

The fact it worked under Windows is of little value. You could have wiped Windows off, installed Linux and then the HDD fails. Doesn't mean Linux is to blame.

Revision history for this message
Adrian Mariano (adrign) said :
#4

Yes, I can connect from another computer via ssh. So what? As I already noted, when the console is frozen I can still ssh in from other machines without trouble. And again as I previously noted, I tried to kill processes to regain control of the console and could not find a process whose death gave me the console back. Even after all the processes associated with the console login are gone, and I send Xorg a kill signal, the console remains frozen.

While it is *possible* that the machine worked just fine for years and then experienced catastrophic hardware failure the moment after I installed Linux, it is exceedingly unlikely. Therefore, the fact that the machine worked under windows is information that may help understand the problem. Most types of hardware failure are going to cause trouble for both Windows and Linux.

Revision history for this message
actionparsnip (andrew-woodhead666) said :
#5

If you can ssh in, run:

dmesg

As well as the last few lines of /var/log/Xorg.0.log

Revision history for this message
actionparsnip (andrew-woodhead666) said :
#6

It's not exceedingly unlikely at all. It's a perfectly reasonable suggestion and does happen. In fact installing a new operating system puts great wear on the drive compared to regular usr so is more likely. Think about it.

Revision history for this message
Adrian Mariano (adrign) said :
#7

It's not at all obvious to me that installing an OS will "put great wear" on the drive compared to regular use. It depends on how drives fail. Years ago, the thing that my systems administrator feared most was powering drives off because apparently they would frequently fail when powered on---I guess due to wear in the bearings, which happens independently of writing.

In any case, I ran 'e2fsck -c -c' on the disk. It did give the message "updating bad block inode" and then later "FILE SYSTEM WAS MODIFIED". But it didn't give information about how many bad blocks it found.

It took quite a while for the system to crash again. (The previous time it crashed within a few minutes of being booted.) I did see in syslog the following:

Jul 2 08:07:11 babbage whoopsie[621]: [08:07:11] Parsing /var/crash/linux-image-3.19.0-21-generic.218508.crash.
Jul 2 08:07:11 babbage whoopsie[621]: [08:07:11] Uploading /var/crash/linux-image-3.19.0-21-generic.218508.crash.
Jul 2 08:07:11 babbage whoopsie[621]: [08:07:11] Sent; server replied with: Couldn't connect to server
Jul 2 08:07:11 babbage whoopsie[621]: [08:07:11] Response code: 0

It appears, actually, like it's making periodic unsuccessful attempts to upload this file. I wonder if it is a proxy configuration issue.

When it did finally crash I looked at Xorg.0.log and it showed no messages of note. I didn't see anything in syslog either. (The last message appears to be something about restarting the CUPS service.)

Revision history for this message
Launchpad Janitor (janitor) said :
#8

This question was expired because it remained in the 'Open' state without activity for the last 15 days.