Diskless system hangs during boot (Karmic)

Asked by Cavendish McKay

I have a cluster that I am trying to upgrade to Karmic. The master node works perfectly, but the (diskless) compute nodes fail to boot. It appears that the failure happens while the system is trying to mount its filesystems. / is a ram disk, while /usr and /home are mounted via nfs.

/etc/fstab contains:
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/ram0 / ext2 noatime 0 1
proc /proc proc defaults 0 0
192.168.0.100:/usr /usr nfs hard,intr,ro 0 0
192.168.0.100:/home /home nfs hard,intr,rw 0 0

and the console output just before the hang is:
[ 12.270005] IP-Config: Got DHCP answer from 192.168.0.100, my address is 192.168.0.15
[ 12.444064] IP-Config: Complete:
[ 12.462325] device=eth0, addr=192.168.0.15, mask=255.255.255.0, gw=255.255.255.255,
[ 12.510429] host=node15, domain=nahuatl.marietta.edu, nis-domain=(none),
[ 12.553176] bootserver=192.168.0.100, rootserver=192.168.0.100, rootpath=
[ 12.598183] RAMDISK: gzip image found at block 0
[ 13.112185] VFS: Mounted root (ext2 filesystem) on device 1:0.
[ 13.147137] Freeing unused kernel memory: 524k freed
[ 13.177025] Write protecting the kernel read-only data: 4968k
init: procps main process (110) terminated with status 255^M
One or more of the mounts listed in /etc/fstab cannot yet be mounted:
(ESC for recovery shell)
/usr: waiting for 192.168.0.100:/usr
/home: waiting for 192.168.0.100:/home

I'm hoping that the problem is a simple configuration error on my part. I have read a bit about race conditions between statd and portmap, but since I don't get any error messages of that kind, I'm not convinced that is what is happening here. The "init: procps main process (110) terminated with status 255" line looks like a likely marker of the problem to me, but I can't figure out what would be causing init/upstart to die.

I'm kind of at a loss as to how to go about debugging this further. I've been fiddling with the scripts in /etc/init on the ram disk image, in the hopes that the problem is upstart related and I can fix it there, but I don't feel like I'm going about it in a particularly productive or systematic way. The upstart scripts in /etc/init are clearly getting run (or at least started), because I can put echo statements in them which then appear on the console.

Any pointers on how to better diagnose the problem (or even better, to fix it) would be greatly appreciated.

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu util-linux Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Philip Muškovac (yofel) said :
#1

There is a known issue with mountall hanging on mounts that cannot be mounted (you whould get a prompt on the splash if you want to skip, drop to a shell or continue waiting). Can you see if adding '_netdev' (the '_' is intentional) to the nfs mount options helps? That should make mount wait until the network is set up until trying to mount the nfs mounts.

Can you help with this problem?

Provide an answer of your own, or ask Cavendish McKay for more information if necessary.

To post a message you must log in.