Bug #71212 “System hangs when copying to NFS mounts” : Bugs : linux-source-2.6.17 package : Ubuntu

Ben Collins (ben-collins) on 2006-11-13

Changed in linux-source-2.6.17:
importance:	Undecided → High
status:	Unconfirmed → Confirmed

Revision history for this message

Matt Thrailkill (matt-modestolan) wrote on 2006-11-22:

#1

This happens to me also. Nothing shows up in any logs. Its not a full freeze, I can ssh in and do things.. but something is messed up. Alot of things don't work. This needs to be fixed.

Revision history for this message

SixDays (oscar-dix) wrote on 2006-11-22:

#2

Same here.
I have tried ruling out the hardware by changing NICs, checking the hard drives in every which way possible.
Tried copying using a dapper liveCD and it works perfectly, which rules out hardware malfunctioning.

Tried switching between nfs-kernel-server and nfs-user-server, same result. This bug really pisses me off, mostly since it renders the computer unusable when I copy files over NFS. Moving 10 gigs of data takes quite a while so this i think is a rather urgent bug to fix.

I have not yet discovered if this is only related to my P4 machine, with an Intel motherboard or if it is cross-hardware "compliant". Will test, and if same error occurs on my other machines I will post info on it after this message.

Hardware: P4@2.40, 768 mb ddr 2700, xubuntu 6.10

Revision history for this message

SixDays (oscar-dix) wrote on 2006-11-22:

#3

Tried to use another machine running xubuntu 6.10 but on an amd xp 3000 barton cpu, 512 mb ram.

It seems like the same thing happens, and the receiving box stated in my previous comment goes to 100% system load.

using midnight commander i tried to copy an dvd iso image from the AMD machine to the INTEL box.

after transfering 53% i got an error message stating: "File size exceeded." I accidentially lost the original message so that is what I recall from my memory.

Revision history for this message

Surfraz Ahmed (surfraz) wrote on 2006-11-25:

#4

After upgrading one of my workstations to edgy I had major problems with NFS (/home drive mounted over NFS). Dapper workstations have not had any problems.

I was only able to resolve the problem by compiling the 2.6.18.2 kernel from kernel.org with the config from the edgy kernel.

If you have a terminal window open when the problem occurs you may see some messages as mentioned in bug #65827 (or type dmesg when the problem occurs). If so can you mark this bug as a duplicate of #65827?

Thanks

Revision history for this message

Vanessa Dannenberg (vanessadannenberg) wrote on 2006-12-10:

#5

I am running Edgy as well and can confirm that this bug still exists while using the 2.6.19 kernel (compiled from kernel.org using a .config that has worked for me for months). I tried 2.6.18.3 and 2.6.18.1 also, but did not get the chance to confirm for certain if the bug exists while using those kernels. However, NFS did seem to perform more consistantly there.

I can also confirm that scp'ing to the same server that's NFS-mounted does not induce hangs of any sort, and copies at a faster and more consistant speed as well. Using NFS: 4.5 MB/sec with a sawtooth-like variance in transfer speed (watching my network meter) SCP: 6.6+MB/sec with a fairly steady transfer rate.

Please someone fix this bug soon!

Revision history for this message

Vanessa Dannenberg (vanessadannenberg) wrote on 2006-12-10:

#6

.config file for 'rainbird' using kernel 2.6.18.2 Edit (34.4 KiB, text/plain)

Following-up my last post ... I tried out both 2.6.19-git15, which has a patch that seems was intended to fix this bug, but that was no good. In fact, my maching locked up solid when the bug struck - not even Alt-SysRq would work, just the "magic" reset button.

I tried rolling back to 2.6.18.2 as per Surfraz's suggestion, but that was also a no-go. The bug exists for me in that version also.

I've attached my current .config for 2.6.18.2 in case there's some obscure driver in there that's interfereing with NFS and causing the bug.

Revision history for this message

Vanessa Dannenberg (vanessadannenberg) wrote on 2006-12-10:

#7

More tests on my end: 2.6.18 (first release) on my box also exhibits this bug. 2.6.19 on my husband's box (nearly identical hardware and same distro/version) also has the bug. Going on something I found on the web, I tried mounting the filesystem via TCP (my server is configured to support it), but that does not help.

I have not been lucky enough to see the kernel messages mentioned in bug #65827 either in dmesg, my logs, the console, or the terminal doing the copy, and no other messages are generated either.

Revision history for this message

Vanessa Dannenberg (vanessadannenberg) wrote on 2006-12-10:

#8

Figured I'd continue testing and now it's getting interesting!

As with the last tests, I'm using fairly large (over 2GB) files and copying to normal NFS mounts.

There are three machines involved here. Rainbird (my box, Edgy, 2.6.19), Swan (husband, Edgy, 2.6.19), and Stork (our server, Breezy, 2.6.17).

As already mentioned, if I tell Rainbird to push a file to Stork, it hangs and the transfer rate is slow. Cancelling the copy is very difficult, taking several second to a minute.

The same thing happens if I tell Rainbird to push a file to Swan, or if I tell Swan to push to Stork. Something interesting here however - watching Swan's network meter (wmnet) during those periods when Rainbird is hung, I can see that there is still data transfer - more in fact than when Rainbird is responding normally.

HOWEVER, if I log into Stork and instruct it to pull from Rainbird, it works fine - no hangs. The transfer averages 8.7 MB/sec and stays fairly smooth. The same thing happens if I instruct Swan to pull a file from Rainbird - no problems whatsoever.

So basically, it doesn't seem to matter which two machines are involved, as long as the data is being *pulled* from from the source machine to the destination machiner, rather than pushed from the source as would be the norm.

Revision history for this message

Vorik (launchpad-gerapeldoorn) wrote on 2006-12-12:

#9

same issues here. Really annoying.

Revision history for this message

Surfraz Ahmed (surfraz) wrote on 2006-12-13:

#10

If it helps, when I was playing around with this, I found that switching to nfs-user-server on the server side stopped gnome from locking up. To do this you need to make sure exportfs does not contain any options that nfs-user-server does not support, then run 'aptitude install nfs-user-server ' and reboot. This is not a fix just a workaround, that solved the problem for me. All this makes me want to switch to samba/cifs.... if only I could get it to automount on client logon...

Revision history for this message

Vanessa Dannenberg (vanessadannenberg) wrote on 2007-01-09:

#11

My two client machines, Swan and Rainbird, have since been updated to the full 2.6.19 release, while Stork still sits at 2.6.17. Problem still exists, however there is something noteable:

If I also update Stork (server) to 2.6.19, something new happens - the hangs seem to go away but the overall transfer rate is maybe 1.3MB/sec, no matter which way the copy goes or which machine does it (thus invalidating my previous "push" and "pull" tests).

Someone PLEASE FIX THIS BUG!

Revision history for this message

Paul Natsuo Kishimoto (khaeru) wrote on 2007-01-19:

#12

Test commands Edit (510 bytes, text/plain)

I'm also experiencing this bug, between a server running edgy ubuntu-server with nfs-kernel-server installed, and a desktop running edgy ubuntu-desktop with nfs-common installed.

Whether I transfer the files through using Nautilus or a terminal, transfer TO the NFS share hangs my desktop (completely but temporarily; Ctrl-Alt-Backspace will not work, but the desktop is entirely usable when transfer finishes); transfer FROM the share proceeds at nearly the same speed, but doesn't result in any noticeable slowdown of GNOME.

I've attached the result of some tests I found in the official NFS HOW-TO: http://nfs.sourceforge.net/nfs-howto/ar01s05.html. Again, the former freezes GNOME; the latter does not. I can't find any useful information in dmesg. The speeds are comparable to using SCP.

Revision history for this message

Vanessa Dannenberg (vanessadannenberg) wrote on 2007-01-19:

#13

Download full text (9.3 KiB)

I've updated my server and one of my client machines to 2.6.20-rc5 just as a test, and there's some changes here. First of all, most of the hangs are gone, but not entirely. Don't be surprised to see the entire machine suddenly stop responding for a while; switching to a text console and back to X may cause the screen to go black for a while (in my case, about two minutes).

Here's the beginning of a "push" transfer (rainbird copying a file to NFS), where a nice large spike in network activity can be seen:

11:42:26.603523 IP (tos 0x0, ttl 64, id 1492, offset 0, flags [DF], proto: TCP
(6), length: 192) 10.1.1.3.921034261 > 10.1.1.1.2049: 140 getattr [|nfs]
11:42:26.607668 IP (tos 0x0, ttl 64, id 23326, offset 0, flags [DF], proto: TCP
(6), length: 168) 10.1.1.1.2049 > 10.1.1.3.921034261: reply ok 116 getattr [|nf
s]
11:42:26.607694 IP (tos 0x0, ttl 64, id 1493, offset 0, flags [DF], proto: TCP
(6), length: 52) 10.1.1.3.705 > 10.1.1.1.2049: ., cksum 0xd74c (correct), ack 12
804 win 11470 <nop,nop,timestamp 9139730 757630>
11:42:26.608222 IP (tos 0x0, ttl 64, id 1494, offset 0, flags [DF], proto: TCP
(6), length: 196) 10.1.1.3.937811477 > 10.1.1.1.2049: 144 access [|nfs]
11:42:26.613817 IP (tos 0x0, ttl 64, id 23327, offset 0, flags [DF], proto: TCP
(6), length: 176) 10.1.1.1.2049 > 10.1.1.3.937811477: reply ok 124 access [|nfs
]
11:42:26.614237 IP (tos 0x0, ttl 64, id 1495, offset 0, flags [DF], proto: TCP
(6), length: 208) 10.1.1.3.954588693 > 10.1.1.1.2049: 156 getattr [|nfs]
11:42:26.615780 IP (tos 0x0, ttl 64, id 23328, offset 0, flags [DF], proto: TCP
(6), length: 168) 10.1.1.1.2049 > 10.1.1.3.954588693: reply ok 116 getattr [|nf
s]
11:42:26.616106 IP (tos 0x0, ttl 64, id 1496, offset 0, flags [DF], proto: TCP
(6), length: 208) 10.1.1.3.971365909 > 10.1.1.1.2049: 156 getattr [|nfs]
11:42:26.619674 IP (tos 0x0, ttl 64, id 23329, offset 0, flags [DF], proto: TCP
(6), length: 168) 10.1.1.1.2049 > 10.1.1.3.971365909: reply ok 116 getattr [|nf
s]
11:42:26.619963 IP (tos 0x0, ttl 64, id 1497, offset 0, flags [DF], proto: TCP
(6), length: 212) 10.1.1.3.988143125 > 10.1.1.1.2049: 160 access [|nfs]
11:42:26.623855 IP (tos 0x0, ttl 64, id 23330, offset 0, flags [DF], proto: TCP
(6), length: 176) 10.1.1.1.2049 > 10.1.1.3.988143125: reply ok 124 access [|nfs
]
11:42:26.624145 IP (tos 0x0, ttl 64, id 1498, offset 0, flags [DF], proto: TCP
(6), length: 244) 10.1.1.3.1004920341 > 10.1.1.1.2049: 192 setattr [|nfs]
11:42:26.670359 IP (tos 0x0, ttl 64, id 23331, offset 0, flags [DF], proto: TCP
(6), length: 52) 10.1.1.1.2049 > 10.1.1.3.705: ., cksum 0xc069 (correct), ack 8
92761 win 16022 <nop,nop,timestamp 757645 9139734>
11:42:26.809860 IP (tos 0x0, ttl 64, id 23332, offset 0, flags [DF], proto: TCP
(6), length: 200) 10.1.1.1.2049 > 10.1.1.3.1004920341: reply ok 148 setattr [|n
fs]

After a couple of minutes, here's something odd that comes up - normally during these tests I was getting occasional large spikes to 10+ MB/sec among an otherwise constant ~700 kB/sec stream, but this time around I got a fairly constant 2.5MB/sec or so:

11:43:33.378615 IP (tos 0x0, ttl 64, id 45505, offset 0, flags [DF], proto: TCP (6), length:...

I've updated my server and one of my client machines to 2.6.20-rc5 just as a test, and there's some changes here.  First of all, most of the hangs are gone, but not entirely.  Don't be surprised to see the entire machine suddenly stop responding for a while; switching to a text console and back to X may cause the screen to go black for a while (in my case, about two minutes).

Here's the beginning of a "push" transfer (rainbird copying a file to NFS), where a nice large spike in network activity can be seen:

11:42:26.603523 IP (tos 0x0, ttl  64, id 1492, offset 0, flags [DF], proto: TCP 
(6), length: 192) 10.1.1.3.921034261 > 10.1.1.1.2049: 140 getattr [|nfs]
11:42:26.607668 IP (tos 0x0, ttl  64, id 23326, offset 0, flags [DF], proto: TCP
 (6), length: 168) 10.1.1.1.2049 > 10.1.1.3.921034261: reply ok 116 getattr [|nf
s]
11:42:26.607694 IP (tos 0x0, ttl  64, id 1493, offset 0, flags [DF], proto: TCP 
(6), length: 52) 10.1.1.3.705 > 10.1.1.1.2049: ., cksum 0xd74c (correct), ack 12
804 win 11470 <nop,nop,timestamp 9139730 757630>
11:42:26.608222 IP (tos 0x0, ttl  64, id 1494, offset 0, flags [DF], proto: TCP 
(6), length: 196) 10.1.1.3.937811477 > 10.1.1.1.2049: 144 access [|nfs]
11:42:26.613817 IP (tos 0x0, ttl  64, id 23327, offset 0, flags [DF], proto: TCP
 (6), length: 176) 10.1.1.1.2049 > 10.1.1.3.937811477: reply ok 124 access [|nfs
]
11:42:26.614237 IP (tos 0x0, ttl  64, id 1495, offset 0, flags [DF], proto: TCP 
(6), length: 208) 10.1.1.3.954588693 > 10.1.1.1.2049: 156 getattr [|nfs]
11:42:26.615780 IP (tos 0x0, ttl  64, id 23328, offset 0, flags [DF], proto: TCP
 (6), length: 168) 10.1.1.1.2049 > 10.1.1.3.954588693: reply ok 116 getattr [|nf
s]
11:42:26.616106 IP (tos 0x0, ttl  64, id 1496, offset 0, flags [DF], proto: TCP 
(6), length: 208) 10.1.1.3.971365909 > 10.1.1.1.2049: 156 getattr [|nfs]
11:42:26.619674 IP (tos 0x0, ttl  64, id 23329, offset 0, flags [DF], proto: TCP
 (6), length: 168) 10.1.1.1.2049 > 10.1.1.3.971365909: reply ok 116 getattr [|nf
s]
11:42:26.619963 IP (tos 0x0, ttl  64, id 1497, offset 0, flags [DF], proto: TCP 
(6), length: 212) 10.1.1.3.988143125 > 10.1.1.1.2049: 160 access [|nfs]
11:42:26.623855 IP (tos 0x0, ttl  64, id 23330, offset 0, flags [DF], proto: TCP
 (6), length: 176) 10.1.1.1.2049 > 10.1.1.3.988143125: reply ok 124 access [|nfs
]
11:42:26.624145 IP (tos 0x0, ttl  64, id 1498, offset 0, flags [DF], proto: TCP 
(6), length: 244) 10.1.1.3.1004920341 > 10.1.1.1.2049: 192 setattr [|nfs]
11:42:26.670359 IP (tos 0x0, ttl  64, id 23331, offset 0, flags [DF], proto: TCP
 (6), length: 52) 10.1.1.1.2049 > 10.1.1.3.705: ., cksum 0xc069 (correct), ack 8
92761 win 16022 <nop,nop,timestamp 757645 9139734>
11:42:26.809860 IP (tos 0x0, ttl  64, id 23332, offset 0, flags [DF], proto: TCP
 (6), length: 200) 10.1.1.1.2049 > 10.1.1.3.1004920341: reply ok 148 setattr [|n
fs]

After a couple of minutes, here's something odd that comes up - normally during these tests I was getting occasional large spikes to 10+ MB/sec among an otherwise constant ~700 kB/sec stream, but this time around I got a fairly constant 2.5MB/sec or so:

11:43:33.378615 IP (tos 0x0, ttl  64, id 45505, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.0 > 10.1.1.1.2049: 1448 null
11:43:33.378621 IP (tos 0x0, ttl  64, id 45506, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.0 > 10.1.1.1.2049: 1448 null
11:43:33.378631 IP (tos 0x0, ttl  64, id 20927, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.1.2049 > 10.1.1.3.705: ., cksum 0x2e4a (correct), ack 158604625 win 16022 <nop,nop,timestamp 774315 9156416>
11:43:33.378634 IP (tos 0x0, ttl  64, id 45507, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.0 > 10.1.1.1.2049: 1448 null
11:43:33.378640 IP (tos 0x0, ttl  64, id 45508, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.0 > 10.1.1.1.2049: 1448 null
11:43:33.378648 IP (tos 0x0, ttl  64, id 20928, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.1.2049 > 10.1.1.3.705: ., cksum 0x22fa (correct), ack 158607521 win 16022 <nop,nop,timestamp 774315 9156416>
11:43:33.378657 IP (tos 0x0, ttl  64, id 20929, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.1.2049 > 10.1.1.3.705: ., cksum 0x17aa (correct), ack 158610417 win 16022 <nop,nop,timestamp 774315 9156416>
11:43:33.378755 IP (tos 0x0, ttl  64, id 45509, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.0 > 10.1.1.1.2049: 1448 null

(what's with all the NULL responses?)

The tcpdump output stayed mostly like that for this particular test, and here's how the copy ended (it took a good couple of minutes for my attempt to cancel it to actually work):

11:45:14.015252 IP (tos 0x0, ttl  64, id 19770, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.3.705 > 10.1.1.1.2049: ., cksum 0xe68f (correct), ack 869296 win 12918 <nop,nop,timestamp 9181580 799472>
11:45:14.019288 IP (tos 0x0, ttl  64, id 54539, offset 0, flags [DF], proto: TCP (6), length: 184) 10.1.1.1.2049 > 10.1.1.3.2219564821: reply ok 132 commit [|nfs]
11:45:14.019309 IP (tos 0x0, ttl  64, id 54540, offset 0, flags [DF], proto: TCP (6), length: 184) 10.1.1.1.2049 > 10.1.1.3.2236342037: reply ok 132 commit [|nfs]
11:45:14.019316 IP (tos 0x0, ttl  64, id 19771, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.3.705 > 10.1.1.1.2049: ., cksum 0xe585 (correct), ack 869560 win 12918 <nop,nop,timestamp 9181581 799473>
11:45:14.099364 IP (tos 0x0, ttl  64, id 54541, offset 0, flags [DF], proto: TCP (6), length: 192) 10.1.1.1.2049 > 10.1.1.3.776724245: reply ok 140
11:45:14.099465 IP (tos 0x0, ttl  64, id 54542, offset 0, flags [DF], proto: TCP (6), length: 192) 10.1.1.1.2049 > 10.1.1.3.2102124309: reply ok 140 write [|nfs]
11:45:14.099474 IP (tos 0x0, ttl  64, id 19772, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.3.705 > 10.1.1.1.2049: ., cksum 0xe442 (correct), ack 869840 win 12918 <nop,nop,timestamp 9181601 799496>
11:45:14.100076 IP (tos 0x0, ttl  64, id 19773, offset 0, flags [DF], proto: TCP (6), length: 220) 10.1.1.3.2253119253 > 10.1.1.1.2049: 168 commit [|nfs]
11:45:14.103363 IP (tos 0x0, ttl  64, id 54543, offset 0, flags [DF], proto: TCP (6), length: 184) 10.1.1.1.2049 > 10.1.1.3.2253119253: reply ok 132 commit [|nfs]
11:45:14.104357 IP (tos 0x0, ttl  64, id 19774, offset 0, flags [DF], proto: TCP (6), length: 208) 10.1.1.3.2269896469 > 10.1.1.1.2049: 156 getattr [|nfs]
11:45:14.107253 IP (tos 0x0, ttl  64, id 54544, offset 0, flags [DF], proto: TCP (6), length: 168) 10.1.1.1.2049 > 10.1.1.3.2269896469: reply ok 116 getattr [|nfs]
11:45:14.149906 IP (tos 0x0, ttl  64, id 19775, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.3.705 > 10.1.1.1.2049: ., cksum 0xe1f8 (correct), ack 870088 win 12918 <nop,nop,timestamp 9181613 799498>

A "pull" request on my server (pulling the same file from NFS to a local disk), showed a continuous stream of messages like this (tcpdump running on the client as before):

11:48:48.310652 IP (tos 0x0, ttl  64, id 18314, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.310774 IP (tos 0x0, ttl  64, id 18315, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.310897 IP (tos 0x0, ttl  64, id 18316, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.311020 IP (tos 0x0, ttl  64, id 18317, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.314499 IP (tos 0x0, ttl  64, id 60418, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.1.705 > 10.1.1.3.2049: ., cksum 0x1d76 (correct), ack 30907177 win 24576 <nop,nop,timestamp 853033 9235142>
11:48:48.314507 IP (tos 0x0, ttl  64, id 18318, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.314512 IP (tos 0x0, ttl  64, id 18319, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.314525 IP (tos 0x0, ttl  64, id 60419, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.1.705 > 10.1.1.3.2049: ., cksum 0x1226 (correct), ack 30910073 win 24576 <nop,nop,timestamp 853033 9235142>
11:48:48.314529 IP (tos 0x0, ttl  64, id 18320, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.314535 IP (tos 0x0, ttl  64, id 18321, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448
11:48:48.314544 IP (tos 0x0, ttl  64, id 60420, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.1.705 > 10.1.1.3.2049: ., cksum 0x06d4 (correct), ack 30912969 win 24576 <nop,nop,timestamp 853034 9235143>
11:48:48.314552 IP (tos 0x0, ttl  64, id 60421, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.1.1.705 > 10.1.1.3.2049: ., cksum 0xfb83 (correct), ack 30915865 win 24576 <nop,nop,timestamp 853034 9235143>
11:48:48.314648 IP (tos 0x0, ttl  64, id 18322, offset 0, flags [DF], proto: TCP (6), length: 1500) 10.1.1.3.2049 > 10.1.1.1.0: reply ok 1448

I might point out that this time, the transfer was moving along at a fairly slow pace - about 2.8 MB/sec, which is odd - earlier in the tests I was doing with this kernel, I was able to max out my server's disk bandwidth (about 10MB/sec) doing "pull" requests.  Anyway, cancelling the copy worked immediately like it should, and only a few seconds went by before tcpdump stopped spewing (guess it was flushing a cache?).

Revision history for this message

DavidM (dmccullo) wrote on 2007-02-07:

#14

Confirming the bug. As a workaround, I installed Feisty kernel:

2.6.20-6-generic #2 SMP Wed Jan 31 20:53:39 UTC 2007 i686 GNU/Linux

With the caveat that I haven't done much testing, this kernel seems to fix the problem. The machine is a MythTV backend/frontend, so had to rebuild nvidia modules; ivtv still works; can't get lirc rebuilt...

BTW, I encountered the bug because of a NFS mounted share that stored a music collection. When ripping CD's from a NFS client, the mount would hang after copying approximately 4GB. The NFS server would then lose its network connection.

Many thanks to all who contributed to this bug report. This was/is very frustrating and I could not have determined the problem without you folks.

Revision history for this message

sammiam (sammh) wrote on 2007-02-20:

#15

I'm experiencing the same thing, am using the Fiesty Fawn Live CD. I nfs mount from my backup machine, and start doing a "copy -dpR" over to my main system. Some times it works just fine, other times it freezes the machine. No symptoms or messages in the log. I'm using the kernel: Linux version 2.6.20-8-generic (root@vernadsky) (gcc version 4.1.2 20070129 (prerelease) (Ubuntu 4.1.1-31ubuntu2)) #2 SMP Tue Feb 13 05:18:42 UTC 2007

Revision history for this message

sammiam (sammh) wrote on 2007-02-20:

#16

as a follow up, looks my problem is solved... I was using a DFE-530TX+ adapter which uses the 8139 driver. I switched, and am now using the internal intel ethernet adapter that comes on my motherboard, and all looks well. What tipped me off was that I retried the operations I was doing, but instead of using nfs, I sftp'd from my source machine over to my machine. The machine froze in the middle of a hugh file. In searching for 8139, I found where the machine locking up is a known problem.

Revision history for this message

Brian Murray (brian-murray) wrote on 2007-12-12:

#17

I am assigning this bug to the 'ubuntu-kernel-team' per their bug policy. For future reference you can learn more about their bug policy at https://wiki.ubuntu.com/KernelTeamBugPolicies .

Changed in linux-source-2.6.17:
assignee:	nobody → ubuntu-kernel-team
milestone:	edgy-updates → none

Revision history for this message

Nick Fishman (bsdlogical) wrote on 2008-01-05:

#18

I encountered the exact same problem, and just like sammiam wrote, it was the DFE-350TX+ adapter that caused the problem. When I switched to using an onboard network interface on a server, NFS worked like a charm.

I was using Gutsy with the 2.6.22-14-generic kernel, by the way, so the problem with the 8139 driver is very much still alive.

Revision history for this message

John Nilsson (john-milsson) wrote on 2008-02-24:

#19

I'm also experiencing this problem.

Server: A "Popcorn Hour A-100" using firmware 01-15-080123-14-POP-402 with apps 00-15-080116-14-POP-402

Client:
Ubuntu Gutsy Gibbon
With following mount option
192.168.0.11:/share /net/pha100 nfs rw,rsize=4096,wsize=4096,hard,intr,user,noauto 0 0
(changed from negotiated 32768 to 4096 to see if that would improve multitasking somewhat)

Client Nic:
04:06.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
with 3c59x driver from 2.6.22-14-generic

(My system is configured to use this nic for both public IP (eth0) and local ip (eth0:1) such that all trafic from LAN to Internet is both comming in and going out the same nic with this maching acting as firewall/NAT.)

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-03-05:

#20

The Hardy Heron Alpha series was recently released which contains an updated version of the kernel. You can download and try the new Hardy Heron Alpha release from http://cdimage.ubuntu.com/releases/hardy/ . You should be able to then test the new kernel via the LiveCD. If you can, please verify if this bug still exists or not and report back your results. General information regarding the release can also be found here: http://www.ubuntu.com/testing/ .

Also note we'll keep this report open against the actively developed kernel but against 2.6.17 this will be closed. Thanks.

Changed in linux:
status:	New → Incomplete
Changed in linux-source-2.6.17:
status:	Confirmed → Won't Fix

Revision history for this message

John Nilsson (john-milsson) wrote on 2008-04-28:

#21

I am now running Hardy Heron. Since my last post I've also bought a router and thus nolonger have my Ubuntu box acting as router/firewall, it's purley a client with one internal ip now.

The symptoms is nolonger that the entire system gets unresponsive, now it's only nautilus that stops responing. It doesn't redraw the desktop icons and it's not possible to open new nautilus windows.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-08-28:

#22

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message

Vanessa Dannenberg (vanessadannenberg) wrote on 2008-08-30:

#23

This seems to work fine for me under Hardy with the 2.6.24-19 kernel - no hangs of any kind.

Revision history for this message

Markus Korn (thekorn) wrote on 2008-10-07:

#24

Marking as 'Fixed Released' based on the last comment.
If this is still an issue with this most recent release please feel free to reopen this report. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New".

Thanks,
Markus

Changed in linux:
status:	Incomplete → Fix Released

Revision history for this message

Richp (rich-parkin-home) wrote on 2008-10-31:

#25

I have just done a default install of Ibex and have had the same issue. When copying a 4.1gb file to a NFS drive the desktop froze. I couldn't open nautilus, but app's like System Monitor worked fine. I have to wait until the copy completes and then the desktop is fine. Here is a section of my logs while the copy was taking place

Oct 31 11:54:35 rich-desktop kernel: [14220.292919] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x2)
Oct 31 11:54:35 rich-desktop kernel: [14220.292952] ata1: soft resetting link
Oct 31 11:54:35 rich-desktop kernel: [14220.502299] ata1.00: configured for UDMA/100
Oct 31 11:54:35 rich-desktop kernel: [14220.502320] ata1: EH complete
Oct 31 11:54:35 rich-desktop kernel: [14220.516137] sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Oct 31 11:54:35 rich-desktop kernel: [14220.516155] sd 0:0:0:0: [sda] Write Protect is off
Oct 31 11:54:35 rich-desktop kernel: [14220.516184] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Regards
Richard

Revision history for this message

Launchpad Janitor (janitor) wrote on 2008-12-23: Kernel team bugs

#26

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message

bananenkasper (bananenkasper) wrote on 2015-10-31:

#27

Since ages, still the same problem.

DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=17.2
DISTRIB_CODENAME=rafaela
DISTRIB_DESCRIPTION="Linux Mint 17.2 Rafaela"

Linux 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Fix Released	Undecided	Unassigned
	linux-source-2.6.17 (Ubuntu)	Won't Fix	High	Unassigned

Ubuntu
linux-source-2.6.17 package

System hangs when copying to NFS mounts

Bug Description

Other bug subscribers

Related questions

Bug attachments

Remote bug watches

Ubuntulinux-source-2.6.17 package

System hangs when copying to NFS mounts

Bug Description

Other bug subscribers

Related questions

Bug attachments

Remote bug watches

Ubuntu
linux-source-2.6.17 package