release a socket stuck in CLOSE_WAIT?

Asked by Bogdan Butnaru

Hi! A bit of a technical issue: I'm running Azureus, and a weird thing happened. Azureus crashed -- not unusual, but this time it was (I think) because I was messing with the network at the moment, and I was running Azureus through X over SSH. Probably not very important for my question.

Anyway, the problem is different: Azureus left a lot of connections occupied (from localhost:6880 to localhast:[various 5-digit ports], over tcp6 -- not sure why, I use ipv4 for internet). The connections are in the CLOSE_WAIT state, which is weird because the process itself doesn't exist anymore. As far as I know, only the program can close a connection in that state, or a _very_ large timeout (hours or days, I think).

I'm looking for a way to close them without restarting the computer. (I even tried "networking stop", they're still there when the network restarts.)

It's annoying because Azureus apparently tries to open those ports, notices they're in use, assumes it's still running, and closes with a "passing startup args to already-running Azureus java process listening on [127.0.0.1: 6880]" --- which of course does nothing, as the process doesn't exist anymore.

Any pointers?

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Solved by:
Steve Dodd
Solved:
Last query:
Last reply:
Revision history for this message
Best Steve Dodd (anarchetic) said :
#1

Have you tried the -p option to netstat (e.g. netstat -tp)? There might be a process left running, forked from the original (or perhaps a thread.) fuser(1) might help too, though from my experiments here it looks a bit buggy. If there's really no process associated with a socket in CLOSE_WAIT, I think that's a kernel bug - sockets are supposed to be closed on process termination, even unclean termination.

Revision history for this message
Bogdan Butnaru (bogdanb) said :
#2

Yes, of course. It didn't show anything in the process column. This is weird in itself, I still get connections with no process attached that are in various _WAIT state. (Right now, time_wait on :6010, might be Azureus again.)

Revision history for this message
Steve Dodd (anarchetic) said :
#3

Very odd. Have you tried running it as root (sudo -s netstat -tp), in case the process is running as a different user?

TIME_WAIT is different, and not a problem. Because a connection is uniquely identified by a particular (host1addr,host1port,host2addr,host2port) tuple, and that tuple could in theory be reused, TIME_WAIT is used to stop reuse of the port number for a period of time (usually 4 minutes), to give any duplicate packets still out on the internet time to find their way home and be gracefully discarded, rather than confusing a new connection.

Alternative explanation:

http://www.port80software.com/200ok/archive/2004/12/07/205.aspx

If you're interested in TCP guts, the Wikipedia article looks good, and also links to the original RFC:

http://en.wikipedia.org/wiki/Transmission_Control_Protocol

Still not sure about your CLOSE_WAIT issue, though. If you really can't find a process associated with it, I'd be tempted to raise it on the linux-kernel mailing list.

BTW, sorry I closed your request last time - hit the wrong button!

Revision history for this message
Bogdan Butnaru (bogdanb) said :
#4

I didn't have the inspiration to do it with sudo the first time. I'll try to reproduce the crash, but I doubt I'll be able to.

Still, I started Azureus from the same account I did the check from. And I checked with ps -Af, I'm sure there was no Azureus or Java process active at the time.

Revision history for this message
Bogdan Butnaru (bogdanb) said :
#5

Thanks Steve Dodd, that solved my question.

Revision history for this message
Bogdan Butnaru (bogdanb) said :
#6

I can't reproduce it. I tried various kinds of network failures, but it seems it won't happen again. I _did_ get a CLOSE_WAIT, but it disappeared much quicker, and it was on a different port. It was sshd, probably a child process. It's likely it was ssh the first time, too, as I was running Azureus over remote X. Thanks anyway!