Strange openjdk hang in FUTEX_WAIT

Bug #309407 reported by Pantelis Koukousoulas
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openjdk-6 (Ubuntu)
Invalid
Undecided
Pantelis Koukousoulas

Bug Description

Best way to reproduce:

1) Go to my ppa: http://launchpad.net/~pktoss/+archive

2) Copy the eclipse - 3.4.1-0~pkt2 package to your own ppa or download to an intrepid machine

3) Start a build: either amd64/i386 will work :(
   - or when at home: cd eclipse-3.4.1 && debuild

4) Wait 40-50 minutes (hey, this is eclipse we are talking about :)

5) Observe it hang with "Generate X" where X is between 1 and 5

In reality, in all X cases it is hanging inside a java application that generates metadata (different app
for each value of X). It is hanging in futex(..., FUTEX_WAIT, ...) (as an strace will convince you)

A potentially interesting fact is that the "val" in the above call is always PID+1 i.e., if the PID of the
hung java process is 5000 the above call will be like futex(<an_addr>, FUTEX_WAIT, 5001, NULL, ...)

Unfortunately you won't be able to (at least I couldn't) reproduce by running just the app
or even by just running the install.sh script in debian/scripts that contains this command.

You have to run the full "debuild" for the bug to appear :(

Another perhaps useful fact is that the package build will complete fine in debian sid which has
a slightly older openjdk (b11 instead of b12 in intrepid).

The bug has been reproduced in the following kernel/arch configurations:

       * 2.6.27/amd64 (latest intrepid kernel) inside a KVM VM
       * 2.6.28-rc8/amd64 slightly customized (small trivial one liners - network card bugfixes) physical machine
       * 2.6.21/i386 (an EC2 node)
       * Whatever the autobuilders run / both i386 and amd64

In a debian sid chroot in the customized 2.6.28-rc8/amd64 machine the "debuild" has succeeded all times so
far.

The problem is of course that debian sid also has a different libc ;-)

Unfortunately, I don't have the time to completely debug this (e.g., one might want to know what files/streams
are open by the hung process, etc) and I also have no familiarity with the openjdk internals.

So, I 'm filing this in case anyone would be interested to look and will try to "hack around" this on the build
until there is a "proper" solution.

Thanks

Revision history for this message
Pantelis Koukousoulas (pktoss) wrote :

I wonder if this is related to the various reported "similar" hangs associated with ATK being enabled

Revision history for this message
Pantelis Koukousoulas (pktoss) wrote :

Nah, it seems I didn't mention the difference that actually matters. It seems that running under fakeroot is the real problem.

Still not sure if eclipse or java is to blame though

Revision history for this message
Pantelis Koukousoulas (pktoss) wrote :

Updating fakeroot to latest debian sid version (it is in my ppa) seems to fix the problem. So the bug should probably be closed as invalid, since openjdk is not actually at fault (?)

Changed in openjdk-6:
assignee: nobody → pktoss
status: New → Invalid
Revision history for this message
Marcus Better (mbetter) wrote :

I get the same problem with a FUTEX_WAIT hang when *starting* Eclipse 3.5 on my Debian amd64 squeeze/sid system (kernel 2.6.31.4, libc6 2.10.1-5, openjdk-6-jre 6b16-1.6.1-2). No fakeroot involved.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.