malloc hangs when ltp mallocstress is run repeatedly

Bug #1081734 reported by bhs
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eglibc (Ubuntu)
Fix Released
Undecided
Adam Conrad
Precise
Fix Released
Undecided
Adam Conrad
Quantal
Fix Released
Undecided
Adam Conrad

Bug Description

[Impact / Justification]
There's a malloc deadlock in glibc, easily reproduced by the mallocstress part of ltp, but otherwise tripped in regular usage here and there. The upstream patch applied in this SRU resolves that.

[Test Case]
Run mallocstress repeatedly before and after upgrade and watch the deadlocks be less deadlocky.

[Regression Potential]
This has been tested quite extensively upstream and in 2.16 in raring, and seems to be a marked improvement with no one reporting any adverse effects, so should be fine.

[Original Report]
- malloc sleeps continously when the mallocstress application which part of ltp package is executed in a loop.
- From gdb backtrace it appears to be for a futex to be released in libc code. This is deadlock in glibc

There are some discussions going on for including this fix in next eglibc release.
emails
http://sourceware.org/ml/libc-alpha/2012-06/msg00648.html
http://permalink.gmane.org/gmane.comp.lib.glibc.alpha/23397
http://sourceware.org/ml/libc-alpha/2012-07/msg00027.html
http://sourceware.org/ml/libc-alpha/2012-08/msg00163.html
http://permalink.gmane.org/gmane.linux.redhat.fedora.extras.cvs/832985

discussion in http://sourceware-org.1504.n7.nabble.com/BZ-13939-malloc-deadlock-td13648.html#none

Revision history for this message
bhs (bharath-vegito) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "patch for malloc_deadlock from http://sourceware-org.1504.n7.nabble.com/BZ-13939-malloc-deadlock-td13648.html#none" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Adam Conrad (adconrad)
Changed in eglibc (Ubuntu):
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.16-0ubuntu7

---------------
eglibc (2.16-0ubuntu7) raring; urgency=low

  * Merge with 2.16-0experimental1 from Debian, bringing in my
    upstream version of the C++ header autodetection patch, some
    packaging and upgrade fixes, and reducing our delta further.
  * Fix debian/tests/control syntax for autopkgtest (LP: #1081500)
  * Add patch ubuntu/local-disable-nscd-netgroup-caching.diff to
    disable netgroup caching in the default config (LP: #1068889)
  * Backport any/cvs-malloc-deadlock.diff from upstream to prevent
    glibc deadlocking in mallock arena retry paths (LP: #1081734)
 -- Adam Conrad <email address hidden> Sun, 25 Nov 2012 19:00:46 -0700

Changed in eglibc (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
bhs (bharath-vegito) wrote :

Hi Adam,
Any chance whether this will make into eglibc-precise branch

Revision history for this message
bhs (bharath-vegito) wrote :

Adam,
This bug repros in 12.04.1 system.
This patch applies to eligbc-2.15 which is in precise and quantal.

I am looking for this fix in ubuntu repository for precise. I wish to do the required work by assigning to bug to myself and port changes to eglibc-2.15 branch. Please let me know how I can get started on pushing the patch to launchpad.

If community is OK, with the patch, can it be pushed to 12.04.2 release after release-based tests are completed.

Changed in eglibc (Ubuntu):
assignee: nobody → bhs (bharath-vegito)
assignee: bhs (bharath-vegito) → nobody
Changed in eglibc (Ubuntu Precise):
assignee: nobody → bhs (bharath-vegito)
bhs (bharath-vegito)
Changed in eglibc (Ubuntu Precise):
status: New → In Progress
bhs (bharath-vegito)
description: updated
Changed in eglibc (Ubuntu):
assignee: nobody → bhs (bharath-vegito)
bhs (bharath-vegito)
Changed in eglibc (Ubuntu):
assignee: bhs (bharath-vegito) → nobody
Changed in eglibc (Ubuntu Precise):
assignee: bhs (bharath-vegito) → nobody
Adam Conrad (adconrad)
Changed in eglibc (Ubuntu Quantal):
status: New → In Progress
Adam Conrad (adconrad)
description: updated
Adam Conrad (adconrad)
Changed in eglibc (Ubuntu):
assignee: nobody → Adam Conrad (adconrad)
Changed in eglibc (Ubuntu Precise):
assignee: nobody → Adam Conrad (adconrad)
Changed in eglibc (Ubuntu Quantal):
assignee: nobody → Adam Conrad (adconrad)
Revision history for this message
Colin Watson (cjwatson) wrote : Please test proposed package

Hello bhs, or anyone else affected,

Accepted eglibc into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/eglibc/2.15-0ubuntu10.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in eglibc (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in eglibc (Ubuntu Quantal):
status: In Progress → Fix Committed
Revision history for this message
Colin Watson (cjwatson) wrote :

Hello bhs, or anyone else affected,

Accepted eglibc into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/eglibc/2.15-0ubuntu20.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Mark Rutland (mark-rutland) wrote :

I had an issue with mallocstress deadlocking, but this was solved by the patch listed in https://bugs.launchpad.net/ubuntu/precise/+source/eglibc/+bug/1091186

I don't know enough about glibc to say if this is related or not.

Revision history for this message
Mark Rutland (mark-rutland) wrote :

Further to my earlier comment, I've since done some further testing with LTP:

https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/1091186/comments/11

While I can't say which change fixes the issue, with the updated packages installed, I no longer encounter a deadlock when running mallocstress under an ARM fast model.

Revision history for this message
Adam Conrad (adconrad) wrote :

The ARM-specific futex issue and this one don't relate, no, the fact that you can trigger both from the same testsuite does muddy the waters a bit, though.

Testing this one on x86 is likely the best way to isolate it from the lowlevellock fix, since that fix only applied to arm, parisc, and sparc, but didn't touch x86 (while the fix for this bug hit all arches).

Revision history for this message
Alex Chiang (achiang) wrote :

Adam,

I tried reproducing on my 12.04 machine, simply doing:

apt-get install ltp
$ /usr/lib/ltp/testcases/bin/mallocstress -l 10000
<top output indicates a peak of ~81GB peak VM use>
<wait a while>
<debug output>
main(): test passed.

Using following old libc:
ii libc6 2.15-0ubuntu10.3 Embedded GNU C Library: Shared libraries

Advice?

Revision history for this message
Adam Conrad (adconrad) wrote :

After significant abuse on several x86 machines, I was finally able to reproduce this with the old libc6 version, but continue to be unable to reproduce with the versions in precise/quantal-proposed. Given that no one's found any regressions with this patch, and the few testers who had deadlocks on the old version couldn't reproduce with the new, I'm marking this verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Colin Watson (cjwatson) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.15-0ubuntu10.4

---------------
eglibc (2.15-0ubuntu10.4) precise; urgency=low

  * Add patch ubuntu/local-disable-nscd-netgroup-caching.diff to
    disable netgroup caching in the default config (LP: #1068889)
  * Backport any/cvs-malloc-deadlock.diff from upstream to prevent
    glibc deadlocking in mallock arena retry paths (LP: #1081734)
  * Fix futex issue (BZ #13844), backport from 2.16 (LP: #1091186)
  * Drop patch any/local-disable-nscd-host-caching.diff, as this
    bug was apparently resolved upstream a while ago (LP: #613662)
  * Add patch any/cvs-ld-self-load.diff to restore ld.so's ability
    to load itself, a behaviour accidentally removed (LP: #1088677)
  * Drop dangling libnss_db.so symlink in libc6-dev (LP: #1088773)
 -- Adam Conrad <email address hidden> Sun, 27 Jan 2013 16:46:30 -0700

Changed in eglibc (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eglibc - 2.15-0ubuntu20.1

---------------
eglibc (2.15-0ubuntu20.1) quantal; urgency=low

  * Add patch ubuntu/local-disable-nscd-netgroup-caching.diff to
    disable netgroup caching in the default config (LP: #1068889)
  * Backport any/cvs-malloc-deadlock.diff from upstream to prevent
    glibc deadlocking in mallock arena retry paths (LP: #1081734)
  * Fix futex issue (BZ #13844), backport from 2.16 (LP: #1091186)
  * Drop patch any/local-disable-nscd-host-caching.diff, as this
    bug was apparently resolved upstream a while ago (LP: #613662)
  * Add patch any/cvs-ld-self-load.diff to restore ld.so's ability
    to load itself, a behaviour accidentally removed (LP: #1088677)
  * Drop dangling libnss_db.so symlink in libc6-dev (LP: #1088773)
 -- Adam Conrad <email address hidden> Sun, 27 Jan 2013 16:46:30 -0700

Changed in eglibc (Ubuntu Quantal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.