Runtime deadlock: pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing

Bug #1899800 reported by Michael Bacarella
48
This bug affects 7 people
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Confirmed
Medium
Unassigned
Focal
Fix Released
Medium
Unassigned
Groovy
Won't Fix
Medium
Unassigned

Bug Description

[Impact]

* Various multi-threaded applications using pthread_cond hang.

[Test Case]

* Run the reproducer attached to the upstream bug report (I used a qemu-emulated 8 core machine on a 4 core one):

  wget https://sourceware.org/bugzilla/attachment.cgi?id=12480 -O repro-lp1899800.c
  gcc -pthread repro-lp1899800.c
  ./a.out 16

Total Threads Count; 16
RefereeThread - (null) started
LoopCriticalSectionThread - 1 started
...
LoopCriticalSectionThread - 16 started
Monitor - g_counter 411380000, loop_round 3024, threads_finished 13
...
Monitor - g_counter 1920301632, loop_round 266764, threads_finished 0
Monitor - g_counter -1851097664, loop_round 270614, threads_finished 13
Monitor - g_counter -1227241664, loop_round 275201, threads_finished 14
Monitor - g_counter -337385664, loop_round 281744, threads_finished 0
Monitor - g_counter 519822336, loop_round 288047, threads_finished 16
Monitor - g_counter 1401918336, loop_round 294533, threads_finished 0
Monitor - g_counter -1993136960, loop_round 301150, threads_finished 16
Monitor - g_counter -1140185466, loop_round 307422, threads_finished 12
Monitor - g_counter -1063307960, loop_round 307987, threads_finished 15
Monitor - g_counter -1063307960, loop_round 307987, threads_finished 15
Monitor - g_counter -1063307960, loop_round 307987, threads_finished 15
Monitor - g_counter -1063307960, loop_round 307987, threads_finished 15
Monitor - g_counter -1063307960, loop_round 307987, threads_finished 15
...

   The lockup is observed as repeating identical lines ^.

* Observe the threads hanging in a few minutes with unfixed libc6 and not hanging for hours with the fixed one.

[Where problems could occur]

* The fix which is rather a workaround in the one-line form is waking up all threads when there is a chance of hitting the deadlock. This causes a slight rare overhead, but the exact amount of the overhead is unknown.

[Original Bug Text]

This bug was submitted by Qin Li to glibc bugzilla earlier this year, with a one-line patch, though it hasn't been merged into glibc yet:

https://sourceware.org/bugzilla/show_bug.cgi?id=25847

This bug in pthread conditions will deadlock the OCaml runtime, as well as Python's runtime, and .NET.

The bug was introduced in glibc 2.27, so affects Ubuntu 18.04 onwards. I confirm my OCaml app, as well as the repro from the bugzilla, deadlocks on Ubuntu 20.04 and Ubuntu 18.04. To further strengthen the case that this is because of a bug in glibc, my app and the repro do not deadlock on Ubuntu 16.04.

To rule out kernel issues, I further confirm that no deadlock happens when I copy Ubuntu 16.04's libc to 18.04 and redirect the dynamic linker so my app loads the earlier libc.

I confirm that the one-line patch (available at the above bugzilla) applies cleanly on top of:

* glibc-2.31-0ubuntu9.1 (Ubuntu 20.04 latest)
* glibc-2.28-10 (Debian Buster/10 latest)
* glibc-2.27-3ubuntu1.2 (Ubuntu 18.04 latest)

I confirm that the one-line patch to glibc cures the deadlock issue in my OCaml apps.

On Ubuntu 20.04, I have not been able to get the repro to deadlock in 5 days. My OCaml apps have not deadlocked in 5 days.

On Debian Buster/10, the repro has not deadlocked in about 5 days. This is my desktop box, and I can otherwise use normal applications as usual like the GNOME environment, etc.

On Ubuntu 18.04, the repro takes about 24-48 hours before it triggers a deadlock. Prior to patching glibc, it would take only a few hours. I have not seen my OCaml apps deadlock since applying this patch, however.

On Ubuntu 16.04 I have not been able to get the repro to deadlock ever. My OCaml apps never deadlocked on this platform. This is expected, since this platform runs glibc 2.23, where the bug has not been introduced yet (the bugzilla report claims introduced in 2.27).

As for why 18.04 still deadlocks, I suspect another, unrelated pthread bug was introduced in glibc 2.27 and fixed by 2.28. When applied to glibc 2.27, the one-line patch appears to significantly reduce the deadlocking by an order of magnitude.

Please kindly consider merging the patch into Ubuntu glibc.

More background about this bug, for the sake of future internet searchers:
* https://discuss.ocaml.org/t/is-there-a-known-recent-linux-locking-bug-that-affects-the-ocaml-runtime

Revision history for this message
Michael Bacarella (mbacarella) wrote :
Revision history for this message
Michael Bacarella (mbacarella) wrote :

This part is ambiguous
> On Ubuntu 20.04, I have not been able to get the repro to deadlock in 5 days. My OCaml apps have not deadlocked in 5 days.

To be clear, on Ubuntu 20.04 **with the glibc patch** Unpatched Ubuntu 20.04 latest is very much vulnerable to this pthread condition bug.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "one-line fix from the glibc bugzilla" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Balint Reczey (rbalint) wrote :

Thank you for the very detailed explanation and the patch. I see the discussion resuming upstream and I hope it ends with this or an alternative patch being accepted there.

When it gets accepted I'll consider this to be included in the next SRUs to affected stable releases.

Changed in glibc (Ubuntu):
importance: Undecided → Medium
Changed in glibc (Ubuntu Bionic):
importance: Undecided → Medium
Changed in glibc (Ubuntu Focal):
importance: Undecided → Medium
Changed in glibc (Ubuntu Groovy):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.32-0ubuntu5

---------------
glibc (2.32-0ubuntu5) hirsute; urgency=medium

  * debian/gbp.conf: Set debian-tag and debian-tag-msg to follow Ubuntu format
  * Don't build libc6-prof in stage1 and stage2
  * Ship libc6-prof on riscv64, too.
    This fixes FTBFS on riscv64 due to the the flavour being built but not
    shipped in a package.
  * Detect debconf consistently in libc6.preinst and do not crash if it is not used
    (LP: #1902955)
  * Prevent rare deadlock in pthread_cond_signal (LP: #1899800)
  * debian/patches/git-updates.diff: update from upstream stable branch

glibc (2.32-0ubuntu4) hirsute; urgency=medium

  * tests: XFAIL time/tst-cpuclock1 on armel, too. (LP: #1895687)
    The armhf build builds for armel, too, thus this fixes the armhf
    autopkgtest.
  * debian/control: Only recommend libnss-nis and libnss-nisplus.
    They pull in a sizable amount of extra dependencies while they are rarely
    needed.
  * Make libc6 provide libc6-lse on arm64.
    Libc6 is now compiled with -moutline-atomics thus the separate binary
    package is dropped.
  * Ship libc variant compiled for profiling in libc6-prof
  * debian/patches/git-updates.diff: update from upstream stable branch
  * Drop obsoleted local-cudacc-float128.diff which breaks new icc
    (LP: #1895358)
  * XFAIL tst-sysvshm-linux on i386 and x32
  * Merge 2.31-4 from Debian unstable

 -- Balint Reczey <email address hidden> Fri, 13 Nov 2020 18:54:38 +0100

Changed in glibc (Ubuntu):
status: New → Fix Released
Balint Reczey (rbalint)
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted glibc into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/glibc/2.32-0ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in glibc (Ubuntu Groovy):
status: New → Fix Committed
tags: added: verification-needed verification-needed-groovy
Revision history for this message
Michael Bacarella (mbacarella) wrote :

Thanks for this. Tested for about 20 hours on groovy-proposed. Neither the repro nor my OCaml application has deadlocked.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (glibc/2.32-0ubuntu3.1)

All autopkgtests for the newly accepted glibc (2.32-0ubuntu3.1) for groovy have finished running.
The following regressions have been reported in tests triggered by the package:

snapd-glib/1.58-0ubuntu0.20.10.0 (armhf)
libcompress-raw-bzip2-perl/unknown (armhf)
node-ws/7.3.0+~cs24.0.3-1build1 (s390x, armhf, ppc64el, amd64, arm64)
austin/unknown (armhf)
r-bioc-rhdf5lib/1.10.1+dfsg-1 (armhf)
tup/0.7.8-3 (arm64)
r-cran-seurat/unknown (armhf)
python-pomegranate/unknown (armhf)
ruby-mysql2/0.5.2-1ubuntu3 (armhf)
libdevel-cover-perl/1.36-1build1 (armhf)
undbx/unknown (armhf)
libtime-warp-perl/0.54-1build1 (ppc64el)
statsprocessor/unknown (armhf)
pycurl/7.43.0.2-7 (armhf)
libhash-fieldhash-perl/unknown (armhf)
reprotest/0.7.15 (s390x)
r-cran-httpuv/1.5.4+dfsg-1 (armhf)
libosinfo/unknown (armhf)
r-cran-batchtools/unknown (armhf)
chiark-tcl/1.3.4ubuntu3 (armhf)
libdbd-mariadb-perl/1.11-3ubuntu2 (s390x, ppc64el, amd64, arm64)
firefox/84.0+build3-0ubuntu0.20.10.1 (arm64)
hilive/2.0a-3build2 (amd64)
etcd/3.2.26+dfsg-8 (amd64)
libdbd-mariadb-perl/unknown (armhf)
prometheus/unknown (armhf)
libticonv/unknown (armhf)
notary/0.6.1~ds2-6 (armhf)
firebird3.0/unknown (armhf)
burrow/unknown (armhf)
ruby-stackprof/0.2.15-2 (arm64)
gnutls28/3.6.15-4ubuntu2 (i386, amd64, s390x, arm64, ppc64el, armhf)
postgresql-plproxy/2.9-2 (armhf)
librg-blast-parser-perl/0.03-6build2 (armhf)
binutils/2.35.1-1ubuntu1 (armhf)
clutter-1.0/1.26.4+dfsg-1 (arm64)
transtermhp/2.09-5 (armhf)
genext2fs/1.5.0-1 (s390x)
python-cmarkgfm/unknown (armhf)
libpff/20180714-2 (s390x, armhf, ppc64el, amd64, arm64)
debsig-verify/unknown (armhf)
bosh/unknown (armhf)
casync/2+20190213-1 (amd64)
golang-github-spf13-cobra/unknown (armhf)
tpm2-tools/unknown (armhf)
umockdev/0.14.3-1 (armhf)
crrcsim/unknown (armhf)
postgis/3.0.2+dfsg-2ubuntu2 (amd64)
r-cran-proc/unknown (armhf)
mercurial/5.5.1-1 (armhf)
r-cran-dqrng/unknown (armhf)
frobby/unknown (armhf)
libcrypt-cast5-perl/unknown (armhf)
drumkv1/0.9.17-1 (ppc64el)
nsf/unknown (armhf)
python3-lxc/1:3.0.4-1ubuntu6 (s390x)
hyphy/2.5.1+dfsg-3build1 (amd64)
fatrace/0.16-1 (arm64)
bio-rainbow/unknown (armhf)
python-fabio/0.10.2+dfsg-2 (armhf)
hhsuite/3.2.0-3 (amd64)
libperlio-layers-perl/0.012-1 (armhf)
postgresql-common/unknown (armhf)
menhir/20200624-1 (s390x)
notify-osd/0.9.35+20.04.20191129-0ubuntu1 (ppc64el)
r-cran-xfun/unknown (armhf)
gifsicle/1.92-2 (armhf)
libunix-processors-perl/unknown (armhf)
freebayes/unknown (armhf)
alertmanager-irc-relay/0.1.0-3 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/groovy/update_excuses.html#glibc

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Balint Reczey (rbalint)
tags: added: verification-done verification-done-groovy
removed: verification-needed verification-needed-groovy
Revision history for this message
Balint Reczey (rbalint) wrote :

@mbacarella (and everyone else) I would highly appreciate feedback about the performance loss in some interesting cases. I plan running general performance tests, but obviously they won't cover all real-life cases.

Revision history for this message
Michael Bacarella (mbacarella) wrote :

@rbalint sadly, my task that was deadlocking is I/O bound, it actually uses relatively little CPU (10%ish of one core). So I can't use it as a benchmark to detect a performance regression. That is, if there is a performance regression it's not severe enough to affect the task noticeably.

My Debian laptop has been running the patch system-wide for a few months now and nothing feels noticeably different.

That's all I know.

tags: added: block-proposed-groovy
Revision history for this message
Slav Ivanyuk (slavivanyuk) wrote :

We've run into this bug with .NET Core garbage collector. It could lock up without warning. With the patch we were not able to reproduce the issue. Furthermore we've tried to stress test (to increase the chance of issue happening) to confirm the issue is fixed, and we didn't notice any slowdowns.

In our application under stresstest IO eventually becomes bottleneck (when OS runs out of memory and begins to dump data into mmap file - we use LMDB), but CPU consumption is constantly 50%+ and initially 70%+. When the system gets lower on memory GC threads wake up more frequently too.

At the same time we tested on Ubuntu 16.04 (which is on glibc prior to the bug being introduced). We didn't see any difference in speed between app running on 16.04 and on 20.10 with the patch. We just measured speed of data processing, which fluctuates a little bit constantly due to a number of reasons.

This is of course not a stress test of the lock itself, but at least it's an example of a real-world application and not seeing any easily noticeable effect on speed in our case.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in glibc (Ubuntu Bionic):
status: New → Confirmed
Changed in glibc (Ubuntu Focal):
status: New → Confirmed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted glibc into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/glibc/2.32-0ubuntu3.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed verification-needed-groovy
removed: verification-done verification-done-groovy
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Michael, or anyone else affected,

Accepted glibc into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/glibc/2.31-0ubuntu9.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in glibc (Ubuntu Focal):
status: Confirmed → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (glibc/2.31-0ubuntu9.3)

All autopkgtests for the newly accepted glibc (2.31-0ubuntu9.3) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

libgdata/0.17.12-1 (armhf)
ruby-grib/0.4.0-4build1 (s390x)
libclang-perl/0.09-4build8 (arm64)
dealer/20190529+ds-1 (ppc64el)
t-coffee/12.00.7fb08c2-4build1 (s390x)
xpra/3.0.6+dfsg1-1build1 (amd64)
makedumpfile/1:1.6.7-1ubuntu2.1 (amd64)
libepubgen/0.1.1-1ubuntu2 (arm64)
apt/2.0.5 (armhf)
pg-qualstats/1.0.9-1 (s390x)
booth/1.0-174-gce9f821-1 (amd64)
gnustep-base/1.26.0-7 (s390x)
fclib/3.0.0+dfsg-2build1 (arm64)
asterisk/1:16.2.1~dfsg-2ubuntu1 (amd64)
golang-github-grpc-ecosystem-grpc-gateway/1.6.4-2 (ppc64el, s390x)
puma/3.12.4-1ubuntu2 (s390x)
cmark-gfm/0.29.0.gfm.0-4 (amd64)
lua-nvim/0.2.1-1-1 (armhf)
saods9/8.1+repack-1 (amd64)
ruby-gpgme/2.0.19-3build1 (s390x)
ruby-bootsnap/1.4.6-1 (arm64)
staden-io-lib/1.14.11-6 (ppc64el)
s3ql/3.3.2+dfsg-1ubuntu1 (armhf)
mercurial/5.3.1-1ubuntu1 (amd64)
libterm-readline-gnu-perl/1.36-2build1 (ppc64el)
casper/1.445.1 (amd64)
datefudge/1.23ubuntu1 (armhf, ppc64el)
source-extractor/2.25.0+ds-2 (s390x)
smcroute/2.4.2-4 (arm64)
disulfinder/1.2.11-8build1 (arm64)
autofs/5.1.6-2ubuntu0.1 (ppc64el)
debconf-kde/1.0.3-3 (ppc64el)
libfastahack/1.0.0+dfsg-5build1 (ppc64el)
gemma/0.98.1+dfsg-1 (arm64)
missidentify/1.0-10 (ppc64el)
r-cran-systemfonts/0.1.1-1build1 (s390x)
bcalm/2.2.1-2build1 (ppc64el)
ipgrab/0.9.10-4 (ppc64el)
ruby-libxml/3.1.0-2 (s390x)
r-bioc-delayedarray/0.12.2+dfsg-1 (armhf)
netplan.io/0.101-0ubuntu3~20.04.2 (arm64)
6tunnel/1:0.13-1 (armhf)
taptempo/1.4.4-1 (s390x, armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#glibc

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (glibc/2.32-0ubuntu3.2)

All autopkgtests for the newly accepted glibc (2.32-0ubuntu3.2) for groovy have finished running.
The following regressions have been reported in tests triggered by the package:

cyrus-imapd/3.2.3-2ubuntu1 (armhf)
pyqt5/5.15.0+dfsg-1 (s390x)
libsass/3.6.4-3 (s390x)
network-manager/1.26.2-1ubuntu1 (arm64)
reprotest/0.7.15 (s390x)
flatpak/1.8.2-1ubuntu0.1 (arm64)
puma/3.12.4-1ubuntu2 (arm64, s390x)
libtext-charwidth-perl/0.04-10 (ppc64el)
systemd/246.6-1ubuntu1.3 (arm64)
taptempo/1.4.5-1 (arm64)
uftrace/0.9.3-1ubuntu1 (arm64)
libgdata/0.17.12-1 (armhf)
cysignals/1.10.2+ds-4 (amd64)
hyphy/2.5.1+dfsg-3build1 (amd64)
glibc/2.32-0ubuntu3.2 (amd64)
libdate-simple-perl/3.0300-3 (arm64)
netplan.io/0.101-0ubuntu3~20.10.1 (amd64)
libastro-fits-cfitsio-perl/1.14-1 (ppc64el)
fwlogwatch/1.4-2 (arm64)
pandas/1.0.5+dfsg-3 (ppc64el, armhf, s390x, amd64, arm64)
rhonabwy/0.9.12-2build1 (s390x)
jailkit/2.21-2 (ppc64el)
openjdk-lts/11.0.10+9-0ubuntu1~20.10 (s390x)
dbus/1.12.20-1ubuntu1 (arm64)
pymca/5.5.5+dfsg-2build2 (arm64)
euslisp/9.27+dfsg-6 (armhf, amd64)
libflame/5.2.0-2 (amd64)
samtools/1.10-4 (arm64)
syncthing/1.10.0~ds1-1 (amd64, s390x)
endlessh/1.1-4 (armhf)
r-cran-amore/0.2-16-1build1 (ppc64el)
crrcsim/0.9.13-3.2build1 (ppc64el)
samplv1/0.9.17-1 (ppc64el)
hugo/0.74.3-1 (armhf)
libcsfml/2.5-1build1 (ppc64el)
firefox/87.0+build3-0ubuntu0.20.10.1 (armhf)
etcd/3.2.26+dfsg-8 (amd64)
libnxml/0.18.3-8 (s390x)
google-osconfig-agent/20210219.00-0ubuntu1~20.10.0 (armhf)
combblas/1.6.2-5build1 (arm64)
libbio-db-hts-perl/3.01-3 (amd64)
libterm-readkey-perl/2.38-1build1 (s390x)
ruby2.7/2.7.1-3ubuntu1.2 (armhf)
hkl/5.0.0.2620-1build1 (ppc64el)
liblinux-inotify2-perl/1:2.2-2 (ppc64el)
datefudge/1.24 (ppc64el)
libpgplot-perl/1:2.24-1build1 (s390x)
healpy/1.14.0-1 (arm64)
udisks2/2.9.1-2ubuntu1 (amd64)
kopanocore/8.7.0-7ubuntu4 (arm64)
gyoto/1.4.4-3build1 (s390x)
cpdb-libs/1.2.0-0ubuntu8 (armhf)
ruby-concurrent/1.1.6+dfsg-3 (amd64)
postgis/3.0.2+dfsg-2ubuntu2 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/groovy/update_excuses.html#glibc

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Slav Ivanyuk (slavivanyuk) wrote :

We've tested the patch on focal and where as previously we saw frequent deadlocks we were not able to reproduce the deadlock with the fixed glibc. The application .net core with deadlocks happening in GC.

Revision history for this message
Balint Reczey (rbalint) wrote :

@slavivanyuk Thank you for testing the fix.

I've also verified the fix in 2.31-0ubuntu9.3 on Focal with repro-lp1899800.c in a 4 core VM running on a 4 core machine.

The reproducer did hang after a half hour with unfixed glibc:

$ gcc -pthread repro-lp1899800.c
$ unbuffer ./a.out | ts | tee -a run1.log
Apr 22 20:52:04 Total Threads Count; 12
Apr 22 20:52:04 RefereeThread - (null) started
Apr 22 20:52:04 LoopCriticalSectionThread - 1 started
Apr 22 20:52:04 LoopCriticalSectionThread - 2 started
Apr 22 20:52:04 LoopCriticalSectionThread - 3 started
Apr 22 20:52:04 LoopCriticalSectionThread - 4 started
Apr 22 20:52:04 LoopCriticalSectionThread - 6 started
Apr 22 20:52:04 LoopCriticalSectionThread - 7 started
Apr 22 20:52:04 LoopCriticalSectionThread - 8 started
Apr 22 20:52:04 LoopCriticalSectionThread - 9 started
Apr 22 20:52:04 LoopCriticalSectionThread - 10 started
Apr 22 20:52:04 LoopCriticalSectionThread - 12 started
Apr 22 20:52:04 LoopCriticalSectionThread - 5 started
Apr 22 20:52:04 LoopCriticalSectionThread - 11 started
Apr 22 20:52:06 Monitor - g_counter 974610000, loop_round 12494, threads_finished 12
Apr 22 20:52:08 Monitor - g_counter 1852531000, loop_round 23750, threads_finished 7
Apr 22 20:52:10 Monitor - g_counter -1586651296, loop_round 34721, threads_finished 12
Apr 22 20:52:12 Monitor - g_counter -757148296, loop_round 45356, threads_finished 8
...
Apr 22 21:20:59 Monitor - g_counter 1484067600, loop_round 8278578, threads_finished 0
Apr 22 21:21:01 Monitor - g_counter -2055625696, loop_round 8288261, threads_finished 12
Apr 22 21:21:03 Monitor - g_counter -1700741696, loop_round 8292811, threads_finished 8
Apr 22 21:21:05 Monitor - g_counter -1700741696, loop_round 8292811, threads_finished 8
Apr 22 21:21:07 Monitor - g_counter -1700741696, loop_round 8292811, threads_finished 8
Apr 22 21:21:09 Monitor - g_counter -1700741696, loop_round 8292811, threads_finished 8
$

With the fixed version I have not observed the hang in a day:

$ head run2.log
Apr 22 21:23:36 Total Threads Count; 12
Apr 22 21:23:36 RefereeThread - (null) started
Apr 22 21:23:36 LoopCriticalSectionThread - 3 started
...
ubuntu@ff-glibc-hang:~$ tail run2.log
Apr 23 22:50:07 Monitor - g_counter 2116331568, loop_round 435966312, threads_finished 12
Apr 23 22:50:09 Monitor - g_counter -1409321728, loop_round 435976175, threads_finished 12
Apr 23 22:50:11 Monitor - g_counter -656621728, loop_round 435985825, threads_finished 0
Apr 23 22:50:13 Monitor - g_counter 80942716, loop_round 435995281, threads_finished 11

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.31-0ubuntu9.3

---------------
glibc (2.31-0ubuntu9.3) focal; urgency=medium

  [ Aurelien Jarno ]
  * debian/patches/any/git-surplus-tls-accounting.diff: backport TLS surplus
    accounting from upstream. (Closes: #964141) (LP: #1914044)

  [ Balint Reczey ]
  * Update debian/patches/ubuntu/local-disable-ld_audit.diff
  * Prevent rare deadlock in pthread_cond_signal (LP: #1899800)
  * Cherry-pick upstream patch to fix building with -moutline-atomics
  * Make libc6 provide libc6-lse on arm64.
    Libc6 is now compiled with -moutline-atomics thus the separate binary
    package is dropped. (LP: #1912652)
  * debian/control: Libc6 should Conflict and Replace libc6-lse

 -- Balint Reczey <email address hidden> Mon, 29 Mar 2021 22:11:32 +0200

Changed in glibc (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for glibc has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Balint Reczey (rbalint) wrote :

The fixed version has been moved back to focal-proposed due to regressions caused by LP: #1914044.

Changed in glibc (Ubuntu Focal):
status: Fix Released → Fix Committed
Revision history for this message
Balint Reczey (rbalint) wrote :

I've copied 2.31-0ubuntu9.3 to https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4547 until the new update can be released including this fix.

Revision history for this message
Brian Murray (brian-murray) wrote :

The Groovy Gorilla has reached end of life, so this bug will not be fixed for that release

Changed in glibc (Ubuntu Groovy):
status: Fix Committed → Won't Fix
Revision history for this message
Evgeny Morozov (evgeny0) wrote :

Could someone clarify the status of this, please? Our .NET Core application freezes from time to time and I believe it's because of this issue (see https://github.com/dotnet/runtime/issues/47700). The linked glibc bug [https://sourceware.org/bugzilla/show_bug.cgi?id=25847] is now "UNCONFIRMED". So is there a glibc fix that has not made it into Ubuntu or is there not even a glibc fix (that doesn't break something else)? Is this expected to be fixed in 18.04 or 20.04 at all? 21.10? 22.04?

Revision history for this message
Isaac Gremmer (isaacg13) wrote :

Evgeny Morozov (evgeny0), just letting you know that we experienced a similar garbage collector hanging issue with .Net Core 3.1. Our webapi would hang anywhere between 20s and 400s. As a temporary solution we switched to the Workstation garbage collection mode, and that is not experiencing the same hanging issue for us. It's worth a try if you need a temporary solution.

Revision history for this message
Evgeny Morozov (evgeny0) wrote :

Thank you, but workstation GC is not a realistic option for us, unfortunately. This is still affecting us. Is there some other workaround?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think the status here is that a fix was in focal for a while but an unrelated change caused it to be pulled. We should put the fixes that are safe to do so back in place.

Revision history for this message
Evgeny Morozov (evgeny0) wrote :

Yes, that would be very nice. I'm not holding out much hope of this being fixed for 18.04, but even if we have to upgrade to 20.04 to get the fix we might prioritise that.

tags: added: verification-needed-focal
removed: block-proposed-groovy patch verification-done-focal verification-needed-groovy
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I've been trying the test case attached to this report with -0ubuntu9.2 in a 2 core vm for a good few minutes and it has not hung yet. Are there tips for getting it to hang more quickly? I'm guessing more cores, I'll try that tomorrow...

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted glibc into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/glibc/2.31-0ubuntu9.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (glibc/2.31-0ubuntu9.8)

All autopkgtests for the newly accepted glibc (2.31-0ubuntu9.8) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

ruby-stackprof/0.2.15-2 (arm64)
sphinxbase/0.8+5prealpha+1-8 (armhf)
r-cran-ps/1.3.2-2 (s390x)
linux-hwe-5.13/5.13.0-37.42~20.04.1 (armhf)
mercurial/5.3.1-1ubuntu1 (armhf, ppc64el)
linux-hwe-5.11/5.11.0-61.61 (armhf)
mbedtls/2.16.4-1ubuntu2 (s390x)
libreoffice/1:6.4.7-0ubuntu0.20.04.4 (armhf)
ruby-ferret/0.11.8.7-2 (amd64)
cross-toolchain-base/43ubuntu3.1 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#glibc

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Michael, or anyone else affected,

Accepted glibc into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/glibc/2.31-0ubuntu9.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (glibc/2.31-0ubuntu9.9)

All autopkgtests for the newly accepted glibc (2.31-0ubuntu9.9) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

hilive/2.0a-3build2 (arm64)
tomb/2.7+dfsg2-1 (arm64)
linux-hwe-5.13/5.13.0-40.45~20.04.1 (armhf)
bali-phy/3.4.1+dfsg-2build1 (s390x, arm64)
smalt/0.7.6-9 (ppc64el)
mariadb-10.3/1:10.3.34-0ubuntu0.20.04.1 (armhf)
feersum/1.407-2 (s390x)
kopanocore/8.7.0-7ubuntu1 (amd64)
r-cran-ps/1.3.2-2 (s390x, ppc64el)
libreoffice/1:6.4.7-0ubuntu0.20.04.4 (amd64)
imagemagick/8:6.9.10.23+dfsg-2.1ubuntu11.4 (armhf)
ruby-stackprof/0.2.15-2 (amd64)
gnome-photos/3.34.1-1 (ppc64el)
linux-azure-5.11/5.11.0-1029.32~20.04.2 (amd64)
linux-intel-5.13/5.13.0-1010.10 (amd64)
php-luasandbox/3.0.3-2build2 (armhf, arm64)
mbedtls/2.16.4-1ubuntu2 (amd64, ppc64el)
cross-toolchain-base/43ubuntu3.1 (ppc64el)
rtags/2.37-1 (amd64)
gvfs/1.44.1-1ubuntu1 (arm64, ppc64el)
linux-oem-5.14/5.14.0-1033.36 (amd64)
linux-azure-cvm/5.4.0-1076.79+cvm1 (amd64)
mercurial/5.3.1-1ubuntu1 (armhf)
r-cran-satellite/1.0.2-1build1 (armhf)
s3ql/3.3.2+dfsg-1ubuntu1 (armhf)
snapd/2.54.3+20.04.1ubuntu0.2 (s390x, arm64, amd64, ppc64el)
sphinxbase/0.8+5prealpha+1-8 (armhf)
gemma/0.98.1+dfsg-1 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#glibc

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I've verified this fix, the test case hung fairly quickly (~5 mins) in a 4 core VM and after installing the libc packages from proposed has been running for so long I no longer have access to the scrollback to paste here.

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.31-0ubuntu9.9

---------------
glibc (2.31-0ubuntu9.9) focal; urgency=medium

  * Disable testsuite on riscv64. It is failing maths tests intermittently in
    ways that cannot be a glibc regression and is disabled in later series
    anyway.

glibc (2.31-0ubuntu9.8) focal; urgency=medium

  * Update for 20.04. (LP: #1951033)

  [ Balint Reczey ]
  * Cherry-pick upstream patch to fix building with -moutline-atomics
  * Prevent rare deadlock in pthread_cond_signal (LP: #1899800)

  [ Matthias Klose ]
  * Revert: Use DH_COMPAT=8 for dh_strip to fix debug sections for valgrind.
    Enables debugging ld.so related issues. (LP: #1918035)
  * Don't strip ld.so on armhf. (LP: #1927192)

  [ Gunnar Hjalmarsson ]
  * d/local/usr_sbin/update-locale: improve sanity checks. (LP: #1892825)

  [ Heitor Alves de Siqueira ]
  * d/p/u/git-lp1928508-reversing-calculation-of-__x86_shared_non_temporal.patch:
    - Fix memcpy() performance regression on x86 AMD systems (LP: #1928508)

  [ Aurelien Jarno ]
  * debian/debhelper.in/libc.preinst: drop the check for kernel release
    > 255 now that glibc and preinstall script are fixed. (LP: #1962225)

  [ Michael Hudson-Doyle ]
  * libc6 on arm64 is now built with -moutline-atomics so libc6-lse can now be
    an empty package that is safe to remove. (LP: #1912652)
  * d/patches/u/aarch64-memcpy-improvements.patch: Backport memcpy
    improvements. (LP: #1951032)
  * Add test-float64x-yn to xfails on riscv64.

 -- Michael Hudson-Doyle <email address hidden> Thu, 07 Apr 2022 13:24:41 +1200

Changed in glibc (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.