systemd-resolve segfault

Bug #1934221 reported by Denys Fedoryshchenko
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned
Impish
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * systemd-resolved stops replying to clients on the local LAN.
 * logging segfault crashes in dmesg:
[836786.046514] systemd-resolve[872009]: segfault at 39900000000 ip 0000039900000000 sp 00007ffd7959a6d8 error 14 in systemd-resolved[556398695000+9000]
[836786.046524] Code: Bad RIP value.
[840887.303994] traps: systemd-resolve[877019] general protection fault ip:55ba402e2594 sp:7ffe8cb6bbb0 error:0 in systemd-resolved[55ba402b5000+40000]
[844395.313421] systemd-resolve[878503]: segfault at 208 ip 00005557a249f5fa sp 00007ffe686f5a90 error 6 in systemd-resolved[5557a2472000+40000]
[844395.313431] Code: 48 85 c0 74 0e 48 8b 8d 00 01 00 00 48 89 88 00 01 00 00 48 8b 85 00 01 00 00 48 85 c0 0f 84 1d 01 00 00 48 8b 95 f8 00 00 00 <48> 89 90 f8 00 00 00 48 c7 85 00 01 00 00 00 00 00 00 48 c7 85 f8

 * The upload backports the upstream fix (https://github.com/systemd/systemd/pull/18832) to Focal & Hirsute.

[Test Plan]

 * Setup /etc/systemd/resolved.conf:
[Resolve]
DNS=46.182.19.48#dns2.digitalcourage.de 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net
DNSSEC=yes
DNSOverTLS=opportunistic
MulticastDNS=no
LLMNR=no
Cache=yes
DNSStubListener=yes
Domains=~.

 * wait for ~24-48 hours and observe if any crash happens

[Where problems could occur]

 * Any regression would likely cause crashes in systemd-resolved, making it unresponsive to DNS network name requests to local applications.

[Other Info]

 * Reported upstream: https://github.com/systemd/systemd/issues/18427
 * Fixed upstream in v248: https://github.com/systemd/systemd/pull/18832

=== Original description ===

systemd-resolve keep crashing and it is very annoying as sometimes it severely interrupt normal dns resolving.

Last uploaded report is 2d9e7378-d89b-11eb-9e14-fa163ee63de6
Typical error in dmesg:
systemd-resolve[1792202]: segfault at 564ff982f3e0 ip 0000564ff982f3e0 sp 00007ffe2fd0b758 error 15

apport hints me that problem is related to mdns

#3 0x00007f3e903c2f11 in sd_event_dispatch () from /lib/systemd/libsystemd-shared-245.so

It might be (or not) related that some hosts with mdns in my network have ipv6 enabled.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: systemd 245.4-4ubuntu3.7
ProcVersionSignature: Ubuntu 5.4.0-75.84-generic 5.4.119
Uname: Linux 5.4.0-75-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.18
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Thu Jul 1 08:22:51 2021
InstallationDate: Installed on 2018-12-05 (938 days ago)
InstallationMedia: Ubuntu 18.10 "Cosmic Cuttlefish" - Release amd64 (20181017.3)
MachineType: System manufacturer System Product Name
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-75-generic root=UUID=54b80d5c-3d61-4919-873f-0d308083e3b9 ro quiet splash vt.handoff=7
SourcePackage: systemd
UpgradeStatus: Upgraded to focal on 2020-05-22 (405 days ago)
dmi.bios.date: 05/12/2020
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1205
dmi.board.asset.tag: Default string
dmi.board.name: ROG STRIX X399-E GAMING
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1205:bd05/12/2020:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX399-EGAMING:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Related branches

CVE References

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :
Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

Here is backtrace, as i am unable to attach crash report (launchpad gives error)
apport-retrace -g _lib_systemd_systemd-resolved.102.crash
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Reading symbols from /lib/systemd/systemd-resolved...
(No debugging symbols found in /lib/systemd/systemd-resolved)
warning: core file may not match specified executable file.
[New LWP 1792202]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/lib/systemd/systemd-resolved'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000564ff982f3e0 in ?? ()
(gdb) bt full
#0 0x0000564ff982f3e0 in ?? ()
No symbol table info available.
#1 0x0000564ff772a97f in ?? ()
No symbol table info available.
#2 0x00007f3e903c2b96 in ?? () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#3 0x00007f3e903c2f11 in sd_event_dispatch () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#4 0x00007f3e903c4948 in sd_event_run () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#5 0x00007f3e903c4b6f in sd_event_loop () from /lib/systemd/libsystemd-shared-245.so
No symbol table info available.
#6 0x0000564ff770522a in ?? ()
No symbol table info available.
#7 0x00007f3e905cc0b3 in __libc_start_main (main=0x564ff7703e40, argc=1, argv=0x7ffe2fd0ba18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe2fd0ba08)
    at ../csu/libc-start.c:308
        self = <optimized out>
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {94901453980416, -7807279351804040094, 94901453740880, 140729700629008, 0, 0, 7807736978364591202, 7916158757757021282}, mask_was_saved = 0}}, priv = {pad = {
              0x0, 0x0, 0x1, 0x7ffe2fd0ba18}, data = {prev = 0x0, cleanup = 0x0, canceltype = 1}}}
        not_first_call = <optimized out>
#8 0x0000564ff7705b7e in ?? ()
No symbol table info available.

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :
Download full text (85.5 KiB)

I run it with valgrind and here is what i got:
(not sure yet how to add debug symbols, will try it too)

root@threadpc:~# valgrind -v /lib/systemd/systemd-resolved
==1798854== Memcheck, a memory error detector
==1798854== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1798854== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun with -h for copyright info
==1798854== Command: /lib/systemd/systemd-resolved
==1798854==
--1798854-- Valgrind options:
--1798854-- -v
--1798854-- Contents of /proc/version:
--1798854-- Linux version 5.4.0-75-generic (buildd@lgw01-amd64-034) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #84-Ubuntu SMP Fri May 28 16:28:37 UTC 2021
--1798854--
--1798854-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--1798854-- Page sizes: currently 4096, max supported 4096
--1798854-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind
--1798854-- Reading syms from /lib/systemd/systemd-resolved
--1798854-- object doesn't have a symbol table
--1798854-- Reading syms from /lib/x86_64-linux-gnu/ld-2.31.so
--1798854-- Considering /lib/x86_64-linux-gnu/ld-2.31.so ..
--1798854-- .. CRC mismatch (computed b1e31cec wanted 7bd1f8ba)
--1798854-- Considering /lib/x86_64-linux-gnu/ld-2.31.so ..
--1798854-- .. CRC mismatch (computed b1e31cec wanted 7bd1f8ba)
--1798854-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so ..
--1798854-- .. CRC is valid
--1798854-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux
--1798854-- object doesn't have a symbol table
--1798854-- object doesn't have a dynamic symbol table
--1798854-- Scheduler: using generic scheduler lock implementation.
--1798854-- Reading suppressions file: /usr/lib/x86_64-linux-gnu/valgrind/default.supp
==1798854== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-1798854-by-root-on-???
==1798854== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-1798854-by-root-on-???
==1798854== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-1798854-by-root-on-???
==1798854==
==1798854== TO CONTROL THIS PROCESS USING vgdb (which you probably
==1798854== don't want to do, unless you know exactly what you're doing,
==1798854== or are doing some strange experiment):
==1798854== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1798854 ...command...
==1798854==
==1798854== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==1798854== /path/to/gdb /lib/systemd/systemd-resolved
==1798854== and then give GDB the following command
==1798854== target remote | /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1798854
==1798854== --pid is optional if only one valgrind process is running
==1798854==
--1798854-- REDIR: 0x4022f60 (ld-linux-x86-64.so.2:strlen) redirected to 0x580c9ce2 (???)
--1798854-- REDIR: 0x4022d30 (ld-linux-x86-64.so.2:index) redirected to 0x580c9cfc (???)
--1798854-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so
--1798854-- object doesn't have a symbol table
--1798854-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memchec...

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

Here is more meaningful result from valgrind with dbgsym present

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

Most likely i found patch that fix this problem:
https://github.com/systemd/systemd/commit/97935302283729c9206b84f5e00b1aff0f78ad19

And associated issue have striking similarity
https://github.com/systemd/systemd/issues/18427

Will try to test it today if it fixes my issue as well.

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

Not sure if i backported patch correctly, some functions is missing in 245.4
Build is OK, testing it, no crashes yet, but will have conclusive results in ~24h.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Backported bugfix" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

As far as i can see - no crashes anymore.

tags: added: rls-ff-incoming
tags: added: fr-1490
tags: removed: rls-ff-incoming
Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

Thank you for providing a patch! That patch looks straight forward and (almost) resembles the upstream change, obviously missing the parts that do not yet exist in systemd v245.

The only other thing I found missing is an upstream change in src/resolve/resolved-dns-transaction.c:

```
 static void dns_transaction_stop_timeout(DnsTransaction *t) {
         assert(t);

- t->timeout_event_source = sd_event_source_unref(t->timeout_event_source);
+ t->timeout_event_source = sd_event_source_disable_unref(t->timeout_event_source);
 }

 DnsTransaction* dns_transaction_free(DnsTransaction *t) {
```

I've created a new patch for Focal, including this change and also adopted the patch to Hirsute.

I have a few open questions:
1/ Did you leave out that resolved-dns-transaction.c change on purpose?
2/ What is the best way to reproduce this issue? Can it somehow be triggered?
3/ How to test/confirm that the issue is indeed fixed? Is there any way other than observing for 24 hours?

Revision history for this message
Lukas Märdian (slyon) wrote :
Changed in systemd (Ubuntu Impish):
status: New → Fix Released
Revision history for this message
Lukas Märdian (slyon) wrote :

systemd v248 as can be found in Impish already contains the upstream fix.

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

1. No, my skills are just not enough to spot it.
I noticed warning in valgrind (but it was once only and not crash), but was not able to trace back reason. Very likely your patch fix that.

2. In my network i have some MDNS devices, and probably some combination of them and their connectivity (power failures, packetloss) triggering this bug from time to time.
I run back now valgrind with timestamps and will record all port 53 traffic, maybe i can catch pattern.
If you want i can apply your patch and do the same.

To find the reason i just disabled systemd-resolved service and run manually valgrind -v /lib/systemd/systemd-resolved.
Probably other users complained about resolved segfault messages in dmesg and have it happening more often?

3. I was not able to find way to reproduce bug reliably.
And now it might take about a week to spot this remaining bug.

Revision history for this message
Lukas Märdian (slyon) wrote :
Lukas Märdian (slyon)
description: updated
Revision history for this message
Lukas Märdian (slyon) wrote :

Denys, what kind of MDNS traffic is flowing in your network? I'm not really able to reproduce the problem locally. Maybe I'm missing this special network packets?

Have you already been able to find out what triggers the issue for you?

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

I recorded pcap and captured moment just before crash. It doesnt looks like there was any MDNS at all at this moment, only suspicious is many weird DNS requests. (spinesystems.solutions is my local domain set on pc). Not sure what generated them.

Small snap of these:

02:16:36.658522 IP 127.0.0.1.39766 > 127.0.0.1.53: 17063+ A? roxquromlpczqgh.spinesystems.solutions. (56)
02:16:36.658626 IP 127.0.0.1.31846 > 127.0.0.53.53: 54480+ A? roxquromlpczqgh.spinesystems.solutions. (56)
02:16:36.659106 IP 127.0.0.53.53 > 127.0.0.1.6126: 42840 NXDomain 0/0/0 (56)
02:16:36.659603 IP 127.0.0.53.53 > 127.0.0.1.31846: 54480 NXDomain 0/0/0 (56)
02:16:36.659644 IP 127.0.0.1.53 > 127.0.0.1.54315: 17063 NXDomain 0/0/0 (56)
02:16:36.659660 IP 127.0.0.1.53 > 127.0.0.1.39766: 17063 NXDomain 0/0/0 (56)
02:16:36.660099 IP 127.0.0.53.53 > 127.0.0.1.6126: 42840 NXDomain 0/0/0 (56)
02:16:36.660603 IP 127.0.0.53.53 > 127.0.0.1.6126: 42840 NXDomain 0/0/0 (56)
02:16:36.661082 IP 127.0.0.53.53 > 127.0.0.1.6126: 42840 NXDomain 0/0/0 (56)
02:16:36.661513 IP 127.0.0.53.53 > 127.0.0.1.25474: 60255 NXDomain 0/0/1 (67)
02:16:36.661898 IP 127.0.0.53.53 > 127.0.0.1.6126: 42840 NXDomain 0/0/0 (56)
02:16:36.662282 IP 127.0.0.53.53 > 127.0.0.1.6126: 42840 NXDomain 0/0/0 (56)
02:16:36.662778 IP 127.0.0.53.53 > 127.0.0.1.6126: 42840 NXDomain 0/0/0 (56)
02:16:36.663188 IP 127.0.0.1.60869 > 127.0.0.1.53: 47647+ AAAA? xjstkkuhaqclygt.spinesystems.solutions. (56)
02:16:36.663259 IP 127.0.0.1.32600 > 127.0.0.53.53: 10939+ AAAA? xjstkkuhaqclygt.spinesystems.solutions. (56)
02:16:36.663449 IP 127.0.0.1.36798 > 127.0.0.1.53: 30847+ AAAA? roxquromlpczqgh.spinesystems.solutions. (56)
02:16:36.663481 IP 127.0.0.1.15565 > 127.0.0.53.53: 54137+ AAAA? roxquromlpczqgh.spinesystems.solutions. (56)
02:16:36.664474 IP 127.0.0.1.47179 > 127.0.0.1.53: 3316+ A? roxquromlpczqgh.spinesystems.solutions. (56)
02:16:36.664563 IP 127.0.0.1.40056 > 127.0.0.53.53: 55684+ A? roxquromlpczqgh.spinesystems.solutions. (56)
02:16:36.664969 IP 10.255.255.225.55743 > 10.255.255.1.53: 60614+ [1au] A? roxquromlpczqgh.spinesystems.solutions. (67)
02:16:36.665503 IP 10.255.255.1.53 > 10.255.255.225.55743: 60614 NXDomain 0/1/1 (140)
02:16:36.666994 IP 10.255.255.225.55743 > 10.255.255.1.53: 60614+ A? roxquromlpczqgh.spinesystems.solutions. (56)
02:16:36.667494 IP 10.255.255.1.53 > 10.255.255.225.55743: 60614 NXDomain 0/1/0 (129)

Maybe because there is massive timeouts in query to nonexisting domain it triggers this bug like in original issue in systemd ( https://github.com/systemd/systemd/issues/18427 )?
In my ISP also all port 53 requests are force-redirected to their DNS, so they might timeout sometimes, instead of NXDomain.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu Focal):
status: New → Confirmed
Changed in systemd (Ubuntu Hirsute):
status: New → Confirmed
Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Denys, or anyone else affected,

Accepted systemd into hirsute-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/247.3-3ubuntu3.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-hirsute to verification-done-hirsute. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-hirsute. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Hirsute):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-hirsute
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/247.3-3ubuntu3.5)

All autopkgtests for the newly accepted systemd (247.3-3ubuntu3.5) for hirsute have finished running.
The following regressions have been reported in tests triggered by the package:

systemd/247.3-3ubuntu3.5 (armhf)
munin/2.0.57-1ubuntu1 (amd64)
udisks2/2.9.2-1ubuntu1 (arm64)
initramfs-tools/0.139ubuntu3 (amd64)
swupdate/2020.11-2 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/hirsute/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Denys, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.12 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: Confirmed → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.12)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.12) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

flatpak/1.6.5-0ubuntu0.3 (amd64)
gvfs/1.44.1-1ubuntu1 (amd64, arm64)
munin/2.0.56-1ubuntu1 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Lukas Märdian (slyon) wrote :

Hi Denys,

could you please help with testing this new systemd release in focal-proposed (and hirsute-proposed) and confirm that it is working in your context without crashes?

https://wiki.ubuntu.com/Testing/EnableProposed

IIRC you're running a Focal machine, which should make testing the version from focal-proposed easy. Would you also be able to setup a VM, connected to the same network, and confirm the version in Ubuntu Hiruste (hirsute-proposed) does not crash either?

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Denys, or anyone else affected,

Accepted systemd into hirsute-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/247.3-3ubuntu3.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-hirsute to verification-done-hirsute. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-hirsute. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Denys, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.13 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/247.3-3ubuntu3.6)

All autopkgtests for the newly accepted systemd (247.3-3ubuntu3.6) for hirsute have finished running.
The following regressions have been reported in tests triggered by the package:

udisks2/2.9.2-1ubuntu1 (arm64)
apt/2.2.4ubuntu0.1 (amd64)
systemd/247.3-3ubuntu3.6 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/hirsute/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

Started testing on my PC.
Ubuntu Hiruste will be a bit difficult to run, but i will try.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.13)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.13) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

gvfs/1.44.1-1ubuntu1 (amd64, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Lukas Märdian (slyon) wrote :

I've tested systemd 247.3-3ubuntu3.6 from hirsute-proposed in a VM and can verify it is working as expected.

I used the configuration provided and kept the VM running for several days without any crash:
```
root@hh-vm:~# cat /etc/systemd/resolved.conf
[Resolve]
DNS=46.182.19.48#dns2.digitalcourage.de 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net
DNSSEC=yes
DNSOverTLS=true
MulticastDNS=no
LLMNR=no
Cache=no
DNSStubListener=yes
Domains=~.
root@hh-vm:~# dmesg | grep segfault
root@hh-vm:~# dmesg | grep "systemd-resolve"
root@hh-vm:~# resolvectl query google.com
google.com: 142.250.185.238 -- link: enp5s0
            2a00:1450:4001:813::200e -- link: enp5s0

-- Information acquired via protocol DNS in 121.0ms.
-- Data is authenticated: no
```

DNS requests are working as expected and I could not provoke a crash by disconnecting the network interface midway through a DNS request, forcing a timeout.

tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Lukas Märdian (slyon) wrote :

I've tested systemd 245.4-4ubuntu3.13 from focal-proposed in a VM and can verify it is working as expected.

I used the configuration provided and kept the VM running for several days without any crash:
```
root@ff-vm:~# cat /etc/systemd/resolved.conf
[Resolve]
DNS=46.182.19.48#dns2.digitalcourage.de 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net
DNSSEC=yes
DNSOverTLS=opportunistic
MulticastDNS=no
LLMNR=no
Cache=yes
DNSStubListener=yes
Domains=~.
root@ff-vm:~# dmesg | grep segfault
root@ff-vm:~# dmesg | grep "systemd-resolve"
root@ff-vm:~# resolvectl query google.com
google.com: 2a00:1450:4001:809::200e -- link: enp5s0
            172.217.18.110 -- link: enp5s0

-- Information acquired via protocol DNS in 177.4ms.
-- Data is authenticated: no
```

DNS requests are working as expected and I could not provoke a crash by disconnecting the network interface midway through a DNS request, forcing a timeout.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.13

---------------
systemd (245.4-4ubuntu3.13) focal; urgency=medium

  * d/p/dell-clamshell-accel-location-base-with-sku.patch:
    Revert incorrect patch (LP: #1942899)

systemd (245.4-4ubuntu3.12) focal; urgency=medium

  [ Yao Wei ]
  * d/p/dell-clamshell-accel-location-base.patch:
    Add ACCEL_LOCATION=base property for Dell clamshell models (LP: #1938259)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=5c1be33900edee94da0dc9a4ade8edcd079b4c85

  [ Lukas Märdian ]
  * Add d/p/lp1934221-resolved-disable-event-sources-before-unreffing-them.patch
    - Fix segfault in systemd-resolve (LP: #1934221)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=6c401900c70962052f56c7108fdc02fe7f84c9bf

  [ Simon Chopin ]
  * d/p/lp1914740-network-enable-DHCP-broadcast-flag-if-required-by-in.patch:
    - Apply upstream patch to fix Hipersocket DHCP mode (LP: #1914740)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=326ae43b7966d9e7c5f7124027185a79a07fa276

  [ Dan Streetman ]
  * d/p/lp1934981-correct-suspend-then-sleep-string.patch:
    Fix sleep verb used by logind during suspend-then-hibernate
    (LP: #1934981)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=997f3a7da3d5db22e3c63626c3f7dc3dff0830b0
  * d/p/lp1937238-util-return-the-correct-correct-wd-from-inotify-help.patch:
    Fix watch for time sync (LP: #1937238)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=dbabff8a03eb232c19174eff1335cd7cb7d7860c
  * d/extra/dhclient-enter-resolved-hook:
    Reset start limit counter for systemd-resolved in dhclient hook
    (LP: #1939255)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=9d3a91a0b70a4b2bcc166f366cd0a880fd494812
  * d/p/lp1935051-shared-unit-file-make-sure-the-old-hashmaps-and-sets.patch:
    Fix memory leak in path cache (LP: #1935051)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=12d6bdeb35f309158fe8d4242c6dd9be4d067604
  * d/p/lp1934147/0001-cgroup-do-catchup-for-unit-cgroup-inotify-watch-file.patch,
    d/p/lp1934147/0002-core-Make-sure-cgroup_oom_queue-is-flushed-on-manage.patch:
    Catchup cgroup inotify watch after reexec/reload (LP: #1934147)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=63eabc88b8e0005eb40b15b543538ce35377bdbd

 -- Dan Streetman <email address hidden> Tue, 07 Sep 2021 14:37:22 -0400

Changed in systemd (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 247.3-3ubuntu3.6

---------------
systemd (247.3-3ubuntu3.6) hirsute; urgency=medium

  * d/p/dell-clamshell-accel-location-base-with-sku.patch:
    Revert incorrect patch (LP: #1942899)

systemd (247.3-3ubuntu3.5) hirsute; urgency=medium

  [ Yao Wei ]
  * d/p/dell-clamshell-accel-location-base-with-sku.patch:
    Use SKU to identify Dell clamshell models for accelerometer properties
    (LP: #1938259)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=a21edd743408b5603b0177e9c230c6d6b919e589

  [ Lukas Märdian ]
  * Add d/p/lp1934221-resolved-disable-event-sources-before-unreffing-them.patch
    - Fix segfault in systemd-resolve (LP: #1934221)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=55906c32bdfd862e454c0fce80c4e023de6c3b19

  [ Simon Chopin ]
  * d/p/lp1914740-network-enable-DHCP-broadcast-flag-if-required-by-in.patch:
    - Apply upstream patch to fix Hipersocket DHCP mode (LP: #1914740)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=c7559785d7d4efaaa899009bceeb9498e53342e5

  [ Dan Streetman ]
  * d/p/lp1934981-correct-suspend-then-sleep-string.patch:
    Fix sleep verb used by logind during suspend-then-hibernate
    (LP: #1934981)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=cf75bbb01a6e7e2516e2bbf541c8e84d0548359c
  * d/p/lp1934147/0001-cgroup-do-catchup-for-unit-cgroup-inotify-watch-file.patch,
    d/p/lp1934147/0002-core-Make-sure-cgroup_oom_queue-is-flushed-on-manage.patch:
    Catchup cgroup inotify watch after reexec/reload (LP: #1934147)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d34d104339065665fa64ccda72d07ba8e2b7e10f

 -- Dan Streetman <email address hidden> Tue, 07 Sep 2021 14:34:22 -0400

Changed in systemd (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

Unfortunately, this patch does not seem to be sufficient.

It produces very similar segfaults in on_query_timeout (https://errors.ubuntu.com/problem/c4e5be3f1c7af9483993c7e6007b9325ab5b78cd) than the segfault we could observe before (https://errors.ubuntu.com/problem/bb0ce4ff8e6ef86041cfb5647b792823a20b44f7)

We want to revert the patch in Focal (systemd v245) for now (LP: #1943982), while we'll try to fix the root cause in Hirsute (systemd v247), adding more relevant patches that were added to systemd-stable v247 (but not systemd-stable v245):
https://github.com/systemd/systemd-stable/commits/v247-stable/src/resolve/resolved-dns-query.c
https://github.com/systemd/systemd-stable/commit/64317106aed94a6fb758ab6b08ba490873fc5227
https://github.com/systemd/systemd-stable/commit/91ba2eac4b6b463026b3a93e5a139923e8f2cfe4
https://github.com/systemd/systemd-stable/commit/ab9f7e1a51005f12d3bac83b86716d9d33048eb7
https://github.com/systemd/systemd-stable/commit/78a43c33c8847ebbc2d3cf530ebe304924c58b32
https://github.com/systemd/systemd-stable/commit/c8d7fab2286384b19ff6328cece107c4c02d7bb8

If root-causing is successful, we might back-port those patches to Focal, too.

Changed in systemd (Ubuntu Focal):
status: Fix Released → Confirmed
Changed in systemd (Ubuntu Hirsute):
status: Fix Released → Confirmed
Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

Yes it seems i am still seeing crashes too. They are more rare, but still appearing.

Revision history for this message
Lukas Märdian (slyon) wrote :

Thanks for confirming Denys, that's unfortunate...
Would you mind giving the version from this PPA a try https://launchpad.net/~slyon/+archive/ubuntu/lp1934221 (for Hirsute)?

And maybe also for Focal once the build is ready?

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

Sorry for very long delay, many bad events in life. I will give it a try now.

Revision history for this message
Denys Fedoryshchenko (nuclearcat) wrote :

My system is focal. Can't try Hirsute yet, dont have any systems with such error appearing.

It doesnt work at all for me, fail to resolve anything

Link 2 (enp3s0)
      Current Scopes: DNS
DefaultRoute setting: yes
       LLMNR setting: yes
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
  Current DNS Server: 2a0d:e40:0:4000::1
         DNS Servers: 2a0d:e40:0:4000::1
                      fd2c:bf12:1194::1
                      10.255.255.1
                      8.8.8.8
          DNS Domain: ~.
                      spinesystems.solutions

udp 0 0 127.0.0.53:53 0.0.0.0:* 4106635/systemd-res
resolv.conf nameserver: nameserver 127.0.0.53

dig @127.0.0.53

; <<>> DiG 9.16.1-Ubuntu <<>> @127.0.0.53
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 4404
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;. IN NS

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Mon Oct 04 12:34:50 EEST 2021
;; MSG SIZE rcvd: 28

dig +short @2a0d:e40:0:4000::1 www.google.com
142.250.200.196

# systemd-resolve -4 www.google.com
www.google.com: resolve call failed: No appropriate name servers or networks for name found

But didnt rebooted yet after fetching new systemd.

Revision history for this message
Lukas Märdian (slyon) wrote :

Thank you for testing. The Focal backport might need some more work, as that version of systemd is even older and the patches did not apply cleanly.

For Hirsute it was a straightforward cherry-pick of patches from systemd-stable v247 (MergeProposal attached to this bug report). So I wonder if we can somehow confirm if those patches fix the root-cause in Hirsute?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 247.3-3ubuntu3.7

---------------
systemd (247.3-3ubuntu3.7) hirsute-security; urgency=medium

  * SECURITY UPDATE: systemd-tmpfiles could be made to crash.
    - d/p/rm-rf-refactor-rm-rf-children-split-out-body-of-directory.patch:
      Backport upstream patch from PR#20173
    - d/p/rm-rf-optionally-fsync-after-removing-directory-tree.patch:
      Backport upstream patch required for CVE-2021-3997 patches
    - d/p/CVE-2021-3997-1.patch: Backport upstream patch to refactor
      rm_rf_children_inner()
    - d/p/CVE-2021-3997-2.patch: Backport upstream patch to refactor
      rm_rf()
    - d/p/CVE-2021-3997-3.patch: Backport upstream patch to loop over
      nested directories instead of using recursion
    - CVE-2021-3997

 -- Alex Murray <email address hidden> Mon, 10 Jan 2022 14:56:34 +1030

Changed in systemd (Ubuntu Hirsute):
status: Confirmed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.15

---------------
systemd (245.4-4ubuntu3.15) focal-security; urgency=medium

  * SECURITY UPDATE: systemd-tmpfiles could be made to crash.
    - d/p/rm-rf-refactor-rm-rf-children-split-out-body-of-directory.patch:
      Backport upstream patch from PR#20173
    - d/p/rm-rf-optionally-fsync-after-removing-directory-tree.patch:
      Backport upstream patch required for CVE-2021-3997 patches
    - d/p/CVE-2021-3997-1.patch: Backport upstream patch to refactor
      rm_rf_children_inner()
    - d/p/CVE-2021-3997-2.patch: Backport upstream patch to refactor
      rm_rf()
    - d/p/CVE-2021-3997-3.patch: Backport upstream patch to loop over
      nested directories instead of using recursion
    - CVE-2021-3997

 -- Alex Murray <email address hidden> Mon, 10 Jan 2022 15:26:38 +1030

Changed in systemd (Ubuntu Focal):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.