systemd-oomd is counting cached as used and triggering more easily than it should

Bug #1966381 reported by Jason Haar
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
High
Nick Rosbrook
Jammy
Fix Released
High
Nick Rosbrook

Bug Description

Just now I was watching some video in Firefox. I popped over to another virtual workspace for a few minutes, and when I popped back to Firefox it had gone. The same thing had been happening all week (I installed fresh Ubuntu-22.04 last week) with Chrome, Firefox and Thunderbird.

This time instead of shrugging it off I looked in the logs, and found this

Mar 25 19:45:40 ubuntu systemd-oomd[960]: Killed /user.slice/user-1001.slice/user@1001.service/app.slice/app-gnome-firefox-6607.scope due to memory used (15940579328) / total (16153944064) and swap used (925564928) / total (1023406080) being more than 90.00%
Mar 22 08:11:29 ubuntu systemd[5029]: app-gnome-google\x2dchrome-5412.scope: systemd-oomd killed 298 process(es) in this unit.
Mar 23 11:09:28 ubuntu systemd-oomd[1055]: Killed /user.slice/user-1001.slice/user@1001.service/app.slice/app-gnome-thunderbird-5418.scope due to memory used (15591993344) / total (16149745664) and swap used (927760384) / total (1023406080) being more than 90.00%
Mar 23 11:09:28 ubuntu systemd[5029]: app-gnome-thunderbird-5418.scope: systemd-oomd killed 173 process(es) in this unit.

I know it's saying those three entirely unrelated applications had suddenly decided to swallow all the RAM+swap on this laptop of mine - but the very same apps didn't act like that last week under Ubuntu-20.04, so I suspect something else is going on

I can't say they hadn't swallowed all the RAM, but there is ZERO sign of a system on the verge of collapsing - everything has been screaming along just nicely - no sign of the "staggering" you normally get when the OS is heavily into swap.

However, now that I look I see my 16G laptop only has 1G swap??? I just let the Ubuntu installer do it's defaults - but it used to auto-choose 1xRAM or 2xRAM - what's with this 1G swap? Could that be related?

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: systemd-oomd 249.11-0ubuntu1
ProcVersionSignature: Ubuntu 5.15.0-23.23-generic 5.15.27
Uname: Linux 5.15.0-23-generic x86_64
ApportVersion: 2.20.11-0ubuntu79
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Fri Mar 25 19:47:44 2022
InstallationDate: Installed on 2022-03-13 (11 days ago)
InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220313)
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Jason Haar (jhaar-launchpad) wrote :
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thank you for the report, tagging as rls incoming because it sounds like we should ensure the systemd-oomd behaviour is what is expected or if it's maybe kicking in more early than it should

tags: added: rls-jj-incoming
Changed in systemd (Ubuntu):
importance: Undecided → High
Revision history for this message
Jason Haar (jhaar-launchpad) wrote :

FYI About 6 hours ago I saw a new release of systemd-oom was released (249.11-0ubuntu2). I've upgraded the entire system and rebooted, so I'll report back if there's any change. I was getting these random OOM about every couple of days, so within a week should know if that changed anything

Revision history for this message
Jason Haar (jhaar-launchpad) wrote :

Oh well, that latest systemd-oom didn't help. Chrome just crashed again - while I wasn't even using the computer. Here are all the syslogs at the time it crashed - nothing but the OOM

Mar 28 19:30:56 ubuntu systemd-oomd[1121]: Killed /user.slice/user-1001.slice/user@1001.service/app.slice/app-gnome-google\x2dchrome-5640.scope due to memory used (15992975360) / total (16153948160) and swap used (921436160) / total (1023406080) being more than 90.00%
Mar 28 19:30:56 ubuntu systemd[5083]: app-gnome-google\x2dchrome-5640.scope: systemd-oomd killed 311 process(es) in this unit.
Mar 28 19:30:56 ubuntu systemd[5083]: app-gnome-google\x2dchrome-5640.scope: Consumed 7h 10min 56.132s CPU time.

tags: added: fr-2157
tags: removed: rls-jj-incoming
Revision history for this message
Dan Streetman (ddstreet) wrote :

wow, looking at the systemd code (even upstream), oomd is counting pagecache as 'used' memory which is massively unfair as the kernel is responsible for pagecache use, not userspace, and it's not even accurate (from a OOM perspective) since the kernel will drop pagecache as memory pressure increases.

Definitely some discussion and patching needs to happen upstream in systemd I think.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

In the near term, we could consider tweaking the systemd-oomd defaults in Ubuntu. According to the commit that added systemd-oomd in Jammy [1], the current config is based on Fedora's. This includes using the default value of SwapUsedLimit=90% [2]. However, Fedora has more swap space by default: a fresh install of Fedora 35 in a VM with 4GB of memory has 4GB of swap, whereas a fresh install of Jammy in a VM with 4GB of memory has 968MB of swap. So, maybe the default SwapUsedLimit is not appropriate for Ubuntu.

[1] https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?h=ubuntu-jammy&id=771fee9e73316c92e065e93946ec64c578b43706
[2] https://www.freedesktop.org/software/systemd/man/oomd.conf.html#SwapUsedLimit=

Revision history for this message
Dan Streetman (ddstreet) wrote :

> So, maybe the default SwapUsedLimit is not appropriate for Ubuntu

I don't think tweaking that will help much if at all, it's going to be hard to get around the core issue of oomd counting pagecache as memory pressure

Revision history for this message
Dan Streetman (ddstreet) wrote :

> it's going to be hard to get around the core issue of oomd counting pagecache as memory pressure

assuming my quick 10-minute assessment of this bug is correct, of course...maybe i'm totally wrong about what the problem is ;-)

Revision history for this message
Jason Haar (jhaar-launchpad) wrote :

Well funny you should say that... When I installed 22.04 on my new Dell laptop with 16G RAM, Jammy still only allocated 976MB of swap. I think you have a problem there too.

So after reporting this issue and continually having OOM crashes, I created a 20G swapfile - and ever since this problem has disappeared...

Maybe running with such a starved swapspace triggers systemd-oom to do weird things?

Revision history for this message
Dan Streetman (ddstreet) wrote :

> I think you have a problem there too.

oh I'm certainly not claiming a 1g default swap is appropriate, that does seem far, far too small to me and will likely cause widespread issues beyond just this, I was only saying that tweaking the systemd-oomd swap % full setting would IMHO not be likely to fix this very well - and as you point out increasing the swap size to a more reasonable size almost certainly will help (and might be why upstream hadn't noticed this before), regardless of the systemd-oomd swap % used default setting, because it would be far less likely to fill swap up to the oomd swap % full default.

> Maybe running with such a starved swapspace triggers systemd-oom to do weird things?

From my quick read of the code, it doesn't seem to be doing anything weird at all, I think it's doing exactly what it's programmed to do. I just don't think the code is correct.

summary: - applications crash that never crashed under Ubuntu-20.04
+ systemd-oomd is counting cached as used and triggering more easily than
+ it should
Changed in systemd (Ubuntu Jammy):
milestone: none → ubuntu-22.04
Nick Rosbrook (enr0n)
Changed in systemd (Ubuntu Jammy):
assignee: nobody → Nick Rosbrook (enr0n)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Sebastien Bacher (seb128) wrote :

https://github.com/systemd/systemd/pull/22965 was proposed upstream to change to logic to use the available space number rather than the free one

Revision history for this message
Sebastien Bacher (seb128) wrote :

the change got merged upstream, can we get it cherry picked to Ubuntu now?

Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

Was this released ? I've received an update of that package this morning:

> $ dpkg --list systemd-oomd
> Souhait=inconnU/Installé/suppRimé/Purgé/H=à garder
> | État=Non/Installé/fichier-Config/dépaqUeté/échec-conFig/H=semi-installé/W=attend-traitement-déclenchements
> |/ Err?=(aucune)/besoin Réinstallation (État,Err: majuscule=mauvais)
> ||/ Nom Version Architecture Description
> +++-==============-===============-============-====================================
> ii systemd-oomd 249.11-0ubuntu3 amd64 Userspace out-of-memory (OOM) killer

But even with that version which fixes the issue according to:

> systemd (249.11-0ubuntu3) jammy; urgency=medium
>
> * oomd: calculate 'used' memory with MemAvailable instead of MemFree (LP: #1966381)
> File: debian/patches/lp1966381-oomd-calculate-used-memory-with-MemAvailable-instead-of-M.patch
> https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=15fc4c53d726e1dcae7296a9306cfd453fd1a046
> * hwdb: remove the tablet pad entry for the UC-Logic 1060N (LP: #1926860)
> File: debian/patches/lp1926860-hwdb-remove-the-tablet-pad-entry-for-the-UC-Logic-1060N.patch
> https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=7bf31946a52e55f9f6ea4ecfa30e311685b20997
>
> -- Nick Rosbrook <email address hidden> Thu, 07 Apr 2022 15:28:15 -0400

I still hit the problem after a few minutes of building a debug firefox on my laptop (ThinkPad P14s, 32GB RAM)

> -- Boot c45108609edc41368bc948d9ffda1f4d --
> avril 08 12:30:44 portable-alex systemd[1]: Starting Userspace Out-Of-Memory (OOM) Killer...
> avril 08 12:30:44 portable-alex systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer.
> avril 08 12:42:07 portable-alex systemd-oomd[1168]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-Alacritty-6631.scope due to memory used (26327048192) / total (29247873024) and swap used (1023406080) / total (1023406080) being more than 90.00%
> -- Boot 2d12cd6e16d4432b94f957fc1cd77124 --

This is what I can see in the logs ; as you can see it's killing my term (alacritty).

FTR, all things being equal otherwise, I have been able to run for days with systemd-oomd masked and disabled and the same usage.

Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

And FTR I also have 1GB swap, the install is not from Jammy (rather 21.04 or 21.10)

$ LC_ALL=C swapon -s
Filename Type Size Used Priority
/dev/dm-2 partition 999420 631852 -2

Revision history for this message
Nick Rosbrook (enr0n) wrote :

Hi Pirouette,

Can you show the output of `oomctl` and `free -h` on your system?

FWIW, at the time of writing this systemd 249.11-0ubuntu3 is still in jammy-proposed, so is only available on systems with the -proposed archive enabled.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 249.11-0ubuntu3

---------------
systemd (249.11-0ubuntu3) jammy; urgency=medium

  * oomd: calculate 'used' memory with MemAvailable instead of MemFree (LP: #1966381)
    File: debian/patches/lp1966381-oomd-calculate-used-memory-with-MemAvailable-instead-of-M.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=15fc4c53d726e1dcae7296a9306cfd453fd1a046
  * hwdb: remove the tablet pad entry for the UC-Logic 1060N (LP: #1926860)
    File: debian/patches/lp1926860-hwdb-remove-the-tablet-pad-entry-for-the-UC-Logic-1060N.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=7bf31946a52e55f9f6ea4ecfa30e311685b20997

 -- Nick Rosbrook <email address hidden> Thu, 07 Apr 2022 15:28:15 -0400

Changed in systemd (Ubuntu Jammy):
status: Confirmed → Fix Released
Revision history for this message
Pirouette Cacahuète (lissyx) wrote :

Sorry, I did not receive any email for your comment:
 - yes, I was on -proposed, so as I said, I had the version with the fix
 - I will try to capture those, but I disabled systemd-oomd since, and have not had issue (at all).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.