KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

Bug #1837810 reported by Pooja Ghumre
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Matthew Ruffell
Focal
Fix Released
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/1837810

[Impact]

We are seeing a problem on OpenStack compute nodes, and KVM hosts, where a kernel oops is generated, and all running KVM machines are placed into the pause state.

This is caused by the kernel's reserved zero_page reference counter overflowing from a positive number to a negative number, and hitting a (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

This only happens if the machine has Kernel Samepage Mapping (KSM) enabled, with "use_zero_pages" turned on. Each time a new VM starts and the kernel does a KSM merge run during a EPT violation, the reference counter for the zero_page is incremented in try_async_pf() and never decremented. Eventually, the reference counter will overflow, causing the KVM subsystem to fail.

Syslog:
error : qemuMonitorJSONCheckError:392 : internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required

QEMU Logs:
error: kvm run failed Bad address
EAX=000afe00 EBX=0000000b ECX=00000080 EDX=00000cfe
ESI=0003fe00 EDI=000afe00 EBP=00000007 ESP=00006d74
EIP=000ee344 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 000f7040 00000037
IDT= 000f707e 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7 <f3> a5 a1 00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

Kernel Oops:

[ 167.695986] WARNING: CPU: 1 PID: 3016 at /build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 follow_page_pte+0x6f4/0x710
[ 167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G OE 4.15.0-106-generic #107~16.04.1-Ubuntu
[ 167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
[ 167.696026] RSP: 0018:ffffa81802023908 EFLAGS: 00010286
[ 167.696027] RAX: ffffed8786e33a80 RBX: ffffed878c6d21b0 RCX: 0000000080000000
[ 167.696027] RDX: 0000000000000000 RSI: 00003ffffffff000 RDI: 80000001b8cea225
[ 167.696028] RBP: ffffa81802023970 R08: 80000001b8cea225 R09: ffff90c4d55fa340
[ 167.696028] R10: 0000000000000000 R11: 0000000000000000 R12: ffffed8786e33a80
[ 167.696029] R13: 0000000000000326 R14: ffff90c4db94fc50 R15: ffff90c4d55fa340
[ 167.696030] FS: 00007f6a7798c700(0000) GS:ffff90c4edc80000(0000) knlGS:0000000000000000
[ 167.696030] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 167.696031] CR2: 0000000000000000 CR3: 0000000315580002 CR4: 0000000000162ee0
[ 167.696033] Call Trace:
[ 167.696047] follow_pmd_mask+0x273/0x630
[ 167.696049] follow_page_mask+0x178/0x230
[ 167.696051] __get_user_pages+0xb8/0x740
[ 167.696052] get_user_pages+0x42/0x50
[ 167.696068] __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
[ 167.696079] ? mmu_set_spte+0x1dd/0x3a0 [kvm]
[ 167.696090] try_async_pf+0x66/0x220 [kvm]
[ 167.696101] tdp_page_fault+0x14b/0x2b0 [kvm]
[ 167.696104] ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
[ 167.696114] kvm_mmu_page_fault+0x62/0x180 [kvm]
[ 167.696117] handle_ept_violation+0xbc/0x160 [kvm_intel]
[ 167.696119] vmx_handle_exit+0xa5/0x580 [kvm_intel]
[ 167.696129] vcpu_enter_guest+0x414/0x1260 [kvm]
[ 167.696138] ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
[ 167.696148] kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[ 167.696157] ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
[ 167.696165] kvm_vcpu_ioctl+0x33a/0x610 [kvm]
[ 167.696166] ? do_futex+0x129/0x590
[ 167.696171] ? __switch_to+0x34c/0x4e0
[ 167.696174] ? __switch_to_asm+0x35/0x70
[ 167.696176] do_vfs_ioctl+0xa4/0x600
[ 167.696177] SyS_ioctl+0x79/0x90
[ 167.696180] ? exit_to_usermode_loop+0xa5/0xd0
[ 167.696181] do_syscall_64+0x73/0x130
[ 167.696182] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 167.696184] RIP: 0033:0x7f6a80482007
[ 167.696184] RSP: 002b:00007f6a7798b8b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 167.696185] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f6a80482007
[ 167.696185] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000016
[ 167.696186] RBP: 000055fe135f3240 R08: 000055fe118be530 R09: 0000000000000001
[ 167.696186] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 167.696187] R13: 00007f6a85852000 R14: 0000000000000000 R15: 000055fe135f3240
[ 167.696188] Code: 4d 63 e6 e9 f2 fc ff ff 4c 89 45 d0 48 8b 47 10 e8 22 f0 9e 00 4c 8b 45 d0 e9 89 fc ff ff 4c 89 e7 e8 81 3f fd ff e9 aa fc ff ff <0f> 0b 49 c7 c4 f4 ff ff ff e9 c1 fc ff ff 0f 1f 40 00 66 2e 0f
[ 167.696200] ---[ end trace 7573f6868ea8f069 ]---

[Fix]

This was fixed in 5.6-rc1 with the following commit:

commit 7df003c85218b5f5b10a7f6418208f31e813f38f
Author: Zhuang Yanying <email address hidden>
Date: Sat Oct 12 11:37:31 2019 +0800
Subject: KVM: fix overflow of zero page refcount with ksm running
Link: https://github.com/torvalds/linux/commit/7df003c85218b5f5b10a7f6418208f31e813f38f

The fix adds a check to see if the Page Frame Number (pfn) is linked to the zero page, and if it is, treats it as reserved. This has the effect that put_page() is no longer called on the zero_page, and reference counting is no longer needed.

This is a clean cherry pick to Bionic and Focal kernels.

[Testcase]

Create a new KVM host, and make sure it has plenty of ram. 16gb should be okay.

Install KVM packages:

$ sudo apt install -y qemu-kvm libvirt-bin qemu-utils genisoimage virtinst

Enable Kernel Samepage Mapping, and use_zero_pages:

$ echo 10000 | sudo tee /sys/kernel/mm/ksm/pages_to_scan
$ echo 1 | sudo tee /sys/kernel/mm/ksm/run
$ echo 1 | sudo tee /sys/kernel/mm/ksm/use_zero_pages

I wrote a script which creates and destroys xenial KVM VMs in a infinite loop:
https://paste.ubuntu.com/p/CvRTsDkdC7/

Save the script to disk, and execute it:

$ chmod +x ksm_refcnt_overflow.sh
$ ./ksm_refcnt_overflow.sh

Each time a VM is created and destroyed the reference counter will increase.

I wrote a kernel module which exposes a /proc interface, which we can use to look at the value of the zero_page reference counter. It works by taking the memory allocated for the zero page: empty_zero_page, which is defined in arch/x86/include/asm/pgtable.h, running virt_to_page() to get the page struct, which we can then dereference to get _refcount;

https://paste.ubuntu.com/p/MJMN8jMVds/

Save the module to disk, create its Makefile from the included documentation, and build it:

$ make
$ sudo insmod zero_page_refcount.ko

From there, we can examine the reference counter with:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x687 or 1671
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x846 or 2118
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x9f8 or 2552
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0xcb2 or 3250

We see it steadily increase. Instead of waiting months for it to overflow, I implemented a /proc entry to set it to near overflow. You can use it with:

$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1FFFFFFFFF000

After that, wait a few seconds and the reference counter will overflow:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff16 or 2147483414
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x80000000 or -2147483648

All VMs will become paused:

$ virsh list
Id Name State
----------------------------------------------------
1 instance-0 paused
2 instance-1 paused

QEMU will error out, and the kernel will oops with the messages in the impact section.

I built a test kernel, which is available here:

https://launchpad.net/~mruffell/+archive/ubuntu/sf290373-test

If you install the test kernel and try reproduce, you will notice the reference counter is never incremented past 1:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

This resolves the problem.

[Regression Potential]

While the change itself seems simple, it changes how the kernel treats the zero_page. The zero_page is important, since it is just a page full of 0's. Each time memory is allocated which is all 0s, the kernel sets it to use the zero_page to save memory. When an application writes to the buffer, a EPT violation happens, and the kernel does a COW to new pages to hold the data.

The change is limited to how the KVM subsystem handles the zero_page. This will not break the entire kernel if a regression occurs, only KVM.

If a regression were to occur, users could turn off KSM and disable KSM use_zero_pages until a fix is ready, as this particular use of zero_pages is limited to KSM.

The fix landed in upstream 5.6, and has not been backported to stable kernels.

I have read a bit of the paging code, especially around where the zero_page is used, and where its reference counters were being incorrectly incremented.

I think the fix is correct, and I believe it won't cause any regressions.

CVE References

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Pooja Ghumre (pooja-9) wrote :

kvm14:/usr/share/doc/qemu-system-x86$ zless changelog.Debian.gz | head
qemu (1:2.11+dfsg-1ubuntu7.10~cloud0) xenial-queens; urgency=medium

  * New update for the Ubuntu Cloud Archive.

 -- Openstack Ubuntu Testing Bot <email address hidden> Tue, 26 Feb 2019 04:25:15 +0000

tags: added: xenial
Revision history for this message
Kaustubh Phatak (kphatak-pf9) wrote :

Kernel version on the environment

```Linux kvm14.snn1.pf9.io 4.15.0-46-generic #49~16.04.1-Ubuntu SMP Tue Feb 12 17:45:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux```

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hello @pooja-9, @kphatak-pf9 and @vlee,

You wouldn't happen to have Kernel Samepage Merging (KSM) enabled on your compute nodes would you?

You can check by looking at the value of:

$ cat /sys/kernel/mm/ksm/run

If it is 1, your nodes have it enabled, and if it is 0 or "missing", you don't have it on.

We have just hit the problem, and I think I have found a fix for it. I will fix the 4.15 kernel once I have analysed the problem a bit more.

Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in linux (Ubuntu Focal):
assignee: nobody → Matthew Ruffell (mruffell)
summary: - qemu instance gets paused with error: kvm run failed Bad address
+ KVM: Fix zero_page reference counter overflow when using KSM on KVM
+ compute host
description: updated
tags: added: bionic focal sts
removed: xenial
Revision history for this message
Pooja Ghumre (pooja-9) wrote :

Thanks for fixing it @mruffell!

Yes, we did have KSM enabled on the hypervisor where we hit this issue.

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Attached is a script to create and destroy VMs in a loop, to try and increment the zero_page reference counter.

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Attached is a kernel module which lets you see the contents of the zero_page reference counter, and to set it to near overflow.

Ian May (ian-may)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Ian May (ian-may)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Matthew Ruffell (mruffell) wrote :
Download full text (3.7 KiB)

Verification steps for Bionic:

First, I made sure I could reproduce the problem on 4.15.0-115-generic.

I made a fresh Bionic VM, and copied over the ksm_refcnt_overflow.sh and zero_page_refcound.c files.

I built the kernel module, and inserted it into the kernel.

From there, I checked the zero_page reference counter.

$ sudo insmod zero_page_refcount.ko
[sudo] password for ubuntu:
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

From there, in another terminal, I ran the script ksm_refcnt_overflow.sh, and
checked to see VMs were running:

$ virsh list
 Id Name State
----------------------------------------------------
 1 instance-0 running
 2 instance-1 running
 3 instance-2 running
 4 instance-3 running
 5 instance-4 running

From there, we can see the reference counter increment:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1158 or 4440
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1622 or 5666
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x163a or 5690

I issued the set command, to get it ready to overflow:

$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1FFFFFFFFF000

I then checked and saw it overflow:

ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff27 or 2147483431
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff92 or 2147483538
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x80000000 or -2147483648

Instances became paused, and virtualisation broken:

$ virsh list
 Id Name State
----------------------------------------------------
 5 instance-4 paused
 6 instance-5 paused
 7 instance-6 paused
 8 instance-7 paused
 9 instance-0 paused
 10 instance-1 paused
 11 instance-2 paused
 12 instance-3 paused

From there, we see the usual call trace in dmesg:

https://paste.ubuntu.com/p/wpJkGCH3fJ/

I rebooted, and enabled -proposed. I then installed the 4.15.0-116-generic kernel, and rebooted again.

I rebuilt the zero_page_refcount kernel module with the new headers, and inserted it into the running kernel.

$ uname -rv
4.15.0-116-generic #117-Ubuntu SMP Fri Aug 28 16:04:22 UTC 2020
$ sudo insmod zero_page_refcount.ko
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

From there, I started the script ksm_refcnt_overflow.sh in another terminal.

We can see that VMs are running:

$ virsh list
 Id Name State
----------------------------------------------------
 1 instance-1 running
 2 instance-2 running
 3 instance-3 running
 4 instance-4 running

Checking the value of the zero_page reference counter:

$ cat /proc/zero_pa...

Read more...

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Verification steps for focal:

Again, I made sure I can reproduce on the existing 5.4.0-42-generic kernel.

I copied ksm_refcnt_overflow.sh and zero_page_refcount.c to the VM, and built the kernel module, and inserted it into the kernel:

$ sudo insmod zero_page_refcount.ko
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

From there, I started running the ksm_refcnt_script.sh in another terminal. I checked to ensure VMs were running:

$ virsh list
 Id Name State
----------------------------
 1 instance-0 running
 2 instance-1 running
 3 instance-2 running

From there, we can see the reference counter increment:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1bd9 or 7129
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1f9e or 8094
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1fb0 or 8112

From there, I set the reference counter in an attempt to make it overflow:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff15 or 2147483413
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x80000000 or -2147483648

From there, all vms became paused:

$ virsh list
 Id Name State
----------------------------
 137 instance-0 paused
 138 instance-1 paused
 139 instance-2 paused

We see the following oops in dmesg:

https://paste.ubuntu.com/p/3Dc73k9VYy/

I then rebooted the machine, enabled -proposed and installed 5.4.0-46-generic.

$ uname -rv
5.4.0-46-generic #50-Ubuntu SMP Fri Aug 28 15:33:36 UTC 2020

I rebooted, and built a new kernel module with the new headers, and inserted it into the running kernel:

$ sudo insmod zero_page_refcount.ko
[sudo] password for ubuntu:
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

Again, I started the ksm_refcnt_overflow.sh script in another terminal,
and checked to see that VMs were being created:

$ virsh list
 Id Name State
----------------------------
 1 instance-0 running
 2 instance-1 running

When we check the value of the reference counter, it is still 1 and not incrementing:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

When I attempt to trigger overflow:

$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1FFFFFFFFF000

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff00 or 2147483392
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7fffff00 or 2147483392

We never overflow. The problem is fixed. Marking the bug as verified for focal.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Matthew Ruffell (mruffell) wrote :

As requested by the kernel team (in https://lists.ubuntu.com/archives/kernel-team/2020-August/112775.html), I will do some additional testing for this SRU to really make sure it won't cause any regressions.

I provisioned a lab machine on segmaas, running Bionic. I installed the 4.15.0-116-generic kernel from -proposed on it.

I built the zero_page_refcount.c kernel module, and inserted it into the running kernel.

I then got ksm_refcnt_overflow.sh running in a screen session, creating and destroying virtual machines in an infinite loop.

This way we will know the code path has been exercised a fair amount.

I will leave this running creating and destroying virtual machines for a week or so, and I will report back with the results.

Revision history for this message
Matthew Ruffell (mruffell) wrote :

As promised, I have an update on the lab machine I left running ksm_refcnt_overflow.sh for a week straight.

The machine was running 4.15.0-116-generic from -proposed:

$ uname -rv
4.15.0-116-generic #117-Ubuntu SMP Fri Aug 28 16:04:22 UTC 2020
$ uptime
 04:36:14 up 7 days, 1 min, 1 user, load average: 3.47, 3.14, 2.97

In that time it has created and destroyed 32,950 virtual machines:

$ virsh list
 Id Name State
----------------------------------------------------
 32945 instance-0 running
 32946 instance-1 running
 32947 instance-2 running
 32948 instance-3 running
 32949 instance-4 running

If we look at the current value of the reference counter, it is still set to 1:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

I checked /var/log/kern.log, /var/log/syslog and journalctl, there are no oops messages, and the KVM subsystem is stable.

I am shutting the lab machine down now, as I am convinced the patch is stable. This SRU is still verified.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.2 KiB)

This bug was fixed in the package linux - 4.15.0-118.119

---------------
linux (4.15.0-118.119) bionic; urgency=medium

  * bionic/linux: 4.15.0-118.119 -proposed tracker (LP: #1894697)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for nvidia 450 and 450-server

  * cgroup refcount is bogus when cgroup_sk_alloc is disabled (LP: #1886860)
    - cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone()

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * [Hyper-V] VSS and File Copy daemons intermittently fails to start
    (LP: #1891224)
    - [Packaging] Bind hv_vss_daemon startup to hv_vss device
    - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * KVM: Fix zero_page reference counter overflow when using KSM on KVM compute
    host (LP: #1837810)
    - KVM: fix overflow of zero page refcount with ksm running

  * Fix false-negative return value for rtnetlink.sh in kselftests/net
    (LP: #1890136)
    - selftests: rtnetlink: correct the final return value for the test
    - selftests: rtnetlink: make kci_test_encap() return sub-test result

  * Bionic update: upstream stable patchset 2020-08-18 (LP: #1892091)
    - USB: serial: qcserial: add EM7305 QDL product ID
    - USB: iowarrior: fix up report size handling for some devices
    - usb: xhci: define IDs for various ASMedia host controllers
    - usb: xhci: Fix ASMedia ASM1142 DMA addressing
    - Revert "ALSA: hda: call runtime_allow() for all hda controllers"
    - ALSA: seq: oss: Serialize ioctls
    - staging: android: ashmem: Fix lockdep warning for write operation
    - Bluetooth: Fix slab-out-of-bounds read in hci_extended_inquiry_result_evt()
    - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_evt()
    - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_with_rssi_evt()
    - omapfb: dss: Fix max fclk divider for omap36xx
    - binder: Prevent context manager from incrementing ref 0
    - vgacon: Fix for missing check in scrollback handling
    - mtd: properly check all write ioctls for permissions
    - leds: wm831x-status: fix use-after-free on unbind
    - leds: da903x: fix use-after-free on unbind
    - leds: lm3533: fix use-after-free on unbind
    - leds: 88pm860x: fix use-after-free on unbind
    - net/9p: validate fds in p9_fd_open
    - drm/nouveau/fbcon: fix module unload when fbcon init has failed for some
      reason
    - drm/nouveau/fbcon: zero-initialise the mode_cmd2 structure
    - i2c: slave: improve sanity check when registering
    - i2c: slave: add sanity check when unregistering
    - usb: hso: check for return value in hso_serial_common_create()
    - firmware: Fix a reference count leak.
    - cfg80211: check vendor command doit pointer before use
    - igb: reinit_locked() should be called with rtnl_lock
    - atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent
    - tools lib traceevent: Fix memory leak in process_dynamic...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (42.6 KiB)

This bug was fixed in the package linux - 5.4.0-48.52

---------------
linux (5.4.0-48.52) focal; urgency=medium

  * focal/linux: 5.4.0-48.52 -proposed tracker (LP: #1894654)

  * mm/slub kernel oops on focal kernel 5.4.0-45 (LP: #1895109)
    - SAUCE: Revert "mm/slub: fix a memory leak in sysfs_slab_add()"

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for nvidia 450 and 450-server

  * [UBUNTU 20.04] zPCI attach/detach issues with PF/VF linking support
    (LP: #1892849)
    - s390/pci: fix zpci_bus_link_virtfn()
    - s390/pci: re-introduce zpci_remove_device()
    - s390/pci: fix PF/VF linking on hot plug

  * [UBUNTU 20.04] kernel: s390/cpum_cf,perf: changeDFLT_CCERROR counter name
    (LP: #1891454)
    - s390/cpum_cf, perf: change DFLT_CCERROR counter name

  * [UBUNTU 20.04] zPCI: Enabling of a reserved PCI function regression
    introduced by multi-function support (LP: #1891437)
    - s390/pci: fix enabling a reserved PCI function

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * [Hyper-V] VSS and File Copy daemons intermittently fails to start
    (LP: #1891224)
    - [Packaging] Bind hv_vss_daemon startup to hv_vss device
    - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * alsa/hdmi: support nvidia mst hdmi/dp audio (LP: #1867704)
    - ALSA: hda - Rename snd_hda_pin_sense to snd_hda_jack_pin_sense
    - ALSA: hda - Add DP-MST jack support
    - ALSA: hda - Add DP-MST support for non-acomp codecs
    - ALSA: hda - Add DP-MST support for NVIDIA codecs
    - ALSA: hda: hdmi - fix regression in connect list handling
    - ALSA: hda: hdmi - fix kernel oops caused by invalid PCM idx
    - ALSA: hda: hdmi - preserve non-MST PCM routing for Intel platforms
    - ALSA: hda: hdmi - Keep old slot assignment behavior for Intel platforms
    - ALSA: hda - Fix DP-MST support for NVIDIA codecs

  * Focal update: v5.4.60 upstream stable release (LP: #1892899)
    - smb3: warn on confusing error scenario with sec=krb5
    - genirq/affinity: Make affinity setting if activated opt-in
    - genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()
    - PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()
    - PCI: Add device even if driver attach failed
    - PCI: qcom: Define some PARF params needed for ipq8064 SoC
    - PCI: qcom: Add support for tx term offset for rev 2.1.0
    - btrfs: allow use of global block reserve for balance item deletion
    - btrfs: free anon block device right after subvolume deletion
    - btrfs: don't allocate anonymous block device for user invisible roots
    - btrfs: ref-verify: fix memory leak in add_block_entry
    - btrfs: stop incremening log_batch for the log root tree when syncing log
    - btrfs: remove no longer needed use of log_writers for the log root tree
    - btrfs: don't traverse into the seed devices in show_devname
    - btrfs: open device...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
norman shen (jshen28) wrote :
Download full text (6.2 KiB)

Interestingly, I hit this warning log without enabling ksm

```console
# cat /sys/kernel/mm/ksm/run
0
# uname -a
Linux compute12 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
```

log is

[Sat May 15 11:28:32 2021] WARNING: CPU: 31 PID: 3196546 at /build/linux-E6MDAa/linux-4.15.0/include/linux/mm.h:857 follow_page_pte+0x663/0x6d0
[Sat May 15 11:28:32 2021] Modules linked in: nls_iso8859_1 act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb ip6table_raw xt_CT xt_mac vhost_net vhost tap ebtable_filter ebtables ip6table_filter devlink vxlan ip6_udp_tunnel udp_tunnel ip_gre gre xt_multiport xt_set iptable_raw iptable_mangle ip_set_hash_net ip_set_hash_ip ip_set ipip tunnel4 ip_tunnel veth xt_statistic xt_physdev xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_addrtype ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6table_nat ip6_tables xt_comment xt_mark iptable_filter xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat aufs rbd libceph overlay openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat bonding dm_service_time dm_multipath
[Sat May 15 11:28:32 2021] scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl skx_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass intel_cstate intel_rapl_perf ipmi_ssif ioatdma joydev input_leds acpi_power_meter mei_me mei shpchp mac_hid ipmi_si ipmi_devintf ipmi_msghandler lpc_ich sch_fq_codel nf_conntrack ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi br_netfilter bridge stp llc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas hid_generic crct10dif_pclmul crc32_pclmul usbhid ghash_clmulni_intel hid pcbc lpfc aesni_intel aes_x86_64 nvmet_fc crypto_simd ast glue_helper nvmet cryptd nvme_fc ttm nvme_fabrics
[Sat May 15 11:28:32 2021] igb nvme_core drm_kms_helper dca scsi_transport_fc syscopyarea i2c_algo_bit sysfillrect sysimgblt i40e aacraid fb_sys_fops drm ptp pps_core ahci libahci wmi
[Sat May 15 11:28:32 2021] CPU: 31 PID: 3196546 Comm: CPU 2/KVM Not tainted 4.15.0-72-generic #81-Ubuntu
[Sat May 15 11:28:32 2021] Hardware name: Inspur NF5280M5/YZMB-00882-104, BIOS 4.0.8 10/17/2018
[Sat May 15 11:28:32 2021] RIP: 0010:follow_page_pte+0x663/0x6d0
[Sat May 15 11:28:32 2021] RSP: 0018:ffffb1eff4e5b8f8 EFLAGS: 00010286
[Sat May 15 11:28:32 2021] RAX: ffffe041b58cba40 RBX: ffffe043fed90cf0 RCX: 0000000080000000
[Sat May 15 11:28:32 2021] RDX: ffffe041b58cba40 RSI: 00007f7306766000 RDI: 8000000d632e9225
[Sat May 15 11:28:32 2021] RBP: ffffb1eff4e5b960 R08: 8000000d632e9225 R09: ffffa0249cceb1e0
[Sat May 15 11:28:32 2021] R10: 0000000000000000 R11: ffffb1eff4e5ba8c R12: ffffe041b58cba40
[Sat May 15 11:28:32 2021] R13: 00003ffffffff000 R14: 0000000000000326 R15: ffffa076af75a198
[Sat May 15 11:28:32 2021]...

Read more...

Revision history for this message
Matthew Ruffell (mruffell) wrote :
Download full text (3.6 KiB)

Hi Jiatong,

Thanks for emailing me, happy to answer questions anytime.

> 1. why linux-hwe-4.15.0 source code is used?

If you look closely at the oops in the description, the customer I was working with was running:

4.15.0-106-generic #107~16.04.1-Ubuntu

This is the Xenial (16.04) HWE kernel. I was using the linux-hwe-4.15.0 source code to make sure the debug symbols used for the debug symbol package matched exactly.

In your case:

4.15.0-72-generic #81-Ubuntu

you are running the 4.15 kernel on normal Bionic (18.04), so we can use the normal linux-4.15.0 source code.

> 2. we are using linux-4.15.0-unsigned and by skimming through the source code, looks like try_get_page is not defined at that time?

Yes! You are correct, the original mainline 4.15 kernel did not have try_get_page() defined at:

https://elixir.bootlin.com/linux/v4.15/source/mm/gup.c#L156

But if you look closely at the actual kernel sources for 4.15.0-72-generic:

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/mm/gup.c?h=Ubuntu-4.15.0-72.81#n156

We see that try_get_page() is there. That is because we backported:

commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64
Author: Linus Torvalds <email address hidden>
Date: Thu Apr 11 10:49:19 2019 -0700
Subject: mm: prevent get_user_pages() from overflowing page refcount
Link:https://github.com/torvalds/linux/commit/8fde12ca79aff9b5ba951fce1a2641901b8d8e64

Ubuntu 4.15 backport link: https://paste.ubuntu.com/p/2bF5WWQy2r/

That commit first turned up in 4.15.0-59-generic, via upstream-stable.

Anyway, let's have a look at your stack trace:

4.15.0-72-generic #81-Ubuntu
RIP: 0010:follow_page_pte+0x663/0x6d0

I downloaded the debug symbols:

http://ddebs.ubuntu.com/ubuntu/pool/main/l/linux/linux-image-unsigned-4.15.0-72-generic-dbgsym_4.15.0-72.81_amd64.ddeb

Extracted them:

dpkg -x linux-image-unsigned-4.15.0-72-generic-dbgsym_4.15.0-72.81_amd64.ddeb debug

and looked up:

$ eu-addr2line -e ./vmlinux-4.15.0-72-generic -f follow_page_pte+0x663
try_get_page inlined at /build/linux-E6MDAa/linux-4.15.0/mm/gup.c:156 in follow_page_pte
/build/linux-E6MDAa/linux-4.15.0/mm/gup.c:138

We see that you hit try_get_page() in mm/gup.c:156

 155 if (flags & FOLL_GET) {
 156 if (unlikely(!try_get_page(page))) {
 157 page = ERR_PTR(-ENOMEM);
 158 goto out;
 159 }

Looking at try_get_page() in include/linux/mm.h:

 854 static inline __must_check bool try_get_page(struct page *page)
 855 {
 856 page = compound_head(page);
 857 if (WARN_ON_ONCE(page_ref_count(page) <= 0))
 858 return false;
 859 page_ref_inc(page);
 860 return true;
 861 }

We see that you hit the exact same WARN_ON_ONCE for the page_ref_count(page) <= 0).

So, whatever page you are trying to access, has its reference counter in the negatives, which suggests that has either wrapped around, or has been decremented too many times.

Looking at your error log, I can't tell for sure if it is the zero_page, but its quite likely going to be. The zero_page is a frequently used page in the system, and it is used outside of ksm, it's just that ksm is a heavy user of the zero_page...

Read more...

Revision history for this message
norman shen (jshen28) wrote :

Thank you very much for the reply. Another question is try_get_page returns -ENOMEM but kvm warning is bad address which should be EFAULT. Why qemu prints error log says bad address?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.