raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

Bug #1896578 reported by Matthew Ruffell
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Matthew Ruffell
Bionic
Fix Released
Medium
Matthew Ruffell
Focal
Fix Released
Medium
Matthew Ruffell
Groovy
Fix Released
Medium
Matthew Ruffell
Hirsute
Fix Released
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/1896578

[Impact]

Block discard is very slow on Raid10, which causes common use cases which invoke block discard, such as mkfs and fstrim operations, to take a very long time.

For example, on a i3.8xlarge instance on AWS, which has 4x 1.9TB NVMe devices which support block discard, a mkfs.xfs operation on Raid 10 takes between 8 to 11 minutes, where the same mkfs.xfs operation on Raid 0, takes 4 seconds.

The bigger the devices, the longer it takes.

The cause is that Raid10 currently uses a 512k chunk size, and uses this for the discard_max_bytes value. If we need to discard 1.9TB, the kernel splits the request into millions of 512k bio requests, even if the underlying device supports larger requests.

For example, the NVMe devices on i3.8xlarge support 2.2TB of discard at once:

$ cat /sys/block/nvme0n1/queue/discard_max_bytes
2199023255040
$ cat /sys/block/nvme0n1/queue/discard_max_hw_bytes
2199023255040

Where the Raid10 md device only supports 512k:

$ cat /sys/block/md0/queue/discard_max_bytes
524288
$ cat /sys/block/md0/queue/discard_max_hw_bytes
524288

If we perform a mkfs.xfs operation on the /dev/md array, it takes over 11 minutes and if we examine the stack, it is stuck in blkdev_issue_discard()

$ sudo cat /proc/1626/stack
[<0>] wait_barrier+0x14c/0x230 [raid10]
[<0>] regular_request_wait+0x39/0x150 [raid10]
[<0>] raid10_write_request+0x11e/0x850 [raid10]
[<0>] raid10_make_request+0xd7/0x150 [raid10]
[<0>] md_handle_request+0x123/0x1a0
[<0>] md_submit_bio+0xda/0x120
[<0>] __submit_bio_noacct+0xde/0x320
[<0>] submit_bio_noacct+0x4d/0x90
[<0>] submit_bio+0x4f/0x1b0
[<0>] __blkdev_issue_discard+0x154/0x290
[<0>] blkdev_issue_discard+0x5d/0xc0
[<0>] blk_ioctl_discard+0xc4/0x110
[<0>] blkdev_common_ioctl+0x56c/0x840
[<0>] blkdev_ioctl+0xeb/0x270
[<0>] block_ioctl+0x3d/0x50
[<0>] __x64_sys_ioctl+0x91/0xc0
[<0>] do_syscall_64+0x38/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[Fix]

Xiao Ni has developed a patchset which resolves the block discard performance problems. These commits have now landed in 5.13-rc1.

commit cf78408f937a67f59f5e90ee8e6cadeed7c128a8
Author: Xiao Ni <email address hidden>
Date: Thu Feb 4 15:50:43 2021 +0800
Subject: md: add md_submit_discard_bio() for submitting discard bio
Link: https://github.com/torvalds/linux/commit/cf78408f937a67f59f5e90ee8e6cadeed7c128a8

commit c2968285925adb97b9aa4ede94c1f1ab61ce0925
Author: Xiao Ni <email address hidden>
Date: Thu Feb 4 15:50:44 2021 +0800
Subject: md/raid10: extend r10bio devs to raid disks
Link: https://github.com/torvalds/linux/commit/c2968285925adb97b9aa4ede94c1f1ab61ce0925

commit f2e7e269a7525317752d472bb48a549780e87d22
Author: Xiao Ni <email address hidden>
Date: Thu Feb 4 15:50:45 2021 +0800
Subject: md/raid10: pull the code that wait for blocked dev into one function
Link: https://github.com/torvalds/linux/commit/f2e7e269a7525317752d472bb48a549780e87d22

commit d30588b2731fb01e1616cf16c3fe79a1443e29aa
Author: Xiao Ni <email address hidden>
Date: Thu Feb 4 15:50:46 2021 +0800
Subject: md/raid10: improve raid10 discard request
Link: https://github.com/torvalds/linux/commit/d30588b2731fb01e1616cf16c3fe79a1443e29aa

commit 254c271da0712ea8914f187588e0f81f7678ee2f
Author: Xiao Ni <email address hidden>
Date: Thu Feb 4 15:50:47 2021 +0800
Subject: md/raid10: improve discard request for far layout
Link: https://github.com/torvalds/linux/commit/254c271da0712ea8914f187588e0f81f7678ee2f

There is also an additional commit which is required, and was merged after "md/raid10: improve raid10 discard request" was merged. The following commit enables Radid10 to use large discards, instead of splitting into many bios, since the technical hurdles have now been removed.

commit ca4a4e9a55beeb138bb06e3867f5e486da896d44
Author: Mike Snitzer <email address hidden>
Date: Fri Apr 30 14:38:37 2021 -0400
Subject: dm raid: remove unnecessary discard limits for raid0 and raid10
Link: https://github.com/torvalds/linux/commit/ca4a4e9a55beeb138bb06e3867f5e486da896d44

The commits more or less cherry pick to the 5.11, 5.8, 5.4 and 4.15 kernels, with the following minor backports:

1) submit_bio_noacct() needed to be renamed to generic_make_request() since it was recently changed in:

commit ed00aabd5eb9fb44d6aff1173234a2e911b9fead
Author: Christoph Hellwig <email address hidden>
Date: Wed Jul 1 10:59:44 2020 +0200
Subject: block: rename generic_make_request to submit_bio_noacct
Link: https://github.com/torvalds/linux/commit/ed00aabd5eb9fb44d6aff1173234a2e911b9fead

2) In the 4.15, 5.4 and 5.8 kernels, trace_block_bio_remap() needs to have its request_queue argument put back in place. It was recently removed in:

commit 1c02fca620f7273b597591065d366e2cca948d8f
Author: Christoph Hellwig <email address hidden>
Date: Thu Dec 3 17:21:38 2020 +0100
Subject: block: remove the request_queue argument to the block_bio_remap tracepoint
Link: https://github.com/torvalds/linux/commit/1c02fca620f7273b597591065d366e2cca948d8f

3) bio_split(), mempool_alloc(), bio_clone_fast() all needed their "address of" '&' removed for one of their arguments for the 4.15 kernel, due to changes made in:

commit afeee514ce7f4cab605beedd03be71ebaf0c5fc8
Author: Kent Overstreet <email address hidden>
Date: Sun May 20 18:25:52 2018 -0400
Subject: md: convert to bioset_init()/mempool_init()
Link: https://github.com/torvalds/linux/commit/afeee514ce7f4cab605beedd03be71ebaf0c5fc8

4) The 4.15 kernel does not need "dm raid: remove unnecessary discard limits for raid0 and raid10" due to not having the following commit, which was merged in 5.1-rc1:

commit 61697a6abd24acba941359c6268a94f4afe4a53d
Author: Mike Snitzer <email address hidden>
Date: Fri Jan 18 14:19:26 2019 -0500
Subject: dm: eliminate 'split_discard_bios' flag from DM target interface
Link: https://github.com/torvalds/linux/commit/61697a6abd24acba941359c6268a94f4afe4a53d

5) The 4.15 kernel needed bio_clone_blkg_association() to be renamed to bio_clone_blkcg_association() due to it changing in:

commit db6638d7d177a8bc74c9e539e2e0d7d061c767b1
Author: Dennis Zhou <email address hidden>
Date: Wed Dec 5 12:10:35 2018 -0500
Subject: blkcg: remove bio->bi_css and instead use bio->bi_blkg
https://github.com/torvalds/linux/commit/db6638d7d177a8bc74c9e539e2e0d7d061c767b1

[Testcase]

You will need a machine with at least 4x NVMe drives which support block discard. I use a i3.8xlarge instance on AWS, since it has all of these things.

$ lsblk
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
nvme0n1 259:2 0 1.7T 0 disk
nvme1n1 259:0 0 1.7T 0 disk
nvme2n1 259:1 0 1.7T 0 disk
nvme3n1 259:3 0 1.7T 0 disk

Create a Raid10 array:

$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1

Format the array with XFS:

$ time sudo mkfs.xfs /dev/md0
real 11m14.734s

$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk

Optional, do a fstrim:

$ time sudo fstrim /mnt/disk

real 11m37.643s

There are test kernels for 5.8, 5.4 and 4.15 available in the following PPA:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test

If you install a test kernel, we can see that performance dramatically improves:

$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1

$ time sudo mkfs.xfs /dev/md0
real 0m4.226s
user 0m0.020s
sys 0m0.148s

$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ time sudo fstrim /mnt/disk

real 0m1.991s
user 0m0.020s
sys 0m0.000s

The patches bring mkfs.xfs from 11 minutes down to 4 seconds, and fstrim
from 11 minutes to 2 seconds.

Performance Matrix (AWS i3.8xlarge):

Kernel | mkfs.xfs | fstrim
---------------------------------
4.15 | 7m23.449s | 7m20.678s
5.4 | 8m23.219s | 8m23.927s
5.8 | 2m54.990s | 8m22.010s
4.15-test | 0m4.286s | 0m1.657s
5.4-test | 0m6.075s | 0m3.150s
5.8-test | 0m2.753s | 0m2.999s

The test kernel also changes the discard_max_bytes to the underlying hardware limit:

$ cat /sys/block/md0/queue/discard_max_bytes
2199023255040

[Where problems can occur]

A problem has occurred once before, with the previous revision of this patchset. This has been documented in bug 1907262, and caused a worst case scenario of data loss for some users, in this particular case, on the second and onward disks. This was due to two two faults: the first, incorrectly calculating the start offset for block discard for the second and extra disks. The second bug was an incorrect stripe size for far layouts.

The kernel team was forced to revert the patches in an emergency and the faulty kernel was removed from the archive, and community users urged to avoid the faulty kernel.

These bugs and a few other minor issues have now been corrected, and we have been testing the new patches since mid February. The patches have been tested against the testcase in bug 1907262 and do not cause the disks to become corrupted.

The regression potential is still the same for this patchset though. If a regression were to occur, it could lead to data loss on Raid10 arrays backed by NVMe or SSD disks that support block discard.

If a regression happens, users need to disable the fstrim systemd service as soon as possible, plan an emergency maintenance window, and downgrade the kernel to a previous release, or upgrade to a corrected kernel.

Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Groovy):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux (Ubuntu Groovy):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in linux (Ubuntu Focal):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in linux (Ubuntu Groovy):
assignee: nobody → Matthew Ruffell (mruffell)
tags: added: sts
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Ian May (ian-may)
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Ian May (ian-may)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Ian May (ian-may)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Groovy.

I enabled -proposed and installed 5.8.0-30-generic to a i3.8xlarge AWS instance.

From there, I followed the testcase steps:

$ uname -rv
5.8.0-30-generic #32-Ubuntu SMP Mon Nov 9 21:03:15 UTC 2020
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
nvme0n1 259:0 0 1.7T 0 disk
nvme1n1 259:1 0 1.7T 0 disk
nvme3n1 259:2 0 1.7T 0 disk
nvme2n1 259:3 0 1.7T 0 disk
$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855336448K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
$ time sudo mkfs.xfs /dev/md0
log stripe unit (524288 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/md0 isize=512 agcount=32, agsize=28989568 blks
         = sectsz=512 attr=2, projid32bit=1
         = crc=1 finobt=1, sparse=1, rmapbt=0
         = reflink=1
data = bsize=4096 blocks=927666176, imaxpct=5
         = sunit=128 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=452968, version=2
         = sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Discarding blocks...Done.

real 0m4.413s
user 0m0.022s
sys 0m0.245s
$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ time sudo fstrim /mnt/disk

real 0m1.973s
user 0m0.000s
sys 0m0.037s

We can see that mkfs.xfs took 4.4 seconds, and fstrim only 2 seconds. This is a significant improvement over the current 11 minutes.

I started up a c5.large instance, and attached 4x EBS drives, which do not support block discard, and went through the testcase steps. Everything worked fine, and the changes have not caused any regressions to disks which do not support block discard.

I also started another i3.8xlarge instance and tested raid0, to check for regressions around the refactoring. raid0 deployed fine, and was as performant as usual.

The 5.8.0-30-generic kernel in -proposed fixes the issue, and I am happy to mark as verified.

tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Focal.

I enabled -proposed and installed 5.4.0-55-generic to a i3.8xlarge AWS instance.

From there, I followed the testcase steps:

$ uname -rv
5.4.0-55-generic #61-Ubuntu SMP Mon Nov 9 20:49:56 UTC 2020
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
nvme0n1 259:0 0 1.7T 0 disk
nvme1n1 259:1 0 1.7T 0 disk
nvme3n1 259:2 0 1.7T 0 disk
nvme2n1 259:3 0 1.7T 0 disk
$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855336448K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
$ time sudo mkfs.xfs /dev/md0
log stripe unit (524288 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/md0 isize=512 agcount=32, agsize=28989568 blks
         = sectsz=512 attr=2, projid32bit=1
         = crc=1 finobt=1, sparse=1, rmapbt=0
         = reflink=1
data = bsize=4096 blocks=927666176, imaxpct=5
         = sunit=128 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=452968, version=2
         = sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

real 0m5.350s
user 0m0.022s
sys 0m0.179s
$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ time sudo fstrim /mnt/disk

real 0m2.944s
user 0m0.006s
sys 0m0.013s

We can see that mkfs.xfs took 5.3 seconds, and fstrim only 3 seconds. This is a significant improvement over the current 11 minutes.

I started up a c5.large instance, and attached 4x EBS drives, which do not support block discard, and went through the testcase steps. Everything worked fine, and the changes have not caused any regressions to disks which do not support block discard.

I also started another i3.8xlarge instance and tested raid0, to check for regressions around the refactoring. raid0 deployed fine, and was as performant as usual.

The 5.4.0-55-generic kernel in -proposed fixes the issue, and I am happy to mark as verified.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Bionic.

I enabled -proposed and installed 4.15.0-125-generic to a i3.8xlarge AWS instance.

From there, I followed the testcase steps:

$ uname -rv
4.15.0-125-generic #128-Ubuntu SMP Mon Nov 9 20:51:00 UTC 2020
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
nvme0n1 259:0 0 1.7T 0 disk
nvme1n1 259:1 0 1.7T 0 disk
nvme2n1 259:2 0 1.7T 0 disk
nvme3n1 259:3 0 1.7T 0 disk
$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 1855336448K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
$ time sudo mkfs.xfs /dev/md0
meta-data=/dev/md0 isize=512 agcount=32, agsize=28989568 blks
         = sectsz=512 attr=2, projid32bit=1
         = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0
data = bsize=4096 blocks=927666176, imaxpct=5
         = sunit=128 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=452968, version=2
         = sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

real 0m3.615s
user 0m0.002s
sys 0m0.179s
$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ time sudo fstrim /mnt/disk

real 0m1.898s
user 0m0.002s
sys 0m0.015s

We can see that mkfs.xfs took 3.6 seconds, and fstrim only 2 seconds. This is a significant improvement over the current 11 minutes.

I started up a c5.large instance, and attached 4x EBS drives, which do not support block discard, and went through the testcase steps. Everything worked fine, and the changes have not caused any regressions to disks which do not support block discard.

I also started another i3.8xlarge instance and tested raid0, to check for regressions around the refactoring. raid0 deployed fine, and was as performant as usual.

The 4.15.0-125-generic kernel in -proposed fixes the issue, and I am happy to mark as verified.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (78.9 KiB)

This bug was fixed in the package linux - 5.4.0-56.62

---------------
linux (5.4.0-56.62) focal; urgency=medium

  * focal/linux: 5.4.0-56.62 -proposed tracker (LP: #1905300)

  * CVE-2020-4788
    - selftests/powerpc: rfi_flush: disable entry flush if present
    - powerpc/64s: flush L1D on kernel entry
    - powerpc/64s: flush L1D after user accesses
    - selftests/powerpc: entry flush test

linux (5.4.0-55.61) focal; urgency=medium

  * focal/linux: 5.4.0-55.61 -proposed tracker (LP: #1903175)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
    - [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
    - [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #1899993)
    - efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * CVE-2020-14351
    - perf/core: Fix race in the perf_mmap_close() function

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull codes that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout
    - dm raid: fix discard limits for raid1 and raid10
    - dm raid: remove unnecessary discard limits for raid10

  * Bionic: btrfs: kernel BUG at /build/linux-
    eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
    - btrfs: drop unnecessary offset_in_page in extent buffer helpers
    - btrfs: extent_io: do extra check for extent buffer read write functions
    - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
    - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
    - btrfs: ctree: check key order before merging tree blocks

  * Ethernet no link lights after reboot (Intel i225-v 2.5G) (LP: #1902578)
    - igc: Add PHY power management control

  * Undetected Data corruption in MPI workloads that use VSX for reductions on
    POWER9 DD2.1 systems (LP: #1902694)
    - powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
    - selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load
      workaround

  * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179)
    - s390: nvme ipl
    - s390: nvme reipl
    - s390/ipl: support NVMe IPL kernel parameters

  * uvcvideo: add mapping for HEVC payloads (LP: #1895803)
    - media: uvcvideo: Add mapping for HEVC payloads

  * Focal update: v5.4.73 upstream stable release (LP: #1902115)
    - ibmveth: Switch order of ibmveth_helper calls.
    - ibmveth: Identify ingress large send packets.
    - ipv4: Restore flowi4_oif update before call to xfrm_lookup_route
    - mlx4: handle non-napi callers to napi_poll
    - net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()
    - net: fec: Fix PHY init after phy_reset_after_clk_enable()
    - net: fix pos incrementment in ipv6_route_seq_next
    - net/smc: fix valid DMBE buffer sizes
    - net...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (50.5 KiB)

This bug was fixed in the package linux - 5.8.0-31.33

---------------
linux (5.8.0-31.33) groovy; urgency=medium

  * groovy/linux: 5.8.0-31.33 -proposed tracker (LP: #1905299)

  * Groovy 5.8 kernel hangs on boot on CPUs with eLLC (LP: #1903397)
    - drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup
      during fbdev init

  * CVE-2020-4788
    - selftests/powerpc: rfi_flush: disable entry flush if present
    - powerpc/64s: flush L1D on kernel entry
    - powerpc/64s: flush L1D after user accesses
    - selftests/powerpc: entry flush test

linux (5.8.0-30.32) groovy; urgency=medium

  * groovy/linux: 5.8.0-30.32 -proposed tracker (LP: #1903194)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
    - [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
    - [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #1899993)
    - efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull codes that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout
    - dm raid: fix discard limits for raid1 and raid10
    - dm raid: remove unnecessary discard limits for raid10

  * Bionic: btrfs: kernel BUG at /build/linux-
    eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
    - btrfs: extent_io: do extra check for extent buffer read write functions
    - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
    - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
    - btrfs: ctree: check key order before merging tree blocks

  * Tiger Lake PMC core driver fixes (LP: #1899883)
    - platform/x86: intel_pmc_core: update TGL's LPM0 reg bit map name
    - platform/x86: intel_pmc_core: fix bound check in pmc_core_mphy_pg_show()
    - platform/x86: pmc_core: Use descriptive names for LPM registers
    - platform/x86: intel_pmc_core: Fix TigerLake power gating status map
    - platform/x86: intel_pmc_core: Fix the slp_s0 counter displayed value

  * drm/i915/dp_mst - System would hang during the boot up. (LP: #1902469)
    - Revert "UBUNTU: SAUCE: drm/i915/display: Fix null deref in
      intel_psr_atomic_check()"
    - drm/i915: Fix encoder lookup during PSR atomic check

  * Undetected Data corruption in MPI workloads that use VSX for reductions on
    POWER9 DD2.1 systems (LP: #1902694)
    - powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
    - selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load
      workaround

  * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179)
    - s390/ipl: support NVMe IPL kernel parameters

  * uvcvideo: add mapping for HEVC payloads (LP: #1895803)
    - media: uvcvideo: Add mapping for HEVC payloads

  * risc-v 5.8 ...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (33.1 KiB)

This bug was fixed in the package linux - 4.15.0-126.129

---------------
linux (4.15.0-126.129) bionic; urgency=medium

  * bionic/linux: 4.15.0-126.129 -proposed tracker (LP: #1905305)

  * CVE-2020-4788
    - SAUCE: powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL
    - SAUCE: powerpc/64s: move some exception handlers out of line
    - powerpc/64s: flush L1D on kernel entry
    - SAUCE: powerpc: Add a framework for user access tracking
    - powerpc: Implement user_access_begin and friends
    - powerpc: Fix __clear_user() with KUAP enabled
    - powerpc/uaccess: Evaluate macro arguments once, before user access is
      allowed
    - powerpc/64s: flush L1D after user accesses

linux (4.15.0-125.128) bionic; urgency=medium

  * bionic/linux: 4.15.0-125.128 -proposed tracker (LP: #1903137)

  * Update kernel packaging to support forward porting kernels (LP: #1902957)
    - [Debian] Update for leader included in BACKPORT_SUFFIX

  * Avoid double newline when running insertchanges (LP: #1903293)
    - [Packaging] insertchanges: avoid double newline

  * EFI: Fails when BootCurrent entry does not exist (LP: #1899993)
    - efivarfs: Replace invalid slashes with exclamation marks in dentries.

  * CVE-2020-14351
    - perf/core: Fix race in the perf_mmap_close() function

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull codes that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout

  * Bionic: btrfs: kernel BUG at /build/linux-
    eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254)
    - btrfs: use offset_in_page instead of open-coding it
    - btrfs: use BUG() instead of BUG_ON(1)
    - btrfs: drop unnecessary offset_in_page in extent buffer helpers
    - btrfs: extent_io: do extra check for extent buffer read write functions
    - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
    - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref()
    - btrfs: ctree: check key order before merging tree blocks

  * Bionic update: upstream stable patchset 2020-11-04 (LP: #1902943)
    - USB: gadget: f_ncm: Fix NDP16 datagram validation
    - gpio: tc35894: fix up tc35894 interrupt configuration
    - vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
    - vsock/virtio: stop workers during the .remove()
    - vsock/virtio: add transport parameter to the
      virtio_transport_reset_no_sock()
    - net: virtio_vsock: Enhance connection semantics
    - Input: i8042 - add nopnp quirk for Acer Aspire 5 A515
    - ftrace: Move RCU is watching check after recursion check
    - drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
    - drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
    - drm/sun4i: mixer: Extend regmap max_register
    - net: dec: de2104x: Increase receive ring size for Tulip
    - rndis_host: increase sleep time in the query-response loop
    - nvme-core: get/put ctrl ...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Markus Schade (lp-markusschade) wrote :

On focal (5.4.0-56-generic) we are starting to see massive file system corruptions on systems updated to this kernel version.
These systems are using LVM with discards and thin provisioning on 6 or 8 NVMe drives in a RAID10 near configuration. We are currently downgrading all systems back to 5.4.0-54-generic and hope we can provide a simple reproducer

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Markus,

I am deeply sorry for causing the regression. We are aware, and tracking the issue in bug 1907262.

The kernel team have started an emergency revert and you can expect fixed kernels to be released in the next day or so.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.8.0-36.40+21.04.1

---------------
linux (5.8.0-36.40+21.04.1) hirsute; urgency=medium

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  [ Ubuntu: 5.8.0-36.40 ]

  * debian/scripts/file-downloader does not handle positive failures correctly
    (LP: #1878897)
    - [Packaging] file-downloader not handling positive failures correctly

  [ Ubuntu: 5.8.0-35.39 ]

  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * CVE-2021-1052 // CVE-2021-1053
    - [Packaging] NVIDIA -- Add the NVIDIA 460 driver

 -- Kleber Sacilotto de Souza <email address hidden> Thu, 07 Jan 2021 11:57:30 +0100

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux (Ubuntu):
status: Fix Released → In Progress
Changed in linux (Ubuntu Bionic):
status: Fix Released → In Progress
Changed in linux (Ubuntu Focal):
status: Fix Released → In Progress
Changed in linux (Ubuntu Groovy):
status: Fix Released → In Progress
description: updated
Revision history for this message
Matthew Ruffell (mruffell) wrote :
Download full text (3.9 KiB)

Hi everyone,

The original patch author, Xiao Ni, has sent a V2 patchset to the linux-raid mailing list for feedback. This new patchset fixes the problems the previous version had, namely, properly calculating the discard offset for second and onward disks, and correctly calculates the stripe size in far layouts.

The patches are:

https://www.spinics.net/lists/raid/msg67208.html
https://www.spinics.net/lists/raid/msg67212.html
https://www.spinics.net/lists/raid/msg67213.html
https://www.spinics.net/lists/raid/msg67209.html
https://www.spinics.net/lists/raid/msg67210.html
https://www.spinics.net/lists/raid/msg67211.html

We now need to thoroughly test and provide feedback to Xiao and the Raid subsystem maintainer before these patches can get merged into mainline again. We really need to make sure that these patches don't cause any data corruption.

I have backported the patchset to the 4.15, 5.4 and 5.8 kernels.

Backports for 5.4 and 5.8 kernels:

https://paste.ubuntu.com/p/vPFFPMjhbv/
https://paste.ubuntu.com/p/MCGH8v7Rqk/
https://paste.ubuntu.com/p/rppy39Qgkz/
https://paste.ubuntu.com/p/Dsqy4PQNzJ/
https://paste.ubuntu.com/p/mZ9VDBD8d5/
https://paste.ubuntu.com/p/vJNYZyGTWH/
https://paste.ubuntu.com/p/M4sMwhgWTj/

Backports for the 4.15 kernel:

https://paste.ubuntu.com/p/X9rRHT59qf/
https://paste.ubuntu.com/p/VWwW9JbBHy/
https://paste.ubuntu.com/p/pFY3YbBW6t/
https://paste.ubuntu.com/p/JKg4KcHwPB/
https://paste.ubuntu.com/p/C4sf2r9jS4/

I have built test kernels for bionic, bionic HWE, focal and groovy.

Performance testing confirms that the testcase for formatting a Raid10 array on NVMe disks drops from 8.5 minutes to about 6 seconds, on AWS i3.8xlarge, due to the speedup in block discard.

https://paste.ubuntu.com/p/NNGqP3xdsc/

I have also run through the data corruption regression reproducer from bug 1907262, and throughout the process, the /sys/block/md0/md/mismatch_cnt was always 0, and all deep fsck checks came back clean for individual disks.

https://paste.ubuntu.com/p/5DK57TzdFH/

I am happy with these results, and its time to get some wider testing on these patches.

If you are interested in helping to test, please use dedicated test servers, and not production systems. These patches have caused data corruption before, so only place data on the Raid10 array that you have copies of elsewhere, and assume that total data loss could happen anytime.

Please note, these test kernels are NOT SUPPORTED by Canonical, and are for TEST PURPOSES ONLY. ONLY install in a dedicated test environment.

Instructions to Install (on a Bionic or Focal or Groovy system):
1) sudo add-apt-repository ppa:mruffell/lp1896578-test
2) sudo apt update

For Bionic:
3) sudo apt install linux-image-unsigned-4.15.0-136-generic linux-modules-4.15.0-136-generic linux-modules-extra-4.15.0-136-generic linux-headers-4.15.0-136-generic

For Bionic HWE 5.4 or Focal:
3) sudo apt install linux-image-unsigned-5.4.0-66-generic linux-modules-5.4.0-66-generic linux-modules-extra-5.4.0-66-generic linux-headers-5.4.0-66-generic

For Groovy:
3) sudo apt install linux-image-unsigned-5.8.0-44-generic linux-modules-5.8.0-44-generic linux-modules-extra-5.8.0-44-generic linux-hea...

Read more...

description: updated
tags: removed: verification-done-bionic verification-done-focal verification-done-groovy
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
description: updated
Revision history for this message
Matthew Ruffell (mruffell) wrote :

If anyone is interested in testing, there are new re-spins of the test kernels
available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test

The patches used are the ones I will be submitting for SRU, and are more or less identical to the patches in the previous test kernels I supplied in February.

Please go ahead and do some testing, and let me know if you find any problems.

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to install:
1) sudo add-apt-repository ppa:mruffell/lp1896578-test
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-unsigned-5.11.0-16-generic linux-modules-5.11.0-16-generic \
linux-modules-extra-5.11.0-16-generic linux-headers-5.11.0-16-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-unsigned-5.8.0-50-generic linux-modules-5.8.0-50-generic \
linux-modules-extra-5.8.0-50-generic linux-headers-5.8.0-50-generic

For 20.04 / Focal:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For 18.04 / Bionic:
For the 5.4 Bionic HWE kernel:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For the 4.15 Bionic GA kernel:

3) sudo apt install linux-image-unsigned-4.15.0-142-generic linux-modules-4.15.0-142-generic \
linux-modules-extra-4.15.0-142-generic linux-headers-4.15.0-142-generic

4) sudo reboot
5) uname -rv
Make sure the string "+TEST1896578v20210504b1" is present in the uname -rv.

You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/

Revision history for this message
Matthew Ruffell (mruffell) wrote :

I have completed most of my regression testing, and things are still looking
good. The performance of the block discard is there, and I haven't seen any
data corruption.

In particular, I have been testing against the testcase for the regression that
occurred with the previous revision of the patches, back in December. The
testcase is covered in bug 1907262 [1].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262

For each of the 5.11, 5.8, 5.4 and 4.15 kernels, the problem does not reproduce,
as the values of /sys/block/md0/md/mismatch_cnt are always 0, and mounting each
disk in singular and performing a full deep fsck shows no data corruption.

Test results for each kernel are below:

5.11.0-16-generic #17+TEST1896578v20210503b1-Ubuntu
https://paste.ubuntu.com/p/Dp3sR9mNdY/

5.8.0-50-generic #56+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/tXmtmd5Jys/

5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/VzX2mXcKbF/

4.15.0-142-generic #146+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/HpMcX3N9fD/

I think I will look into some longer running tests as well, more info on that
later.

Revision history for this message
Evan Hoffman (ehoffman24) wrote :

Is there any ETA on a supported kernel with this patch?

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Evan,

The patches have been submitted for SRU to the Ubuntu kernel mailing list, for the 4.15, 5.4, 5.8 and 5.11 kernels:

[0] https://lists.ubuntu.com/archives/kernel-team/2021-May/119935.html
[1] https://lists.ubuntu.com/archives/kernel-team/2021-May/119936.html
[2] https://lists.ubuntu.com/archives/kernel-team/2021-May/119937.html
[3] https://lists.ubuntu.com/archives/kernel-team/2021-May/119938.html
[4] https://lists.ubuntu.com/archives/kernel-team/2021-May/119939.html
[5] https://lists.ubuntu.com/archives/kernel-team/2021-May/119941.html
[6] https://lists.ubuntu.com/archives/kernel-team/2021-May/119940.html
[7] https://lists.ubuntu.com/archives/kernel-team/2021-May/119942.html
[8] https://lists.ubuntu.com/archives/kernel-team/2021-May/119943.html
[9] https://lists.ubuntu.com/archives/kernel-team/2021-May/119944.html
[10] https://lists.ubuntu.com/archives/kernel-team/2021-May/119945.html
[11] https://lists.ubuntu.com/archives/kernel-team/2021-May/119946.html

The kernel team have reviewed the patches, but they are in no hurry to ACK the patchset [12], and they also haven't outright rejected it.

[12] https://lists.ubuntu.com/archives/kernel-team/2021-May/120051.html

The current status is that the kernel team have requested more testing to be performed, and that the patches will not make the current SRU cycle. They will instead be submitted for consideration in the next SRU cycle.

You can look at https://kernel.ubuntu.com/ for dates of various SRU cycles. If the patches are accepted for the 2021.05.31 SRU cycle, then you could expect a supported kernel to be available in late June.

If you want to help, then please consider installing the test kernels in comment #14 and helping test Raid10 on ssds / NVMe drives that support block discard.

I am currently using a cloud instance with 4x NVMe disks in Raid10 as my /home directory, and things seem okay.

I'll keep you updated on the progress of this patchset via this bug.

Thanks,
Matthew

Revision history for this message
Evan Hoffman (ehoffman24) wrote :

I have it running on two machines now that needed big RAID 10s:

# uname -rv
5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu SMP Tue May 4 00:30:36 UTC 202
# df -h /opt/raid
Filesystem Size Used Avail Use% Mounted on
/dev/md0 30T 208G 29T 1% /opt/raid
# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 nvme7n1[7] nvme6n1[6] nvme5n1[5] nvme4n1[4] nvme3n1[3] nvme2n1[2] nvme1n1[1] nvme0n1[0]
      31255576576 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
      [>....................] resync = 0.1% (48777600/31255576576) finish=2514.8min speed=206813K/sec
      bitmap: 233/233 pages [932KB], 65536KB chunk

FWIW the mkfs.xfs took ~1 minute across 8x8TB NVMe disks with this patched kernel.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Evan,

As I mentioned in my previous message, I submitted the patches to the Ubuntu kernel mailing list for SRU.

These patches have now gotten 2 acks [1][2] from senior kernel team members, and the patches have now been applied [3] to the 4.15, 5.4, 5.8 and 5.11 kernels.

[1] https://lists.ubuntu.com/archives/kernel-team/2021-May/120475.html
[2] https://lists.ubuntu.com/archives/kernel-team/2021-May/120799.html
[3] https://lists.ubuntu.com/archives/kernel-team/2021-May/120800.html

This is what is going to happen next. Next week, between the 31st of May and 4th of June, the kernel team will build the next kernel update, and place it in -proposed for testing.

As soon as these kernels enter -proposed, we need to install and test Raid10 in these new kernels as much as possible. The testing and verification window is between the 7th and 18th of June.

If all goes well, we can mark the launchpad bug as verified, and we will see a release to -updates around the 21st of June, give or take a few days if any CVEs turn up.

The schedule is on https://kernel.ubuntu.com/ if anything were to change.

I will write back once the next kernel update is in -proposed, likely early to mid next week. I would really, really appreciate it if you could help test the kernels when they arrive in -proposed, as I really don't want to introduce any more regressions.

Thanks,
Matthew

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Evan,

The kernel team have built all of the kernels for this SRU cycle, and have placed them into -proposed for verification.

We now need to do some thorough testing and make sure that Raid10 arrays function with good performance, ensure data integrity and make sure we won't be introducing any regressions when these kernels are released in two weeks time.

I would really appreciate it if you could help test and verify these kernels function as intended.

Instructions to Install:

1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list
# Enable Ubuntu proposed archive
deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main universe
EOF
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-5.11.0-20-generic linux-modules-5.11.0-20-generic \
linux-modules-extra-5.11.0-20-generic linux-headers-5.11.0-20-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-5.8.0-56-generic linux-modules-5.8.0-56-generic \
linux-modules-extra-5.8.0-56-generic linux-headers-5.8.0-56-generic

For 20.04 / Focal:

3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic

For 18.04 / Bionic:
 For the 5.4 Bionic HWE Kernel:

 3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic

 For the 4.15 Bionic GA Kernel:

 3) sudo apt install linux-image-4.15.0-145-generic linux-modules-4.15.0-145-generic \
linux-modules-extra-4.15.0-145-generic linux-headers-4.15.0-145-generic

4) sudo reboot
5) uname -rv

You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/

I am running the -proposed kernel on my cloud instance with my /home directory on a Raid10 array made up of 4x NVMe devices, and things are looking okay.
I will be performing my detailed regression testing against these kernels tomorrow, and I will write back with the results then.

Please help test these kernels in -proposed, and let me know how they go.

Thanks,
Matthew

Revision history for this message
Evan Hoffman (ehoffman24) wrote (last edit ):

Thanks Matt. I have it installed on one machine so far and looks good (in the past 10 minutes). fstrim of a ~30 TB RAID 10 took 73 seconds instead of multiple hours.

# uname -a
Linux xxx 5.4.0-75-generic #84-Ubuntu SMP Fri May 28 16:28:37 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# df -h /dev/md0
Filesystem Size Used Avail Use% Mounted on
/dev/md0 30T 212G 29T 1% /opt/raid
# cat /proc/mdstat
Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
md0 : active raid10 nvme7n1[7] nvme2n1[2] nvme6n1[6] nvme4n1[4] nvme1n1[1] nvme3n1[3] nvme0n1[0] nvme5n1[5]
      31255576576 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
      bitmap: 11/233 pages [44KB], 65536KB chunk

unused devices: <none>

# time fstrim /opt/raid

real 1m13.162s
user 0m0.004s
sys 0m0.351s

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Hirsute.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard performance.

The second is running through the regression reproducer from bug 1907262.

The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed 5.11.0-20-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent:

https://paste.ubuntu.com/p/X5sdCGT78Y/

It took 4.6 seconds to format the array with xfs, and 2.8 seconds for a fstrim,
as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug 1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 5.11.0-20-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/Xy6CPCQXZN/

Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Hirsute.

tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Groovy.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard performance.

The second is running through the regression reproducer from bug 1907262.

The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed 5.8.0-56-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent:

https://paste.ubuntu.com/p/GGXfjCHfDR/

It took 5.7 seconds to format the array with xfs, and 2.3 seconds for a fstrim,
as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug 1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 5.8.0-56-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/75xWd4Z3NZ/

Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Groovy.

tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Focal.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard performance.

The second is running through the regression reproducer from bug 1907262.

The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed 5.4.0-75-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent:

https://paste.ubuntu.com/p/mdQ6Wjr4yK/

It took 6.6 seconds to format the array with xfs, and 3.8 seconds for a fstrim, as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug 1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 5.4.0-75-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/jFHW26kcCK/

Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Focal.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Bionic.

I'm going to do three rounds of verification.

The first is the testcase from this bug, showing block discard performance.

The second is running through the regression reproducer from bug 1907262.

The third will be results from my testing with my /home directory on a cloud instance with Raid10 backed disks, 3x customer testing and 2x community user testing. This will be in a separate comment closer to the release date once I have collected results.

Starting with the testcase for this bug.

I started a i3.8xlarge instance on AWS, enabled -proposed and installed 4.15.0-145-generic. From there, I ran through the testcase of making a Raid10 array, and formatting it with xfs, and block discard performance was excellent:

https://paste.ubuntu.com/p/Sr8tR9yhRd/

It took 3.2 seconds to format the array with xfs, and 1.0 seconds for a fstrim, as opposed to the 20 minutes it took beforehand.

Performance with block discard is excellent.

Moving onto the second testcase, the regression reproducer from bug 1907262.

I started a n1-standard-2 VM on Google Cloud, and attached 2x NVMe scratch disks. I enabled -proposed and installed 4.15.0-145-generic. I ran through the testcase of making a Raid10 array, doing consistency checks, ensuring that mismatch count is 0, creating a file, deleting it, performing a fstrim, and more consistency checks, then taking the raid array down and bringing up one disk at a time, and performing a fsck.ext4. All disks came back clean:

https://paste.ubuntu.com/p/h8gTd4JQ8Y/

Since the block discard performance is there, and there is no apparent data corruption going on after a fstrim, I will mark this verified for Bionic.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Evan,

Great to hear things are looking good for you and that the block discard performance is there. If possible, keep running the kernel from -proposed for a bit longer, just to make sure nothing comes up on longer runs.

I spent some time today performing verification on all the kernels in -proposed, testing block discard performance [1], and also running through the regression testcase from LP #1907262 [2].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262

All kernels performed as expected, with block discard on 4x 1.9TB NVMe disks on an i3.8xlarge AWS instance taking 3-4 seconds, and the consistency checks performed returned clean disks, with no filesystem or data corruption.

I have documented my tests in my verification messages:

Hirsute:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/26

Groovy:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/27

Focal:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/28

Bionic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/29

I have marked the launchpad bug as verified for all releases.

I'm still running my own testing, with my /home directory being on a Raid10 array on a Google Cloud instance, and it has no issues.

If things keep going well, we should see a release to -updates around the 21st of June, give or take a few days if any CVEs turn up.

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Evan,

Just checking in. Are you still running 5.4.0-75-generic on your server?

Is everything nice and stable? Is your data fully intact, and no signs of corruption at all?

My server has been running for two weeks now, and it does a fstrim every 30 minutes, and everything appears to be stable, and I don't have any corruption when I fsck my disks.

If things keep looking good, the SRU cycle will complete early next week, and the kernel will be released to -updates around the 21st of June, give or take a few days if any CVEs turn up.

Let me know how things are going.

Thanks,
Matthew

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (65.7 KiB)

This bug was fixed in the package linux - 5.11.0-20.21+21.10.1

---------------
linux (5.11.0-20.21+21.10.1) impish; urgency=medium

  * impish/linux: 5.11.0-20.21+21.10.1 -proposed tracker (LP: #1930056)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  [ Ubuntu: 5.11.0-20.21 ]

  * hirsute/linux: 5.11.0-20.21 -proposed tracker (LP: #1930854)
  * ath11k WIFI not working in proposed kernel 5.11.0-19-generic (LP: #1930637)
    - bus: mhi: core: Download AMSS image from appropriate function

  [ Ubuntu: 5.11.0-19.20 ]

  * hirsute/linux: 5.11.0-19.20 -proposed tracker (LP: #1930075)
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * CVE-2021-33200
    - bpf: Wrap aux data inside bpf_sanitize_info container
    - bpf: Fix mask direction swap upon off reg sign change
    - bpf: No need to simulate speculative domain for immediates
  * AX201 BT will cause system could not enter S0i3 (LP: #1928047)
    - SAUCE: drm/i915: Tweaked Wa_14010685332 for all PCHs
  * CVE-2021-3490
    - SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking with
      bitwise ops"
    - gpf: Fix alu32 const subreg bound tracking on bitwise operations
  * CVE-2021-3489
    - SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of read-
      only ringbuf pages"
    - bpf: Prevent writable memory-mapping of read-only ringbuf pages
  * Select correct boot VGA when BIOS doesn't do it properly (LP: #1929217)
    - vgaarb: Use ACPI HID name to find integrated GPU
  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
    (LP: #1928242)
    - USB: Verify the port status when timeout happens during port suspend
  * CVE-2020-26145
    - ath10k: drop fragments with multicast DA for SDIO
    - ath10k: add CCMP PN replay protection for fragmented frames for PCIe
    - ath10k: drop fragments with multicast DA for PCIe
  * CVE-2020-26141
    - ath10k: Fix TKIP Michael MIC verification for PCIe
  * CVE-2020-24587
    - ath11k: Clear the fragment cache during key install
  * CVE-2020-24588
    - mac80211: properly handle A-MSDUs that start with an RFC 1042 header
    - cfg80211: mitigate A-MSDU aggregation attacks
    - mac80211: drop A-MSDUs on old ciphers
    - ath10k: drop MPDU which has discard flag set by firmware for SDIO
  * CVE-2020-26139
    - mac80211: do not accept/forward invalid EAPOL frames
  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
    - mac80211: extend protection against mixed key and fragment cache attacks
  * CVE-2020-24586 // CVE-2020-24587
    - mac80211: prevent mixed key and fragment cache attacks
    - mac80211: add fragment cache to sta_info
    - mac80211: check defrag PN against current frame
    - mac80211: prevent attacks on TKIP/WEP as well
  * CVE-2020-26147
    - mac80211: assure all fragments are encrypted
  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull the code that wait for blocked dev into one function
    - md/raid10: improve ra...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (65.9 KiB)

This bug was fixed in the package linux - 5.11.0-22.23

---------------
linux (5.11.0-22.23) hirsute; urgency=medium

  * UAF on CAN J1939 j1939_can_recv (LP: #1932209)
    - SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
    - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (5.11.0-20.21) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-20.21 -proposed tracker (LP: #1930854)

  * ath11k WIFI not working in proposed kernel 5.11.0-19-generic (LP: #1930637)
    - bus: mhi: core: Download AMSS image from appropriate function

linux (5.11.0-19.20) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-19.20 -proposed tracker (LP: #1930075)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CVE-2021-33200
    - bpf: Wrap aux data inside bpf_sanitize_info container
    - bpf: Fix mask direction swap upon off reg sign change
    - bpf: No need to simulate speculative domain for immediates

  * AX201 BT will cause system could not enter S0i3 (LP: #1928047)
    - SAUCE: drm/i915: Tweaked Wa_14010685332 for all PCHs

  * CVE-2021-3490
    - SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking with
      bitwise ops"
    - gpf: Fix alu32 const subreg bound tracking on bitwise operations

  * CVE-2021-3489
    - SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of read-
      only ringbuf pages"
    - bpf: Prevent writable memory-mapping of read-only ringbuf pages

  * Select correct boot VGA when BIOS doesn't do it properly (LP: #1929217)
    - vgaarb: Use ACPI HID name to find integrated GPU

  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
    (LP: #1928242)
    - USB: Verify the port status when timeout happens during port suspend

  * CVE-2020-26145
    - ath10k: drop fragments with multicast DA for SDIO
    - ath10k: add CCMP PN replay protection for fragmented frames for PCIe
    - ath10k: drop fragments with multicast DA for PCIe

  * CVE-2020-26141
    - ath10k: Fix TKIP Michael MIC verification for PCIe

  * CVE-2020-24587
    - ath11k: Clear the fragment cache during key install

  * CVE-2020-24588
    - mac80211: properly handle A-MSDUs that start with an RFC 1042 header
    - cfg80211: mitigate A-MSDU aggregation attacks
    - mac80211: drop A-MSDUs on old ciphers
    - ath10k: drop MPDU which has discard flag set by firmware for SDIO

  * CVE-2020-26139
    - mac80211: do not accept/forward invalid EAPOL frames

  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
    - mac80211: extend protection against mixed key and fragment cache attacks

  * CVE-2020-24586 // CVE-2020-24587
    - mac80211: prevent mixed key and fragment cache attacks
    - mac80211: add fragment cache to sta_info
    - mac80211: check defrag PN against current frame
    - mac80211: prevent attacks on TKIP/WEP as well

  * CVE-2020-26147
    - mac80211: assure all fragments are encrypted

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - ...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (23.2 KiB)

This bug was fixed in the package linux - 5.8.0-59.66

---------------
linux (5.8.0-59.66) groovy; urgency=medium

  * UAF on CAN J1939 j1939_can_recv (LP: #1932209)
    - SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
    - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (5.8.0-57.64) groovy; urgency=medium

  * groovy/linux: 5.8.0-57.64 -proposed tracker (LP: #1932047)

  * pmtu.sh from selftests.net in linux ADT test failure with linux/5.8.0-56.63
    (LP: #1931731)
    - net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb

linux (5.8.0-56.63) groovy; urgency=medium

  * groovy/linux: 5.8.0-56.63 -proposed tracker (LP: #1930052)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * scsi: storvsc: Parameterize number hardware queues (LP: #1930626)
    - scsi: storvsc: Parameterize number hardware queues

  * CVE-2021-33200
    - bpf: Wrap aux data inside bpf_sanitize_info container
    - bpf: Fix mask direction swap upon off reg sign change
    - bpf: No need to simulate speculative domain for immediates

  * CVE-2021-3490
    - SAUCE: Revert "UBUNTU: SAUCE: bpf: verifier: fix ALU32 bounds tracking with
      bitwise ops"
    - gpf: Fix alu32 const subreg bound tracking on bitwise operations

  * CVE-2021-3489
    - SAUCE: Revert "UBUNTU: SAUCE: bpf: prevent writable memory-mapping of read-
      only ringbuf pages"
    - bpf: Prevent writable memory-mapping of read-only ringbuf pages

  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
    (LP: #1928242)
    - USB: Verify the port status when timeout happens during port suspend

  * CVE-2020-26145
    - ath10k: drop fragments with multicast DA for SDIO
    - ath10k: add CCMP PN replay protection for fragmented frames for PCIe
    - ath10k: drop fragments with multicast DA for PCIe

  * CVE-2020-26141
    - ath10k: Fix TKIP Michael MIC verification for PCIe

  * CVE-2020-24587
    - ath11k: Clear the fragment cache during key install

  * CVE-2020-24588
    - mac80211: properly handle A-MSDUs that start with an RFC 1042 header
    - cfg80211: mitigate A-MSDU aggregation attacks
    - mac80211: drop A-MSDUs on old ciphers
    - ath10k: drop MPDU which has discard flag set by firmware for SDIO

  * CVE-2020-26139
    - mac80211: do not accept/forward invalid EAPOL frames

  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
    - mac80211: extend protection against mixed key and fragment cache attacks

  * CVE-2020-24586 // CVE-2020-24587
    - mac80211: prevent mixed key and fragment cache attacks
    - mac80211: add fragment cache to sta_info
    - mac80211: check defrag PN against current frame
    - mac80211: prevent attacks on TKIP/WEP as well

  * CVE-2020-26147
    - mac80211: assure all fragments are encrypted

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull the code that wait for blocked dev into one...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (34.3 KiB)

This bug was fixed in the package linux - 5.4.0-77.86

---------------
linux (5.4.0-77.86) focal; urgency=medium

  * UAF on CAN J1939 j1939_can_recv (LP: #1932209)
    - SAUCE: can: j1939: delay release of j1939_priv after synchronize_rcu

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
    - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (5.4.0-76.85) focal; urgency=medium

  * focal/linux: 5.4.0-76.85 -proposed tracker (LP: #1932123)

  * Upstream v5.9 introduced 'module' patches that removed exported symbols
    (LP: #1932065)
    - SAUCE: Revert "modules: inherit TAINT_PROPRIETARY_MODULE"
    - SAUCE: Revert "modules: return licensing information from find_symbol"
    - SAUCE: Revert "modules: rename the licence field in struct symsearch to
      license"
    - SAUCE: Revert "modules: unexport __module_address"
    - SAUCE: Revert "modules: unexport __module_text_address"
    - SAUCE: Revert "modules: mark each_symbol_section static"
    - SAUCE: Revert "modules: mark find_symbol static"
    - SAUCE: Revert "modules: mark ref_module static"

linux (5.4.0-75.84) focal; urgency=medium

  * focal/linux: 5.4.0-75.84 -proposed tracker (LP: #1930032)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CVE-2021-33200
    - bpf: Wrap aux data inside bpf_sanitize_info container
    - bpf: Fix mask direction swap upon off reg sign change
    - bpf: No need to simulate speculative domain for immediates

  * Realtek USB hubs in Dell WD19SC/DC/TB fail to work after exiting s2idle
    (LP: #1928242)
    - USB: Verify the port status when timeout happens during port suspend

  * CVE-2020-26145
    - ath10k: drop fragments with multicast DA for SDIO
    - ath10k: add CCMP PN replay protection for fragmented frames for PCIe
    - ath10k: drop fragments with multicast DA for PCIe

  * CVE-2020-26141
    - ath10k: Fix TKIP Michael MIC verification for PCIe

  * CVE-2020-24588
    - mac80211: properly handle A-MSDUs that start with an RFC 1042 header
    - cfg80211: mitigate A-MSDU aggregation attacks
    - mac80211: drop A-MSDUs on old ciphers
    - ath10k: drop MPDU which has discard flag set by firmware for SDIO

  * CVE-2020-26139
    - mac80211: do not accept/forward invalid EAPOL frames

  * CVE-2020-24586 // CVE-2020-24587 // CVE-2020-24587 for such cases.
    - mac80211: extend protection against mixed key and fragment cache attacks

  * CVE-2020-24586 // CVE-2020-24587
    - mac80211: prevent mixed key and fragment cache attacks
    - mac80211: add fragment cache to sta_info
    - mac80211: check defrag PN against current frame
    - mac80211: prevent attacks on TKIP/WEP as well

  * CVE-2020-26147
    - mac80211: assure all fragments are encrypted

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull the code that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout
    - dm raid: remove unnecessary discard limi...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.5 KiB)

This bug was fixed in the package linux - 4.15.0-147.151

---------------
linux (4.15.0-147.151) bionic; urgency=medium

  * CVE-2021-3444
    - bpf: Fix truncation handling for mod32 dst reg wrt zero

  * CVE-2021-3600
    - SAUCE: bpf: Do not use ax register in interpreter on div/mod
    - bpf: fix subprog verifier bypass by div/mod by 0 exception
    - SAUCE: bpf: Fix 32-bit register truncation on div/mod instruction

linux (4.15.0-146.150) bionic; urgency=medium

  * UAF on CAN BCM bcm_rx_handler (LP: #1931855)
    - SAUCE: can: bcm: delay release of struct bcm_op after synchronize_rcu

linux (4.15.0-145.149) bionic; urgency=medium

  * bionic/linux: 4.15.0-145.149 -proposed tracker (LP: #1929967)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * raid10: Block discard is very slow, causing severe delays for mkfs and
    fstrim operations (LP: #1896578)
    - md: add md_submit_discard_bio() for submitting discard bio
    - md/raid10: extend r10bio devs to raid disks
    - md/raid10: pull the code that wait for blocked dev into one function
    - md/raid10: improve raid10 discard request
    - md/raid10: improve discard request for far layout

  * CVE-2021-23133
    - sctp: delay auto_asconf init until binding the first addr

  * Bionic update: upstream stable patchset 2021-05-25 (LP: #1929603)
    - Input: nspire-keypad - enable interrupts only when opened
    - dmaengine: dw: Make it dependent to HAS_IOMEM
    - ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
    - arc: kernel: Return -EFAULT if copy_to_user() fails
    - neighbour: Disregard DEAD dst in neigh_update
    - ARM: keystone: fix integer overflow warning
    - ASoC: fsl_esai: Fix TDM slot setup for I2S mode
    - scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state
    - net: ieee802154: stop dump llsec keys for monitors
    - net: ieee802154: stop dump llsec devs for monitors
    - net: ieee802154: forbid monitor for add llsec dev
    - net: ieee802154: stop dump llsec devkeys for monitors
    - net: ieee802154: forbid monitor for add llsec devkey
    - net: ieee802154: stop dump llsec seclevels for monitors
    - net: ieee802154: forbid monitor for add llsec seclevel
    - pcnet32: Use pci_resource_len to validate PCI resource
    - mac80211: clear sta->fast_rx when STA removed from 4-addr VLAN
    - Input: i8042 - fix Pegatron C15B ID entry
    - HID: wacom: set EV_KEY and EV_ABS only for non-HID_GENERIC type of devices
    - readdir: make sure to verify directory entry for legacy interfaces too
    - arm64: fix inline asm in load_unaligned_zeropad()
    - arm64: alternatives: Move length validation in alternative_{insn, endif}
    - scsi: libsas: Reset num_scatter if libata marks qc as NODATA
    - netfilter: conntrack: do not print icmpv6 as unknown via /proc
    - netfilter: nft_limit: avoid possible divide error in nft_limit_init
    - net: davicom: Fix regulator not turned off on failed probe
    - net: sit: Unregister catch-all devices
    - i40e: fix the panic when running bpf in xdpdrv mode
    - ibmvnic: avoid calling napi_disable() twice
    - ibmvnic: remove duplicate napi_schedule call in do_reset function
  ...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Evan,

The SRU cycle has completed, and all kernels containing the Raid10 block discard performance patches have now been released to -updates.

Note that the versions are different than the kernels in -proposed, due to the kernel team needing to do a last minute respin to fix two sets of CVEs, one for broadcom wifi chipsets and the other for bpf, hence the kernels being released a day later than usual.

The released kernels are:

Hirsute: 5.11.0-22-generic
Groovy: 5.8.0-59-generic
Focal: 5.4.0-77-generic
Bionic: 4.15.0-147-generic

The HWE equivalents have also been released to -updates.

You may now install these kernels to your systems and enjoy fast block discard for your Raid10 arrays.

Thanks,
Matthew

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.