NFSv4.1: Interrupted connections cause high bandwidth RPC ping-pong between client and server

Bug #1887607 reported by Matthew Ruffell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/1887607

[Impact]

There is a bug in NFS v4.1 that causes a large amount of RPC calls between a client and server when a previous RPC call is interrupted. This uses a large amount of bandwidth and can saturate the network.

The symptoms are so:

* On NFS clients:
Attempts to access mounted NFS shares associated with the affected server block indefinitely.

* On the network:
A storm of repeated RPCs between NFS client and server uses a lot of bandwidth. Each RPC is acknowledged by the server with an NFS4ERR_SEQ_MISORDERED error.

* Other NFS clients connected to the same NFS server:
Performance drops dramatically.

This occurs during a "false retry", when a client attempts to make a new RPC call using a slot+sequence number that references an older, cached call. This happens when a user process interrupts an RPC call that is in progress.

I had previously fixed this for Disco in bug 1828978, and now a customer has run into the issue in Bionic. A reproducer is supplied in the testcase section, which was something missing from bug 1828978, since we never determined how the issue actually occurred back then.

[Fix]

This was fixed in 5.1 upstream with the below commit:

commit 3453d5708b33efe76f40eca1c0ed60923094b971
Author: Trond Myklebust <email address hidden>
Date: Wed Jun 20 17:53:34 2018 -0400
Subject: NFSv4.1: Avoid false retries when RPC calls are interrupted

The fix is to pre-emptively increment the sequence number if an RPC call is interrupted, and to address corner cases we interpret the NFS4ERR_SEQ_MISORDERED error as a sign we need to locate an appropriate sequence number between the value we sent, and the last successfully acked SEQUENCE call.

The commit also requires two fixup commits, which landed in 5.5 and 5.8-rc6 respectively:

commit 5c441544f045e679afd6c3c6d9f7aaf5fa5f37b0
Author: Trond Myklebust <email address hidden>
Date: Wed Nov 13 08:34:00 2019 +0100
Subject: NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()

commit 913fadc5b105c3619d9e8d0fe8899ff1593cc737
Author: Anna Schumaker <email address hidden>
Date: Wed Jul 8 10:33:40 2020 -0400
Subject: NFS: Fix interrupted slots by sending a solo SEQUENCE operation

Commits 3453d5708b33efe76f40eca1c0ed60923094b971 and 913fadc5b105c3619d9e8d0fe8899ff1593cc737 require small backports to bionic, as struct rpc_cred changed to const struct cred in 5.0, and the backports swap them back to struct rpc_cred since that is how 4.15 works.

[Testcase]

You will need four machines. The first, is a kerberos KDC. Set up Kerberos correctly and create new service principals for the NFS server and for the client. I used: nfs/nfskerb.mydomain.com and nfs/client.mydomain.com.

The second machine will be a NFS server with the krb5p share. Add the nfs server kerberos keys to the system's keytab, and set up a NFS server that exports a directory with sec=krb5p. Example export:
/mnt/secretfolder *.mydomain.com(rw,sync,no_subtree_check,sec=krb5p)

The third machine is a regular NFS server. Export a directory with normal sec=sys security. Example export:
/mnt/sharedfolder *.mydomain.com(rw,sync)

The fourth is a desktop machine. Add the client kerberos keys to the system's keytab. Mount both NFS shares, making sure to use the NFS v4.2 protocol. I used the commands:
mount -t nfs4 nfskerb.mydomain.com:/mnt/secretfolder /mnt/secretfolder_client/
mount -t nfs4 nfs.mydomain.com:/mnt/sharedfolder /mnt/sharedfolder_client

Check "mount -l" to ensure that NFS v4.2 is used:
nfskerb.mydomain.com:/mnt/secretfolder on /mnt/secretfolder_client type nfs4 (rw,relatime,vers=4.2,<...>,sec=krb5p,<...>)
nfs.mydomain.com:/mnt/sharedfolder on /mnt/sharedfolder_client type nfs4 (rw,relatime,vers=4.2,<...>,sec=sys,<...>)

Generate some files full of random data. I found 20MB from /dev/random works great.

Open each NFS share up in tabs in Nautilus. Copy the random data files to the sec=sys NFS share. When they are done, one at a time cut and then paste the file into the sec=krb5p NFS share. The bug will trigger either on the first, or subsequent tries, but less than 10 tries are needed usually.

There is a test kernel available in the following PPA:
https://launchpad.net/~mruffell/+archive/ubuntu/sf285439-test

If you install the test kernel, files will cut and paste correctly, and NFS will work as expected.

[Regression Potential]

The changes are localised to NFS v4.1 and v4.2 only, and other versions of NFS are not affected. If a regression occurs, users can downgrade NFS versions to v4.0 or v3.x until a fix is made.

The changes only impact when connections are interrupted, and under typical blue sky scenarios would not be invoked.

There have been several attempts to fix this in the past, starting with f9312a541050 "NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY", and stemming to the commit mentioned in the fix section, along with its two other fixup commits. This seems to be an ongoing issue where edge cases crop up. I won't be surprised if there are further commits down the line.

[Other Info]

When I first submitted this fix for SRU, I believed that the fix was:

commit 02ef04e432babf8fc703104212314e54112ecd2d
Author: Chuck Lever <email address hidden>
Date: Mon Feb 11 11:25:25 2019 -0500
Subject: NFS: Account for XDR pad of buf->pages

This is not the case. This was a false positive fix. What it did was break NFSv4 GETACL and FS_LOCATIONS requests. When you tried to reproduce, the calls were never made since they were broken, and thus could not be interrupted, and cutting and pasting files worked fine.

When you applied the fixup commit 29e7ca715f2a0b6c0a99b1aec1b0956d1f271955 to fix NFSv4 GETACL and FS_LOCATIONS requests, the problem returns, as GETACL and FS_LOCATIONS are free to be interrupted and start a high bandwidth ping pong.

Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
tags: added: sts
summary: - Cutting and Pasting files from NFS sec=sys to NFS sec=krb5p causes NFS
- to hang
+ NFS4.2: Cutting and Pasting files from NFS sec=sys to NFS sec=krb5p
+ causes NFS to hang
description: updated
summary: - NFS4.2: Cutting and Pasting files from NFS sec=sys to NFS sec=krb5p
- causes NFS to hang
+ NFSv4.1: Interrupted connections cause high bandwidth RPC ping-pong
+ between client and server
description: updated
description: updated
description: updated
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Matthew Ruffell (mruffell) wrote :

I installed 4.15.0-114-generic from -proposed to my test client machine, which is a Ubuntu 18.04 Desktop VM.

I mounted two NFS shares, one with sec=sys, and the other with sec=krb5p. I then opened each share up in separate tabs in Nautilus.

I then CUT a file from the sec=sys share, and PASTED it into the sec=krb5p share. The file pasted to the new share correctly, and was removed from the original share.

I repeated this several times, and each file movement was successful with no hangs.

I then moved all files from the sec=krb5p share, back to the sec=sys share, selected all files, then CUT and PASTED them to the sec=krb5p share.

All files moved without the system hanging. I repeated this a number of times, and the system functions correctly.

There is an occasional delay when pasting some files, that lasts for no more than a few seconds, which happens when a file copy is interrupted, but the NFS subsystem recovers, likely finding the correct slot and sequence number, and the file moves successfully. The delay is likely due to the NFS client needing a few round trips to the server to discover the correct slot and sequence number.

The hang is fixed, and I am happy to mark this as verified.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (55.0 KiB)

This bug was fixed in the package linux - 4.15.0-115.116

---------------
linux (4.15.0-115.116) bionic; urgency=medium

  * bionic/linux: 4.15.0-115.116 -proposed tracker (LP: #1893055)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (4.15.0-114.115) bionic; urgency=medium

  * bionic/linux: 4.15.0-114.115 -proposed tracker (LP: #1891052)

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (4.15.0-113.114) bionic; urgency=medium

  * bionic/linux: 4.15.0-113.114 -proposed tracker (LP: #1890705)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Reapply "usb: handle warm-reset port requests on hub resume" (LP: #1859873)
    - usb: handle warm-reset port requests on hub resume

  * Bionic update: upstream stable patchset 2020-07-29 (LP: #1889474)
    - gpio: arizona: handle pm_runtime_get_sync failure case
    - gpio: arizona: put pm_runtime in case of failure
    - pinctrl: amd: fix npins for uart0 in kerncz_groups
    - mac80211: allow rx of mesh eapol frames with default rx key
    - scsi: scsi_transport_spi: Fix function pointer check
    - xtensa: fix __sync_fetch_and_{and,or}_4 declarations
    - xtensa: update *pos in cpuinfo_op.next
    - drivers/net/wan/lapbether: Fixed the value of hard_header_len
    - net: sky2: initialize return of gm_phy_read
    - drm/nouveau/i2c/g94-: increase NV_PMGR_DP_AUXCTL_TRANSACTREQ timeout
    - irqdomain/treewide: Keep firmware node unconditionally allocated
    - SUNRPC reverting d03727b248d0 ("NFSv4 fix CLOSE not waiting for direct IO
      compeletion")
    - spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
    - IB/umem: fix reference count leak in ib_umem_odp_get()
    - uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix
      GDB regression
    - ALSA: info: Drop WARN_ON() from buffer NULL sanity check
    - ASoC: rt5670: Correct RT5670_LDO_SEL_MASK
    - btrfs: fix double free on ulist after backref resolution failure
    - btrfs: fix mount failure caused by race with umount
    - btrfs: fix page leaks after failure to lock page for delalloc
    - bnxt_en: Fix race when modifying pause settings.
    - hippi: Fix a size used in a 'pci_free_consistent()' in an error handling
      path
    - ax88172a: fix ax88172a_unbind() failures
    - net: dp83640: fix SIOCSHWTSTAMP to update the struct with actual
      configuration
    - drm: sun4i: hdmi: Fix inverted HPD result
    - net: smc91x: Fix possible memory leak in smc_drv_probe()
    - bonding: check error value of register_netdevice() immediately
    - mlxsw: destroy workqueue when trap_register in mlxsw_emad_init
    - ipvs: fix the connection sync failed in some cases
    - i2c: rcar: always clear ICSAR to avoid side effects
    - bonding: check return value of register_netdevice() in bond_newlink()
    - serial: exar: Fix GPIO configuration for Sealevel cards based on XR17V35X
    - scripts/decode_stacktrace: strip basepath from all paths
    - HID: i...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.