[Hyper-V] VSS and File Copy daemons intermittently fails to start

Bug #1891224 reported by Pat Viafore
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
Focal
Invalid
Undecided
Unassigned
Groovy
Invalid
Undecided
Unassigned

Bug Description

[Impact]

We have most reliably reproduced this on a Standard_B1s in Azure in the North Europe region (>80% of the time). Tests in other regions/VM types do not show this failure as often (<1%). We have reproduced this in Xenial, Bionic, Focal, and Groovy. We saw an increase of test failures around a month ago.

From the journal :

Aug 11 09:55:28 ubuntu systemd[1]: sys-devices-virtual-misc-vmbus\x21hv_vss.device: Job sys-devices-virtual-misc-vmbus\x21hv_vss.device/start tim>
Aug 11 09:55:28 ubuntu systemd[1]: Timed out waiting for device sys-devices-virtual-misc-vmbus\x21hv_vss.device.
Aug 11 09:55:28 ubuntu systemd[1]: Dependency failed for Hyper-V VSS Protocol Daemon.
Aug 11 09:55:28 ubuntu systemd[1]: hv-vss-daemon.service: Job hv-vss-daemon.service/start failed with result 'dependency'.
Aug 11 09:55:28 ubuntu systemd[1]: sys-devices-virtual-misc-vmbus\x21hv_vss.device: Job sys-devices-virtual-misc-vmbus\x21hv_vss.device/start fai>
Aug 11 09:55:28 ubuntu systemd[1]: sys-devices-virtual-misc-vmbus\x21hv_fcopy.device: Job sys-devices-virtual-misc-vmbus\x21hv_fcopy.device/start>
Aug 11 09:55:28 ubuntu systemd[1]: Timed out waiting for device sys-devices-virtual-misc-vmbus\x21hv_fcopy.device.
Aug 11 09:55:28 ubuntu systemd[1]: Dependency failed for Hyper-V File Copy Protocol Daemon.

We've seen problems in the past with KVP daemons that looked very similar : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820063

[Test Case]

There two main scenarios that need to be tested:

1. Azure instances:

   - Just start an azure instance using our Ubuntu images and check the the status of the hv-vss-daemon and hv-fcopy-daemon services using systemctl.

   - If the issue is solved they shouldn't be listed as failed.

2. Local Hyper-V VM:

   - Create a local Hyper-V instance and enable the two systemd services if necessary (hv-vss-daemon and hv-fcopy-daemon) and reboot.

   - You can change the integration services that are enable to the guest.

     1. With desktop integration and the backup feature disabled, the hv-fcopy-daemon and the hv-vss-daemon service, respectively should not be listed as failed.
     2. With the same features enabled the services should start without errors.

[Regression Potential]

The major risk with a potential regression is that those systemd service units are shipped by a package produced by our generic kernels and not the linux-azure kernel. So in case of a regression we might need to re-spin the generic kernels.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1891224

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Marcelo Cerri (mhcerri)
description: updated
Changed in systemd (Ubuntu):
status: New → Invalid
Changed in systemd (Ubuntu Xenial):
status: New → Invalid
Changed in systemd (Ubuntu Bionic):
status: New → Invalid
Changed in systemd (Ubuntu Focal):
status: New → Invalid
Changed in linux (Ubuntu Groovy):
status: Incomplete → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Changes submitted to our SRU cycle for groovy, focal, bionic and xenial.

Cover letter: https://lists.ubuntu.com/archives/kernel-team/2020-August/112922.html

Ian May (ian-may)
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
tags: added: verification-needed-xenial
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (57.9 KiB)

This bug was fixed in the package linux - 5.8.0-18.19

---------------
linux (5.8.0-18.19) groovy; urgency=medium

  * groovy/linux: 5.8.0-18.19 -proposed tracker (LP: #1893047)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Groovy update: v5.8.4 upstream stable release (LP: #1893048)
    - drm/vgem: Replace opencoded version of drm_gem_dumb_map_offset()
    - drm/panel-simple: Fix inverted V/H SYNC for Frida FRD350H54004 panel
    - drm/ast: Remove unused code paths for AST 1180
    - drm/ast: Initialize DRAM type before posting GPU
    - khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter()
    - ALSA: hda: avoid reset of sdo_limit
    - ALSA: hda/realtek: Add quirk for Samsung Galaxy Flex Book
    - ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion
    - can: j1939: transport: j1939_session_tx_dat(): fix use-after-free read in
      j1939_tp_txtimer()
    - can: j1939: socket: j1939_sk_bind(): make sure ml_priv is allocated
    - spi: Prevent adding devices below an unregistering controller
    - io_uring: find and cancel head link async work on files exit
    - mm/vunmap: add cond_resched() in vunmap_pmd_range
    - romfs: fix uninitialized memory leak in romfs_dev_read()
    - kernel/relay.c: fix memleak on destroy relay channel
    - uprobes: __replace_page() avoid BUG in munlock_vma_page()
    - squashfs: avoid bio_alloc() failure with 1Mbyte blocks
    - mm: include CMA pages in lowmem_reserve at boot
    - mm, page_alloc: fix core hung in free_pcppages_bulk()
    - ASoC: amd: renoir: restore two more registers during resume
    - RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE request
    - opp: Enable resources again if they were disabled earlier
    - opp: Put opp table in dev_pm_opp_set_rate() for empty tables
    - opp: Put opp table in dev_pm_opp_set_rate() if _set_opp_bw() fails
    - ext4: do not block RWF_NOWAIT dio write on unallocated space
    - ext4: fix checking of directory entry validity for inline directories
    - jbd2: add the missing unlock_buffer() in the error path of
      jbd2_write_superblock()
    - scsi: zfcp: Fix use-after-free in request timeout handlers
    - selftests: kvm: Use a shorter encoding to clear RAX
    - s390/pci: fix zpci_bus_link_virtfn()
    - s390/pci: re-introduce zpci_remove_device()
    - s390/pci: fix PF/VF linking on hot plug
    - s390/pci: ignore stale configuration request event
    - mm/memory.c: skip spurious TLB flush for retried page fault
    - drm: amdgpu: Use the correct size when allocating memory
    - drm/amdgpu/display: use GFP_ATOMIC in dcn20_validate_bandwidth_internal
    - drm/amd/display: Fix incorrect backlight register offset for DCN
    - drm/amd/display: Fix EDID parsing after resume from suspend
    - drm/amd/display: Blank stream before destroying HDCP session
    - drm/amd/display: Fix DFPstate hang due to view port changed
    - drm/amd/display: fix pow() crashing when given base 0
    - drm/i915/pmu: Prefer drm_WARN_ON over WARN_ON
    - drm/i915: Provide the perf pmu.module
    - scsi: ufs: Add DELAY_BEFORE_LPM quirk for Micron devices
    - scsi: target: tcmu: Fix crash in tcmu_flush_dcache_range on ARM
  ...

Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.2 KiB)

This bug was fixed in the package linux - 4.15.0-118.119

---------------
linux (4.15.0-118.119) bionic; urgency=medium

  * bionic/linux: 4.15.0-118.119 -proposed tracker (LP: #1894697)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for nvidia 450 and 450-server

  * cgroup refcount is bogus when cgroup_sk_alloc is disabled (LP: #1886860)
    - cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone()

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * [Hyper-V] VSS and File Copy daemons intermittently fails to start
    (LP: #1891224)
    - [Packaging] Bind hv_vss_daemon startup to hv_vss device
    - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * KVM: Fix zero_page reference counter overflow when using KSM on KVM compute
    host (LP: #1837810)
    - KVM: fix overflow of zero page refcount with ksm running

  * Fix false-negative return value for rtnetlink.sh in kselftests/net
    (LP: #1890136)
    - selftests: rtnetlink: correct the final return value for the test
    - selftests: rtnetlink: make kci_test_encap() return sub-test result

  * Bionic update: upstream stable patchset 2020-08-18 (LP: #1892091)
    - USB: serial: qcserial: add EM7305 QDL product ID
    - USB: iowarrior: fix up report size handling for some devices
    - usb: xhci: define IDs for various ASMedia host controllers
    - usb: xhci: Fix ASMedia ASM1142 DMA addressing
    - Revert "ALSA: hda: call runtime_allow() for all hda controllers"
    - ALSA: seq: oss: Serialize ioctls
    - staging: android: ashmem: Fix lockdep warning for write operation
    - Bluetooth: Fix slab-out-of-bounds read in hci_extended_inquiry_result_evt()
    - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_evt()
    - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_with_rssi_evt()
    - omapfb: dss: Fix max fclk divider for omap36xx
    - binder: Prevent context manager from incrementing ref 0
    - vgacon: Fix for missing check in scrollback handling
    - mtd: properly check all write ioctls for permissions
    - leds: wm831x-status: fix use-after-free on unbind
    - leds: da903x: fix use-after-free on unbind
    - leds: lm3533: fix use-after-free on unbind
    - leds: 88pm860x: fix use-after-free on unbind
    - net/9p: validate fds in p9_fd_open
    - drm/nouveau/fbcon: fix module unload when fbcon init has failed for some
      reason
    - drm/nouveau/fbcon: zero-initialise the mode_cmd2 structure
    - i2c: slave: improve sanity check when registering
    - i2c: slave: add sanity check when unregistering
    - usb: hso: check for return value in hso_serial_common_create()
    - firmware: Fix a reference count leak.
    - cfg80211: check vendor command doit pointer before use
    - igb: reinit_locked() should be called with rtnl_lock
    - atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent
    - tools lib traceevent: Fix memory leak in process_dynamic...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (42.6 KiB)

This bug was fixed in the package linux - 5.4.0-48.52

---------------
linux (5.4.0-48.52) focal; urgency=medium

  * focal/linux: 5.4.0-48.52 -proposed tracker (LP: #1894654)

  * mm/slub kernel oops on focal kernel 5.4.0-45 (LP: #1895109)
    - SAUCE: Revert "mm/slub: fix a memory leak in sysfs_slab_add()"

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for nvidia 450 and 450-server

  * [UBUNTU 20.04] zPCI attach/detach issues with PF/VF linking support
    (LP: #1892849)
    - s390/pci: fix zpci_bus_link_virtfn()
    - s390/pci: re-introduce zpci_remove_device()
    - s390/pci: fix PF/VF linking on hot plug

  * [UBUNTU 20.04] kernel: s390/cpum_cf,perf: changeDFLT_CCERROR counter name
    (LP: #1891454)
    - s390/cpum_cf, perf: change DFLT_CCERROR counter name

  * [UBUNTU 20.04] zPCI: Enabling of a reserved PCI function regression
    introduced by multi-function support (LP: #1891437)
    - s390/pci: fix enabling a reserved PCI function

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * [Hyper-V] VSS and File Copy daemons intermittently fails to start
    (LP: #1891224)
    - [Packaging] Bind hv_vss_daemon startup to hv_vss device
    - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * alsa/hdmi: support nvidia mst hdmi/dp audio (LP: #1867704)
    - ALSA: hda - Rename snd_hda_pin_sense to snd_hda_jack_pin_sense
    - ALSA: hda - Add DP-MST jack support
    - ALSA: hda - Add DP-MST support for non-acomp codecs
    - ALSA: hda - Add DP-MST support for NVIDIA codecs
    - ALSA: hda: hdmi - fix regression in connect list handling
    - ALSA: hda: hdmi - fix kernel oops caused by invalid PCM idx
    - ALSA: hda: hdmi - preserve non-MST PCM routing for Intel platforms
    - ALSA: hda: hdmi - Keep old slot assignment behavior for Intel platforms
    - ALSA: hda - Fix DP-MST support for NVIDIA codecs

  * Focal update: v5.4.60 upstream stable release (LP: #1892899)
    - smb3: warn on confusing error scenario with sec=krb5
    - genirq/affinity: Make affinity setting if activated opt-in
    - genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()
    - PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()
    - PCI: Add device even if driver attach failed
    - PCI: qcom: Define some PARF params needed for ipq8064 SoC
    - PCI: qcom: Add support for tx term offset for rev 2.1.0
    - btrfs: allow use of global block reserve for balance item deletion
    - btrfs: free anon block device right after subvolume deletion
    - btrfs: don't allocate anonymous block device for user invisible roots
    - btrfs: ref-verify: fix memory leak in add_block_entry
    - btrfs: stop incremening log_batch for the log root tree when syncing log
    - btrfs: remove no longer needed use of log_writers for the log root tree
    - btrfs: don't traverse into the seed devices in show_devname
    - btrfs: open device...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (9.6 KiB)

This bug was fixed in the package linux - 4.4.0-190.220

---------------
linux (4.4.0-190.220) xenial; urgency=medium

  * xenial/linux: 4.4.0-190.220 -proposed tracker (LP: #1893431)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * [Hyper-V] VSS and File Copy daemons intermittently fails to start
    (LP: #1891224)
    - [Packaging] Bind hv_vss_daemon startup to hv_vss device
    - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * CVE-2019-20811
    - net-sysfs: call dev_hold if kobject_init_and_add success

  * CVE-2020-0067
    - f2fs: fix to avoid memory leakage in f2fs_listxattr

  * CVE-2019-9453
    - f2fs: fix to avoid accessing xattr across the boundary

  * Xenial update: 4.4.233 upstream stable release (LP: #1892822)
    - media: rc: prevent memory leak in cx23888_ir_probe
    - ath9k_htc: release allocated buffer if timed out
    - ath9k: release allocated buffer if timed out
    - nfs: Move call to security_inode_listsecurity into nfs_listxattr
    - PCI/ASPM: Disable ASPM on ASMedia ASM1083/1085 PCIe-to-PCI bridge
    - drm/amdgpu: Prevent kernel-infoleak in amdgpu_info_ioctl()
    - drm: hold gem reference until object is no longer accessed
    - f2fs: check memory boundary by insane namelen
    - f2fs: check if file namelen exceeds max value
    - ARM: 8986/1: hw_breakpoint: Don't invoke overflow handler on uaccess
      watchpoints
    - fbdev: Detect integer underflow at "struct fbcon_ops"->clear_margins.
    - rds: Prevent kernel-infoleak in rds_notify_queue_get()
    - net/x25: Fix x25_neigh refcnt leak when x25 disconnect
    - net/x25: Fix null-ptr-deref in x25_disconnect
    - sh: Fix validation of system call number
    - net: lan78xx: add missing endpoint sanity check
    - net: lan78xx: fix transfer-buffer memory leak
    - mlxsw: core: Increase scope of RCU read-side critical section
    - mac80211: mesh: Free ie data when leaving mesh
    - nfc: s3fwrn5: add missing release on skb in s3fwrn5_recv_frame
    - net: ethernet: ravb: exit if re-initialization fails in tx timeout
    - Revert "i2c: cadence: Fix the hold bit setting"
    - xen-netfront: fix potential deadlock in xennet_remove()
    - x86/i8259: Use printk_deferred() to prevent deadlock
    - random32: update the net random state on interrupt and activity
    - ARM: percpu.h: fix build error
    - random: fix circular include dependency on arm64 after addition of percpu.h
    - random32: remove net_rand_state from the latent entropy gcc plugin
    - random32: move the pseudo-random 32-bit definitions to prandom.h
    - ext4: fix direct I/O read error
    - USB: serial: qcserial: add EM7305 QDL product ID
    - ALSA: seq: oss: Serialize ioctls
    - Bluetooth: Fix slab-out-of-bounds read in hci_extended_inquiry_result_evt()
    - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_evt()
    - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_with_rssi_evt()
    - vgacon: Fix for missing check in scrollback handling
    - mtd: properly check all write ioctls for permissions
    - net/9p: validate fds in p9_fd_open
    - drm/nouveau/fbcon: fix module unload when fbcon init has failed for some
...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.