error "ib_dealloc_pd failed" when load unload ib_ipoib module

Bug #1467912 reported by Kamal Heib
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Vivid
Fix Released
Medium
Joseph Salisbury

Bug Description

When we try to load and unload the ib_ipoib driver we get an error message.

Steps to reproduce:

load ib_ipoib: modprobe -v ib_ipoib
configure ip: ifconfig ib0 11.135.196.7/16
unload driver: modprobe -rv ib_ipoib

We will see this message in dmesg:

[ 709.652944] ib0: ib_dealloc_pd failed

uname output:

root@qa-h-vrt-035:~# uname -a
Linux qa-h-vrt-035 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The following two upstream commits fix this issue:

commit 9ab874b6593045886b699df2bc3ff803d88a9f7c
Author: Doug Ledford <email address hidden>
Date: Sat Feb 21 19:27:00 2015 -0500

    IB/ipoib: change init sequence ordering

    In preparation for using per device work queues, we need to move the
    start of the neighbor thread task to after ipoib_ib_dev_init and move
    the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
    Otherwise we will end up freeing our workqueue with work possibly
    still on it.

    Signed-off-by: Doug Ledford <email address hidden>

commit 6387d8d5896536b904ba6937fe019a29548e3a86
Author: Doug Ledford <email address hidden>
Date: Sat Feb 21 19:26:59 2015 -0500

    IB/ipoib: factor out ah flushing

    Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at
    appropriate times to flush out all remaining ah entries before we shut
    the device down.

    Because neighbors and mcast entries can each have a reference on any
    given ah, we must make sure to free all of those first before our ah
    will actually have a 0 refcount and be able to be reaped.

    This factoring is needed in preparation for having per-device work
    queues. The original per-device workqueue code resulted in the following
    error message:

    <ibdev>: ib_dealloc_pd failed

    That error was tracked down to this issue. With the changes to which
    workqueues were flushed when, there were no flushes of the per device
    workqueue after the last ah's were freed, resulting in an attempt to
    dealloc the pd with outstanding resources still allocated. This code
    puts the explicit flushes in the needed places to avoid that problem.

    Signed-off-by: Doug Ledford <email address hidden>
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jun 23 15:16 seq
 crw-rw---- 1 root audio 116, 33 Jun 23 15:16 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.17.2-0ubuntu1.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 15.04
HibernationDevice: RESUME=UUID=00a63930-d83a-446b-a3e4-b5910e884c7f
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Red Hat KVM
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: root=/dev/sda2 console=tty0 console=ttyS0,115200n8 rhgb
ProcVersionSignature: Ubuntu 3.19.0-20.20-generic 3.19.8
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-20-generic N/A
 linux-backports-modules-3.19.0-20-generic N/A
 linux-firmware 1.143
RfKill: Error: [Errno 2] No such file or directory
Tags: vivid
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 3.19.0-20-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 01/01/2011
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2011:svnRedHat:pnKVM:pvrRHEL7.0.0PC(i440FX+PIIX,1996):cvnBochs:ct1:cvr:
dmi.product.name: KVM
dmi.product.version: RHEL 7.0.0 PC (i440FX + PIIX, 1996)
dmi.sys.vendor: Red Hat

Kamal Heib (kamalh-s)
description: updated
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1467912

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Kamal Heib (kamalh-s) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Kamal Heib (kamalh-s) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : JournalErrors.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : Lspci.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : ProcEnviron.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : ProcModules.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : UdevDb.txt

apport information

Revision history for this message
Kamal Heib (kamalh-s) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Changed in linux (Ubuntu Vivid):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Vivid test kernel with a cherry pick of the following two commits:

be7aa66 IB/ipoib: change init sequence ordering
e135106 IB/ipoib: factor out ah flushing

The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1467912/

Can you test this kernel and see if it resolves this bug?

Thanks in advance!

Revision history for this message
Kamal Heib (kamalh-s) wrote :

HI Joseph,

Sure, I'll test it and update you.

Can you please target this bug also for Ubuntu 14.04.3.

Thanks,
Kamal

Revision history for this message
Kamal Heib (kamalh-s) wrote :

Hi Joseph,

I tested the kernel you provided and it's resolves this bug.

And one more thing as i said in my previous comment, Could you please target this issue to Ubuntu 14.04.3.

Thanks a lot,
Kamal

Changed in linux (Ubuntu Vivid):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu Vivid):
assignee: Rafael David Tinoco (inaddy) → nobody
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Vivid):
status: Triaged → In Progress
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Kamal,

I will send an SRU request for these two patches to be included in Vivid, which is the kernel(3.19) used in 14.04.3.

Do you happen to know if any other earlier releases(Utopic, Trusty, Precise) also experience this bug and require these patches?

Thanks!

Revision history for this message
Kamal Heib (kamalh-s) wrote :

Hello Joseph,

Thanks a lot for the update.

Related to earlier releases. I don't think that this bug affected earlier releases. because we saw it just in kernel 3.19 (Ubuntu 15.04).

Thanks,

Kamal

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Kamal,

I've sent a request to the 3.19 stable upstream maintainer, Kamal Mostafa, and requested the inclusion of these two patches in the upstream kernel.

I've also sent an SRU request for inclusion in Vivid.

Thanks!

Brad Figg (brad-figg)
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
Revision history for this message
Kamal Heib (kamalh-s) wrote :

Hi,

I tested the kernel and it is fixing this issue.

Verified with command:
#for i in {0..5}; do modprobe -rv ib_ipoib; sleep 5; modprobe -v ib_ipoib; sleep 5; ifconfig ib0 11.135.196.7/16; sleep 5; done

uname -a:
Linux reg-l-vrt-036-007 3.19.0-23-generic #24-Ubuntu SMP Tue Jul 7 18:52:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

dmegs:
[ 15.208908] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 639.091197] mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
[ 724.788318] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 724.801980] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 772.052993] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 772.056608] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 794.909398] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 794.912980] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 810.996256] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 810.999475] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 826.182059] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 826.185445] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 843.741871] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 843.745190] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 860.196937] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 860.200349] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 879.430210] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 879.433895] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 899.534312] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 899.538097] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 1108.823736] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 1108.828015] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

Thanks,
Kamal

tags: added: verification-done-vivid
removed: verification-needed-vivid
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.6 KiB)

This bug was fixed in the package linux - 3.19.0-23.24

---------------
linux (3.19.0-23.24) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1472346

  [ Chris J Arges ]

  * SAUCE: Don't use atomic read in evlist.c
    - LP: #1410673

linux (3.19.0-23.23) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1472048

  [ Chris J Arges ]

  * [Config] Add CRYPTO_DEV_NX_*, 842_* as modules
    - LP: #1454687

  [ Lu, Han ]

  * SAUCE: i915_bpo: drm/i915/audio: add codec wakeup override
    enabled/disable callback
    - LP: #1460674

  [ Timo Aaltonen ]

  * SAUCE: Backport I915_OVERLAY_DISABLE_DEST_COLORKEY
    - LP: #1460674
  * SAUCE: i915_bpo: Rebase to drm-intel-next-fixes-2015-05-29
    - LP: #1460674
  * SAUCE: i915_bpo: Revert "drm/i915: Implement the intel_dp_autotest_edid
    function for DP EDID complaince tests"
    - LP: #1460674
  * SAUCE: i915_bpo: Revert "drm/i915: Add debugfs test control files for
    Displayport compliance testing"
    - LP: #1460674
  * SAUCE: Load i915_bpo from the hda driver on SKL/CHV
    - LP: #1460674
  * SAUCE: i915_bpo: Don't try to support BXT
    - LP: #1460674
  * SAUCE: i915_bpo: drm/i915/skl: Fix DMC API version.

  [ Upstream Kernel Changes ]

  * Revert "usb: dwc2: add bus suspend/resume for dwc2"
    - LP: #1471252
  * Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820"
    - LP: #1471252
  * Revert "KVM: x86: drop fpu_activate hook"
    - LP: #1471252
  * Revert "libceph: clear r_req_lru_item in __unregister_linger_request()"
    - LP: #1471252
  * drm/i915: add component support
    - LP: #1460661
  * ALSA: hda: export struct hda_intel
    - LP: #1460661
  * ALSA: hda: pass intel_hda to all i915 interface functions
    - LP: #1460661
  * ALSA: hda: add component support
    - LP: #1460661
  * drm/atomic-helpers: Fix documentation typos and wrong copy&paste
    - LP: #1460674
  * drm/atomic: Rename drm_atomic_helper_commit_pre_planes() state argument
    - LP: #1460674
  * drm/atomic-helper: Rename commmit_post/pre_planes
    - LP: #1460674
  * drm/atomic-helpers: make mode_set hooks optional
    - LP: #1460674
  * drm/atomic-helper: Fix kerneldoc for prepare_planes
    - LP: #1460674
  * drm: Complete moving rotation property to core
    - LP: #1460674
  * drm: Share plane pixel format check code between legacy and atomic
    - LP: #1460674
  * drm/atomic: Constify a bunch of functions pointer structs
    - LP: #1460674
  * drm: Fix some typo mistake of the annotations
    - LP: #1460674
  * drm: change connector to tmp_connector
    - LP: #1460674
  * drm: atomic: Expose CRTC active property
    - LP: #1460674
  * drm: atomic: Allow setting CRTC active property
    - LP: #1460674
  * drm/atomic-helpers: Properly avoid full modeset dance
    - LP: #1460674
  * drm/atomic: Add helpers for state-subclassing drivers
    - LP: #1460674
  * drm: Fix some typos
    - LP: #1460674
  * drm/atomic: Add for_each_{connector,crtc,plane}_in_state helper macros
    - LP: #1460674
  * drm/atomic-helper: Don't call atomic_update_plane when it stays off
    - LP: #1460674
  * drm/atomic-helper: Really recover pre-atomic plane/cursor behavior
 ...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.