Ubuntu 21.04 crashes with nouveau_drm_ioctl

Asked by George Law

Ubuntu - currently running 21.04 upgraded from 20.10

BIOSTAR motherboard :
product: A960D+ (To Be Filled By O.E.M.)
vendor: BIOSTAR Group

GeForce 8400 GS video card :
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)

$ sudo lshw -c display
  *-display
       description: VGA compatible controller
       product: G98 [GeForce 8400 GS Rev. 2]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nouveau latency=0
       resources: irq:30 memory:fd000000-fdffffff memory:d0000000-dfffffff memory:fa000000-fbffffff ioport:d800(size=128) memory:c0000-dffff

I had been getting random hangs - I only had this machine running some docker containers and minimal activity - remotely displaying a remote VM via a ssh spice:// connection with vinagree

I installed kdump this morning to try to get a vmcore and I've gotten 3 this morning after enabling all of the panic sysctls

here is a snippet of the most recent -- you can see the uptime was only 1765 seconds -- just short of 30 minutes

  979.118573] virbr0: topology change detected, propagating
[ 1765.577128] general protection fault, probably for non-canonical address 0x2e874c263be30124: 0000 [#1] SMP NOPTI
[ 1765.577144] CPU: 1 PID: 1646 Comm: Xorg Kdump: loaded Not tainted 5.11.0-16-generic #17-Ubuntu
[ 1765.577153] Hardware name: BIOSTAR Group A960D+/A960D+, BIOS 080015 09/04/2014
[ 1765.577158] RIP: 0010:kmem_cache_alloc_trace+0x90/0x200
[ 1765.577229] Code: 94 92 7b 49 8b 00 49 83 78 10 00 48 89 45 c0 0f 84 46 01 00 00 48 85 c0 0f 84 3d 01 00 00 41 8b 4c 24 28 49 8b 3c 24 48 01 c1 <48> 8b 19 48 89 ce 49 33 9c 24 b8 00 00 00 48 8d 4a 01 48 0f ce 48
[ 1765.577241] RSP: 0018:ffffb5db4139bc60 EFLAGS: 00010206
[ 1765.577252] RAX: 2e874c263be300f4 RBX: ffff8d25fe924a00 RCX: 2e874c263be30124
[ 1765.577259] RDX: 00000000000574bd RSI: 0000000000000cc0 RDI: 00000000000310c0
[ 1765.577266] RBP: ffffb5db4139bca8 R08: ffff8d26a6c710c0 R09: 0000000000000000
[ 1765.577273] R10: ffff8d258b70c700 R11: ffff8d258b70c002 R12: ffff8d2580043a00
[ 1765.577278] R13: ffff8d2580043a00 R14: 0000000000000048 R15: 0000000000000cc0
[ 1765.577285] FS: 00007ffb16e59a40(0000) GS:ffff8d26a6c40000(0000) knlGS:0000000000000000
[ 1765.577293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1765.577299] CR2: 00007ffb0c003050 CR3: 0000000107280000 CR4: 00000000000406e0
[ 1765.577306] Call Trace:
[ 1765.577313] ? nouveau_gem_object_unmap.constprop.0+0x5c/0xd0 [nouveau]
[ 1765.577596] nouveau_gem_object_unmap.constprop.0+0x5c/0xd0 [nouveau]
[ 1765.577869] nouveau_gem_object_close+0x9f/0x100 [nouveau]
[ 1765.578104] drm_gem_object_release_handle+0x30/0x80 [drm]
[ 1765.578193] drm_gem_handle_delete+0x59/0xa0 [drm]
[ 1765.578266] ? drm_gem_handle_create+0x40/0x40 [drm]
[ 1765.578339] drm_gem_close_ioctl+0x24/0x30 [drm]
[ 1765.578412] drm_ioctl_kernel+0xae/0xf0 [drm]
[ 1765.578486] drm_ioctl+0x245/0x400 [drm]
[ 1765.578560] ? drm_gem_handle_create+0x40/0x40 [drm]
[ 1765.578633] ? __fget_files+0x5f/0x90
[ 1765.578643] ? __fget_light+0x32/0x80
[ 1765.578655] nouveau_drm_ioctl+0x66/0xc0 [nouveau]
[ 1765.578885] __x64_sys_ioctl+0x91/0xc0
[ 1765.578894] do_syscall_64+0x38/0x90

The latest dmesg from the lastest crash :
https://paste.ubuntu.com/p/7SsDBq5vr3/

I can provide the dump if needed.

Question information

Language:
English Edit question
Status:
Expired
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
George Law (9n01bo-q7sdgm-bc94vy) said :
#1

Running the latest kernel available :

glaw@fedora 202105051125]$ uname -a
Linux fedora 5.11.0-16-generic #17-Ubuntu SMP Wed Apr 14 20:12:43 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
George Law (9n01bo-q7sdgm-bc94vy) said :
#2

Some additional notes - I do not think this is a NEW issue

from one of my previous kernel.log files before the upgrade to 21.04 on the 5.8 kernel

Apr 6 10:41:16 fedora kernel: [95565.555713] general protection fault, probably for non-canonical address 0xee27b5839e1aa064: 0000 [#1] SMP NOPTI
Apr 6 10:41:16 fedora kernel: [95565.555722] CPU: 2 PID: 1548 Comm: Xorg Not tainted 5.8.0-48-generic #54-Ubuntu
Apr 6 10:41:16 fedora kernel: [95565.555725] Hardware name: BIOSTAR Group A960D+/A960D+, BIOS 080015 09/04/2014
Apr 6 10:41:16 fedora kernel: [95565.555734] RIP: 0010:kmem_cache_alloc_trace+0x8c/0x240
Apr 6 10:41:16 fedora kernel: [95565.555740] Code: 08 65 4c 03 05 7d b9 d4 48 49 83 78 10 00 4d 8b 20 0f 84 8e 01 00 00 4d 85 e4 0f 84 85 01 00 00 41 8b 41 20 49 8b 39 4c 01 e0 <48> 8b 18 48 89 c1 49 33 99 70 01 00 00 4c 89 e0 48 0f c9 48 31 cb
Apr 6 10:41:16 fedora kernel: [95565.555743] RSP: 0018:ffffac6501887c60 EFLAGS: 00010282
Apr 6 10:41:16 fedora kernel: [95565.555748] RAX: ee27b5839e1aa064 RBX: 0000000000000000 RCX: ffff934ddfcc7710
Apr 6 10:41:16 fedora kernel: [95565.555751] RDX: 0000000000d38d1f RSI: 0000000000000cc0 RDI: 00000000000310c0
Apr 6 10:41:16 fedora kernel: [95565.555753] RBP: ffffac6501887c90 R08: ffff934e66cb10c0 R09: ffff934e65c07480
Apr 6 10:41:16 fedora kernel: [95565.555756] R10: ffff934dfe384dd8 R11: ffff934e1b426912 R12: ee27b5839e1aa034
Apr 6 10:41:16 fedora kernel: [95565.555759] R13: 0000000000000cc0 R14: 0000000000000048 R15: ffff934e65c07480
Apr 6 10:41:16 fedora kernel: [95565.555763] FS: 00007f867c855a40(0000) GS:ffff934e66c80000(0000) knlGS:0000000000000000
Apr 6 10:41:16 fedora kernel: [95565.555766] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 6 10:41:16 fedora kernel: [95565.555768] CR2: 00007f8671755000 CR3: 00000001e0c54000 CR4: 00000000000406e0
Apr 6 10:41:16 fedora kernel: [95565.555771] Call Trace:
Apr 6 10:41:16 fedora kernel: [95565.555904] ? nouveau_gem_object_unmap.constprop.0+0x5c/0xd0 [nouveau]
Apr 6 10:41:16 fedora kernel: [95565.556019] nouveau_gem_object_unmap.constprop.0+0x5c/0xd0 [nouveau]
Apr 6 10:41:16 fedora kernel: [95565.556134] nouveau_gem_object_close+0xad/0x110 [nouveau]
Apr 6 10:41:16 fedora kernel: [95565.556186] drm_gem_object_release_handle+0x35/0xa0 [drm]
Apr 6 10:41:16 fedora kernel: [95565.556222] drm_gem_handle_delete+0x59/0xa0 [drm]
Apr 6 10:41:16 fedora kernel: [95565.556259] ? drm_gem_handle_create+0x40/0x40 [drm]
Apr 6 10:41:16 fedora kernel: [95565.556296] drm_gem_close_ioctl+0x24/0x30 [drm]
Apr 6 10:41:16 fedora kernel: [95565.556333] drm_ioctl_kernel+0xae/0xf0 [drm]
Apr 6 10:41:16 fedora kernel: [95565.556371] drm_ioctl+0x238/0x3d0 [drm]
Apr 6 10:41:16 fedora kernel: [95565.556408] ? drm_gem_handle_create+0x40/0x40 [drm]
Apr 6 10:41:16 fedora kernel: [95565.556414] ? hrtimer_try_to_cancel.part.0+0x54/0xf0
Apr 6 10:41:16 fedora kernel: [95565.556529] nouveau_drm_ioctl+0x66/0xc0 [nouveau]
Apr 6 10:41:16 fedora kernel: [95565.556536] ksys_ioctl+0x8e/0xc0
Apr 6 10:41:16 fedora kernel: [95565.556540] __x64_sys_ioctl+0x1a/0x20
Apr 6 10:41:16 fedora kernel: [95565.556545] do_syscall_64+0x49/0xc0
Apr 6 10:41:16 fedora kernel: [95565.556550] entry_SYSCALL_64_after_hwframe+0x44/0xa9

This only started pushing my buttons recently because the machine was locking up over night and I was having to do a hard power cycle every morning recently.

Revision history for this message
Daniel Letzeisen (dtl131) said :
#3

It could be the GPU going bad. Nvidia GPU's from this era are notorious for failing prematurely.

Revision history for this message
George Law (9n01bo-q7sdgm-bc94vy) said :
#4

Thanks Daniel. Just ordered a ATI FireGL V7350 card that will be here Monday - 🤞 hopefully it will solve my problem but whatever the underlying issue is with nouveau could remain. I will update this if it does clear up with the new card which I fully expect.

Revision history for this message
Daniel Letzeisen (dtl131) said :
#5

Have you tried the nvidia blob driver to see if it does any better?

Revision history for this message
Launchpad Janitor (janitor) said :
#6

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
George Law (9n01bo-q7sdgm-bc94vy) said :
#7

I replaced the card - swapped the NVIDIA card for a AMD/ATI one :
  *-display:0
       description: VGA compatible controller
       product: R520 GL [FireGL V7350]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=radeon latency=0
       resources: irq:28 memory:d0000000-dfffffff memory:febf0000-febfffff ioport:d000(size=256) memory:c0000-dffff

I ran several hours of video conversion on that machine yesterday trying to push it to the point where it was previously hanging up with nouveau ... 100% smooth sailing without issue.

nouveau removed from the equation - This can be closed