EEH recovery fails for shinner T on firestone
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Vivid |
Fix Released
|
Undecided
|
Tim Gardner | ||
Wily |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
== Comment: #0 - Manvanthara B. Puttashankar <email address hidden> - 2015-07-27 02:38:12 ==
---Problem Description---
EEH recovery fails for shinner T on firestone
Contact Information = <email address hidden>
---uname output---
Linux rcx2c309 3.19.0-23-generic #24~14.04.1-Ubuntu SMP Wed Jul 8 11:17:19 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = firestone
---Debugger---
A debugger is not configured
---Steps to Reproduce---
root@rcx2c309:~# uname -a
Linux rcx2c309 3.19.0-23-generic #24~14.04.1-Ubuntu SMP Wed Jul 8 11:17:19 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
root@rcx2c309:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 100baseT/Half 100baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Half 100baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
Link partner advertised pause frame use: Transmit-only
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 17
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000000 (0)
Link detected: yes
root@rcx2c309:
total 0
drwxr-xr-x 2 root root 0 Jul 24 04:23 ./
drwxr-xr-x 58 root root 0 Jul 24 03:45 ../
lrwxrwxrwx 1 root root 0 Jul 26 23:17 eth0 -> ../../devices/
lrwxrwxrwx 1 root root 0 Jul 24 07:33 eth1 -> ../../devices/
lrwxrwxrwx 1 root root 0 Jul 24 03:45 lo -> ../../devices/
lrwxrwxrwx 1 root root 0 Jul 24 03:45 virbr0 -> ../../devices/
Every 2.0s: netstat -i Sun Jul 26 23:26:16 2015
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth1 1500 0 1230820 0 12167 0 45239 0 0 0 BMRU
lo 65536 0 22 0 0 0 22 0 0 0 LRU
virbr0 1500 0 0 0 0 0 0 0 0 0 BMU
syslog:
Jul 27 01:09:54 rcx2c309 kernel: [ 68.122649] EEH: Frozen PE#1 on PHB#1 detected
Jul 27 01:09:54 rcx2c309 kernel: [ 68.122790] EEH: PE location: N/A, PHB location: N/A
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123539] EEH: This PCI device has failed 1 times in the last hour
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123540] EEH: Notify device drivers to shutdown
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123545] bnx2x: [bnx2x_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123706] bnx2x: [bnx2x_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.154922] bnx2x: [bnx2x_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.155146] EEH: Collect temporary log
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235532] PHB3 PHB#1 Diag-data (Version: 1)
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235535] brdgCtl: 00000002
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235538] RootSts: 00000040 00400000 f0820048 00100147 00002000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235541] PhbSts: 0000001c00000000 0000001c00000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235543] Lem: 0000001000000004 42498e327f502eae 0000000000000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235546] OutErr: 0000000800000000 0000000800000000 0204006000003b10 113c7cd800000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235549] InBErr: 0000000000000020 0000000000000020 4001010000000000 0000000000000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235551] PE[ 1] A/B: 8400001b00000000 80003b10113c7cd8
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235554] EEH: Reset without hotplug activity
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236546] EEH: PHB#1 failure detected, location: N/A
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236698] CPU: 9 PID: 1093 Comm: kworker/9:1 Tainted: G OE 3.19.0-23-generic #24~14.04.1-Ubuntu
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236704] Workqueue: events linkwatch_event
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236706] Call Trace:
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236709] [c000003c9923b6c0] [c000000000a26690] dump_stack+
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236713] [c000003c9923b6f0] [c000000000036a5c] eeh_dev_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236715] [c000003c9923b790] [c000000000036e14] eeh_check_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236737] [c000003c9923b7d0] [d00000001c7854a0] bnx2x_get_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236746] [c000003c9923b830] [d00000001c794c34] bnx2x_fill_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236754] [c000003c9923b8e0] [d00000001c79f2ac] bnx2x_get_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236761] [c000003c9923b910] [d00000001e34f9b0] netdevice_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236765] [c000003c9923ba90] [c0000000000dbce8] notifier_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236767] [c000003c9923bae0] [c0000000008b796c] call_netdevice_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236770] [c000003c9923bb60] [c0000000008bde48] netdev_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236772] [c000003c9923bba0] [c0000000008db014] linkwatch_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236773] [c000003c9923bbd0] [c0000000008db54c] __linkwatch_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236775] [c000003c9923bc40] [c0000000008db6b4] linkwatch_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236778] [c000003c9923bc60] [c0000000000d291c] process_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236780] [c000003c9923bcf0] [c0000000000d31c0] worker_
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236782] [c000003c9923bd80] [c0000000000da4f4] kthread+0x114/0x140
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236785] [c000003c9923be30] [c00000000000956c] ret_from_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038711] pnv_ioda_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038713] eeh_pci_enable: Unexpected state change 2 on PHB#1-PE#1, err=-5
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038937] pnv_ioda_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038938] eeh_pci_enable: Unexpected state change 3 on PHB#1-PE#1, err=-5
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038940] EEH: Notify device drivers the completion of reset
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038943] bnx2x: [bnx2x_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039706] EEH: Frozen PHB#1-PE#1 detected
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039733] EEH: PE location: N/A, PHB location: N/A
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039767] CPU: 9 PID: 812 Comm: eehd Tainted: G OE 3.19.0-23-generic #24~14.04.1-Ubuntu
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039768] Call Trace:
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039770] [c000003ca1e6f840] [c000000000a26690] dump_stack+
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039772] [c000003ca1e6f870] [c000000000036d74] eeh_dev_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039775] [c000003ca1e6f910] [c000000000076c9c] pnv_pci_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039778] [c000003ca1e6f960] [c000000000561204] pci_bus_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039781] [c000003ca1e6f9c0] [c00000000056f574] pci_enable_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039790] [c000003ca1e6fa10] [d00000001c761dc4] bnx2x_io_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039792] [c000003ca1e6fad0] [c00000000003ab04] eeh_report_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039793] [c000003ca1e6fb10] [c0000000000395c8] eeh_pe_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039795] [c000003ca1e6fba0] [c00000000003b584] eeh_handle_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039797] [c000003ca1e6fc20] [c00000000003b968] eeh_handle_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039799] [c000003ca1e6fcd0] [c00000000003bce8] eeh_event_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039801] [c000003ca1e6fd80] [c0000000000da4f4] kthread+0x114/0x140
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039803] [c000003ca1e6fe30] [c00000000000956c] ret_from_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.054577] pci_raw_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.054580] bnx2x 0001:01:00.0: Refused to change power state, currently in D3
Jul 27 01:09:56 rcx2c309 kernel: [ 70.114605] bnx2x: [bnx2x_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.114817] bnx2x: [bnx2x_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.130577] bnx2x 0001:01:00.1: Refused to change power state, currently in D3
Jul 27 01:09:56 rcx2c309 kernel: [ 70.214576] bnx2x: [bnx2x_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.214790] Unable to handle kernel paging request for data at address 0xd0000801827fffff
Jul 27 01:09:56 rcx2c309 kernel: [ 70.214965] Faulting instruction address: 0xd00000001c742a70
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215007] Oops: Kernel access of bad area, sig: 11 [#1]
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215039] SMP NR_CPUS=2048 NUMA PowerNV
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215074] Modules linked in: ipt_MASQUERADE nf_nat_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215777] CPU: 9 PID: 812 Comm: eehd Tainted: G OE 3.19.0-23-generic #24~14.04.1-Ubuntu
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215834] task: c000003ca0139100 ti: c000003ca1e6c000 task.ti: c000003ca1e6c000
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216017] NIP: d00000001c742a70 LR: d00000001c742a50 CTR: c000000000036d90
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216066] REGS: c000003ca1e6f710 TRAP: 0300 Tainted: G OE (3.19.0-23-generic)
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216122] MSR: 9000000100009033 <SF,HV,
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] CFAR: c000000000036e24 DAR: d0000801827fffff DSISR: 40000000 SOFTE: 1
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR00: d00000001c742a50 c000003ca1e6f990 d00000001c809348 d0000801827fffff
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR04: 0000000000000001 c000003ca1e6f970 9000000100009033 0000000000000001
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR08: 0000000000000000 0000000000000000 0000000000000000 d00000001c7d2030
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR12: 0000000000008800 c00000000fb85100 c0000000000da3e8 c000001fe2931980
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000c51108
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR24: c000000000c510e0 0000000000100100 c000001fe25d0000 c000001fe25d0000
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR28: ffffffffffffffff 0000000000000033 00000000ffffffff c000001fe198c900
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217034] NIP [d00000001c742a70] bnx2x_init_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217081] LR [d00000001c742a50] bnx2x_init_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217122] Call Trace:
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217145] [c000003ca1e6f990] [d00000001c742a50] bnx2x_init_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217217] [c000003ca1e6fa10] [d00000001c761f48] bnx2x_io_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217274] [c000003ca1e6fad0] [c00000000003ab04] eeh_report_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217331] [c000003ca1e6fb10] [c0000000000395c8] eeh_pe_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217389] [c000003ca1e6fba0] [c00000000003b584] eeh_handle_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217445] [c000003ca1e6fc20] [c00000000003b968] eeh_handle_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217502] [c000003ca1e6fcd0] [c00000000003bce8] eeh_event_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217558] [c000003ca1e6fd80] [c0000000000da4f4] kthread+0x114/0x140
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217608] [c000003ca1e6fe30] [c00000000000956c] ret_from_
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217798] Instruction dump:
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217825] 40820014 792a07e1 4182000c 4808f5e5 e8410018 893f0033 e87f0020 939f0928
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217917] 79291768 7fde4a14 7c63f214 7c0004ac <81230000> 0c090000 4c00012c 2f89ffff
Jul 27 01:09:56 rcx2c309 kernel: [ 70.218009] ---[ end trace 8d49f86574f73f94 ]---
Jul 27 01:09:56 rcx2c309 kernel: [ 70.218041]
Userspace tool common name: EEH
The userspace tool has the following bit modes: ppc64le
Userspace rpm: EEH
Userspace tool obtained from project website: na
*Additional Instructions for <email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.
== Comment: #8 - Guo Wen Shan <email address hidden> - 2015-08-06 21:01:21 ==
Manvanthara, please catch me through sametime to provide the machine access info so that I can debug it and come up with patch to fix it, thanks!
== Comment: #10 - Mukesh K. Ojha <email address hidden> - 2015-08-18 04:50:19 ==
Hi All,
Any update on this issue?
== Comment: #13 - Guo Wen Shan <email address hidden> - 2015-08-27 20:38:18 ==
Actually, Manvanthara is reporting two different issues from comment#0 and comment#7. I'm looking at the problem reported from comment#7, which can be reproduced with 4.2.rc8 (upstream kernel). I think we might open another bug to trace the issue from comment#7 and let this bug track the issue from comment#0 if Manvanthara agree, as they're different issue from my perspective, thanks!
== Comment: #14 - Guo Wen Shan <email address hidden> - 2015-08-27 22:03:16 ==
One patch was sent to community for review, which is tracked by following link. Also, I installed one private kernel that was built from 4.2.rc8 + the patch. EEH error can be recovered successfully without problem. The kernel can be selected from petiboot menu "Ubuntu, with Linux 4.2.0-rc8gavin+" in case any body want to have a try, thanks!
https:/
== Comment: #15 - Guo Wen Shan <email address hidden> - 2015-08-27 23:42:16 ==
Please ignore the part of "there're different issues" on comment 13. It should be corrected as: they are same issues. So we don't need open another bug at all. Sorry for those stupid confusion :-)
== Comment: #16 - Guo Wen Shan <email address hidden> - 2015-08-27 23:43:36 ==
I was told by Michael Ellerman the patch will be put into 4.3.rc3. Closing it as "fixed".
tags: | added: architecture-ppc64le bugnameltc-128071 severity-critical targetmilestone-inin--- |
affects: | ubuntu → linux (Ubuntu) |
Changed in linux (Ubuntu Vivid): | |
status: | In Progress → Fix Committed |
Ubuntu-4.2.0-14.16