Issue
- The following errors are reported on the server console.
0.000033 Thu Feb 13 2014 20:07:05 CPU 2 has an internal error (IERR).
Normal 0.000032 Thu Feb 13 2014 09:04:50 An OEM diagnostic event has occurred.
Critical 0.000031 Thu Feb 13 2014 09:04:50 A bus fatal error was detected on a component at bus 0 device 5 function 0.
Critical 0.000030 Thu Feb 13 2014 09:04:50 A bus fatal error was detected on a component at bus 9 device 0 function 0.
Environment
- Red Hat Enterprise Linux Server 5
- Red Hat Enterprise Linux Server 6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.
Current Customers and Partners
Log in for full access
Log In
- Remove From My Forums
-
Question
-
need help with this Dell R510 Server keeps getting fatal hardware error and rebooting
A fatal hardware error has occurred.
Component: PCI Express Root Port
Error Source: GenericBus:Device:Function: 0x0:0xA:0x0
Vendor ID:Device ID: 0x8086:0x3411
Class Code: 0x60400The details view of this entry contains further information.
[ Name] Microsoft-Windows-WHEA-Logger [ Guid] {C26C4F3C-3F66-4E99-8F8A-39405CFED220} Keywords 0x8000000000000000 [ SystemTime] 2016-05-21T01:26:57.796273100Z [ ActivityID] {3843F13F-98CC-4457-BCE3-AC8C702907D5} [ ProcessID] 1788 [ ThreadID] 1776 FRUId {00000000-0000-0000-0000-000000000000} UncorrectableErrorStatus 0x4000 CorrectableErrorStatus 0x0 HeaderLog 00000000000000000000000000000000 RawData 435045521002FFFFFFFF01000100000002000000980100002A150100150510140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB5713167A4623E40AB9A40A698F362D464B38F2A68110688ADD101000000004552000000000000000000000000000000000000C8000000D0000000010200000100000054E995D9C1BB0F43AD91B44DCB3C6F3500000000000000000000000000000000010000000000000000000000000000000000000000000000EF000000000000000400000000010000470510400000000086801134000406000A0000000500000000000000000000000060070010E04201218000002C010400423C3B0A40004130800C0800C00348010E000100000000000000000000000000000000000000000000000000000000000100011500400000008031003070060000000000C13100000E00000000000000000000000000000000000000000000005C0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Data from iDRAC
Sat May 21 2016 02:21:46 A runtime critical stop occured.
Sat May 21 2016 02:21:39 An OEM diagnostic event has occurred.
Sat May 21 2016 02:21:39 A bus fatal error was detected on a component at bus 0 device 10 function 0.
Sat May 21 2016 02:21:39 An OEM diagnostic event has occurred.
Sat May 21 2016 02:21:39 A bus fatal error was detected on a component at slot 1.
Sat May 14 2016 03:26:34 This is an OEM record.-
Edited by
Saturday, May 21, 2016 11:48 AM
update
-
Edited by
Answers
-
-
Proposed as answer by
Jay Gu
Friday, May 27, 2016 6:03 AM -
Marked as answer by
Jay Gu
Tuesday, June 7, 2016 3:11 AM
-
Proposed as answer by
Hi,
we are running some R720’s for VMware View and they all are having this same issues. where they purple screen of death. and in the logs we have the following errors
A bus fatal error was detected on a component at bus 0 device 3 function 0.
A bus fatal error was detected on a component at slot 6.
I tried to use lspci and I get an output of the device but all i see is…
0000:00:03.0 Bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 3a [PCIe RP[0000:00:03.0]]
but I cant go any further than this, the -s switch some people have suggested is not accepted.
can any one help me determine what device is actually causing the issues. what physical slot does 3a refer to?
I would be very very grateful for some help here.
- If you can’t explain it simply, you don’t understand it well enough. Albert Einstein
- An arrow can only be shot by pulling it backward. So when life is dragging you back with difficulties. It means that its going to launch you into something great. So just focus and keep aiming.
Error Code
Message Information
Action
PCI1360
Message
LCD Message
Details
Action
PCI1362
Message
Details
Action
PCI2000
Message
LCD Message
Details
Action
PCI2002
Message
LCD Message
Details
Action
PCI3000
Message
Details
Action
PCI3002
Message
104
Cycle input power, update component drivers, if device is removable,
reinstall the device.
A bus fatal error was detected on a component at slot <
number >. Re-seat PCI card.
Bus fatal error on slot <
System performance may be degraded, or system may fail to operate.
Cycle input power, update component drivers, if device is removable,
reinstall the device.
Bus performance degraded for a component at slot <
System performance may be degraded. The bus is not operating at
maximum speed or width.
Cycle input power, update component drivers, remove and reinstall
the device at the next scheduled service time.
A fatal IO error detected on a component at bus
bus >device< device >function < func >.
<
bus > device < device > function < func >.
Fatal IO error on bus <
System performance may be degraded, or system may fail to operate.
Cycle input power, update component drivers, remove and reinstall
the device.
A fatal IO error detected on a component at slot <
number >.
Fatal IO error on slot <
System performance may be degraded, or system may fail to operate.
Cycle input power, update component drivers, remove and reinstall
the device.
Device option ROM on embedded NIC failed to support Link Tuning or
FlexAddress.
Either the BIOS, BMC/iDRAC, or LOM firmware is out of date and does
not support FlexAddress.
Update BIOS, BMC/iDRAC, and LOM firmware. If the issue persists,
see
Getting
Help.
Failed to program virtual MAC address on a component at bus
bus >device< device >function < func >.
<
number >.
number >.
number >.
Skip to content
I had a Dell R815 host crash yesterday, with the following PSOD error message;
The system has found a problem on your machine and cannot continue.
LINT1 motherboard interrupt. This is a hardware problem; please contact your hardware vendor.
When I checked the system logs on the iDRAC, I could see a bus fatal error logged;
System Event Logs
Severity | Time | Description |
---|---|---|
Critical | 18:24:36 | The watchdog timer expired. |
Normal | 18:16:37 | An OEM diagnostic event has occurred. |
Critical | 18:16:36 | A bus fatal error was detected on a component at bus 4 device 4 function 0. |
I ran the integrated hardware diagnostics using the system services on boot (F10) which confirmed these errors, but only because it read the system logs. I find this really annoying because if I had cleared the event logs prior to running the hardware diagnostics no errors would have been reported, and now I’m not sure if the hardware is faulty or not. Here are the reported errors;
Either way I can’t put it back into production without further analysis and need to find out what hardware component is located at bus 4 device 4 function 0 so that I can log a support call to Dell. It turns out this is really easy, using the lspci command which returns detailed info on all PCI devices.
lspci prints the device syntax in the [domain]:[bus]:[device].[function] format, so it’s easy to add the device information to grep the specific component without seeing all the other PCI devices. Here is what mine returned;
lspci
~ # lspci | grep '000:004:04.0' 000:004:04.0 Bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] ~ # lspci --help lspci -p --pciinfo Prints detailed info on all PCI devices -n --nolookup Don't look up PCI device names and info -d --dump Print hex dump of the full config space -v --verbose Verbose information
So now I know there was a problem with the PCI bridge and can log this to Dell in the hope that they simply replace the component under warranty.
45,962 total views, 2 views today
An independent IT contractor with a strong focus on VMware virtualisation and infrastructure operations. I am inspired by technology, not afraid to question the status quo and balance my professional commitments with entertaining my three awesome kids (Ashton, Oliver and Lara).
View all posts by Jon Munday
Error Code
Message Information
Action
Cycle input power, update component drivers, if device is removable,
reinstall the device.
PCI1320
Message
A bus fatal error was detected on a component at bus
<
bus
>device<
device
>function <
func
>.
LCD Message
Bus fatal error on bus <
bus
> device <
device
> function <
func
>. Power
cycle system.
Details
System performance may be degraded, or system may fail to operate.
Action
Cycle input power, update component drivers, if device is removable,
reinstall the device.
PCI1342
Message
A bus time-out was detected on a component at slot <
number
>.
Details
System performance may be degraded, or system may fail to operate.
Action
Cycle input power, update component drivers, if device is removable,
reinstall the device.
PCI1348
Message
A PCI parity error was detected on a component at slot <
number
>.
LCD Message
PCI parity error on slot <
number
>. Re-seat PCI card.
Details
System performance may be degraded, or system may fail to operate.
Action
Cycle input power, update component drivers, if device is removable,
reinstall the device.
PCI1360
Message
A bus fatal error was detected on a component at slot <
number
>.
LCD Message
Bus fatal error on slot <
number
>. Re-seat PCI card.
Details
System performance may be degraded, or system may fail to operate.
Action
Cycle input power, update component drivers, if device is removable,
reinstall the device.
PDR0001
Message
Fault detected on drive <
number
>.
LCD Message
Fault detected on drive <
number
>. Check drive.
Details
The controller detected a failure on the disk and has taken the disk
offline.
Action
PDR1016
Message
Drive <
number
> is removed from disk drive bay <
bay
>.
163