An uncorrectable bus error has occurred on bus cpus

System x3630M4/x3530M4 may report uncorrectable bus error on Central Processing Unit (CPU) and uncorrectable error on memory while performing excessive memory usage application or memory diagnostic. The error from the Integrated Management Module (IMM)chassis event log is as follows and results in a system restart. CHASSIS:(12/23/2014 02:23:11) An Uncorrectable Bus Error has occurred on bus CPUs. CHASSIS:(12/23/2014 02:23:01) Uncorrectable error detected for One of the DIMMs on Subsystem System Memory. The symptom can only be observed by the Three (3) DIMMs for One (1) CPU or Six (6) DIMMs for Two (2) CPUs ( One DIMM per channel ) configuration.

Problem

System x3630M4/x3530M4 may report uncorrectable bus error on Central Processing Unit (CPU) and uncorrectable error on memory while performing excessive memory usage application or memory diagnostic.

The error from the Integrated Management Module (IMM)chassis event log is as follows and results in a system restart.

CHASSIS:(12/23/2014 02:23:11) An Uncorrectable Bus Error has occurred on bus CPUs.
CHASSIS:(12/23/2014 02:23:01) Uncorrectable error detected for One of the DIMMs on Subsystem System Memory.

The symptom can only be observed by the Three (3) DIMMs for One (1) CPU or Six (6) DIMMs for Two (2) CPUs ( One DIMM per channel ) configuration.

Resolving The Problem

Source

RETAIN tip: H213778

Symptom

System x3630M4/x3530M4 may report uncorrectable bus error on
Central Processing Unit (CPU) and uncorrectable error on memory
while performing excessive memory usage application or memory
diagnostic.

The error from the Integrated Management Module (IMM) chassis
event log is as follows and results in a system restart.

 

CHASSIS:(12/23/2014 02:23:11) An Uncorrectable Bus Error has
occurred on bus CPUs.

CHASSIS:(12/23/2014 02:23:01) Uncorrectable error detected for
One of the DIMMs on Subsystem System Memory.

The symptom can only be observed by the Three (3) DIMMs for One
(1) CPU or Six (6) DIMMs for Two (2) CPUs ( One DIMM per channel )
configuration.

Affected configurations

The system may be any of the following IBM servers:

  • System x3530 M4, type 7160 E5-xxxxV2, any model
  • System x3630 M4, type 7158 E5-xxxxV2, any model

This tip is not software specific.

This tip is not option specific.

The following system BIOS/UEFI level(s) are affected: UEFI
version 1.60/1.70/1.71 are affected.

Solution

The issue has been addressed by the UEFI firmware code version
2.12 or later.

The file is or will be available by selecting the appropriate
Product Group, type of System, Product name, Product machine type,
and operating system on IBM Support’s Fix Central web page, at the
following URL:

http://www.ibm.com/support/fixcentral/

Workaround

This issue can be worked around by changing the UEFI setting to
lower memory speed from default 1600 MT/s to 1333 MT/s.

In UEFI setup menu, Select ‘System Settings’ —> ‘Operating
Modes’ —> Choose Operating Mode: ‘Custom Mode’ —> Change
Memory Speed to ‘Balanced’.

Additional information

The system mistakenly triggers the failure symptom. The UEFI
firmware will address this issue in next release.

Document Location

Worldwide

Operating System

System x:Operating system independent / None

[{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU01GCQ»,»label»:»System x->System x3530 M4->7160″},»Platform»:[{«code»:»PF025″,»label»:»Platform Independent»}],»Line of Business»:{«code»:»»,»label»:»»}},{«Type»:»HW»,»Business Unit»:{«code»:»BU016″,»label»:»Multiple Vendor Support»},»Product»:{«code»:»QU91NCW»,»label»:»System x->System x3630 M4->7158″},»Platform»:[{«code»:»PF025″,»label»:»Platform Independent»}],»Line of Business»:{«code»:»»,»label»:»»}}]

About Lenovo

  • Our Company

  • News

  • Investor Relations

  • Sustainability

  • Product Compliance

  • Product Security

  • Lenovo Open Source

  • Legal Information

  • Jobs at Lenovo

Shop

  • Laptops & Ultrabooks

  • Tablets

  • Desktops & All-in-Ones

  • Workstations

  • Accessories & Software

  • Servers

  • Storage

  • Networking

  • Laptop Deals

  • Outlet

Support

  • Drivers & Software

  • How To’s

  • Warranty Lookup

  • Parts Lookup

  • Contact Us

  • Repair Status Check

  • Imaging & Security Resources

Resources

  • Where to Buy

  • Shopping Help

  • Sales Order Status

  • Product Specifications (PSREF)

  • Forums

  • Registration

  • Product Accessibility

  • Environmental Information

  • Gaming Community

  • LenovoEDU Community

  • LenovoPRO Community

©

Lenovo.

|
|
|
|


Posted by dianesalas 2017-06-14T09:22:32Z

Hello.

I have replaced motherboard and IO shuttle after a severe power failure.  I am now getting:

0x8007020c2587ffff — Sensor «MemSpareErr7» has transitioned to critical from a less severe state

0x806f08131801ffff — A Uncorrectable Bus Error has occurred on system «Host»

«MemSpareErr7» and «Host» are not found in any documentation and all the info is vague.  Can someone point me in a good direction?

3 Replies

  • Not sure if that one will help:

    https://www.manualslib.com/manual/1217508/Ibm-X3950-X5.html?page=750 Opens a new window

    Text

    806f0813-1801ffff A Uncorrectable Bus Error has occurred on system [ComputerSystemElementName].
    Explanation: IMM has detected a bus uncorrectable error.
    May also be shown as 806f08131801ffff or 0x806f08131801ffff
    Severity: Error
    Alert Category: Critical
    Serviceable: No
    CIM Information: Prefix: PLAT and ID: 0240
    SNMP Trap ID: 50
    Automatically notify Support: No
    User response: This is a UEFI detected event. The UEFI diagnostic code for this event can be found in the logged
    IMM message text. Please refer to the UEFI diagnostic code in the "UEFI diagnostic code" section of the Info Center
    for the appropriate user response
    

    And that ones:

    http://systemx.lenovofiles.com/help/index.jsp?topic=%2Fcom.lenovo.conv.5462.doc%2Fr_imm_error_messag… Opens a new window

    http://systemx.lenovofiles.com/help/index.jsp?topic=%2Fcom.lenovo.conv.5462.doc%2F8007020c2502ffff.h… Opens a new window

    Text

    8007020c-2502ffff : Sensor [SensorElementName] has transitioned to critical from a less severe state. (nvDIMM 02 Status)
    
    Sensor [SensorElementName] has transitioned to critical from a less severe state. (nvDIMM 02 Status)
    This message is for the use case when an implementation has detected a Sensor transitioned to critical from less severe.
    
    May also be shown as 8007020c2502ffff or 0x8007020c2502ffff
    Severity
    
    Error
    Alert Category
    
    Critical - Other
    Serviceable
    
    Yes
    CIM Information
    
    Prefix:PLAT
    
    and ID:0522
    SNMP Trap ID
    
    50
    Automatically Notify Support
    
    No
    User Response
    None
    


    Was this post helpful?
    thumb_up
    thumb_down

  • Author Tom Demetriou

    Did you check the firmware levels?

    Is everything reseated properly and grounded?

    Check all memory and CPU and cards

    Can you access IMM or does it not even get that far?


    Was this post helpful?
    thumb_up
    thumb_down

  • Author Tom Demetriou

    If you get into the BIOS/UEFI check to see that it recognizes the drives

    Play with those settings


    Was this post helpful?
    thumb_up
    thumb_down

  • #1

Добрый день! Есть сервер HP DL 560 Gen9, работающий на windows server 2016. Там живет Microsoft SQL server 2016 и сервер 1с предприятия. В выходные на емэйл пришли ошибки от ilo, сервер перестал пинговаться и видимо завис. Ошибки такие:

— PCI Bus 01/26/2021 15:15 01/26/2021 15:15 1 PCI Bus Error (Slot 0, Bus 0, Device 0, Function 0)
— System Error 01/26/2021 15:15 01/26/2021 15:15 1 Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible
— PCI Bus 01/26/2021 15:15 01/26/2021 15:15 1 Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 0, Function 0, Error status 0x00000000)
— PCI Bus 01/26/2021 15:15 [NOT SET] 1 Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 0, Function 0, Error status 0x00000000)

Перезагрузил сервер удаленно через ilo — вроде он ожил. Но на долго ли.. Помогите понять что за ошибка и что сломалось ?:confused:

  • #2

HP ProLiant Servers — How to Decode Uncorrectable PCI Express Error

Information​

This document will help user in decoding the Uncorrectable PCI Express Error.
Ex: Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 8, Function 0, Error status 0x00000000

Details​

This particular PCI Express Error could be decoded by using the logs mentioned below.

  1. Advanced Survey Report.
  2. lspci Output from a Linux Machine or ESX Machine.

Advanced Survey Report:

c03525471.jpg

NOTE: Use the Vendor ID and the Device ID to determine the hardware device.
LSPCI Output:
If the server is running Linux or ESX, collect the OS logs from the server.
Check the lspci.txt in the OS logs. User should be able to find the information as listed in the screenshot below:

c03525472.jpg

In this Example, check the numbers listed before the word Bridge.
000:000:08.0 Bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8.
000 —> Represents PCI Domain (Every PCI Domain could have 256 PCI Buses).
000 —> Bus
08 —> Device
0 —> Function
By using either of these logs, the PCI Express Error could be narrowed down to the hardware device causing the error.
NOTE: The Values mentioned in the IML Logs are Decimal Values. The values in the Advanced Survey Report is in the decimal Value. However the values in the lspci command output is in hexadecimal value. Everytime the values has to be converted to hexadecimal when comparing the values in the lspci output.

  • #3

Если не последняя версия прошивки то можно попробовать вылечить через сервисный пак SPP Gen9 Production Version *: 2021.10.0
Подобного рода алерты как правило часто вызваны устарвшими версиями прошивок таких компонентов как System Rom и ILO.

  • #4

Если не последняя версия прошивки то можно попробовать вылечить через сервисный пак SPP Gen9 Production Version *: 2021.10.0
Подобного рода алерты как правило часто вызваны устарвшими версиями прошивок таких компонентов как System Rom и ILO.

Попробую но не думаю что это поможет, позже напишу. Сейчас нет возможности перезагрузить сервер

Surf_rider


  • #5

Если не последняя версия прошивки то можно попробовать вылечить через сервисный пак SPP Gen9 Production Version *: 2021.10.0
Подобного рода алерты как правило часто вызваны устарвшими версиями прошивок таких компонентов как System Rom и ILO.

Хьюлеты при обращении в саппорт или в другой любой непонятной ситуации выдают стандартный ответ — обновите прошивку:cautious:

  • #6

Обновил все firmware. Пока полет нормальный. Думаю помогло.

806f0813-2581ffff • 806f0813-2584ffff

806f0813-2581ffff A Uncorrectable Bus Error has occurred on system [ComputerSystemElementName].

Explanation: IMM has detected a bus uncorrectable error.

May also be shown as 806f08132581ffff or 0x806f08132581ffff

Severity: Error

Alert Category: Critical

Serviceable: Yes

CIM Information: Prefix: PLAT and ID: 0240

SNMP Trap ID: 50

Automatically notify Support: Yes

User response: This is a UEFI detected event. The UEFI diagnostic code for this event can be found in the logged

IMM message text. Please refer to the UEFI diagnostic code in the «UEFI diagnostic code» section of the Info Center

for the appropriate user response.

806f0813-2582ffff A Uncorrectable Bus Error has occurred on system [ComputerSystemElementName].

Explanation: IMM has detected a Bus Uncorrectable Error.

May also be shown as 806f08132582ffff or 0x806f08132582ffff

Severity: Error

Alert Category: Critical

Serviceable: Yes

CIM Information: Prefix: PLAT and ID: 0240

SNMP Trap ID: 50

Automatically notify Support: Yes

User response: This is a UEFI detected event. The UEFI diagnostic code for this event can be found in the logged

IMM message text. Please refer to the UEFI diagnostic code in the «UEFI diagnostic code» section of the Info Center

for the appropriate user response.

806f0813-2584ffff A Uncorrectable Bus Error has occurred on system [ComputerSystemElementName].

Explanation: IMM has detected a bus uncorrectable error.

May also be shown as 806f08132584ffff or 0x806f08132584ffff

Severity: Error

Alert Category: Critical

Serviceable: No

CIM Information: Prefix: PLAT and ID: 0240

SNMP Trap ID: 50

Automatically notify Support: No

User response: This is a UEFI detected event. The UEFI diagnostic code for this event can be found in the logged

IMM message text. Please refer to the UEFI diagnostic code in the «UEFI diagnostic code» section of the Info Center

for the appropriate user response.

735

Appendix B. Integrated management module error messages

Понравилась статья? Поделить с друзьями:
  • An error occurred while updating the entries see the inner exception for details перевести
  • An error occurred while updating the entries see the inner exception for details entity framework
  • An error occurred while updating the device software use emergency recovery function in the smart
  • An error occurred while updating dota 2 invalid app configuration
  • An error occurred while updating counter strike global offensive