Correctable memory error logging disabled for a memory device at location - Исправление ошибок и поиск оптимальных решений проблем

Error Code

Message Information

MEM8000

Message

Correctable memory error logging disabled for a memory device at
location <

location

Details

Errors are being corrected but no longer logged.

Action

Review system logs for memory exceptions. Reinstall memory at
location <

location

PCI1302

Message

A bus time-out was detected on a component at bus <

bus

device<

device

> function <

func

Details

System performance may be degraded. The device has failed to
respond to a transaction.

Action

Cycle input power, update component drivers, if device is removable,
reinstall the device.

PCI1304

Message

An I/O channel check error was detected.

Action

Cycle input power, update component drivers, if device is removable,
reinstall the device.

PCI1308

Message

A PCI parity error was detected on a component at bus
<

bus

>device<

device

>function <

func

Details

System performance may be degraded, PCI device may fail to
operate, or system may fail to operate.

Action

Cycle input power, update component drivers, if device is removable,
reinstall the device.

PCI1320

Message

A bus fatal error was detected on a component at bus
<

bus

>device<

device

>function <

func

Details

System performance may be degraded, or system may fail to operate.

Action

Cycle input power, update component drivers, if device is removable,
reinstall the device.

PCI1342

Message

A bus time-out was detected on a component at slot <

number

Details

System performance may be degraded, or system may fail to operate.

Action

Cycle input power, update component drivers, if device is removable,
reinstall the device.

PCI1348

Message

A PCI parity error was detected on a component at slot <

number

109

Источник

Обновлено 14.12.2016

Всем привет сегодня на IBM Blade HS22 вылезла ошибка Correctable ECC memory error logging limit reached. Я расскажу как ее решить. Появляется данная проблема в журналах AMM, кто не в курсе AMM это вебинтерфейс управления корзиной с блейд серверами IBM.

Вот как выглядит данная ошибка в AMM.

Ошибка Correctable ECC memory error logging limit reached на IBM HS22-1

Ошибка Correctable ECC memory error logging limit reached, возникает с проблемой в оперативной памяти, сам IBM в первую очередь советует прошить все по максимуму, и если не поможет вытащить блейд и пере ткнуть DDR память.

и в логах эта ошибка тоже присутствует и имеет код 0x806f050c.

Ошибка Correctable ECC memory error logging limit reached на IBM HS22-2

Я пошел первым путем решил все обновить. Ранее я вам рассказывал Как обновить все прошивки на IBM Blade HS22

После обновления видим в логах что ошибка в состоянии recovery

Ошибка Correctable ECC memory error logging limit reached на IBM HS22-11

и когда будет произведена перезагрузка после обновления вы увидите, что ошибка благополучно исчезла и все зеленое.

Как обновить все прошивки на IBM Blade HS22-10

Вот так вот просто решается Ошибка Correctable ECC memory error logging limit reached на IBM HS22.

Материал сайта pyatilistnik.org

Дек 14, 2016 10:49

Источник

Suggest you check the BIOS logs on the servers that have had the problem.

We get that once every so often on a Dell 2950 with quad-core intels — looked in the BIOS log the other day and found a note that said that number 8 memory dimm had been ‘disabled’ due to it failing an ECC check. Rebooted the server after the ESX failed and it will run for ages — then the BIOS will lock out DIMM8 and things go hinky again.

Doesn’t seem to matter whih physical dimm is in socket 8 — even when the memory is swapped out we get same issue — Uncorrectable memory error in esx and a Dimm8 Disabled due to ecc failure in bios.

I ended up leaving the last pair of dimms out and the error hasn’t occured now for about 3 months

suspicion = something I saw with HP DL servers once… each dimm can be seen as a single rank per side, or dual rank per side — some servers can only take so many ranks… the DL380G4 had a limitation (I seem to recall) of 8 ranks total and not more than dual-rank (or 2 ranks) per dimm… therefore a dual-sided, Dual-rank dimm counted as four ranks (and thus wouldn’t work or wouldnt function as expected)

With the Dell, I believe that the fact that I had 4Gb dimms in it (that counted as dual-sided AND dual-rank) might somehow exeed whatever the rank limitation is. I haven’t found a limit published… is just a thoery based on experience

Источник

Detailed

The memory may not be operational. This an early indicator of a possible future uncorrectable

Description

error.

Recommended

Re-install the memory component. If the problem continues, contact support.

Response Action

Category

System Health

SubCategory

MEM = Memory

Severity

Severity 1 (Critical)

Trap/EventID

FALSE

LCD Message

Correctable memory error rate exceeded for <location>. Reseat memory.

Initial Default.

FALSE

Filter

IPMI Alert SNMP

Visibility

Alert

FALSE

Message

Correctable memory error logging disabled for a memory device at location

Arguments

Detailed

Errors are being corrected but no longer logged.

Description

Recommended

Review system logs for memory exceptions. Re-install memory at location <location>

Response Action

Category

System Health

SubCategory

MEM = Memory

Severity

Severity 1 (Critical)

Trap/EventID

FALSE

LCD Message

SBE log disabled on <location>. Reseat memory

Initial Default.

FALSE

Filter

IPMI Alert SNMP

Visibility

Alert

FALSE

MEM8001

Message

Persistent correctable memory error logging enabled for a memory device at location

Arguments

LC Log

Alert

FALSE

arg1 = location

•

LC Log

Alert

FALSE

arg1 = location

•

LCD

Power Off Power

Cycle

FALSE

LCD

Power Off Power

Cycle

FALSE

Reset

FALSE

arg1 .

Reset

FALSE

arg1 .

379

Источник

Detailed
Description

The memory may not be operational. This an early indicator of a possible future uncorrectable
error.

Recommended
Response Action

Re-install the memory component. If the problem continues, contact support.

Category

System Health

SubCategory

MEM = Memory

Severity

Severity 1 (Critical)

Trap/EventID

FALSE

LCD Message

Correctable memory error rate exceeded for <location>. Reseat memory.

Initial Default.

FALSE

Filter

Visibility

IPMI Alert SNMP

Alert

LC Log

LCD

Power Off Power

Cycle

Reset

FALSE

MEM8000

Message

Correctable memory error logging disabled for a memory device at location

arg1

Arguments

•

arg1 =

location

Detailed
Description

Errors are being corrected but no longer logged.

Recommended
Response Action

Review system logs for memory exceptions. Re-install memory at location <location>

Category

System Health

SubCategory

MEM = Memory

Severity

Severity 1 (Critical)

Trap/EventID

FALSE

LCD Message

SBE log disabled on <location>. Reseat memory

Initial Default.

FALSE

Filter

Visibility

IPMI Alert SNMP

Alert

LC Log

LCD

Power Off Power

Cycle

Reset

FALSE

MEM8001

Message

Persistent correctable memory error logging enabled for a memory device at location

arg1

Arguments

•

arg1 =

location

379

Источник

Problem

The Error Light Emitting Diode (LED) is illuminated on the chassis and the BladeCenter HS22 blade server front information panel. The Advanced Management Module (AMM) system status indicates that there is a «correctable ECC memory error logging limit reached» error. The AMM logs the following errors:

19 E Blade_05 12/08/09, 11:29:06 (octans012)
Correctable memory error logging limit reached

20 E Blade_05 12/08/09, 11:29:05 (octans012)
Correctable memory error logging limitreached on DIMM 5

The memory errors occur in the following BladeCenter HS22 configuration:

— CPU-C states [Enable]

— Thermal Mode [Normal] double refresh rate

— 4 Gigabyte (GB) Samsung VLP DIMMs installed, Option part number 44T1488, replacement part number (FRU) 44T1498.

Resolving The Problem

Source

RETAIN tip: H196525

Symptom

19 E Blade_05 12/08/09, 11:29:06 (octans012)
Correctable memory error logging limit reached

20 E Blade_05 12/08/09, 11:29:05 (octans012)
Correctable memory error logging limit reached on DIMM 5

The memory errors occur in the following BladeCenter HS22 configuration:

— CPU-C states [Enable]

— Thermal Mode [Normal] double refresh rate

— 4 Gigabyte (GB) Samsung VLP DIMMs installed, Option part number 44T1488, replacement part number (FRU) 44T1498.

Affected configurations

The system may be any of the following IBM servers:

BladeCenter HS22, Type 1936, any model
BladeCenter HS22, Type 7870, any model

This tip is not software specific.
This tip is not option specific.

The system has the symptom described above.

Solution

Choose one of the following two (2) methods to resolve the errors:

Method 1:

Change Thermal Mode setting (preferred method)

Boot the blade into the F1 «System Configuration and Boot Management» screen. Highlight «System Settings.» Press Enter and select Memory. Select Thermal Mode and change the setting to «Performance.»
Press the Esc key twice to get to «System Configuration and Boot Management» and then select Save Settings and Exit Setup.
Follow the instructions on the next screen to exit the «Setup Utility.»
Power the blade off for the changes to take effect and restart.

Changing «Normal» mode to «Performance» mode affects the way that the Dual In-Line Memory Modules (DIMMs) are refreshed. This results in a DIMM temperature warning message occurring at a 10 degree lower temperature. This causes no impact in most industry standard data centers.

Method 2:

Disable CPU C-State

Boot the blade into the F1 «System Configuration and Boot Management» screen. Highlight System Settings, press Enter, and select Processors. Select CPU C-States, and then change the setting to «Disable.»
Press the Esc key twice to get to «System Configuration and Boot Management» and then select Save Settings and Exit Setup.
Follow the instructions on the next screen to exit the «Setup Utility.
Power the blade off for the changes to take effect and restart.

If the LED stays on after the changes have been made, do one of the following to turn it off:

Using the IPMItool application (which is a third party application available for Windows and Linux):
1. impitool sel list (to verify the log contains messages)
2. ipmitool sel clear
3. ipmitool sel list (to verify the log is now empty)
4. Restart the IMM. This can be done via the AMM GUI interface (select Blade Tasks, Power/Restart, and Restart Blade System Mgmt Processor for the appropriate blade) or with the ASU command line tool (asu rebootimm).
Fully power the blade off, then power it back on (do not restart the blade). This can be done with the AMM or locally at the blade.

Additional information

This error message usually indicates a failing DIMM, however, a very rare condition has been identified with Samsung DIMMs that can cause a false error. By implementing either of the recommended Workaround s above, the false «correctable ECC memory logging limit reached» error should not occur.

Note: The false «correctable ECC memory error logging limit reached» error does not indicate defective DIMMs.

[{«Type»:»HW»,»Business Unit»:{«code»:»BU054″,»label»:»Systems w/TPS»},»Product»:{«code»:»HW21Q»,»label»:»BladeCenter HS Series Server (7809-H22)»},»Platform»:[{«code»:»PF025″,»label»:»Platform Independent»}],»Line of Business»:{«code»:»LOB18″,»label»:»Miscellaneous LOB»}}]

Источник

ProLiant Servers (ML,DL,SL)

- Forums
- - Advancing Life & Work
  - Alliances
  - Around the Storage Block
  - HPE Ezmeral: Uncut
  - OEM Solutions
  - Servers & Systems: The Right Compute
  - Tech Insights
  - The Cloud Experience Everywhere
  - HPE Blog, Austria, Germany & Switzerland
  - Blog HPE, France
  - HPE Blog, Italy
  - HPE Blog, Japan
  - HPE Blog, Latin America
  - HPE Blog, Poland
  - HPE Blog, Hungary
  - HPE Blog, UK, Ireland, Middle East & Africa
- Blogs
- Information
Forums

Blogs
- Advancing Life & Work
- Alliances
- Around the Storage Block
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
- HPE Blog, Austria, Germany & Switzerland
- Blog HPE, France
- HPE Blog, Italy
- HPE Blog, Japan
- HPE Blog, Latin America
- HPE Blog, UK, Ireland, Middle East & Africa
- HPE Blog, Poland
- HPE Blog, Hungary
Information
English

Источник

MEM8001

Problem

Resolving The Problem

Source

Symptom

Affected configurations

Solution

Method 1:

Method 2:

Additional information

Читайте также: