L2 l3 cache error was detected on the raid controller

Hi everybody,

thank you for posting the above. It helped to figure out what was going on.

And I have some good news as well: In my case I did not even have to re-flash the firmware. Here is a description of what happened to hopefully help others, but also for myself in case this happens again

Setup: I am using the controller for a FreeNAS VM running on ESXi with the controller handed through to the VM. As preferred for zfs usage of course I use JBOD only, so there was no controller level raid that I had to worry about. In my case the controller is a 3008 SAS on the mobo.

Situation: shutting down the FreeNAS VM hard reset or purple screened the ESXi server. On the next boot vSphere would restart the VM and I’d be back to square one. Disabling vSphere HA helped to finally get into ESXi maintenance mode. However, somewhere in the half a dozen crashes or so I am guessing that the configuration stored on the controller got corrupted.

In FreeNAS I saw this in the system log:

Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6

To make the nested setup work, in the intel mobo BIOS I had the Oprom Control for the controller disabled. After I went into the bios and re-enabled the oprom:

F2 on boot to get into BIOS
«Setup Menu»
«Advanced»
«PCI Configuration»
«PCIe Port Oprom Control»
«Enabled» on all entries

On boot I got exactly the same error during boot as Apil posted at the beginning of the thread:

Pressing X to continue and crtl-r to get into the raid controller bios I set the controller to factory defaults:

Ctrl-n twoce to get to the «Ctrl Mgmt» page
lots of tab to get to «Set Factory Defaults»
Ctrl-s to save
lots of esc to get all the way out to the prompt that tells you to use Alt-Crtl-Del

On the next boot the error was not there any more and it listed the connected physical (jbod) drives instead as per normal. Yes.

Clean-up: back into the mobo bios to disable the oprom for the controller

After booting ESXi, turning vSphere HA back on and booting the FreeNAS VM the controller, all the disks, and the zfs mirrored pool were back as if nothing had ever happened.

Update 2018-09-08: I am glad I made this post because it just saved my bacon again. Somebody (kids) stacked boxes in front of my home server rack and I am assuming the controller overheated being cooked by all the disks. The LSI controller probably got into an inconsistent state when it did a thermally triggered emergency shut down, and I can’t really blame it for that. Anyhow, with my own instructions I got everything back up and running, but boy is it scary when your disks go missing.

Источник

Why Hello There!

This evening I performed a regular upgrade to a client’s long-standing (~2 years) hyperconverged PVE cluster. This cluster has some history but for the most part has been trouble free. The cluster is made up of homogeneous hardware, all Gen8 HPE Proliant servers with one node being a different model from the rest. The boot devices are single SSDs

The software upgrades went without issue, no warnings or errors with apt/dpkg, migration off of PVE nodes to other nodes before rebooting was occuring as expected. However, two things changed their behaviour upon reboot:

The HBA (MegaRAID RAID Controller in IT/HBA Mode) now asks me to press «X» on startup claiming the following error:
Code:
```
Caution: Memory conflict detected. You may face boot problem.L2/L3 Cache error was detected on the RAID controller.
Please contact technical support to resolve this issue. Press "X" to
continue or else power off your system, replace the controller and reboot.
```
Of course, a cluster full of HBAs don’t all simultaneously fail after four Linux machines upgrade their kernels, no, this is a smart and valid detection by the HBA. However, I will need to refresh its memory in the CTRL+R Configuration Menu or if someone has a method of validating the data on the boot partitions in Linux I would greatly appreciate that.
Grub boots to 5.15.35-1 and runs into the following repeated errors:
Code:
```
Volume group "pve" not foundCannot Process Volume Group pve
```
(Then it ends with the following)
Code:
```
Gave up waiting for root file system device. Common problems:
```
(TL;DR it said root delay or missing modules, then I get spat out to busybox)
The issue with the root delay increase is that there is already a noticeable delay, the errors repeat about 10-15 times before quitting to BusyBox so there’s no further reasonable delay that could help.

The upgrade for all nodes included the same message (because they are always upgraded at the same time and have the same software installed):

Code:

The following packages were automatically installed and are no longer required:
  bsdmainutils golang-docker-credential-helpers libsecret-1-0 libsecret-common
  libzpool4linux pve-kernel-5.4.114-1-pve pve-kernel-5.4.119-1-pve
  python3-asn1crypto python3-dockerpycreds
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
  gnutls-bin libdrm-common libdrm2 libepoxy0 libgbm1 libgnutls-dane0 libjs-qrcodejs
  libjson-glib-1.0-0 libjson-glib-1.0-common libopts25 libposix-strptime-perl
  libproxmox-rs-perl libtpms0 libunbound8 libvirglrenderer1 libwayland-server0
  libzpool5linux proxmox-websocket-tunnel pve-kernel-5.11.22-7-pve pve-kernel-5.15
  pve-kernel-5.15.35-1-pve swtpm swtpm-libs swtpm-tools
The following packages will be upgraded:
  base-files bind9-dnsutils bind9-host bind9-libs bsdextrautils bsdutils btrfs-progs
  ceph ceph-base ceph-common ceph-fuse ceph-mds ceph-mgr ceph-mgr-modules-core
  ceph-mon ceph-osd corosync cryptsetup-bin curl dirmngr distro-info-data dnsutils
  eject fdisk gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client
  gpg-wks-server gpgconf gpgsm gpgv gzip krb5-locales libarchive13 libblkid1
  libc-bin libc-dev-bin libc-devtools libc-l10n libc6 libc6-dev libc6-i386
  libcephfs2 libcfg7 libcmap4 libcorosync-common4 libcpg4 libcryptsetup12
  libcurl3-gnutls libcurl4 libexpat1 libfdisk1 libflac8 libgmp10 libgssapi-krb5-2
  libgssrpc4 libjaeger libjs-jquery-ui libk5crypto3 libknet1 libkrad0 libkrb5-3
  libkrb5support0 libldap-2.4-2 libldap-common libldb2 liblzma5 libmount1 libnozzle1
  libnss-systemd libnss3 libntfs-3g883 libnvpair3linux libpam-modules
  libpam-modules-bin libpam-runtime libpam-systemd libpam0g libperl5.32
  libproxmox-acme-perl libproxmox-acme-plugins libproxmox-backup-qemu0
  libpve-access-control libpve-cluster-api-perl libpve-cluster-perl
  libpve-common-perl libpve-guest-common-perl libpve-http-server-perl libpve-rs-perl
  libpve-storage-perl libpve-u2f-server-perl libquorum5 librados2 libradosstriper1
  librbd1 librgw2 libsasl2-2 libsasl2-modules-db libseccomp2 libsmartcols1
  libsmbclient libssl1.1 libsystemd0 libtiff5 libudev1 libuuid1 libuutil3linux
  libvotequorum8 libwbclient0 libxml2 libzfs4linux linux-libc-dev locales lxc-pve
  lxcfs lynx lynx-common mount novnc-pve ntfs-3g openssl perl perl-base
  perl-modules-5.32 proxmox-backup-client proxmox-backup-file-restore
  proxmox-mini-journalreader proxmox-ve proxmox-widget-toolkit pve-cluster
  pve-container pve-docs pve-edk2-firmware pve-firewall pve-firmware pve-ha-manager
  pve-i18n pve-kernel-5.11 pve-kernel-5.11.22-3-pve pve-kernel-helper
  pve-lxc-syscalld pve-manager pve-qemu-kvm pve-xtermjs python3-ceph-argparse
  python3-ceph-common python3-cephfs python3-ldb python3-pil python3-rados
  python3-rbd python3-reportbug python3-rgw python3-waitress qemu-server reportbug
  rsync samba-common samba-libs smartmontools smbclient spl systemd systemd-sysv
  systemd-timesyncd sysvinit-utils tasksel tasksel-data tzdata udev usb.ids
  util-linux uuid-runtime vim-common vim-tiny wget xxd xz-utils zfs-initramfs
  zfs-zed zfsutils-linux zlib1g
185 upgraded, 24 newly installed, 0 to remove and 0 not upgraded.
Need to get 505 MB of archives.
After this operation, 949 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Please let me know if you have any guidance, I appreciate your support.

Tmanok

oguz

Proxmox Retired Staff

hi,

* are you able to boot an older kernel at the grub menu? if yes, does it make a difference?

* have you checked for any BIOS upgrades for the servers?

* was any hardware changed recently?

hi,

* are you able to boot an older kernel at the grub menu? if yes, does it make a difference?

* have you checked for any BIOS upgrades for the servers?

* was any hardware changed recently?

Hi Oguz,

Yes. That is what we have been doing in the meantime, going back to 5.11 without issue but we have noticed poor disk performance.
BIOS is not latest, but we don’t have a copy of latest or latest HBA firmware admittedly.
No HW changes at all.

Hi Neobin,

You may be on the right track about why we cannot boot to 5.15. However, it does not explain our disk performance degradation so maybe booting into 5.15 is worth a try with iommu disabled. Also, to clarify G8 HP = R#20 series, so R620 = HP DL360 Gen8, R720 = HP DL380 Gen8.

Thank you both for your time and support!

Tmanok

Last edited: Jun 8, 2022

Источник

Question Replacement raid controller on system x 3650 M5

Thread starter

Ingenetic
Start date

Mar 6, 2020

: May 6, 2019

: 8

: 0

: 10

: 0

Hi everyone,

I wanna ask for some informartion, i have 1 server system x 3650 M5, using RAID 10, with 4 HDD, suddenly my server doesn’t reboot correctly, there is a warning :

L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.

Then i press x to continue, then the server boot normally to my centos 7,

For now, i have plan to replace the raid controller which model is SERVERAID MS1215 SAS/SATA,

I will replace it with the same model M1215 too,

The big question for me, is it my existing raid will be destroyed when i replace the raid controller? Or the hdd will detect raid normally because it’s a same model then boot normally to my OS ?

Please advice.

Thankyou and regards,

InGenetic

: Jan 16, 2014

: 6,632

: 954

: 35,940

: 1,065

It depends on where the raid information is stored—the controller or the drives. I’m not familiar enough to know the answer, but I know the guys on the servethehome forum would know this in a second.

Thread starter	Similar threads	Forum	Replies	Date
T	Question BIOS stuck on «Hit Control I to go to the configuration raid utility» ?	Prebuilt & Enterprise	1	Saturday at 5:02 PM
	Question Prebuilt System for Gaming and Content Creation (video editing)	Prebuilt & Enterprise	3	Feb 2, 2023
	Question 2 Systems for TV’s in the Office	Prebuilt & Enterprise	2	Dec 7, 2022
	Question New System Purchase	Prebuilt & Enterprise	9	Dec 4, 2022
	Question Dell PowerEdge C6145 questions: fan control, each motherboard runs separately or together?	Prebuilt & Enterprise	2	Nov 24, 2022
	Question Help me upgrade my system	Prebuilt & Enterprise	4	Nov 4, 2022
T	Question System has Power but No Post or Display	Prebuilt & Enterprise	1	Sep 15, 2022
E	[SOLVED] Can’t access BIOS after HDD format ?	Prebuilt & Enterprise	3	Sep 12, 2022
	Question How to connect rgb fan to msi mag codex 5 10th, without rgb header	Prebuilt & Enterprise	9	Sep 6, 2022
	Question Computer not POSTing after replacing the hard drive ?	Prebuilt & Enterprise	6	Jul 22, 2022

Advertising
Cookies Policies
Privacy
Term & Conditions
Topics

Источник

Hi,

I was copying files on datastore2 of my ESXi 5.1.0 when the copy process failed and the datastore showed negative free space. Restarting the machine didn’t help, but after some investigation I noticed that the RAID 0 logical dirve has failed. The status of the physical disks show OK. Here are the outputs:

/opt/hp/hpssacli/bin # ./hpssacli ctrl all show config detail

Smart Array P410 in Slot 4

Bus Interface: PCI

Slot: 4

Serial Number: PACCRID122003KK

Cache Serial Number: PAAVPID1152078F

RAID 6 (ADG) Status: Disabled

Controller Status: OK

Hardware Revision: C

Firmware Version: 5.14

Rebuild Priority: Medium

Expand Priority: Medium

Surface Scan Delay: 3 secs

Surface Scan Mode: Idle

Queue Depth: Automatic

Monitor and Performance Delay: 60 min

Elevator Sort: Enabled

Degraded Performance Optimization: Disabled

Inconsistency Repair Policy: Disabled

Wait for Cache Room: Disabled

Surface Analysis Inconsistency Notification: Disabled

Post Prompt Timeout: 15 secs

Cache Board Present: True

Cache Status: OK

Cache Status Details: A cache error was detected. Run a diagnostic report for more information.

Cache Ratio: 25% Read / 75% Write

Drive Write Cache: Disabled

Total Cache Size: 512 MB

Total Cache Memory Available: 400 MB

No-Battery Write Cache: Disabled

Cache Backup Power Source: Batteries

Battery/Capacitor Count: 1

Battery/Capacitor Status: OK

SATA NCQ Supported: True

Number of Ports: 2 Internal only

Driver Name: hpsa

Driver Version: 5.0.0-21vmw

Driver Supports HP SSD Smart Path: False

Array: A

Interface Type: SATA

Unused Space: 0 MB

Status: OK

Array Type: Data

Logical Drive: 1

Size: 232.9 GB

Fault Tolerance: 1

Heads: 255

Sectors Per Track: 32

Cylinders: 59844

Strip Size: 256 KB

Full Stripe Size: 256 KB

Status: OK

Caching: Enabled

Unique Identifier: 600508B1001CBF6A42A47C82264D178D

Disk Name: vmhba2:C0:T0:L1

Mount Points: None

Logical Drive Label: A0C6705BPACCRID122003KKE364

Mirror Group 0:

physicaldrive 2I:0:5 (port 2I:box 0:bay 5, SATA, 250 GB, OK)

Mirror Group 1:

physicaldrive 2I:0:6 (port 2I:box 0:bay 6, SATA, 250 GB, OK)

Drive Type: Data

LD Acceleration Method: Controller Cache

physicaldrive 2I:0:5

Port: 2I

Box: 0

Bay: 5

Status: OK

Drive Type: Data Drive

Interface Type: SATA

Size: 250 GB

Native Block Size: 512

Rotational Speed: 7200

Firmware Revision: HPG7

Serial Number: Z2AS7ZW0

Model: ATA VB0250EAVER

SATA NCQ Capable: True

SATA NCQ Enabled: True

PHY Count: 1

PHY Transfer Rate: 3.0Gbps

physicaldrive 2I:0:6

Port: 2I

Box: 0

Bay: 6

Status: OK

Drive Type: Data Drive

Interface Type: SATA

Size: 250 GB

Native Block Size: 512

Rotational Speed: 7200

Firmware Revision: HPG7

Serial Number: Z2ARX763

Model: ATA VB0250EAVER

SATA NCQ Capable: True

SATA NCQ Enabled: True

PHY Count: 1

PHY Transfer Rate: 3.0Gbps

Array: B

Interface Type: SATA

Unused Space: 0 MB

Status: OK

Array Type: Data

Logical Drive: 2

Size: 5.5 TB

Fault Tolerance: 0

Heads: 255

Sectors Per Track: 32

Cylinders: 65535

Strip Size: 256 KB

Full Stripe Size: 256 KB

Status: Failed

Caching: Enabled

Unique Identifier: 600508B1001C29307A23E7A75EAFF878

Disk Name: unknown

Mount Points: None

Logical Drive Label: A019CCEFPACCRID122003KK75DB

Drive Type: Data

LD Acceleration Method: Controller Cache

physicaldrive 2I:0:7

Port: 2I

Box: 0

Bay: 7

Status: OK

Drive Type: Data Drive

Interface Type: SATA

Size: 3 TB

Native Block Size: 4096

Rotational Speed: 7200

Firmware Revision: CV13

Serial Number: Z1F1159J

Model: ATA ST3000VX000-9YW1

SATA NCQ Capable: True

SATA NCQ Enabled: True

PHY Count: 1

PHY Transfer Rate: 3.0Gbps

physicaldrive 2I:0:8

Port: 2I

Box: 0

Bay: 8

Status: OK

Drive Type: Data Drive

Interface Type: SATA

Size: 3 TB

Native Block Size: 4096

Rotational Speed: 7200

Firmware Revision: CV13

Serial Number: Z1F115AR

Model: ATA ST3000VX000-9YW1

SATA NCQ Capable: True

SATA NCQ Enabled: True

PHY Count: 1

PHY Transfer Rate: 3.0Gbps

SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250

Device Number: 250

Firmware Version: RevC

WWID: 5001438021D4731F

Vendor ID: PMCSIERA

Model: SRC 8x6G

/opt/hp/hpssacli/bin # ./hpssacli ctrl all show config

Smart Array P410 in Slot 4 (sn: PACCRID122003KK)

array A (SATA, Unused Space: 0 MB)

logicaldrive 1 (232.9 GB, RAID 1, OK)

physicaldrive 2I:0:5 (port 2I:box 0:bay 5, SATA, 250 GB, OK)

physicaldrive 2I:0:6 (port 2I:box 0:bay 6, SATA, 250 GB, OK)

array B (SATA, Unused Space: 0 MB)

logicaldrive 2 (5.5 TB, RAID 0, Failed)

physicaldrive 2I:0:7 (port 2I:box 0:bay 7, SATA, 3 TB, OK)

physicaldrive 2I:0:8 (port 2I:box 0:bay 8, SATA, 3 TB, OK)

SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 5001438021D4731F)

What can I do to recover the data? If I delete and recreate the RAID 0 volume will the data be lost?

Any pointers appreciated!

Источник

Replacement raid controller on system x 3650 M5

Ingenetic

New Member

I wanna ask for some informartion, i have 1 server system x 3650 M5, using RAID 10, with 4 HDD, suddenly my server doesn’t reboot correctly, there is a warning :

Then i press x to continue, then the server boot normally to my centos 7,

For now, i have plan to replace the raid controller which model is SERVERAID MS1215 SAS/SATA,

I will replace it with the same model M1215 too,

The big question for me, is it my existing raid will be destroyed when i replace the raid controller? Or the hdd will detect raid normally because it’s a same model then boot normally to my OS ?

Thankyou and regards,

kapone

Well-Known Member

Ingenetic

New Member

i’m using Raid 10 with 4 HDD , but suddenly i got the warning like in the 1st post.
i’m not experience with raid controller replacement, i just ever experience when one hard drive fail , then replace with the new one.

ari2asem

Active Member

my experience with areca cards.

i have a machine with 12 hdd, it went dead about 15 years ago. raid-6 setup with 12* 250gb. no hot-spare. areca card had 16 sata ports, not a sff-ports. but real sata ports.

almost 1 year ago i bougth another areca card with 4* sff-8087 ports. connecting 12 hdd»s via 3* sff-8087—>> sata cables to totally different areca card, i could recover my files with a recovery program (getdataback). in windows 10.

i didn’t change any raid-setting, neither any file syatem change. my raid-volume was 15 years ago not visible under windows xp. neither it was visable under windows 10.

i just swap areca cards, no raid rebuilding, run getdataback (it was ntfs file system) and recover my files.

i should say: give it a shot and try it.

your situation is not that bad, meaning you just replace your dead card with the same model.
i replaced my card with totally different model (but same brand) and i was able to recover my files.

just try it and keep us updated with your progress.

Ingenetic

New Member

my experience with areca cards.

i have a machine with 12 hdd, it went dead about 15 years ago. raid-6 setup with 12* 250gb. no hot-spare. areca card had 16 sata ports, not a sff-ports. but real sata ports.

i didn’t change any raid-setting, neither any file syatem change. my raid-volume was 15 years ago not visible under windows xp. neither it was visable under windows 10.

i just swap areca cards, no raid rebuilding, run getdataback (it was ntfs file system) and recover my files.

i should say: give it a shot and try it.

your situation is not that bad, meaning you just replace your dead card with the same model.
i replaced my card with totally different model (but same brand) and i was able to recover my files.

just try it and keep us updated with your progress.

Thanks ari2asem for your advice,

did anyone here having the same experience ? i mean how about to replacing the raid controller with the same model , and how it works ? it will have to rebuilding raid or run the controller will detect the existing raid, and system up normally detected all hdd ?

Источник

Success! Subscription added.

Success! Subscription removed.

Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.

Intel Communities
Product Support Forums
Server Products
Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Subscribe to RSS Feed
Mark Topic as New
Mark Topic as Read
Float this Topic for Current User
Bookmark
Subscribe
Mute
Printer Friendly Page

Mark as New
Bookmark
Subscribe
Mute
Subscribe to RSS Feed
Permalink
Print
Report Inappropriate Content

I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:

It tells me that an L2/L3 Cache error was detected on the RAID controller:

I’ve tried the F9 (reset to defaults) option, and entering some text, and pressing F10. I’ve also tried hitting X, and Ctrl-X, but I cannot resume from this screen. I’ve also tried disconnecting the power, and opening the unit and disconnecting the supercapacitor for the RAID mezzanine board for 10 minutes, and I’ve tried three different controller cards (all the same make and model). Any idea how to get the server past this error? We’re not concerned with saving any live data on the server, it is part of a test environment, and we were rebuilding it when the anomaly was discovered. in operation, we were getting various filesystem errors on the volumes that were controlled by this RAID controller (mostly on the JBOD SSD root filesystem) that resulted in CentOS 7.5 changing the volume to read-only status. We also tried three different Intel DataCenter edition SSD’s, with no change.

Источник

L2 l3 cache error was detected on the raid controller

Success! Subscription added.

Success! Subscription removed.

Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.

Intel Communities
Product Support Forums
Server Products
Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Subscribe to RSS Feed
Mark Topic as New
Mark Topic as Read
Float this Topic for Current User
Bookmark
Subscribe
Mute
Printer Friendly Page

Mark as New
Bookmark
Subscribe
Mute
Subscribe to RSS Feed
Permalink
Print
Report Inappropriate Content

I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:

It tells me that an L2/L3 Cache error was detected on the RAID controller:

Источник

L2 l3 cache error was detected on the raid controller

Success! Subscription added.

Success! Subscription removed.

Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.

Intel Communities
Product Support Forums
Server Products
Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Subscribe to RSS Feed
Mark Topic as New
Mark Topic as Read
Float this Topic for Current User
Bookmark
Subscribe
Mute
Printer Friendly Page

Mark as New
Bookmark
Subscribe
Mute
Subscribe to RSS Feed
Permalink
Print
Report Inappropriate Content

I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:

It tells me that an L2/L3 Cache error was detected on the RAID controller:

Источник

[Solved] LSI 9341-8i L2/L3 Cache Error.

New Member

I have just experienced what i presume is a hardware malfunctioning on my Sas9341-8i raid card.
While server was running, the raid suddenly got alot of I/O errors from a program running, that was writing to the raid, and then the raid disappeared in windows.
After reboot, i now get I/O error in device manager in Windows Server 2012 R2.
And after POST, it tells me:L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.

Tom5051

Active Member

These cards are generally pretty reliable when they report a problem like this. I would suggest replacing the controller with another one that is known to work, hopefully you can borrow one off a friend?
Otherwise replace the controller, a firmware update is unlikely to be successful nor cure the card.
Has it got the correct airflow over the card, they get pretty hot.

Also you didn’t say what level of RAID the array was built with. It’s possible that with a replacement controller, the array will still be optimal but there is always the chance that it has degraded or failed.
You may need backups.
The replacement controller will attempt to get the array configuration from the disks if it is still not corrupt.
Quite often I move an array of 8 disks between servers and the RAID cards pick up the array config and boots no problem.

New Member

Thanks for your reply.

The only option i have to replace it, is to buy a brand new one, so its not that easy.
Here i have some doubts, of what will happen to the excisting raid, if i replace the card ?

Its a raid 5 with 8 disks, and all disks seems to be fine.
But again, i cannot open the Mega Raid software in windows anymore.
Though in the Megaraid config during boot, it says that the array is optimal etc, and findes all disks.

What hits me as strange, with this «L2/L3 cache error», is that the card dosnt have any Cache ?

The card from what i read is known to run hot normaly, and it has been, at around 90′ degree celcius.
There has been dedicated fan straight on the card, but plenty of cabin air flow, that passes the card, which i though was sufficient.
I have now though set a 120 mm fan straight on the card, but it is probably to late.

Something still tells me, that it would be strange if this is a hardware fault, and permenent damage ?
Espesically since the card does not have any cache ?

Thanks again for any help, it is very much appreciated! since my raid is currently down 🙁

Best regards
Apil

Tom5051

Active Member

New Member

I agree, the first thing i did, was to try and google it, and got absolutely nothing.

Do you have any idea, how i check if the read/write cache is enabled/disabled ?
Is is a jumper on the board, or a bios setting ?

Best regards
Apil

Tom5051

Active Member

Tom5051

Active Member

New Member

Yea i can still access the after this error, i was looking around there yesterday, and didnt find anything interesting.
Do you know what option in there, to disable/enable ?
Els ill try to have a look around again

The server is a Dell T20 that i pull out, and placed in a custom rack mounted case, with added case fans.
An Xeon E3- 1225 v2/3 (cant remember), An Intel Dual Gig NiC, and 8 x WD Red 3 TB disks, with a Kingston 120 SSD as System disk.

I did update the bios and firmware of the motherboard, and Raid card, when i build it 6-12 months ago, because i was having problems getting the card to work, (classic cannot start hardware Error 10 in device manager), seemed to be because of that the card does not have any ram/cache, so i had to disable/enable some settings in the bios of the motherboard, to get it to start, and since then it has been running flawlessly, untill now.

New Member

Yea that makes sense, i just thought that since this is the 9341 version, and not the 9361, then there was no ram/cache on the board, and there for it utilized the ram as memeory or the cache of the cpu, since it has no dedicated memeory.

By the way, sorry for my bad english, and lack of correct terms.

Tom5051

Active Member

New Member

If needed i can provide some more SS’s.

Thanks again
-Apil

New Member

If needed i can provide some more SS’s.

Thanks again
-Apil

vanfawx

Active Member

Unfortunately I think it’s talking about the on-board L2/L3 cache of the raid card CPU, not the on-board RAM cache. If the CPU L2/L3 cache has failed, then it’s a sign the CPU itself might be failing on the raid card.

Hope that helps.

New Member

Maybe i should try to downgrade the FW ?
[Edit] Trying to update the driver in windows now.

New Member

vanfawx

Active Member

Tom5051

Active Member

New Member

stin9ray

New Member

thank you for posting the above. It helped to figure out what was going on.

In FreeNAS I saw this in the system log:

Jan 6 12:32:39 fns mfi0: port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
Jan 6 12:32:39 fns mfi0: port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6

To make the nested setup work, in the intel mobo BIOS I had the Oprom Control for the controller disabled. After I went into the bios and re-enabled the oprom:

F2 on boot to get into BIOS
«Setup Menu»
«Advanced»
«PCI Configuration»
«PCIe Port Oprom Control»
«Enabled» on all entries

On boot I got exactly the same error during boot as Apil posted at the beginning of the thread:

Pressing X to continue and crtl-r to get into the raid controller bios I set the controller to factory defaults:

Ctrl-n twoce to get to the «Ctrl Mgmt» page
lots of tab to get to «Set Factory Defaults»
Ctrl-s to save
lots of esc to get all the way out to the prompt that tells you to use Alt-Crtl-Del

On the next boot the error was not there any more and it listed the connected physical (jbod) drives instead as per normal. Yes.

Clean-up: back into the mobo bios to disable the oprom for the controller

After booting ESXi, turning vSphere HA back on and booting the FreeNAS VM the controller, all the disks, and the zfs mirrored pool were back as if nothing had ever happened.

Источник

oguz

Proxmox Retired Staff

Question Replacement raid controller on system x 3650 M5

Replacement raid controller on system x 3650 M5

Ingenetic

New Member

kapone

Well-Known Member

Ingenetic

New Member

ari2asem

Active Member

Ingenetic

New Member

L2 l3 cache error was detected on the raid controller

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

L2 l3 cache error was detected on the raid controller

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

L2 l3 cache error was detected on the raid controller

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

[Solved] LSI 9341-8i L2/L3 Cache Error.

New Member

Tom5051

Active Member

New Member

Tom5051

Active Member

New Member

Tom5051

Active Member

Tom5051

Active Member

New Member

New Member

Tom5051

Active Member

New Member

New Member

vanfawx

Active Member

New Member

New Member

New Member

vanfawx

Active Member

Tom5051

Active Member

New Member

stin9ray

New Member

Читайте также: