L2 l3 cache error was detected on the raid controller

Hi! :) I have just experienced what i presume is a hardware malfunctioning on my Sas9341-8i raid card. While server was running, the raid suddenly got alot of I/O errors from a program running, that was writing to the raid, and then the raid disappeared in windows. After reboot, i now get I/O...

Hi everybody,

thank you for posting the above. It helped to figure out what was going on.

And I have some good news as well: In my case I did not even have to re-flash the firmware. Here is a description of what happened to hopefully help others, but also for myself in case this happens again ;-)

Setup: I am using the controller for a FreeNAS VM running on ESXi with the controller handed through to the VM. As preferred for zfs usage of course I use JBOD only, so there was no controller level raid that I had to worry about. In my case the controller is a 3008 SAS on the mobo.

Situation: shutting down the FreeNAS VM hard reset or purple screened the ESXi server. On the next boot vSphere would restart the VM and I’d be back to square one. Disabling vSphere HA helped to finally get into ESXi maintenance mode. However, somewhere in the half a dozen crashes or so I am guessing that the configuration stored on the controller got corrupted.

In FreeNAS I saw this in the system log:

Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6

To make the nested setup work, in the intel mobo BIOS I had the Oprom Control for the controller disabled. After I went into the bios and re-enabled the oprom:

  • F2 on boot to get into BIOS
  • «Setup Menu»
  • «Advanced»
  • «PCI Configuration»
  • «PCIe Port Oprom Control»
  • «Enabled» on all entries

On boot I got exactly the same error during boot as Apil posted at the beginning of the thread:

L2L3_cache_error.jpg

Pressing X to continue and crtl-r to get into the raid controller bios I set the controller to factory defaults:

  • Ctrl-n twoce to get to the «Ctrl Mgmt» page
  • lots of tab to get to «Set Factory Defaults»
  • Ctrl-s to save
  • lots of esc to get all the way out to the prompt that tells you to use Alt-Crtl-Del

factory_reset.jpg

On the next boot the error was not there any more and it listed the connected physical (jbod) drives instead as per normal. Yes.

Clean-up: back into the mobo bios to disable the oprom for the controller

After booting ESXi, turning vSphere HA back on and booting the FreeNAS VM the controller, all the disks, and the zfs mirrored pool were back as if nothing had ever happened.

:)

Update 2018-09-08: I am glad I made this post because it just saved my bacon again. Somebody (kids) stacked boxes in front of my home server rack and I am assuming the controller overheated being cooked by all the disks. The LSI controller probably got into an inconsistent state when it did a thermally triggered emergency shut down, and I can’t really blame it for that. Anyhow, with my own instructions I got everything back up and running, but boy is it scary when your disks go missing.

  • #1

Why Hello There!

This evening I performed a regular upgrade to a client’s long-standing (~2 years) hyperconverged PVE cluster. This cluster has some history but for the most part has been trouble free. The cluster is made up of homogeneous hardware, all Gen8 HPE Proliant servers with one node being a different model from the rest. The boot devices are single SSDs

The software upgrades went without issue, no warnings or errors with apt/dpkg, migration off of PVE nodes to other nodes before rebooting was occuring as expected. However, two things changed their behaviour upon reboot:

  1. The HBA (MegaRAID RAID Controller in IT/HBA Mode) now asks me to press «X» on startup claiming the following error:

    Code:

    Caution: Memory conflict detected. You may face boot problem.L2/L3 Cache error was detected on the RAID controller.
    Please contact technical support to resolve this issue. Press "X" to
    continue or else power off your system, replace the controller and reboot.

    Of course, a cluster full of HBAs don’t all simultaneously fail after four Linux machines upgrade their kernels, no, this is a smart and valid detection by the HBA. However, I will need to refresh its memory in the CTRL+R Configuration Menu or if someone has a method of validating the data on the boot partitions in Linux I would greatly appreciate that.

  2. Grub boots to 5.15.35-1 and runs into the following repeated errors:

    Code:

    Volume group "pve" not foundCannot Process Volume Group pve

    (Then it ends with the following)

    Code:

    Gave up waiting for root file system device. Common problems:

    (TL;DR it said root delay or missing modules, then I get spat out to busybox)
    The issue with the root delay increase is that there is already a noticeable delay, the errors repeat about 10-15 times before quitting to BusyBox so there’s no further reasonable delay that could help.

The upgrade for all nodes included the same message (because they are always upgraded at the same time and have the same software installed):

Code:

The following packages were automatically installed and are no longer required:
  bsdmainutils golang-docker-credential-helpers libsecret-1-0 libsecret-common
  libzpool4linux pve-kernel-5.4.114-1-pve pve-kernel-5.4.119-1-pve
  python3-asn1crypto python3-dockerpycreds
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
  gnutls-bin libdrm-common libdrm2 libepoxy0 libgbm1 libgnutls-dane0 libjs-qrcodejs
  libjson-glib-1.0-0 libjson-glib-1.0-common libopts25 libposix-strptime-perl
  libproxmox-rs-perl libtpms0 libunbound8 libvirglrenderer1 libwayland-server0
  libzpool5linux proxmox-websocket-tunnel pve-kernel-5.11.22-7-pve pve-kernel-5.15
  pve-kernel-5.15.35-1-pve swtpm swtpm-libs swtpm-tools
The following packages will be upgraded:
  base-files bind9-dnsutils bind9-host bind9-libs bsdextrautils bsdutils btrfs-progs
  ceph ceph-base ceph-common ceph-fuse ceph-mds ceph-mgr ceph-mgr-modules-core
  ceph-mon ceph-osd corosync cryptsetup-bin curl dirmngr distro-info-data dnsutils
  eject fdisk gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client
  gpg-wks-server gpgconf gpgsm gpgv gzip krb5-locales libarchive13 libblkid1
  libc-bin libc-dev-bin libc-devtools libc-l10n libc6 libc6-dev libc6-i386
  libcephfs2 libcfg7 libcmap4 libcorosync-common4 libcpg4 libcryptsetup12
  libcurl3-gnutls libcurl4 libexpat1 libfdisk1 libflac8 libgmp10 libgssapi-krb5-2
  libgssrpc4 libjaeger libjs-jquery-ui libk5crypto3 libknet1 libkrad0 libkrb5-3
  libkrb5support0 libldap-2.4-2 libldap-common libldb2 liblzma5 libmount1 libnozzle1
  libnss-systemd libnss3 libntfs-3g883 libnvpair3linux libpam-modules
  libpam-modules-bin libpam-runtime libpam-systemd libpam0g libperl5.32
  libproxmox-acme-perl libproxmox-acme-plugins libproxmox-backup-qemu0
  libpve-access-control libpve-cluster-api-perl libpve-cluster-perl
  libpve-common-perl libpve-guest-common-perl libpve-http-server-perl libpve-rs-perl
  libpve-storage-perl libpve-u2f-server-perl libquorum5 librados2 libradosstriper1
  librbd1 librgw2 libsasl2-2 libsasl2-modules-db libseccomp2 libsmartcols1
  libsmbclient libssl1.1 libsystemd0 libtiff5 libudev1 libuuid1 libuutil3linux
  libvotequorum8 libwbclient0 libxml2 libzfs4linux linux-libc-dev locales lxc-pve
  lxcfs lynx lynx-common mount novnc-pve ntfs-3g openssl perl perl-base
  perl-modules-5.32 proxmox-backup-client proxmox-backup-file-restore
  proxmox-mini-journalreader proxmox-ve proxmox-widget-toolkit pve-cluster
  pve-container pve-docs pve-edk2-firmware pve-firewall pve-firmware pve-ha-manager
  pve-i18n pve-kernel-5.11 pve-kernel-5.11.22-3-pve pve-kernel-helper
  pve-lxc-syscalld pve-manager pve-qemu-kvm pve-xtermjs python3-ceph-argparse
  python3-ceph-common python3-cephfs python3-ldb python3-pil python3-rados
  python3-rbd python3-reportbug python3-rgw python3-waitress qemu-server reportbug
  rsync samba-common samba-libs smartmontools smbclient spl systemd systemd-sysv
  systemd-timesyncd sysvinit-utils tasksel tasksel-data tzdata udev usb.ids
  util-linux uuid-runtime vim-common vim-tiny wget xxd xz-utils zfs-initramfs
  zfs-zed zfsutils-linux zlib1g
185 upgraded, 24 newly installed, 0 to remove and 0 not upgraded.
Need to get 505 MB of archives.
After this operation, 949 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Please let me know if you have any guidance, I appreciate your support.

Tmanok

oguz

oguz

Proxmox Retired Staff


  • #2

hi,

* are you able to boot an older kernel at the grub menu? if yes, does it make a difference?

* have you checked for any BIOS upgrades for the servers?

* was any hardware changed recently?

  • #4

hi,

* are you able to boot an older kernel at the grub menu? if yes, does it make a difference?

* have you checked for any BIOS upgrades for the servers?

* was any hardware changed recently?

Hi Oguz,

  1. Yes. That is what we have been doing in the meantime, going back to 5.11 without issue but we have noticed poor disk performance.
  2. BIOS is not latest, but we don’t have a copy of latest or latest HBA firmware admittedly.
  3. No HW changes at all.

Hi Neobin,

You may be on the right track about why we cannot boot to 5.15. However, it does not explain our disk performance degradation so maybe booting into 5.15 is worth a try with iommu disabled. Also, to clarify G8 HP = R#20 series, so R620 = HP DL360 Gen8, R720 = HP DL380 Gen8. ;)

Thank you both for your time and support!

Tmanok

Last edited: Jun 8, 2022

Question Replacement raid controller on system x 3650 M5


  • Thread starter

    Ingenetic


  • Start date

    Mar 6, 2020



May 6, 2019



8



0



10

0


  • #1

Hi everyone,

I wanna ask for some informartion, i have 1 server system x 3650 M5, using RAID 10, with 4 HDD, suddenly my server doesn’t reboot correctly, there is a warning :

L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.

Then i press x to continue, then the server boot normally to my centos 7,

For now, i have plan to replace the raid controller which model is SERVERAID MS1215 SAS/SATA,

I will replace it with the same model M1215 too,

The big question for me, is it my existing raid will be destroyed when i replace the raid controller? Or the hdd will detect raid normally because it’s a same model then boot normally to my OS ?

Please advice.

Thankyou and regards,

InGenetic



Jan 16, 2014



6,632



954



35,940

1,065


  • #2

It depends on where the raid information is stored—the controller or the drives. I’m not familiar enough to know the answer, but I know the guys on the servethehome forum would know this in a second.

Thread starter Similar threads Forum Replies Date

T

Question BIOS stuck on «Hit Control I to go to the configuration raid utility» ? Prebuilt & Enterprise 1 Saturday at 5:02 PM

Z666

Question Prebuilt System for Gaming and Content Creation (video editing) Prebuilt & Enterprise 3 Feb 2, 2023

LukeJamieson

Question 2 Systems for TV’s in the Office Prebuilt & Enterprise 2 Dec 7, 2022

trickynick

Question New System Purchase Prebuilt & Enterprise 9 Dec 4, 2022

vinaypundith

Question Dell PowerEdge C6145 questions: fan control, each motherboard runs separately or together? Prebuilt & Enterprise 2 Nov 24, 2022

rugupiruvu

Question Help me upgrade my system Prebuilt & Enterprise 4 Nov 4, 2022

T

Question System has Power but No Post or Display Prebuilt & Enterprise 1 Sep 15, 2022

E

[SOLVED] Can’t access BIOS after HDD format ? Prebuilt & Enterprise 3 Sep 12, 2022

Yerbus

Question How to connect rgb fan to msi mag codex 5 10th, without rgb header Prebuilt & Enterprise 9 Sep 6, 2022

JamieMaik

Question Computer not POSTing after replacing the hard drive ? Prebuilt & Enterprise 6 Jul 22, 2022

  • Advertising
  • Cookies Policies
  • Privacy
  • Term & Conditions
  • Topics

Hi,

I was copying files on datastore2 of my ESXi 5.1.0 when the copy process failed and the datastore showed negative free space. Restarting the machine didn’t help, but after some investigation I noticed that the RAID 0 logical dirve has failed. The status of the physical disks show OK. Here are the outputs:

/opt/hp/hpssacli/bin # ./hpssacli ctrl all show config detail

Smart Array P410 in Slot 4

   Bus Interface: PCI

   Slot: 4

   Serial Number: PACCRID122003KK

   Cache Serial Number: PAAVPID1152078F

   RAID 6 (ADG) Status: Disabled

   Controller Status: OK

   Hardware Revision: C

   Firmware Version: 5.14

   Rebuild Priority: Medium

   Expand Priority: Medium

   Surface Scan Delay: 3 secs

   Surface Scan Mode: Idle

   Queue Depth: Automatic

   Monitor and Performance Delay: 60  min

   Elevator Sort: Enabled

   Degraded Performance Optimization: Disabled

   Inconsistency Repair Policy: Disabled

   Wait for Cache Room: Disabled

   Surface Analysis Inconsistency Notification: Disabled

   Post Prompt Timeout: 15 secs

   Cache Board Present: True

   Cache Status: OK

   Cache Status Details: A cache error was detected. Run a diagnostic report for more information.

   Cache Ratio: 25% Read / 75% Write

   Drive Write Cache: Disabled

   Total Cache Size: 512 MB

   Total Cache Memory Available: 400 MB

   No-Battery Write Cache: Disabled

   Cache Backup Power Source: Batteries

   Battery/Capacitor Count: 1

   Battery/Capacitor Status: OK

   SATA NCQ Supported: True

   Number of Ports: 2 Internal only

   Driver Name: hpsa

   Driver Version: 5.0.0-21vmw

   Driver Supports HP SSD Smart Path: False

   Array: A

      Interface Type: SATA

      Unused Space: 0  MB

      Status: OK

      Array Type: Data

      Logical Drive: 1

         Size: 232.9 GB

         Fault Tolerance: 1

         Heads: 255

         Sectors Per Track: 32

         Cylinders: 59844

         Strip Size: 256 KB

         Full Stripe Size: 256 KB

         Status: OK

         Caching:  Enabled

         Unique Identifier: 600508B1001CBF6A42A47C82264D178D

         Disk Name: vmhba2:C0:T0:L1

         Mount Points: None

         Logical Drive Label: A0C6705BPACCRID122003KKE364

         Mirror Group 0:

            physicaldrive 2I:0:5 (port 2I:box 0:bay 5, SATA, 250 GB, OK)

         Mirror Group 1:

            physicaldrive 2I:0:6 (port 2I:box 0:bay 6, SATA, 250 GB, OK)

         Drive Type: Data

         LD Acceleration Method: Controller Cache

      physicaldrive 2I:0:5

         Port: 2I

         Box: 0

         Bay: 5

         Status: OK

         Drive Type: Data Drive

         Interface Type: SATA

         Size: 250 GB

         Native Block Size: 512

         Rotational Speed: 7200

         Firmware Revision: HPG7

         Serial Number: Z2AS7ZW0

         Model: ATA     VB0250EAVER

         SATA NCQ Capable: True

         SATA NCQ Enabled: True

         PHY Count: 1

         PHY Transfer Rate: 3.0Gbps

      physicaldrive 2I:0:6

         Port: 2I

         Box: 0

         Bay: 6

         Status: OK

         Drive Type: Data Drive

         Interface Type: SATA

         Size: 250 GB

         Native Block Size: 512

         Rotational Speed: 7200

         Firmware Revision: HPG7

         Serial Number: Z2ARX763

         Model: ATA     VB0250EAVER

         SATA NCQ Capable: True

         SATA NCQ Enabled: True

         PHY Count: 1

         PHY Transfer Rate: 3.0Gbps

   Array: B

      Interface Type: SATA

      Unused Space: 0  MB

      Status: OK

      Array Type: Data

      Logical Drive: 2

         Size: 5.5 TB

         Fault Tolerance: 0

         Heads: 255

         Sectors Per Track: 32

         Cylinders: 65535

         Strip Size: 256 KB

         Full Stripe Size: 256 KB

         Status: Failed

         Caching:  Enabled

         Unique Identifier: 600508B1001C29307A23E7A75EAFF878

         Disk Name: unknown

         Mount Points: None

         Logical Drive Label: A019CCEFPACCRID122003KK75DB

         Drive Type: Data

         LD Acceleration Method: Controller Cache

      physicaldrive 2I:0:7

         Port: 2I

         Box: 0

         Bay: 7

         Status: OK

         Drive Type: Data Drive

         Interface Type: SATA

         Size: 3 TB

         Native Block Size: 4096

         Rotational Speed: 7200

         Firmware Revision: CV13

         Serial Number:             Z1F1159J

         Model: ATA     ST3000VX000-9YW1

         SATA NCQ Capable: True

         SATA NCQ Enabled: True

         PHY Count: 1

         PHY Transfer Rate: 3.0Gbps

      physicaldrive 2I:0:8

         Port: 2I

         Box: 0

         Bay: 8

         Status: OK

         Drive Type: Data Drive

         Interface Type: SATA

         Size: 3 TB

         Native Block Size: 4096

         Rotational Speed: 7200

         Firmware Revision: CV13

         Serial Number:             Z1F115AR

         Model: ATA     ST3000VX000-9YW1

         SATA NCQ Capable: True

         SATA NCQ Enabled: True

         PHY Count: 1

         PHY Transfer Rate: 3.0Gbps

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250

      Device Number: 250

      Firmware Version: RevC

      WWID: 5001438021D4731F

      Vendor ID: PMCSIERA

      Model:  SRC 8x6G

/opt/hp/hpssacli/bin # ./hpssacli ctrl all show config

Smart Array P410 in Slot 4                (sn: PACCRID122003KK)

   array A (SATA, Unused Space: 0  MB)

      logicaldrive 1 (232.9 GB, RAID 1, OK)

      physicaldrive 2I:0:5 (port 2I:box 0:bay 5, SATA, 250 GB, OK)

      physicaldrive 2I:0:6 (port 2I:box 0:bay 6, SATA, 250 GB, OK)

   array B (SATA, Unused Space: 0  MB)

      logicaldrive 2 (5.5 TB, RAID 0, Failed)

      physicaldrive 2I:0:7 (port 2I:box 0:bay 7, SATA, 3 TB, OK)

      physicaldrive 2I:0:8 (port 2I:box 0:bay 8, SATA, 3 TB, OK)

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 5001438021D4731F)

What can I do to recover the data? If I delete and recreate the RAID 0 volume will the data be lost?

Any pointers appreciated!

Replacement raid controller on system x 3650 M5

Ingenetic

New Member

I wanna ask for some informartion, i have 1 server system x 3650 M5, using RAID 10, with 4 HDD, suddenly my server doesn’t reboot correctly, there is a warning :

L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.

Then i press x to continue, then the server boot normally to my centos 7,

For now, i have plan to replace the raid controller which model is SERVERAID MS1215 SAS/SATA,

I will replace it with the same model M1215 too,

The big question for me, is it my existing raid will be destroyed when i replace the raid controller? Or the hdd will detect raid normally because it’s a same model then boot normally to my OS ?

Thankyou and regards,

kapone

Well-Known Member

Ingenetic

New Member

i’m using Raid 10 with 4 HDD , but suddenly i got the warning like in the 1st post.
i’m not experience with raid controller replacement, i just ever experience when one hard drive fail , then replace with the new one.

ari2asem

Active Member

my experience with areca cards.

i have a machine with 12 hdd, it went dead about 15 years ago. raid-6 setup with 12* 250gb. no hot-spare. areca card had 16 sata ports, not a sff-ports. but real sata ports.

almost 1 year ago i bougth another areca card with 4* sff-8087 ports. connecting 12 hdd»s via 3* sff-8087—>> sata cables to totally different areca card, i could recover my files with a recovery program (getdataback). in windows 10.

i didn’t change any raid-setting, neither any file syatem change. my raid-volume was 15 years ago not visible under windows xp. neither it was visable under windows 10.

i just swap areca cards, no raid rebuilding, run getdataback (it was ntfs file system) and recover my files.

i should say: give it a shot and try it.

your situation is not that bad, meaning you just replace your dead card with the same model.
i replaced my card with totally different model (but same brand) and i was able to recover my files.

just try it and keep us updated with your progress.

Ingenetic

New Member

my experience with areca cards.

i have a machine with 12 hdd, it went dead about 15 years ago. raid-6 setup with 12* 250gb. no hot-spare. areca card had 16 sata ports, not a sff-ports. but real sata ports.

almost 1 year ago i bougth another areca card with 4* sff-8087 ports. connecting 12 hdd»s via 3* sff-8087—>> sata cables to totally different areca card, i could recover my files with a recovery program (getdataback). in windows 10.

i didn’t change any raid-setting, neither any file syatem change. my raid-volume was 15 years ago not visible under windows xp. neither it was visable under windows 10.

i just swap areca cards, no raid rebuilding, run getdataback (it was ntfs file system) and recover my files.

i should say: give it a shot and try it.

your situation is not that bad, meaning you just replace your dead card with the same model.
i replaced my card with totally different model (but same brand) and i was able to recover my files.

just try it and keep us updated with your progress.

Thanks ari2asem for your advice,

did anyone here having the same experience ? i mean how about to replacing the raid controller with the same model , and how it works ? it will have to rebuilding raid or run the controller will detect the existing raid, and system up normally detected all hdd ?

Источник

L2 l3 cache error was detected on the raid controller

Success! Subscription added.

Success! Subscription removed.

Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.

  • Intel Communities
  • Product Support Forums
  • Server Products
  • Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

  • Subscribe to RSS Feed
  • Mark Topic as New
  • Mark Topic as Read
  • Float this Topic for Current User
  • Bookmark
  • Subscribe
  • Mute
  • Printer Friendly Page
  • Mark as New
  • Bookmark
  • Subscribe
  • Mute
  • Subscribe to RSS Feed
  • Permalink
  • Print
  • Report Inappropriate Content

I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:

It tells me that an L2/L3 Cache error was detected on the RAID controller:

I’ve tried the F9 (reset to defaults) option, and entering some text, and pressing F10. I’ve also tried hitting X, and Ctrl-X, but I cannot resume from this screen. I’ve also tried disconnecting the power, and opening the unit and disconnecting the supercapacitor for the RAID mezzanine board for 10 minutes, and I’ve tried three different controller cards (all the same make and model). Any idea how to get the server past this error? We’re not concerned with saving any live data on the server, it is part of a test environment, and we were rebuilding it when the anomaly was discovered. in operation, we were getting various filesystem errors on the volumes that were controlled by this RAID controller (mostly on the JBOD SSD root filesystem) that resulted in CentOS 7.5 changing the volume to read-only status. We also tried three different Intel DataCenter edition SSD’s, with no change.

Источник

L2 l3 cache error was detected on the raid controller

Success! Subscription added.

Success! Subscription removed.

Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.

  • Intel Communities
  • Product Support Forums
  • Server Products
  • Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

  • Subscribe to RSS Feed
  • Mark Topic as New
  • Mark Topic as Read
  • Float this Topic for Current User
  • Bookmark
  • Subscribe
  • Mute
  • Printer Friendly Page
  • Mark as New
  • Bookmark
  • Subscribe
  • Mute
  • Subscribe to RSS Feed
  • Permalink
  • Print
  • Report Inappropriate Content

I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:

It tells me that an L2/L3 Cache error was detected on the RAID controller:

I’ve tried the F9 (reset to defaults) option, and entering some text, and pressing F10. I’ve also tried hitting X, and Ctrl-X, but I cannot resume from this screen. I’ve also tried disconnecting the power, and opening the unit and disconnecting the supercapacitor for the RAID mezzanine board for 10 minutes, and I’ve tried three different controller cards (all the same make and model). Any idea how to get the server past this error? We’re not concerned with saving any live data on the server, it is part of a test environment, and we were rebuilding it when the anomaly was discovered. in operation, we were getting various filesystem errors on the volumes that were controlled by this RAID controller (mostly on the JBOD SSD root filesystem) that resulted in CentOS 7.5 changing the volume to read-only status. We also tried three different Intel DataCenter edition SSD’s, with no change.

Источник

L2 l3 cache error was detected on the raid controller

Success! Subscription added.

Success! Subscription removed.

Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.

  • Intel Communities
  • Product Support Forums
  • Server Products
  • Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)

  • Subscribe to RSS Feed
  • Mark Topic as New
  • Mark Topic as Read
  • Float this Topic for Current User
  • Bookmark
  • Subscribe
  • Mute
  • Printer Friendly Page
  • Mark as New
  • Bookmark
  • Subscribe
  • Mute
  • Subscribe to RSS Feed
  • Permalink
  • Print
  • Report Inappropriate Content

I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:

It tells me that an L2/L3 Cache error was detected on the RAID controller:

I’ve tried the F9 (reset to defaults) option, and entering some text, and pressing F10. I’ve also tried hitting X, and Ctrl-X, but I cannot resume from this screen. I’ve also tried disconnecting the power, and opening the unit and disconnecting the supercapacitor for the RAID mezzanine board for 10 minutes, and I’ve tried three different controller cards (all the same make and model). Any idea how to get the server past this error? We’re not concerned with saving any live data on the server, it is part of a test environment, and we were rebuilding it when the anomaly was discovered. in operation, we were getting various filesystem errors on the volumes that were controlled by this RAID controller (mostly on the JBOD SSD root filesystem) that resulted in CentOS 7.5 changing the volume to read-only status. We also tried three different Intel DataCenter edition SSD’s, with no change.

Источник

[Solved] LSI 9341-8i L2/L3 Cache Error.

New Member

I have just experienced what i presume is a hardware malfunctioning on my Sas9341-8i raid card.
While server was running, the raid suddenly got alot of I/O errors from a program running, that was writing to the raid, and then the raid disappeared in windows.
After reboot, i now get I/O error in device manager in Windows Server 2012 R2.
And after POST, it tells me:L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.

Tom5051

Active Member

These cards are generally pretty reliable when they report a problem like this. I would suggest replacing the controller with another one that is known to work, hopefully you can borrow one off a friend?
Otherwise replace the controller, a firmware update is unlikely to be successful nor cure the card.
Has it got the correct airflow over the card, they get pretty hot.

Also you didn’t say what level of RAID the array was built with. It’s possible that with a replacement controller, the array will still be optimal but there is always the chance that it has degraded or failed.
You may need backups.
The replacement controller will attempt to get the array configuration from the disks if it is still not corrupt.
Quite often I move an array of 8 disks between servers and the RAID cards pick up the array config and boots no problem.

New Member

Thanks for your reply.

The only option i have to replace it, is to buy a brand new one, so its not that easy.
Here i have some doubts, of what will happen to the excisting raid, if i replace the card ?

Its a raid 5 with 8 disks, and all disks seems to be fine.
But again, i cannot open the Mega Raid software in windows anymore.
Though in the Megaraid config during boot, it says that the array is optimal etc, and findes all disks.

What hits me as strange, with this «L2/L3 cache error», is that the card dosnt have any Cache ?

The card from what i read is known to run hot normaly, and it has been, at around 90′ degree celcius.
There has been dedicated fan straight on the card, but plenty of cabin air flow, that passes the card, which i though was sufficient.
I have now though set a 120 mm fan straight on the card, but it is probably to late.

Something still tells me, that it would be strange if this is a hardware fault, and permenent damage ?
Espesically since the card does not have any cache ?

Thanks again for any help, it is very much appreciated! since my raid is currently down 🙁

Best regards
Apil

Tom5051

Active Member
New Member

I agree, the first thing i did, was to try and google it, and got absolutely nothing.

Do you have any idea, how i check if the read/write cache is enabled/disabled ?
Is is a jumper on the board, or a bios setting ?

Best regards
Apil

Tom5051

Active Member

Tom5051

Active Member
New Member

Yea i can still access the after this error, i was looking around there yesterday, and didnt find anything interesting.
Do you know what option in there, to disable/enable ?
Els ill try to have a look around again

The server is a Dell T20 that i pull out, and placed in a custom rack mounted case, with added case fans.
An Xeon E3- 1225 v2/3 (cant remember), An Intel Dual Gig NiC, and 8 x WD Red 3 TB disks, with a Kingston 120 SSD as System disk.

I did update the bios and firmware of the motherboard, and Raid card, when i build it 6-12 months ago, because i was having problems getting the card to work, (classic cannot start hardware Error 10 in device manager), seemed to be because of that the card does not have any ram/cache, so i had to disable/enable some settings in the bios of the motherboard, to get it to start, and since then it has been running flawlessly, untill now.

New Member

Yea that makes sense, i just thought that since this is the 9341 version, and not the 9361, then there was no ram/cache on the board, and there for it utilized the ram as memeory or the cache of the cpu, since it has no dedicated memeory.

By the way, sorry for my bad english, and lack of correct terms.

Tom5051

Active Member
New Member

If needed i can provide some more SS’s.

Thanks again
-Apil

New Member

If needed i can provide some more SS’s.

Thanks again
-Apil

vanfawx

Active Member

Unfortunately I think it’s talking about the on-board L2/L3 cache of the raid card CPU, not the on-board RAM cache. If the CPU L2/L3 cache has failed, then it’s a sign the CPU itself might be failing on the raid card.

Hope that helps.

New Member

New Member

Maybe i should try to downgrade the FW ?
[Edit] Trying to update the driver in windows now.

New Member

vanfawx

Active Member

Tom5051

Active Member
New Member

stin9ray

New Member

thank you for posting the above. It helped to figure out what was going on.

And I have some good news as well: In my case I did not even have to re-flash the firmware. Here is a description of what happened to hopefully help others, but also for myself in case this happens again 😉

Setup: I am using the controller for a FreeNAS VM running on ESXi with the controller handed through to the VM. As preferred for zfs usage of course I use JBOD only, so there was no controller level raid that I had to worry about. In my case the controller is a 3008 SAS on the mobo.

Situation: shutting down the FreeNAS VM hard reset or purple screened the ESXi server. On the next boot vSphere would restart the VM and I’d be back to square one. Disabling vSphere HA helped to finally get into ESXi maintenance mode. However, somewhere in the half a dozen crashes or so I am guessing that the configuration stored on the controller got corrupted.

In FreeNAS I saw this in the system log:

Jan 6 12:32:39 fns mfi0: port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
Jan 6 12:32:39 fns mfi0: port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6

To make the nested setup work, in the intel mobo BIOS I had the Oprom Control for the controller disabled. After I went into the bios and re-enabled the oprom:

  • F2 on boot to get into BIOS
  • «Setup Menu»
  • «Advanced»
  • «PCI Configuration»
  • «PCIe Port Oprom Control»
  • «Enabled» on all entries

On boot I got exactly the same error during boot as Apil posted at the beginning of the thread:

Pressing X to continue and crtl-r to get into the raid controller bios I set the controller to factory defaults:

  • Ctrl-n twoce to get to the «Ctrl Mgmt» page
  • lots of tab to get to «Set Factory Defaults»
  • Ctrl-s to save
  • lots of esc to get all the way out to the prompt that tells you to use Alt-Crtl-Del

On the next boot the error was not there any more and it listed the connected physical (jbod) drives instead as per normal. Yes.

Clean-up: back into the mobo bios to disable the oprom for the controller

After booting ESXi, turning vSphere HA back on and booting the FreeNAS VM the controller, all the disks, and the zfs mirrored pool were back as if nothing had ever happened.

Update 2018-09-08: I am glad I made this post because it just saved my bacon again. Somebody (kids) stacked boxes in front of my home server rack and I am assuming the controller overheated being cooked by all the disks. The LSI controller probably got into an inconsistent state when it did a thermally triggered emergency shut down, and I can’t really blame it for that. Anyhow, with my own instructions I got everything back up and running, but boy is it scary when your disks go missing.

Источник

Понравилась статья? Поделить с друзьями:
  • L2 error can not resolve hostname
  • Lancer 9 ошибка иммобилайзера
  • Lan прокси запрещено как исправить windows 7
  • Lan error 00000000 орион про болид
  • Lamtec bt320 коды ошибок