Hi everybody,
thank you for posting the above. It helped to figure out what was going on.
And I have some good news as well: In my case I did not even have to re-flash the firmware. Here is a description of what happened to hopefully help others, but also for myself in case this happens again
Setup: I am using the controller for a FreeNAS VM running on ESXi with the controller handed through to the VM. As preferred for zfs usage of course I use JBOD only, so there was no controller level raid that I had to worry about. In my case the controller is a 3008 SAS on the mobo.
Situation: shutting down the FreeNAS VM hard reset or purple screened the ESXi server. On the next boot vSphere would restart the VM and I’d be back to square one. Disabling vSphere HA helped to finally get into ESXi maintenance mode. However, somewhere in the half a dozen crashes or so I am guessing that the configuration stored on the controller got corrupted.
In FreeNAS I saw this in the system log:
Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
Jan 6 12:32:39 fns mfi0: <Fury> port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
To make the nested setup work, in the intel mobo BIOS I had the Oprom Control for the controller disabled. After I went into the bios and re-enabled the oprom:
- F2 on boot to get into BIOS
- «Setup Menu»
- «Advanced»
- «PCI Configuration»
- «PCIe Port Oprom Control»
- «Enabled» on all entries
On boot I got exactly the same error during boot as Apil posted at the beginning of the thread:
Pressing X to continue and crtl-r to get into the raid controller bios I set the controller to factory defaults:
- Ctrl-n twoce to get to the «Ctrl Mgmt» page
- lots of tab to get to «Set Factory Defaults»
- Ctrl-s to save
- lots of esc to get all the way out to the prompt that tells you to use Alt-Crtl-Del
On the next boot the error was not there any more and it listed the connected physical (jbod) drives instead as per normal. Yes.
Clean-up: back into the mobo bios to disable the oprom for the controller
After booting ESXi, turning vSphere HA back on and booting the FreeNAS VM the controller, all the disks, and the zfs mirrored pool were back as if nothing had ever happened.
Update 2018-09-08: I am glad I made this post because it just saved my bacon again. Somebody (kids) stacked boxes in front of my home server rack and I am assuming the controller overheated being cooked by all the disks. The LSI controller probably got into an inconsistent state when it did a thermally triggered emergency shut down, and I can’t really blame it for that. Anyhow, with my own instructions I got everything back up and running, but boy is it scary when your disks go missing.
-
#1
Why Hello There!
This evening I performed a regular upgrade to a client’s long-standing (~2 years) hyperconverged PVE cluster. This cluster has some history but for the most part has been trouble free. The cluster is made up of homogeneous hardware, all Gen8 HPE Proliant servers with one node being a different model from the rest. The boot devices are single SSDs
The software upgrades went without issue, no warnings or errors with apt/dpkg, migration off of PVE nodes to other nodes before rebooting was occuring as expected. However, two things changed their behaviour upon reboot:
- The HBA (MegaRAID RAID Controller in IT/HBA Mode) now asks me to press «X» on startup claiming the following error:
Code:
Caution: Memory conflict detected. You may face boot problem.L2/L3 Cache error was detected on the RAID controller. Please contact technical support to resolve this issue. Press "X" to continue or else power off your system, replace the controller and reboot.
Of course, a cluster full of HBAs don’t all simultaneously fail after four Linux machines upgrade their kernels, no, this is a smart and valid detection by the HBA. However, I will need to refresh its memory in the CTRL+R Configuration Menu or if someone has a method of validating the data on the boot partitions in Linux I would greatly appreciate that.
- Grub boots to 5.15.35-1 and runs into the following repeated errors:
Code:
Volume group "pve" not foundCannot Process Volume Group pve
(Then it ends with the following)
Code:
Gave up waiting for root file system device. Common problems:
(TL;DR it said root delay or missing modules, then I get spat out to busybox)
The issue with the root delay increase is that there is already a noticeable delay, the errors repeat about 10-15 times before quitting to BusyBox so there’s no further reasonable delay that could help.
The upgrade for all nodes included the same message (because they are always upgraded at the same time and have the same software installed):
Code:
The following packages were automatically installed and are no longer required:
bsdmainutils golang-docker-credential-helpers libsecret-1-0 libsecret-common
libzpool4linux pve-kernel-5.4.114-1-pve pve-kernel-5.4.119-1-pve
python3-asn1crypto python3-dockerpycreds
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
gnutls-bin libdrm-common libdrm2 libepoxy0 libgbm1 libgnutls-dane0 libjs-qrcodejs
libjson-glib-1.0-0 libjson-glib-1.0-common libopts25 libposix-strptime-perl
libproxmox-rs-perl libtpms0 libunbound8 libvirglrenderer1 libwayland-server0
libzpool5linux proxmox-websocket-tunnel pve-kernel-5.11.22-7-pve pve-kernel-5.15
pve-kernel-5.15.35-1-pve swtpm swtpm-libs swtpm-tools
The following packages will be upgraded:
base-files bind9-dnsutils bind9-host bind9-libs bsdextrautils bsdutils btrfs-progs
ceph ceph-base ceph-common ceph-fuse ceph-mds ceph-mgr ceph-mgr-modules-core
ceph-mon ceph-osd corosync cryptsetup-bin curl dirmngr distro-info-data dnsutils
eject fdisk gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client
gpg-wks-server gpgconf gpgsm gpgv gzip krb5-locales libarchive13 libblkid1
libc-bin libc-dev-bin libc-devtools libc-l10n libc6 libc6-dev libc6-i386
libcephfs2 libcfg7 libcmap4 libcorosync-common4 libcpg4 libcryptsetup12
libcurl3-gnutls libcurl4 libexpat1 libfdisk1 libflac8 libgmp10 libgssapi-krb5-2
libgssrpc4 libjaeger libjs-jquery-ui libk5crypto3 libknet1 libkrad0 libkrb5-3
libkrb5support0 libldap-2.4-2 libldap-common libldb2 liblzma5 libmount1 libnozzle1
libnss-systemd libnss3 libntfs-3g883 libnvpair3linux libpam-modules
libpam-modules-bin libpam-runtime libpam-systemd libpam0g libperl5.32
libproxmox-acme-perl libproxmox-acme-plugins libproxmox-backup-qemu0
libpve-access-control libpve-cluster-api-perl libpve-cluster-perl
libpve-common-perl libpve-guest-common-perl libpve-http-server-perl libpve-rs-perl
libpve-storage-perl libpve-u2f-server-perl libquorum5 librados2 libradosstriper1
librbd1 librgw2 libsasl2-2 libsasl2-modules-db libseccomp2 libsmartcols1
libsmbclient libssl1.1 libsystemd0 libtiff5 libudev1 libuuid1 libuutil3linux
libvotequorum8 libwbclient0 libxml2 libzfs4linux linux-libc-dev locales lxc-pve
lxcfs lynx lynx-common mount novnc-pve ntfs-3g openssl perl perl-base
perl-modules-5.32 proxmox-backup-client proxmox-backup-file-restore
proxmox-mini-journalreader proxmox-ve proxmox-widget-toolkit pve-cluster
pve-container pve-docs pve-edk2-firmware pve-firewall pve-firmware pve-ha-manager
pve-i18n pve-kernel-5.11 pve-kernel-5.11.22-3-pve pve-kernel-helper
pve-lxc-syscalld pve-manager pve-qemu-kvm pve-xtermjs python3-ceph-argparse
python3-ceph-common python3-cephfs python3-ldb python3-pil python3-rados
python3-rbd python3-reportbug python3-rgw python3-waitress qemu-server reportbug
rsync samba-common samba-libs smartmontools smbclient spl systemd systemd-sysv
systemd-timesyncd sysvinit-utils tasksel tasksel-data tzdata udev usb.ids
util-linux uuid-runtime vim-common vim-tiny wget xxd xz-utils zfs-initramfs
zfs-zed zfsutils-linux zlib1g
185 upgraded, 24 newly installed, 0 to remove and 0 not upgraded.
Need to get 505 MB of archives.
After this operation, 949 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Please let me know if you have any guidance, I appreciate your support.
Tmanok
oguz
Proxmox Retired Staff
-
#2
hi,
* are you able to boot an older kernel at the grub menu? if yes, does it make a difference?
* have you checked for any BIOS upgrades for the servers?
* was any hardware changed recently?
-
#4
hi,
* are you able to boot an older kernel at the grub menu? if yes, does it make a difference?
* have you checked for any BIOS upgrades for the servers?
* was any hardware changed recently?
Hi Oguz,
- Yes. That is what we have been doing in the meantime, going back to 5.11 without issue but we have noticed poor disk performance.
- BIOS is not latest, but we don’t have a copy of latest or latest HBA firmware admittedly.
- No HW changes at all.
Hi Neobin,
You may be on the right track about why we cannot boot to 5.15. However, it does not explain our disk performance degradation so maybe booting into 5.15 is worth a try with iommu disabled. Also, to clarify G8 HP = R#20 series, so R620 = HP DL360 Gen8, R720 = HP DL380 Gen8.
Thank you both for your time and support!
Tmanok
Last edited: Jun 8, 2022
Question Replacement raid controller on system x 3650 M5
-
Thread starterIngenetic
-
Start dateMar 6, 2020
-
- May 6, 2019
-
- 8
-
- 0
-
- 10
- 0
-
#1
I wanna ask for some informartion, i have 1 server system x 3650 M5, using RAID 10, with 4 HDD, suddenly my server doesn’t reboot correctly, there is a warning :
L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.
Then i press x to continue, then the server boot normally to my centos 7,
For now, i have plan to replace the raid controller which model is SERVERAID MS1215 SAS/SATA,
I will replace it with the same model M1215 too,
The big question for me, is it my existing raid will be destroyed when i replace the raid controller? Or the hdd will detect raid normally because it’s a same model then boot normally to my OS ?
Please advice.
Thankyou and regards,
InGenetic
-
- Jan 16, 2014
-
- 6,632
-
- 954
-
- 35,940
- 1,065
-
#2
Thread starter | Similar threads | Forum | Replies | Date |
---|---|---|---|---|
T
|
Question BIOS stuck on «Hit Control I to go to the configuration raid utility» ? | Prebuilt & Enterprise | 1 | Saturday at 5:02 PM |
|
Question Prebuilt System for Gaming and Content Creation (video editing) | Prebuilt & Enterprise | 3 | Feb 2, 2023 |
|
Question 2 Systems for TV’s in the Office | Prebuilt & Enterprise | 2 | Dec 7, 2022 |
|
Question New System Purchase | Prebuilt & Enterprise | 9 | Dec 4, 2022 |
|
Question Dell PowerEdge C6145 questions: fan control, each motherboard runs separately or together? | Prebuilt & Enterprise | 2 | Nov 24, 2022 |
|
Question Help me upgrade my system | Prebuilt & Enterprise | 4 | Nov 4, 2022 |
T
|
Question System has Power but No Post or Display | Prebuilt & Enterprise | 1 | Sep 15, 2022 |
E
|
[SOLVED] Can’t access BIOS after HDD format ? | Prebuilt & Enterprise | 3 | Sep 12, 2022 |
|
Question How to connect rgb fan to msi mag codex 5 10th, without rgb header | Prebuilt & Enterprise | 9 | Sep 6, 2022 |
|
Question Computer not POSTing after replacing the hard drive ? | Prebuilt & Enterprise | 6 | Jul 22, 2022 |
- Advertising
- Cookies Policies
- Privacy
- Term & Conditions
- Topics
Hi,
I was copying files on datastore2 of my ESXi 5.1.0 when the copy process failed and the datastore showed negative free space. Restarting the machine didn’t help, but after some investigation I noticed that the RAID 0 logical dirve has failed. The status of the physical disks show OK. Here are the outputs:
/opt/hp/hpssacli/bin # ./hpssacli ctrl all show config detail
Smart Array P410 in Slot 4
Bus Interface: PCI
Slot: 4
Serial Number: PACCRID122003KK
Cache Serial Number: PAAVPID1152078F
RAID 6 (ADG) Status: Disabled
Controller Status: OK
Hardware Revision: C
Firmware Version: 5.14
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: OK
Cache Status Details: A cache error was detected. Run a diagnostic report for more information.
Cache Ratio: 25% Read / 75% Write
Drive Write Cache: Disabled
Total Cache Size: 512 MB
Total Cache Memory Available: 400 MB
No-Battery Write Cache: Disabled
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Number of Ports: 2 Internal only
Driver Name: hpsa
Driver Version: 5.0.0-21vmw
Driver Supports HP SSD Smart Path: False
Array: A
Interface Type: SATA
Unused Space: 0 MB
Status: OK
Array Type: Data
Logical Drive: 1
Size: 232.9 GB
Fault Tolerance: 1
Heads: 255
Sectors Per Track: 32
Cylinders: 59844
Strip Size: 256 KB
Full Stripe Size: 256 KB
Status: OK
Caching: Enabled
Unique Identifier: 600508B1001CBF6A42A47C82264D178D
Disk Name: vmhba2:C0:T0:L1
Mount Points: None
Logical Drive Label: A0C6705BPACCRID122003KKE364
Mirror Group 0:
physicaldrive 2I:0:5 (port 2I:box 0:bay 5, SATA, 250 GB, OK)
Mirror Group 1:
physicaldrive 2I:0:6 (port 2I:box 0:bay 6, SATA, 250 GB, OK)
Drive Type: Data
LD Acceleration Method: Controller Cache
physicaldrive 2I:0:5
Port: 2I
Box: 0
Bay: 5
Status: OK
Drive Type: Data Drive
Interface Type: SATA
Size: 250 GB
Native Block Size: 512
Rotational Speed: 7200
Firmware Revision: HPG7
Serial Number: Z2AS7ZW0
Model: ATA VB0250EAVER
SATA NCQ Capable: True
SATA NCQ Enabled: True
PHY Count: 1
PHY Transfer Rate: 3.0Gbps
physicaldrive 2I:0:6
Port: 2I
Box: 0
Bay: 6
Status: OK
Drive Type: Data Drive
Interface Type: SATA
Size: 250 GB
Native Block Size: 512
Rotational Speed: 7200
Firmware Revision: HPG7
Serial Number: Z2ARX763
Model: ATA VB0250EAVER
SATA NCQ Capable: True
SATA NCQ Enabled: True
PHY Count: 1
PHY Transfer Rate: 3.0Gbps
Array: B
Interface Type: SATA
Unused Space: 0 MB
Status: OK
Array Type: Data
Logical Drive: 2
Size: 5.5 TB
Fault Tolerance: 0
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 256 KB
Full Stripe Size: 256 KB
Status: Failed
Caching: Enabled
Unique Identifier: 600508B1001C29307A23E7A75EAFF878
Disk Name: unknown
Mount Points: None
Logical Drive Label: A019CCEFPACCRID122003KK75DB
Drive Type: Data
LD Acceleration Method: Controller Cache
physicaldrive 2I:0:7
Port: 2I
Box: 0
Bay: 7
Status: OK
Drive Type: Data Drive
Interface Type: SATA
Size: 3 TB
Native Block Size: 4096
Rotational Speed: 7200
Firmware Revision: CV13
Serial Number: Z1F1159J
Model: ATA ST3000VX000-9YW1
SATA NCQ Capable: True
SATA NCQ Enabled: True
PHY Count: 1
PHY Transfer Rate: 3.0Gbps
physicaldrive 2I:0:8
Port: 2I
Box: 0
Bay: 8
Status: OK
Drive Type: Data Drive
Interface Type: SATA
Size: 3 TB
Native Block Size: 4096
Rotational Speed: 7200
Firmware Revision: CV13
Serial Number: Z1F115AR
Model: ATA ST3000VX000-9YW1
SATA NCQ Capable: True
SATA NCQ Enabled: True
PHY Count: 1
PHY Transfer Rate: 3.0Gbps
SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250
Device Number: 250
Firmware Version: RevC
WWID: 5001438021D4731F
Vendor ID: PMCSIERA
Model: SRC 8x6G
/opt/hp/hpssacli/bin # ./hpssacli ctrl all show config
Smart Array P410 in Slot 4 (sn: PACCRID122003KK)
array A (SATA, Unused Space: 0 MB)
logicaldrive 1 (232.9 GB, RAID 1, OK)
physicaldrive 2I:0:5 (port 2I:box 0:bay 5, SATA, 250 GB, OK)
physicaldrive 2I:0:6 (port 2I:box 0:bay 6, SATA, 250 GB, OK)
array B (SATA, Unused Space: 0 MB)
logicaldrive 2 (5.5 TB, RAID 0, Failed)
physicaldrive 2I:0:7 (port 2I:box 0:bay 7, SATA, 3 TB, OK)
physicaldrive 2I:0:8 (port 2I:box 0:bay 8, SATA, 3 TB, OK)
SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 5001438021D4731F)
What can I do to recover the data? If I delete and recreate the RAID 0 volume will the data be lost?
Any pointers appreciated!
Replacement raid controller on system x 3650 M5
Ingenetic
New Member
I wanna ask for some informartion, i have 1 server system x 3650 M5, using RAID 10, with 4 HDD, suddenly my server doesn’t reboot correctly, there is a warning :
L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.
Then i press x to continue, then the server boot normally to my centos 7,
For now, i have plan to replace the raid controller which model is SERVERAID MS1215 SAS/SATA,
I will replace it with the same model M1215 too,
The big question for me, is it my existing raid will be destroyed when i replace the raid controller? Or the hdd will detect raid normally because it’s a same model then boot normally to my OS ?
Thankyou and regards,
kapone
Well-Known Member
Ingenetic
New Member
i’m using Raid 10 with 4 HDD , but suddenly i got the warning like in the 1st post.
i’m not experience with raid controller replacement, i just ever experience when one hard drive fail , then replace with the new one.
ari2asem
Active Member
my experience with areca cards.
i have a machine with 12 hdd, it went dead about 15 years ago. raid-6 setup with 12* 250gb. no hot-spare. areca card had 16 sata ports, not a sff-ports. but real sata ports.
almost 1 year ago i bougth another areca card with 4* sff-8087 ports. connecting 12 hdd»s via 3* sff-8087—>> sata cables to totally different areca card, i could recover my files with a recovery program (getdataback). in windows 10.
i didn’t change any raid-setting, neither any file syatem change. my raid-volume was 15 years ago not visible under windows xp. neither it was visable under windows 10.
i just swap areca cards, no raid rebuilding, run getdataback (it was ntfs file system) and recover my files.
i should say: give it a shot and try it.
your situation is not that bad, meaning you just replace your dead card with the same model.
i replaced my card with totally different model (but same brand) and i was able to recover my files.
just try it and keep us updated with your progress.
Ingenetic
New Member
my experience with areca cards.
i have a machine with 12 hdd, it went dead about 15 years ago. raid-6 setup with 12* 250gb. no hot-spare. areca card had 16 sata ports, not a sff-ports. but real sata ports.
almost 1 year ago i bougth another areca card with 4* sff-8087 ports. connecting 12 hdd»s via 3* sff-8087—>> sata cables to totally different areca card, i could recover my files with a recovery program (getdataback). in windows 10.
i didn’t change any raid-setting, neither any file syatem change. my raid-volume was 15 years ago not visible under windows xp. neither it was visable under windows 10.
i just swap areca cards, no raid rebuilding, run getdataback (it was ntfs file system) and recover my files.
i should say: give it a shot and try it.
your situation is not that bad, meaning you just replace your dead card with the same model.
i replaced my card with totally different model (but same brand) and i was able to recover my files.
just try it and keep us updated with your progress.
Thanks ari2asem for your advice,
did anyone here having the same experience ? i mean how about to replacing the raid controller with the same model , and how it works ? it will have to rebuilding raid or run the controller will detect the existing raid, and system up normally detected all hdd ?
Источник
L2 l3 cache error was detected on the raid controller
Success! Subscription added.
Success! Subscription removed.
Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.
- Intel Communities
- Product Support Forums
- Server Products
- Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)
Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:
It tells me that an L2/L3 Cache error was detected on the RAID controller:
I’ve tried the F9 (reset to defaults) option, and entering some text, and pressing F10. I’ve also tried hitting X, and Ctrl-X, but I cannot resume from this screen. I’ve also tried disconnecting the power, and opening the unit and disconnecting the supercapacitor for the RAID mezzanine board for 10 minutes, and I’ve tried three different controller cards (all the same make and model). Any idea how to get the server past this error? We’re not concerned with saving any live data on the server, it is part of a test environment, and we were rebuilding it when the anomaly was discovered. in operation, we were getting various filesystem errors on the volumes that were controlled by this RAID controller (mostly on the JBOD SSD root filesystem) that resulted in CentOS 7.5 changing the volume to read-only status. We also tried three different Intel DataCenter edition SSD’s, with no change.
Источник
L2 l3 cache error was detected on the raid controller
Success! Subscription added.
Success! Subscription removed.
Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.
- Intel Communities
- Product Support Forums
- Server Products
- Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)
Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:
It tells me that an L2/L3 Cache error was detected on the RAID controller:
I’ve tried the F9 (reset to defaults) option, and entering some text, and pressing F10. I’ve also tried hitting X, and Ctrl-X, but I cannot resume from this screen. I’ve also tried disconnecting the power, and opening the unit and disconnecting the supercapacitor for the RAID mezzanine board for 10 minutes, and I’ve tried three different controller cards (all the same make and model). Any idea how to get the server past this error? We’re not concerned with saving any live data on the server, it is part of a test environment, and we were rebuilding it when the anomaly was discovered. in operation, we were getting various filesystem errors on the volumes that were controlled by this RAID controller (mostly on the JBOD SSD root filesystem) that resulted in CentOS 7.5 changing the volume to read-only status. We also tried three different Intel DataCenter edition SSD’s, with no change.
Источник
L2 l3 cache error was detected on the raid controller
Success! Subscription added.
Success! Subscription removed.
Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.
- Intel Communities
- Product Support Forums
- Server Products
- Re: Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)
Intel Integraded RAID module RMS3CC0 reporting L2/L3 cache error (Intel S2600WFT)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a server that refuses to boot. I’ve tried three replacement RMS3CC0 RAID controllers, but I always get stopped on boot (in the POST environemnt) by the «Driver Health Manager»:
It tells me that an L2/L3 Cache error was detected on the RAID controller:
I’ve tried the F9 (reset to defaults) option, and entering some text, and pressing F10. I’ve also tried hitting X, and Ctrl-X, but I cannot resume from this screen. I’ve also tried disconnecting the power, and opening the unit and disconnecting the supercapacitor for the RAID mezzanine board for 10 minutes, and I’ve tried three different controller cards (all the same make and model). Any idea how to get the server past this error? We’re not concerned with saving any live data on the server, it is part of a test environment, and we were rebuilding it when the anomaly was discovered. in operation, we were getting various filesystem errors on the volumes that were controlled by this RAID controller (mostly on the JBOD SSD root filesystem) that resulted in CentOS 7.5 changing the volume to read-only status. We also tried three different Intel DataCenter edition SSD’s, with no change.
Источник
[Solved] LSI 9341-8i L2/L3 Cache Error.
New Member
I have just experienced what i presume is a hardware malfunctioning on my Sas9341-8i raid card.
While server was running, the raid suddenly got alot of I/O errors from a program running, that was writing to the raid, and then the raid disappeared in windows.
After reboot, i now get I/O error in device manager in Windows Server 2012 R2.
And after POST, it tells me:L2/L3 Cache error was detected on the RAID controller.
«Please contact technical support to resolve this issue. Press ‘X’ to continue or else power off the system, replace the controller and reboot.
Tom5051
Active Member
These cards are generally pretty reliable when they report a problem like this. I would suggest replacing the controller with another one that is known to work, hopefully you can borrow one off a friend?
Otherwise replace the controller, a firmware update is unlikely to be successful nor cure the card.
Has it got the correct airflow over the card, they get pretty hot.
Also you didn’t say what level of RAID the array was built with. It’s possible that with a replacement controller, the array will still be optimal but there is always the chance that it has degraded or failed.
You may need backups.
The replacement controller will attempt to get the array configuration from the disks if it is still not corrupt.
Quite often I move an array of 8 disks between servers and the RAID cards pick up the array config and boots no problem.
New Member
Thanks for your reply.
The only option i have to replace it, is to buy a brand new one, so its not that easy.
Here i have some doubts, of what will happen to the excisting raid, if i replace the card ?
Its a raid 5 with 8 disks, and all disks seems to be fine.
But again, i cannot open the Mega Raid software in windows anymore.
Though in the Megaraid config during boot, it says that the array is optimal etc, and findes all disks.
What hits me as strange, with this «L2/L3 cache error», is that the card dosnt have any Cache ?
The card from what i read is known to run hot normaly, and it has been, at around 90′ degree celcius.
There has been dedicated fan straight on the card, but plenty of cabin air flow, that passes the card, which i though was sufficient.
I have now though set a 120 mm fan straight on the card, but it is probably to late.
Something still tells me, that it would be strange if this is a hardware fault, and permenent damage ?
Espesically since the card does not have any cache ?
Thanks again for any help, it is very much appreciated! since my raid is currently down 🙁
Best regards
Apil
Tom5051
Active Member
New Member
I agree, the first thing i did, was to try and google it, and got absolutely nothing.
Do you have any idea, how i check if the read/write cache is enabled/disabled ?
Is is a jumper on the board, or a bios setting ?
Best regards
Apil
Tom5051
Active Member
Tom5051
Active Member
New Member
Yea i can still access the after this error, i was looking around there yesterday, and didnt find anything interesting.
Do you know what option in there, to disable/enable ?
Els ill try to have a look around again
The server is a Dell T20 that i pull out, and placed in a custom rack mounted case, with added case fans.
An Xeon E3- 1225 v2/3 (cant remember), An Intel Dual Gig NiC, and 8 x WD Red 3 TB disks, with a Kingston 120 SSD as System disk.
I did update the bios and firmware of the motherboard, and Raid card, when i build it 6-12 months ago, because i was having problems getting the card to work, (classic cannot start hardware Error 10 in device manager), seemed to be because of that the card does not have any ram/cache, so i had to disable/enable some settings in the bios of the motherboard, to get it to start, and since then it has been running flawlessly, untill now.
New Member
Yea that makes sense, i just thought that since this is the 9341 version, and not the 9361, then there was no ram/cache on the board, and there for it utilized the ram as memeory or the cache of the cpu, since it has no dedicated memeory.
By the way, sorry for my bad english, and lack of correct terms.
Tom5051
Active Member
New Member
If needed i can provide some more SS’s.
Thanks again
-Apil
New Member
If needed i can provide some more SS’s.
Thanks again
-Apil
vanfawx
Active Member
Unfortunately I think it’s talking about the on-board L2/L3 cache of the raid card CPU, not the on-board RAM cache. If the CPU L2/L3 cache has failed, then it’s a sign the CPU itself might be failing on the raid card.
Hope that helps.
New Member
New Member
Maybe i should try to downgrade the FW ?
[Edit] Trying to update the driver in windows now.
New Member
vanfawx
Active Member
Tom5051
Active Member
New Member
stin9ray
New Member
thank you for posting the above. It helped to figure out what was going on.
And I have some good news as well: In my case I did not even have to re-flash the firmware. Here is a description of what happened to hopefully help others, but also for myself in case this happens again 😉
Setup: I am using the controller for a FreeNAS VM running on ESXi with the controller handed through to the VM. As preferred for zfs usage of course I use JBOD only, so there was no controller level raid that I had to worry about. In my case the controller is a 3008 SAS on the mobo.
Situation: shutting down the FreeNAS VM hard reset or purple screened the ESXi server. On the next boot vSphere would restart the VM and I’d be back to square one. Disabling vSphere HA helped to finally get into ESXi maintenance mode. However, somewhere in the half a dozen crashes or so I am guessing that the configuration stored on the controller got corrupted.
In FreeNAS I saw this in the system log:
Jan 6 12:32:39 fns mfi0: port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
Jan 6 12:32:39 fns mfi0: port 0xb000-0xb0ff mem 0xfcef0000-0xfcefffff,0xfcd00000-0xfcdfffff irq 17 at device 0.0 on pci28
Jan 6 12:32:39 fns mfi0: Using MSI
Jan 6 12:32:39 fns mfi0: Megaraid SAS driver Ver 4.23
Jan 6 12:32:39 fns mfi0: Firmware fault
Jan 6 12:32:39 fns mfi0: Firmware not in READY state, error 6
Jan 6 12:32:39 fns device_attach: mfi0 attach returned 6
To make the nested setup work, in the intel mobo BIOS I had the Oprom Control for the controller disabled. After I went into the bios and re-enabled the oprom:
- F2 on boot to get into BIOS
- «Setup Menu»
- «Advanced»
- «PCI Configuration»
- «PCIe Port Oprom Control»
- «Enabled» on all entries
On boot I got exactly the same error during boot as Apil posted at the beginning of the thread:
Pressing X to continue and crtl-r to get into the raid controller bios I set the controller to factory defaults:
- Ctrl-n twoce to get to the «Ctrl Mgmt» page
- lots of tab to get to «Set Factory Defaults»
- Ctrl-s to save
- lots of esc to get all the way out to the prompt that tells you to use Alt-Crtl-Del
On the next boot the error was not there any more and it listed the connected physical (jbod) drives instead as per normal. Yes.
Clean-up: back into the mobo bios to disable the oprom for the controller
After booting ESXi, turning vSphere HA back on and booting the FreeNAS VM the controller, all the disks, and the zfs mirrored pool were back as if nothing had ever happened.
Update 2018-09-08: I am glad I made this post because it just saved my bacon again. Somebody (kids) stacked boxes in front of my home server rack and I am assuming the controller overheated being cooked by all the disks. The LSI controller probably got into an inconsistent state when it did a thermally triggered emergency shut down, and I can’t really blame it for that. Anyhow, with my own instructions I got everything back up and running, but boy is it scary when your disks go missing.
Источник