Communications error with the storage controller

HPE MSA 2040 Unresponsive, Dead, or Failed Controller, Controller won’t boot This weekend I came across a big issue with my HPE MSA 2040 where one of the SAN controllers became unresponsive, and appeared to had failed because it would not boot. It all started when I decided to clean the MSA SAN. I […]

Содержание

  1. HPE MSA 2040 Unresponsive, Dead, or Failed Controller, Controller won’t boot
  2. The Problem
  3. The Fix
  4. Disclaimer
  5. Update – 24 Hours later
  6. Update – April 2nd 2019
  7. 18 Responses to “HPE MSA 2040 Unresponsive, Dead, or Failed Controller, Controller won’t boot”
  8. Communication error with the storage controller

HPE MSA 2040 Unresponsive, Dead, or Failed Controller, Controller won’t boot

This weekend I came across a big issue with my HPE MSA 2040 where one of the SAN controllers became unresponsive, and appeared to had failed because it would not boot.

It all started when I decided to clean the MSA SAN. I try to clean the components once or twice a year to remove dust and make sure it’s not getting all jammed up. Sometimes I’ll shut the entire unit down and remove the individual components, other times I’ll remove them while operating. Because of the redundancies and since I have two controllers, I can remove and clean each controller individually at separate times.

Please Note: When dusting equipment with fans, never allow the fans to spin up with compressed air as this can generate current which can damage components. Never allow compressed air flow to spin up fans.

After cleaning out the power supplies, it came time to clean the controllers.

The Problem

As always, I logged in to the SMU to shutdown controller A (storage). I shut it down, the blue LED illuminated it was safe for removal. I then proceeded to remove it, clean it, and re-insert it. The controller came back online, and ownership of the applicable disk groups were successfully moved back. Controller A was now completed successfully. I continued to do the same for controller B: I logged in to shutdown controller B (storage). It shut down just like controller A, the blue LED removable light illuminated. I was able to remove it, clean it, and re-insert it.

However, controller B did not come back online.

After inserting controller B, the status light was flashing (as if it was booting). I waited 20 minutes with no change. The SMU on controller B was responding to HTTPS requests, however you could not log on due to the error “system is initializing”. SSH was functioning and you could log in and issue commands, however any command to get information would return “Please wait while this information is pulled from the MC controller”, and ultimately fail. The SMU on controller A would report a controller fault on controller B, and not provide any other information (including port status on controller B).

I then tried to re-seat the controller with the array still running. Gave it plenty of time with no effect.

I then removed the failed controller, shutdown the unit, powered it back on (only with controller A), and re-inserted Controller B. Again, no effect.

The Fix

At this point I’m thinking the controller may have failed or died during the cleaning process. I was just about to call HPE support for a replacement until I noticed the “Power LED” light inside of the failed controller would flash every 5 seconds while removed.

This made me start to wonder if there was an issue writing the cache to the compact flash card, or if the controller was still running off battery power but had completely frozen.

I tried these 3 things on the failed controller while it was unplugged and removed:

  1. I left the controller untouched for 1 hour out of the array (to maybe let it finish whatever it was doing while on battery power)
  2. There’s an unlabeled button on the back of the controller. As a last resort (thinking it was a reset button), I pressed and held it for 20 seconds, waited a minute, then briefly pressed it for 1 second while it was out of the unit.
  3. I removed the Compact Flash card from the controller for 1 minute, then re-inserted it. In hoping this would fail the cache copy if it was stuck in the process of writing cache to compact flash.

I then re-inserted the controller, and it booted fine! It was not functioning and working (and came up very fast). Looking at the logs, it has no record of what occurred between the first shutdown, and final boot. I hope this post helps someone else with the same issue, it can save you a support ticket, and time with a controller down.

Disclaimer

PLEASE NOTE: I could not find any information on the unlabeled button on the controller, and it’s hard to know exactly what it does. Perform this at your own risk (make sure you have a backup). Since I have 2 controllers, and my MSA 2040 was running fine on Controller A, I felt comfortable doing this, as if this did reset controller B, the configuration would replicate back from controller A. I would not do this in a single controller environment.

Update – 24 Hours later

After I got everything up and running, I checked the logs of the unit and couldn’t find anything on controller B that looked out of ordinary. However, 24 hours later, I logged back in and noticed some new events showed up from the day before (from the day I had the issues):

You’ll notice the event log with severity error:

One thing that’s very odd is that I know for a fact the time is wrong on the error severity log entry, this could be due to the fact we had a daylight savings time change last night at midnight. Either way it appears that it finally did detect that the Storage controller was in an error state and logged it, but it still would have been nice for some more information.

On a final note, the unit has been running perfectly for over 24 hours.

Update – April 2nd 2019

Well, in March a new firmware update was released for the MSA. I went to upgrade and the same issue as above occurred. During the firmware upgrade, at one point of the firmware update process a step had failed and repeated 4 times until successful.

The firmware update log (below was repeated):

During the Storage Controller restarting process, the controller never came back up. I removed the controller 1 hour, re-inserted and the above fix did not work. I then tried it after 2 hours of disconnection.

At this point I contacted HPE, who is sending a replacement controller.

The following day (12 hours of controller removed), I re-inserted it again and it actually booted up, was working with the new firmware, and then did a PFU (Partner Firmware Update) of controller A.

While it is working now, I’m still going to replace the controller as I believe something is not functioning correctly.

18 Responses to “HPE MSA 2040 Unresponsive, Dead, or Failed Controller, Controller won’t boot”

Thank You for the great post it saved my life 🙂
I had exactly the same issue. It has begun with a host port IP setting.
After a few minutes controller B showed up the CopactFlash fault.
First i tryed to reboot the controller and aftert this step instant controller failure.
After it i unplugged the controller ad did the reset via your advice and a miracle is happened 🙂

Glad to hear it helped! Always be careful though, doing this will corrupt or clear the flash and compact flash on the controller… Only do this if the other controller is valid, responding, functioning properly, and has an OK status.

Dear Steve,
We are migrating from Netapp to a HPE MSA 2050. I’ve just started to work with it and I got the error message. I have no experience with the HPE, but it looks fairly straight forward in installing, much better then the Netapp. I will be visiting your blog in the next couple of weeks whilst setting up the HPE. 😉
By the way is there a big difference between the 2040 and 2050?

Have a great day,
André

Glad to hear, the MSA storage arrays are beautiful products!

There is a few differences, the main being that the 205x series are the next generation agyer the 204x series.

Hi Stephen! My name is Alex, i got some problem with my MSA, I hope You can give me good advice. So, i Have two MC on HP MSA, each of MC connected via FibreChannel modules to HP Proliant Servers throught SAN switch. And there are dozen of VMs on servers. I need to replace MC B, without stopping (power off) whole MSA. How do You think, is it possible without data loss?

As long as you configured the unit properly, and have the proper channels of redundancy (which you likely do, if the controller has failed and everything is still working), you just simply follow the standard procedures for swapping the controller.

No data loss should occur if the unit was configured properly, and if you follow standard procedures.

Stephen, thanx for fast answer, yeah, looks like everything configured good, but i have to be sure. Well, i do not really know, witch MC is bad, and a problem i have – a can’t see any mappings via web-interface or CLI, but DG, volumes and other stuff i can see clear. So, HP sent me new MC, i whatnt to make hot replace, but data in MSA very important…

You need to identify which controller is bad first.

Have you logged on to the SMU on both controllers? One should have good information.

No, i cant see any mappings via A and via B MC. And in CLI – too, but “show volume-maps” – still works. And other one symptom – web-interface often unavailable, just disconnecting.

Have you tried to restart one of the management controllers (not the storage controller)?

What did HPe advise?

Are there any amber lights on the back of the storage controllers?

There is no any amber lights. All HP recommendations i made at all (restart of MSA, FW upgrade and downgrade, etc,) – no result. They sent me new MC and wants me to replace it, but i cant stop MSA in september…

Did you restart the Storage Controllers, or Management Controllers?

The storage controllers handle the actual storage, whereas the management controllers provide the CLI and SMU.

Sometimes the management controllers can crash, but everything still works. In this case it would require a restart of only the management controllers.

However, if you’ve restarted the SC’s, the MCs, and the entire SAN with no effect, I’m wondering if there could be an issue with the midplane or something.

When you log on to vSphere, are all paths active? If some paths are offline, you can use this to determine which storage controller may be problematic. In this case, if there are paths offline, I’d replace the storage controller that has the failed paths.

Yes, Stephen, i restarted MCs and SCs, and all the paths online…HPe 2 and 3 level support do not have descision still. As for me – some issue, that common for MCs: Compact Flash card trouble, Firmware trouble…Maybe the fact important that I use virtual volumes.

I have HP MSA 2040 storage which we are going to repurpose. The device is stacked but I am unable to take its serial and it always says COM4/7 doesn’t exists. So, I reset the device and in next attempt I was able to connect through serial port and when i tried to reset it to default settings it started giving me below error.

(none) login: restoredefaults
Password:

2019-09-09 14:59:05: Running MC-only restoredefaults on partner controller.
2019-09-09 14:59:05: Running MC-only restoredefaults on this controller.
2019-09-09 14:59:05: Perform Restore Defaults
/app/default/restoreDefaults.sh: source: line 96: can’t open ‘/mem/relinfo/features.env’

So I dont know its password and i cant reset it as well. Any idea how to fix it?

Can you try reflashing the firmware using the restoredefaults account? I’m not familiar with that account/method, but I’m curious if it’s possible.

Do you have active warranty? I’d check the serial number, and if it has warranty, try submitting a ticket with HPe.

I have HP MSA 2040 storage with two controllers and with D2700 disk enclosure with two controllers.
Now SMU periodically reports warning (The algorithm for best-path routing selected the alternate path to the indicated disk because the I/O error count on the primary path reached its threshold.) and then corresponding disk becomes degraded. I have blinking amber light 2 on controller B of D2700 which indicates SAS port error detected by application client). Now I already have 8 degraded disks of 12 disks on controller storage B of D2700.
How do you think, can I try to restart the storage controller B without any loss of data to fix this issue?

First and foremost, make sure you have an active valid backup of the data.

I’d recommend contacting HPe and submitting a support ticket. It sounds like either a disk, backplane, D2700 controller, or SAS cable might not be functioning.

As for restarting controllers, if the path is failing on that controller, in a dual controller configuration you can simply restart it. However if it’s using that controller, I wouldn’t recommend doing that.

If you’re not going to call HPe support, I might actually recommend restarting the entire storage system properly and see what happens. But make sure you have a backup first as this could make the situation worse.

Источник

Communication error with the storage controller

Yesterday I had an issue with my VMWare farm and my SAN. Im using an HP MSA1000 on a 3 system farm. The farm stopped working altogether and restarting the MSA resolved the issue. In virtual center I brought up the logs and found this entry.

Feb 14 00:15:44 jgssesx01 vobd: Feb 14 00:15:44.125:

Feb 14 00:15:52 jgssesx01 vobd: Feb 14 00:15:52.108:

This repeats tiself about every 15-20 seconds. Is this just an indicator that the storage device is unreachable? I have 2 fiber switch in it for redundancy and find it hard to believe that both switches failed.

Message was edited by: tom howarth to remove the MS Word XML formating

  • Mark as New
  • Bookmark
  • Subscribe
  • Mute
  • Subscribe to RSS Feed
  • Permalink
  • Print
  • Report Inappropriate Content

Please can you give some infomation as to the version of ESX you are running for your Hosts and the version of vCenter.

Further what firmware is your MSA running, etc

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

Tom Howarth VCP / vExpert

VMware Communities User Moderator

Contributing author on «[VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment|http://www.amazon.co.uk/VMware-VSphere-Virtual-Infrastructure-Security/dp/0137158009/ref=sr_1_1?ie=UTF8&s=books&qid=1256146240&sr=1-1]”.

  • Mark as New
  • Bookmark
  • Subscribe
  • Mute
  • Subscribe to RSS Feed
  • Permalink
  • Print
  • Report Inappropriate Content

vSphere v4.0.0 build 162856

vCenter v4.0.0 build 162856

I dont have the firmware version for the MSA but the firmware was updated prior to implementation so it shouldn’t be too old if it’s not current.

  • Mark as New
  • Bookmark
  • Subscribe
  • Mute
  • Subscribe to RSS Feed
  • Permalink
  • Print
  • Report Inappropriate Content

I am moving your post to the dedicated vSphere forums as you will receive a better response

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

Tom Howarth VCP / vExpert

VMware Communities User Moderator

Contributing author on «[VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment|http://www.amazon.co.uk/VMware-VSphere-Virtual-Infrastructure-Security/dp/0137158009/ref=sr_1_1?ie=UTF8&s=books&qid=1256146240&sr=1-1]”.

  • Mark as New
  • Bookmark
  • Subscribe
  • Mute
  • Subscribe to RSS Feed
  • Permalink
  • Print
  • Report Inappropriate Content

There is a very, very long list of people that have had similar issues with MSA1000/MSA1500 dropping off the planet after extended periods of uptime. This is known to happen on all versions of VMware ESX. I recall having the problem way back on 2.5.0. and also now after I’ve upgraded to 4.0 in late 2009.

It seems to depend on the load generated and is probably related to the number of SCSI reservations, so snapshots, plenty of VM shutdown/startup events or VM deployments would agravate the problem. Some people report as short as a week, while others (including myself) get to run for about 3 months before a crash occurs.

There are workarounds available, such as rebooting the controllers via the serial interface every few days/weeks, but they’re clunky. We’re not that mission critical, so if we remember, we reboot the MSA about every three months, otherwise we just restart it when it crashes.

You can get some excellent details at Eric Miller’s site on http://www.msa1500cs.com on how to work around the problem and address performance and stability in general.

I’ve just noticed that HP finally posted the v7.20 firmware for public download on their web site.

One of the fixes is supposedly:

«Resolved issue of MSA persistent reservation table becoming full when used with various supported levels of VMware.»

Maybe that is (finally) the silver bullet that will resolve the «timebomb» facility the array has. I will be applying it in the next week or two.

Источник

Avatar of Williams225

Williams225

 asked on 4/30/2013

Hello experts,

I have a serious problem on a  hp p2000 storage. Both controllers (A and B) have communication problems, so the disks, volumes….are not available!!!

I was still able to login to the cli interface and GUI.

Please help me. Below are the log session and screenshots. Its an emergency

p2000storage.txt

screenshot 1
screenshot 2

Storage HardwareStorage SoftwareStorage

Avatar of undefined

Last Comment

Williams225


8/22/2022 — Mon

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.

View this solution by signing up for a free trial.

Members can start a

7-Day free trial

and enjoy unlimited access to the platform.

hum its not really the error, both controllers are unavailable

Have you tried restarting the storage controller? I can see you did «restart MC» but as far as I can see the MCs are fine and it’s the SCs that need restarting with «restart sc both» or just power cycling.

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.

View this solution by signing up for a free trial.

Members can start a

7-Day free trial

and enjoy unlimited access to the platform.

Hello andyalder I have tried to restart the storage controller from ssh and on GUI but the error message is the same : «the storage controller is not ready for this operation»
failed to restart storage controller

I have also collected the logs via ftp, and I don’t see infos about the disks, luns…. its a lil bit weird.

controllerlogs.rtf

I can see no option but to power cycle it but you may get more from HP support.

what about firmware update?

If te MC can’t talk to the SC firmware update won’t work until it’s rebooted.

@andyalder.

You are right. I have unplugged and replugged both controllers, i have restarted the storage but the result is the same, «Error: Communications error with the Storage Controller.. » 

I have tried a firmware update with no success. I am lost, don’t know what to do…

I also have Expander Controller Code Version invalid, don’t know what it means? may be thats the issue?

expender controller code version invalid, what does it mean?
Firmware Update Error
Error when scanning disks
red led in front of the storage
back of the storage

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.

View this solution by signing up for a free trial.

Members can start a

7-Day free trial

and enjoy unlimited access to the platform.

thanx @andyalder, jaymenagy.

The luns where visible when i restarted the p2000 with only one controller!  

So i have used jaymenagy procedure to update the storage expansion firmware for both storage, and after that i did a firmware update for both controllers!!!! thanx a lot!

Модераторы: Trinity admin`s, Free-lance moderator`s

buro

Advanced member
Сообщения: 88
Зарегистрирован: 13 окт 2014, 18:17
Откуда: Киев

MSA 2312SA

доброго дня.

Приключилась беда с СХД. Нет доступа по веб. через CLI пускает, но не ресет, ни выключение не работают.
Логи сбросились, только за последние 2 дня остались.

По светодиодам —
на контроллере А:
восклицательный знак — моргает оранжевый, активити — помигивает иногда зеленым
на контроллере В:
восклицательный знак — постоянно горит оранжевый, активити — постоянно горит зеленый

Если вытащить один из контроллеров. ничего не меняется в лучшую сторону.

Вот как-то так.
Куда копать?
Спасибо.


Аватара пользователя

Stranger03

Сотрудник Тринити
Сотрудник Тринити
Сообщения: 12979
Зарегистрирован: 14 ноя 2003, 16:25
Откуда: СПб, Екатеринбург
Контактная информация:

Re: MSA 2312SA

Сообщение

Stranger03 » 01 сен 2016, 09:06

buro писал(а):Куда копать?

Вообще это в сервис. Попробуйте спаять ком-кабель, подключиться к контроллеру и посмотреть вывод команд, что там.


buro

Advanced member
Сообщения: 88
Зарегистрирован: 13 окт 2014, 18:17
Откуда: Киев

Re: MSA 2312SA

Сообщение

buro » 01 сен 2016, 09:12

ком кабель есть, а какие команды вводить?
вводил такие:
show configuration
show version
но толку от них, на данном контроллере ничего они не показывают — как бы нет контроллера.

Что еще можно предпринять?

З.Ы. сервис хорошо, но оф. от НР уже закончилась, а сторонней организации разбирающейся пока не нашел.

Спасибо.


Аватара пользователя

Stranger03

Сотрудник Тринити
Сотрудник Тринити
Сообщения: 12979
Зарегистрирован: 14 ноя 2003, 16:25
Откуда: СПб, Екатеринбург
Контактная информация:

Re: MSA 2312SA

Сообщение

Stranger03 » 01 сен 2016, 09:19

buro писал(а):ком кабель есть, а какие команды вводить?
вводил такие:
show configuration
show version

А там есть возможность просмотреть лог? Типа show log? Вообще по идее при начальной загрузке в терминале должно бы быть видны ошибки.


ostrov

Power member
Сообщения: 37
Зарегистрирован: 31 мар 2005, 17:26
Откуда: Мытищи Московской области
Контактная информация:

Re: MSA 2312SA

Сообщение

ostrov » 02 сен 2016, 15:17

На MSA2324SA команда HELP выводит все команды и синтаксис, остается только выбрать нужное


buro

Advanced member
Сообщения: 88
Зарегистрирован: 13 окт 2014, 18:17
Откуда: Киев

Re: MSA 2312SA

Сообщение

buro » 11 окт 2016, 15:53

Что имеем на сегодняшний день.

Контроллер А заводится, горят зеленым светодиод «ОК» и «Cache», в CLI видим вот такую картину:

Код: Выделить всё

FLASH LOADER v12.015 Dec  4 2008 11:32:25

MANAGEMENT CONTROLLER

<Hold down the spacebar for Loader Menu>

Loading controller...

Use new code in primary.
...copy primary...decomp [lzma]...exit loader...


Starting VxWorks ...
initElanRegs Done...
Dumping chipset info...
REVID      0xFFFEF000  0x11
CPUCTL     0xFFFEF002  0x1
HBCTL      0xFFFEF060  0x500
SYSARBCTL  0xFFFEF070  0x2
SYARBMENB  0xFFFEF072  0x8
ARBPRICTL  0xFFFEF074  0x40000F03
PIOFS L    0xFFFEFC20  0x2004
PIOFS H    0xFFFEFC22  0x14
PIO DIR L  0xFFFEFC2A  0x2B9
PIO DIR H  0xFFFEFC2C  0x0
PIO DAT L  0xFFFEFC30  0xEE3F
PIO DAT H  0xFFFEFC32  0xFFFB
Chipset info done.
MC CODE: W441R57 Oct  2 2013 15:31:51



Hold down Spacebar key to enter diags...

Entering POST...
Checking PCI bus integrity...Test passed.
IP  = 010.100.001.201
MAC = 00:c0:ff:10:70:e7
POST Done.
COM 2

-----------------------------------------
Copyright (c) 2006
BUILD TIME: Oct  2 2013  15:31:45
-----------------------------------------
IP Address = 010.100.001.201
IP Mask    = 255.255.255.000
Gateway    = 010.100.001.249
MAC        = 00:c0:ff:10:70:e7
HP StorageWorks MSA2312sa
System Name:
System Location:
Version: W441R57
#

И больше ни на что не отвечает

Контроллер В мигает оранжевым светодиод «!» в CLI имеем следующее:

Код: Выделить всё

Loading controller...

Use new code in primary.
...copy primary...decomp [lzma]...exit loader...


Starting VxWorks ...
initElanRegs Done...
Dumping chipset info...
REVID      0xFFFEF000  0x11
CPUCTL     0xFFFEF002  0x1
HBCTL      0xFFFEF060  0x500
SYSARBCTL  0xFFFEF070  0x2
SYARBMENB  0xFFFEF072  0x8
ARBPRICTL  0xFFFEF074  0x40000F03
PIOFS L    0xFFFEFC20  0x2004
PIOFS H    0xFFFEFC22  0x14
PIO DIR L  0xFFFEFC2A  0x2B9
PIO DIR H  0xFFFEFC2C  0x0
PIO DAT L  0xFFFEFC30  0xFE3F
PIO DAT H  0xFFFEFC32  0xFFF9
Chipset info done.
MC CODE: W441P28 Oct 19 2010 13:17:12



Hold down Spacebar key to enter diags...

Entering POST...
Checking PCI bus integrity...Test passed.
IP  = 010.000.000.001
MAC = 00:50:13:00:00:aa
POST Done.
COM 2

-----------------------------------------
Copyright (c) 2006
BUILD TIME: Oct 19 2010  13:17:02
-----------------------------------------
IP Address = 010.000.000.001
IP Mask    = 255.255.255.000
Gateway    = 000.000.000.000
MAC        = 00:50:13:00:00:aa
HP StorageWorks MSA2312sa
System Name:
System Location:
Version: W441P28

Отвечает на Ентер, но по командам глухо:

Код: Выделить всё

# show controller
controller-date     controllers
# show controllers
Cannot obtain data lock. Will continue to wait
Cannot obtain data lock. Will continue to wait
Controllers
-----------
Controller ID: A
Serial Number: N/A
Hardware Version:
CPLD Version:
MAC Address: 00:00:00:00:00:00
WWNN: 0000000000000000
IP Address: No_data
IP Subnet Mask: No_data
IP Gateway: No_data
Disks: 0
Vdisks: 0
Cache Memory Size (MB): 0
Host Ports: 0
Disk Channels: 0
Disk Bus Type: Unknown
Status: Running
Failed Over: No
Fail Over Reason: Not applicable

Controller ID: B
Serial Number: N/A
Hardware Version:
CPLD Version:
MAC Address: 00:00:00:00:00:00
WWNN: 0000000000000000
IP Address: No_data
IP Subnet Mask: No_data
IP Gateway: No_data
Disks: 0
Vdisks: 0
Cache Memory Size (MB): 0
Host Ports: 0
Disk Channels: 0
Disk Bus Type: Unknown
Status: Running
Failed Over: No
Fail Over Reason: Not applicable

Error: Communications Error with Storage Controller (-2194) - Could not read SAS info, so skipping enclosure 0.

Error: Communications Error with Storage Controller (-2194) - Could not read SAS info, so skipping enclosure 0.
#
#
# show controllers
Cannot obtain data lock. Will continue to wait
Controllers
-----------
Controller ID: A
Serial Number: N/A
Hardware Version:
CPLD Version:
MAC Address: 00:00:00:00:00:00
WWNN: 0000000000000000
IP Address: No_data
IP Subnet Mask: No_data
IP Gateway: No_data
Disks: 0
Vdisks: 0
Cache Memory Size (MB): 0
Host Ports: 0
Disk Channels: 0
Disk Bus Type: Unknown
Status: Running
Failed Over: No
Fail Over Reason: Not applicable

Controller ID: B
Serial Number: N/A
Hardware Version:
CPLD Version:
MAC Address: 00:00:00:00:00:00
WWNN: 0000000000000000
IP Address: No_data
IP Subnet Mask: No_data
IP Gateway: No_data
Disks: 0
Vdisks: 0
Cache Memory Size (MB): 0
Host Ports: 0
Disk Channels: 0
Disk Bus Type: Unknown
Status: Running
Failed Over: No
Fail Over Reason: Not applicable

Error: Communications Error with Storage Controller (-2194) - Could not read SASPress any key to continue (Q to qu info, so skipping enclosure 0.

Error: Communications Error with Storage Controller (-2194) - Could not read SASPress any key to continue (Q to qu info, so skipping enclosure 0.
#
#
#
#
#
# show system
System Information
------------------
System Name:
System Contact:
System Location:
System Information:
Vendor Name: HP StorageWorks
Product ID: MSA2312sa
Product Brand: MSA Storage
SCSI Vendor ID: HP
Enclosure Count: 0
Health: OK
Supported Locales: English (English), Spanish (español), French (français), German (Deutsch), Italian (italiano), Japanese (日本語), Dutch (Nederlands), Chinese-simplified (简体中文), Chinese-traditional (繁體中文), Korean (한국어)
#

# show debug-log-parameters
Cannot obtain data lock. Will continue to wait
Error: Communications Error with Storage Controller (-2194)
#

=-=-=-=-=-=—=-=-=

Возможно их как-то вылечить или сбросить на дефолт?
Помогите, кто чем может.


Аватара пользователя

Stranger03

Сотрудник Тринити
Сотрудник Тринити
Сообщения: 12979
Зарегистрирован: 14 ноя 2003, 16:25
Откуда: СПб, Екатеринбург
Контактная информация:

Re: MSA 2312SA

Сообщение

Stranger03 » 12 окт 2016, 09:17

buro писал(а):Что имеем на сегодняшний день.

Немного погуглив, есть два варианта:
1. У вас софтовый сбой прошивок обоих контроллеров. Надо попробовать по очереди по одному пробовать запустить, через CLI попытаться сделать factory reset и прошить контроллер. Если помогло, устанавливать второй. Если эта процедура не помогла, то:
2. Есть вероятность сбойного бекплейна и в этом случае только платный ремонт


buro

Advanced member
Сообщения: 88
Зарегистрирован: 13 окт 2014, 18:17
Откуда: Киев

Re: MSA 2312SA

Сообщение

buro » 12 окт 2016, 09:29

Stranger03 писал(а):

buro писал(а):Что имеем на сегодняшний день.

Немного погуглив, есть два варианта:
1. У вас софтовый сбой прошивок обоих контроллеров. Надо попробовать по очереди по одному пробовать запустить, через CLI попытаться сделать factory reset и прошить контроллер. Если помогло, устанавливать второй. Если эта процедура не помогла, то:
2. Есть вероятность сбойного бекплейна и в этом случае только платный ремонт

Спасибо.
а не подскажете как сделать «factory reset», т.к. через CLI не могу ничего сделать..


Аватара пользователя

Stranger03

Сотрудник Тринити
Сотрудник Тринити
Сообщения: 12979
Зарегистрирован: 14 ноя 2003, 16:25
Откуда: СПб, Екатеринбург
Контактная информация:

Re: MSA 2312SA

Сообщение

Stranger03 » 12 окт 2016, 09:45

buro писал(а):а не подскажете как сделать «factory reset», т.к. через CLI не могу ничего сделать..

К нам через некоторое время может придет инженер по ХП, а пока сорри, я в них слабо разбираюсь, разве что гугл, :wink: . Погуглите, :wink:


buro

Advanced member
Сообщения: 88
Зарегистрирован: 13 окт 2014, 18:17
Откуда: Киев

Re: MSA 2312SA

Сообщение

buro » 13 окт 2016, 12:01

Вставил контроллер в живую полку с живым вторым контроллером, зашел на CLI, сделал «restore defaults factory», перегрузил контроллер.
имеем дефолтный IP 10.0.0.2
Перенес контроллер в родную полку. При загрузке получил опять старый IP(это получается еще и Мидплайн хранит какие-то настройки?) и не отвечающий контроллер на команды через CLI.

Второй контроллер на котором мигает светодиод «!» — сброс и т.п. не работает.


Аватара пользователя

Stranger03

Сотрудник Тринити
Сотрудник Тринити
Сообщения: 12979
Зарегистрирован: 14 ноя 2003, 16:25
Откуда: СПб, Екатеринбург
Контактная информация:

Re: MSA 2312SA

Сообщение

Stranger03 » 13 окт 2016, 12:17

buro писал(а):Вставил контроллер в живую полку с живым вторым контроллером, зашел на CLI, сделал «restore defaults factory», перегрузил контроллер.
имеем дефолтный IP 10.0.0.2

Дык вы попробуйте с одним живым контроллером стартануть, с фактори ресет. Там скорей всего в контроллерах в прошивках чехарда, друг друга мОчат. Берете контроллер, фактори ресет, с ним и пробуйте оживить, его перешить и как система начнет оживать, на ходу вставляйте второй полудохлый. Авось он подхватит прошивку первого контроллера и начнет сам себя прошивать.


buro

Advanced member
Сообщения: 88
Зарегистрирован: 13 окт 2014, 18:17
Откуда: Киев

Re: MSA 2312SA

Сообщение

buro » 13 окт 2016, 12:22

Так вот с одним и пробую стартануть, но при загрузке вижу новый-старый IP и далее на команды не отвечает (висит как бы)
——————————————
Copyright (c) 2006
BUILD TIME: Oct 2 2013 15:31:45
——————————————
IP Address = 010.100.001.201
IP Mask = 255.255.255.000
Gateway = 010.100.001.249
MAC = 00:c0:ff:10:70:e7
HP StorageWorks MSA2312sa
System Name:
System Location:
Version: W441R57

И больше ни на что не отвечает
но самое интересное, что прошивка остается самой последней, не сбрасывается. M114P01-01


Аватара пользователя

Stranger03

Сотрудник Тринити
Сотрудник Тринити
Сообщения: 12979
Зарегистрирован: 14 ноя 2003, 16:25
Откуда: СПб, Екатеринбург
Контактная информация:

Re: MSA 2312SA

Сообщение

Stranger03 » 13 окт 2016, 13:26

buro писал(а):Так вот с одним и пробую стартануть, но при загрузке вижу новый-старый IP и далее на команды не отвечает (висит как бы)

Тогда есть вероятность, что сдох бекплейн, а это увы, мертво. Он один и кроме его замены ничего не сделать.


buro

Advanced member
Сообщения: 88
Зарегистрирован: 13 окт 2014, 18:17
Откуда: Киев

Re: MSA 2312SA

Сообщение

buro » 18 окт 2016, 11:09

Stranger03 писал(а):

buro писал(а):Так вот с одним и пробую стартануть, но при загрузке вижу новый-старый IP и далее на команды не отвечает (висит как бы)

Тогда есть вероятность, что сдох бекплейн, а это увы, мертво. Он один и кроме его замены ничего не сделать.

а еще такой вопрос, а может есть вариант сбросить на дефолт Midplane СХД?


Аватара пользователя

Stranger03

Сотрудник Тринити
Сотрудник Тринити
Сообщения: 12979
Зарегистрирован: 14 ноя 2003, 16:25
Откуда: СПб, Екатеринбург
Контактная информация:

Re: MSA 2312SA

Сообщение

Stranger03 » 18 окт 2016, 13:23

buro писал(а):а еще такой вопрос, а может есть вариант сбросить на дефолт Midplane СХД?

По крайней мере я не знаю, думаю что нет. Это все-таки плата с дорожками, там нет «активной» электроники.


Вернуться в «Массивы — Технические вопросы, решение проблем.»


Перейти

  • Серверы
  • ↳   Серверы — Конфигурирование
  • ↳   Конфигурации сервера для 1С
  • ↳   Серверы — Решение проблем
  • ↳   Серверы — ПО, Unix подобные системы
  • ↳   Серверы — ПО, Windows система, приложения.
  • ↳   Серверы — ПО, Базы Данных и их использование
  • ↳   Серверы — FAQ
  • Дисковые массивы, RAID, SCSI, SAS, SATA, FC
  • ↳   Массивы — RAID технологии.
  • ↳   Массивы — Технические вопросы, решение проблем.
  • ↳   Массивы — FAQ
  • Майнинг, плоттинг, фарминг (Добыча криптовалют)
  • ↳   Proof Of Work
  • ↳   Proof Of Space
  • Кластеры — вычислительные и отказоустойчивые ( SMP, vSMP, NUMA, GRID , NAS, SAN)
  • ↳   Кластеры, Аппаратная часть
  • ↳   Deep Learning и AI
  • ↳   Кластеры, Программное обеспечение
  • ↳   Кластеры, параллельные файловые системы
  • Медиа технологии, и цифровое ТВ, IPTV, DVB
  • ↳   Станции видеомонтажа, графические системы, рендеринг.
  • ↳   Видеонаблюдение
  • ↳   Компоненты Digital TV решений
  • ↳   Студийные системы, производство ТВ, Кино и рекламы
  • Инфраструктурное ПО и его лицензирование
  • ↳   Виртуализация
  • ↳   Облачные технологии
  • ↳   Резервное копирования / Защита / Сохранение данных
  • Сетевые решения
  • ↳   Сети — Вопросы конфигурирования сети
  • ↳   Сети — Технические вопросы, решение проблем
  • Общие вопросы
  • ↳   Обсуждение общих вопросов
  • ↳   Приколы нашего IT городка
  • ↳   Регистрация на форуме

#21

Ставр

  • Supermoderator
  • Cообщений: 20 563
  • Регистрация: 16-11-2004

Отправлено 27 Июнь 2014 — 14:12

RusKG

на хранилище где то 2 tb инфа. Девать их не куда

начните с бэкапа!

  • Наверх


#22

RusKG

RusKG

  • На удаление
  • Cообщений: 180
  • Регистрация: 05-10-2009

Отправлено 03 Июль 2014 — 18:52

добрый день всем.
ESXi я поставил нормально, а СХД подключил как? С перва обратно то что было полностью удалил и заново создал, вот только тогда у меня ESXi начал видит их и я подключил все нормально.

Потом я конкретно тупанул!!!
горячую я понимал так, в рабочем ввиде я вытащил один жесткий диск для проверки. Потом вышло ошибка. А потом я СХД подкл хабу (почему долго обяснить) и он получил ip по dhcp/ Я не могу подключиться к СХД а потом через ipscan нашел его. Но когда подключаешься с браузера вот такая фигня не пропусакает. Warning
The system is currently unavailable

Потом решил сбросить все это.
подключился через кабель CLI. и запустил прогу HyperTerminal, и набрал restore defaults factory пишет такая ошибка Waiting for the other Management Controller to restore defaults …
Error: Communications error with the Storage Controller

Теперь у меня такой вопрос как еще сбросить СХД и как посмотреть логи через HyperTerminal. Через какую команду??

Прикрепленные изображения

  • SHD.png

Сообщение отредактировал RusKG: 03 Июль 2014 — 18:54

  • Наверх


#23

Desenchante

Отправлено 24 Июль 2014 — 14:23

RusKG

горячую я понимал так, в рабочем ввиде я вытащил один жесткий диск для проверки. Потом вышло ошибка. А потом я СХД подкл хабу (почему долго обяснить) и он получил ip по dhcp

Ты б еще молотком постучал для профилактики. Вдруг поможет.

Сообщение отредактировал Desenchante: 24 Июль 2014 — 14:23

  • Наверх


#24

eRIC

eRIC

  • Модераторы
  • Cообщений: 1 719
  • Регистрация: 18-11-2004

Отправлено 24 Июль 2014 — 23:28

попробуйте выключить массив, обесточив массив вытащите контроллер и обратно вставьте контроллер и потом включайте массив. далее все интересное в CLI :)

Сообщение отредактировал eRIC: 24 Июль 2014 — 23:28

  • Наверх


#25

zikoo

zikoo

  • Человеки
  • Cообщений: 1 874
  • Регистрация: 06-09-2011
  • Предупреждения: 0

Отправлено 08 Август 2015 — 18:59

Хотел спросить про кластер из серверов
нужна отказоустойчивость
Кластер создается средствами гипервизоров или нужно в гостевых ОС (на которых крутятся сервера) тоже настраивать что то?
т.е если такая возможность:
допустим есть сервер X который вращается на гипервизоре 1
можно ли на гипервизоре 2 (средствами гипервизора, минимально ковыряя гостевую ОС/т.е. сервер X) добиться того, что в случае выхода из строя железки гипервизора 1/или железки на котором крутится гипервизор, сервер X продолжил работать на гипервизоре 2
Извиняюсь что может вопрос поставлен некорректно, но думаю основную мысль объяснил :)

Сообщение отредактировал zikoo: 08 Август 2015 — 19:08

  • Наверх


#26


Tuxper2

Tuxper2

  • Заблокированные
  • Cообщений: 4 166
  • Регистрация: 25-06-2013
  • Предупреждения: 5

Отправлено 09 Август 2015 — 02:12

zikoo
Если у вас есть vcenter то создаете HA кластер из хостов esxi. В случае поломки одного из esxi, vcenter перезапустит виртуалки на других нодах кластера, если правильно настроите все.

  • Наверх


#27

*DefendeR*

Отправлено 09 Август 2015 — 10:28

Хотел спросить про кластер из серверов
нужна отказоустойчивость
Кластер создается средствами гипервизоров или нужно в гостевых ОС (на которых крутятся сервера) тоже настраивать что то?
т.е если такая возможность:
допустим есть сервер X который вращается на гипервизоре 1
можно ли на гипервизоре 2 (средствами гипервизора, минимально ковыряя гостевую ОС/т.е. сервер X) добиться того, что в случае выхода из строя железки гипервизора 1/или железки на котором крутится гипервизор, сервер X продолжил работать на гипервизоре 2
Извиняюсь что может вопрос поставлен некорректно, но думаю основную мысль объяснил :)

Proxmox cluster + DRBD

  • Наверх


#28

zikoo

zikoo

  • Человеки
  • Cообщений: 1 874
  • Регистрация: 06-09-2011
  • Предупреждения: 0

Отправлено 09 Август 2015 — 11:02

Tuxper2
*DefendeR*
Комрады спасибо за мысли
А хранилище лучше сделать локальными (ну т.е. у каждой железки на которых крутятся гипервизоры свои HDD или покупать отдельное дисковое хранилище?)
тогда стоит вопрос при использованиии хранилища: как добиться их отказоустойчивости :)

  • Наверх


#29


Tuxper2

Tuxper2

  • Заблокированные
  • Cообщений: 4 166
  • Регистрация: 25-06-2013
  • Предупреждения: 5

Отправлено 10 Август 2015 — 21:54

zikoo
В этом случае смотрите VSA (virtual storage appliance). А лучше найти схд нормальную.

  • Наверх


#30

Alexei aka HappyAlex

Alexei aka HappyAlex

  • Модераторы
  • Cообщений: 27 153
  • Регистрация: 24-08-2004

Отправлено 14 Август 2015 — 11:41

Tuxper2
ну можно запилить VSAN :)

  • Наверх


#31

zikoo

zikoo

  • Человеки
  • Cообщений: 1 874
  • Регистрация: 06-09-2011
  • Предупреждения: 0

Отправлено 14 Август 2015 — 17:03

Tuxper2
Alexei aka HappyAlex
Можно поподробнее?

  • Наверх


#32

Alexei aka HappyAlex

Alexei aka HappyAlex

  • Модераторы
  • Cообщений: 27 153
  • Регистрация: 24-08-2004

Отправлено 14 Август 2015 — 17:59

zikoo
берете 3 сервера
2 свитча
3 диска SSD
6 дисков SAS
покупаете лиц. на vmware+vsan

объединяете все это дело в виртуальный ДЦ

при использовании VSAN диски объдиняются в один пул .. и при выпадении одного сервера — вирт. просто качует на другой сервер.. пока вы делаете ремонт выпавшего сервера…

Сообщение отредактировал Alexei aka HappyAlex: 14 Август 2015 — 17:59

  • Наверх



Posted by spicehead-8914v 2019-03-29T13:09:46Z

Hi All, I have a site with two HP Storage Works MSA2000 SAN’s. Both SAN’s on GUI, SSH and Telnet display the same error «Unable to communicate with storage controller» 

Tried reseating controllers on both SANS, rebooting the SANS and reboots the MC with still the same error.

We are looking to wipe these and use them as sandbox/testing.

Anyone else had this issue and know of a fix?

2 Replies

  • Tagging HP for viability here


    1 found this helpful
    thumb_up
    thumb_down

  • Are the controllers running the same FW version?  I know we had that issue many times before I junked the SANs, always HP pointed at a FW mismatch.


    Was this post helpful?
    thumb_up
    thumb_down

Read these next…

  • Curated Green Brand Rep Wrap-Up: January 2023

    Green Brand Rep Wrap-Up: January 2023

    Spiceworks Originals

    Hi, y’all — Chad here. A while back, we used to feature the top posts from our brand reps (aka “Green Gals/Guys/et. al.) in a weekly or monthly wrap-up post. I can’t specifically recall which, as that was approximately eleven timelines ago. Luckily, our t…

  • Curated Help with domain controller setup

    Help with domain controller setup

    Windows

    I just got a new job as the only IT person for a business with around 270 employees (I would say probably less than half use computers) They don’t have any policies or procedures when it comes to IT, as they have never had an IT person. My background cons…

  • Curated Malicious URLs

    Malicious URLs

    Security

    We have firewall, we have endpoint protection, we have Safe links and Attachments for Office 365 (Microsoft Defense for Office 365 Plan 1), and still receiving links that lead to malicious web sites.It seems like security companies still didn’t develop a …

  • Curated Snap! -- Old Batteries, Lovable Bots, Quantum Breakthrough, Should We Trust AI?

    Snap! — Old Batteries, Lovable Bots, Quantum Breakthrough, Should We Trust AI?

    Spiceworks Originals

    Your daily dose of tech news, in brief.

    Welcome to the Snap!

    Flashback: February 8, 1996: The massive Internet collaboration “24 Hours in Cyberspace” takes place (Read more HERE.)

    Bonus Flashback: February 8, 1974: Americans end outer spa…

  • Curated Large collection of Mac Minis

    Large collection of Mac Minis

    Best Practices & General IT

    We are getting rid of a lot of older equipment that doesn’t have a purpose anymore on our campus. Most of it is 2010 and 2014 Mac Minis. When they were purchased, they were the absolute base model, so nothing special about them. I’ve reached out to multip…

In this post we are going to look at basic troubleshooting steps a System Administrator can carry out to identify and potentially resolve a faulty storage controller in a HP MSA 2040.

This post is going to focus more on dual controller setups, with both a Controller A and a Controller B. In my case the faulty controller was brought to my attention by the health ember light on the front of the SAN and using the methods displayed in this article. I was able to gather information and ultimately bring the controller back up.

First step is to log into the SMU (Storage Management Utility)

Using the SMU you can use the event logs as a first step to identifying the error/problem

In my case I was getting the error below:

Error: Critical Error Fault Type: NMI p1: 0x037454E, p2: 0x0000000 ….. No Cur Thread

SMU Controller

Overall there wasn’t a lot of information to go off aside from confirmation that a serious error had occurred on Controller A and as a result Controller B had initiated the failover process.

Controller A was still powered on and ping able, plus I could get to it’s SMU. However, when I tried to log in it would either give an unsuccessful login or state that the controller was ‘initializing’.

Get more information using CLI

A good way to gather more information is to telnet or ssh into the working controller’s command line interface (CLI). An easy method to do this is to use a piece of software called Putty.

Just enter the controllers IP and press Open

Putty

When prompted put in your credentials (the one you would use to log into the SMU)

If a putty connection over your network isn’t working, try the below method to connect via USB.

Connecting to MSA CLI via Serial Connecton using USB to Mini USB

-Install driver for the USB connection. (http://h20564.www2.hp.com/hpsc/swd/public/detail?swItemId=MTX_8de10954d645450f9e3c0d015d )

-Check COM port on Device Manager from Windows.

USB device manager

-Connect USB to the Mini USB port of the MSA

-Use Putty, or Hyperterminal.

putty USB

-Hit enter to connect to the MSA.

Useful CLI Commands

Get information about both controllers

show controllers

Show recent controller events (can copy output into notepad)

show events

In my case the show controllers command displayed a health note advising me to restart the problem controller A.

Health Reason: The controller is not Healthy

Recommendation: – Restart the Storage Controller in this controller module, unless it is performing an operation where it is normal for it to be shut down, such as firmware update.

For all CLI commands, you can refer to HP’s CLI Reference Guide –

Click to access 723979-001.pdf

Re-Seating the Problem Controller

Ultimately I chose to reseat the problem controller in an attempt to prompt a clean restart. Essentially what this involves is unscrewing the two pins holding the problem controller, followed by pulling it slightly out of the controller slot and putting it back in.

Although doing this caused the controller to show no health issues when re-running the #show controller command on the working Controller (B). I had no connectivity to the problem controller (A).

Progress! At least at this point I didn’t have any errors in my working controller’s (B) SMU and nothing was showing as unhealthy in the CLI.

Restarting the Controller

All had to do at this point was restart the problem controller (A) via CLI. It came back up healthy with network connectivity.

restart mc a

Additional info concerning CLI restarting: Syntax restart sc|mc a|b|both
Parameters sc|mc
The controller to restart:
– sc: Storage Controller
– mc: Management Controller
a|b|both

I could now connect using putty to Controller A, I could login into it’s SMU and the ember lights had cleared from the physical panel on the SAN. Happy days!

**If a simple reseat, restart, or cold reboot of the problem controller still fails to resolve your issue or you want to try and gather more information as to the preceding events leading up to the error. There is the option to generate detailed logs via CLI.

Get more detailed logs via CLI

  • Enter the command in CLI “#show protocols”
    • If FTP is not Enabled –
    • Enter the command: “#set protocols ftp enabled”
    • Exit the telnet session, putty etc.
    • Once FTP is enabled:
      • Go to Start – Command Prompt – enter “cd ../..” (Go to C:/ Drive or the location where you want to save the logs)
      • Type “#ftp ”
      • Then type the user name and password of controller to authenticate.
    • Use the command “#get logs filename.zip”

Once you have your logs, you can use Trace32 to view the logs. (Can use notepad but it’s very difficult to view due to a lack of formatting.

You can download Trace32 here:

Download Microsoft Config Manager tools and choose to only install Common Tools in the wizard.

https://www.microsoft.com/en-us/download/details.aspx?id=9257

configmgr

Tracer32

Hope this was helpful!


Thanks for reading – feel free to follow and stay updated 🙂 View sysadminguides’s profile on Facebook View GuidesSysadmin’s profile on Twitter View 115372466162675927272’s profile on Google+

Понравилась статья? Поделить с друзьями:
  • Communication error майнкрафт
  • Communication error лазерный станок
  • Communication error как переводится
  • Communication error вася диагност решение ошибки
  • Communication error you may have selected the wrong printer