Post error 1787 drive array operating in interim recovery mode

Compaq PROLIANT ML370 User Manual Page 75 Diagnostics and Troubleshooting 3-21 Table 3-2POST Error Messages Probable Source of Problem 1786 — Drive ArrayRecovery Needed.The following SCSIdrive(s) needAutomatic DataRecovery: SCSIport 1: SCSI ID 0. Select F1 to continuewith recovery of datato drive. Select F2 tocontinue withoutrecovery of data todrive. Slot 1 Drive ArrayRecovery needed.Automatic DataRecovery […]


Клуб сертифицированных специалистов

Проблема с RAID

Проблема с RAID

Добрый день. Есть HP ProLiant ML 350 G3, установлены 4 жестких диска в RAID 5. Работает все под Server 2003, ещё на нем установлен Exchange, вечером перестала работать почта, удаленно из дома не смог ничего сделать, сервер только пинговался других признаков жизни не подавал, через 4 часа всё опять заработало, залез в логи. Всё началось с такой ошибки:

Event Type: Error
Event Source: cpqcissm
Event Category: None
Event ID: 9
Date: 28.05.2007
Time: 18:56:29
User: N/A
Computer: EXCHANGE
The device, DeviceScsicpqcissm1, did not respond within the timeout period.

Далее ещё много таких ошибок, потом варнинг:

Event Type: Warning
Event Source: CPQCISSE
Event Category: None
Event ID: 24683
Date: 28.05.2007
Time: 19:10:04
User: N/A
Computer: EXCHANGE
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller in slot 2.
This may result in a «downshift» in transfer rate for one or more hard drives on the bus.

Всё закончилось ошибкой:

Event Type: Error
Event Source: CPQCISSE
Event Category: None
Event ID: 24597
Date: 28.05.2007
Time: 21:44:55
User: N/A
Computer: EXCHANGE
Physical Drive on DEVICE ID 1 on Port 1 of
Array Controller in slot 2 has failed.
Failure Code: 0x05.

После чего написал сообщение:

Event Type: Information
Event Source: CPQCISSE
Event Category: None
Event ID: 24598
Date: 28.05.2007
Time: 21:44:55
User: N/A
Computer: EXCHANGE
Logical Drive 1 of
Array Controller in slot 2 has changed from status code 0 to status code 3.
Status Codes:
0 = OK

Возникли вопросы:
Если что-то произошло с диском, то почему сервер был в отключке 4 часа, ведь он должен был продолжать работать?
И что делать сейчас, что он сейчас делает, когда диск в INTERIM RECOVERY MODE?
Индикатор этого жесткого диска горит красным.


Post error 1787 drive array operating in interim recovery mode

. The following SCSI drive(s) need Automatic Data Recovery: SCSI Port Y: SCSI ID Z

Select F1 to retry Automatic Data Recovery to drive. Select F2 to continue without starting Automatic Data Recovery.

Audible Beeps: None

Possible Cause: System is in Interim Data Recovery Mode and a failed or replacement drive has not yet been rebuilt. This message is displayed if the F2 key was pressed during a previous boot or if the F1 key was pressed during a previous boot and the system rebooted before the rebuild of the drive completed.

  • Perform one of the suggested actions:
    • Press the F1 key to retry Automatic Data Recovery to the drive. Data will be automatically restored to drive X when a failed drive has been replaced, or to the original drive if it is working again without errors.
    • Press the F2 key to continue without recovery of data to the drive. The failed drive will not be rebuilt and the system will continue to operate in a failed state of Interim Data Recovery Mode.
  • If drive recovery is not successful, run ADU for more information.
    • If the replacement drive failed, replace with another drive.
    • If the rebuild was aborted due to a read error from another physical drive in the array, back up all readable data on the array, run ADU, and then restore the data.


Восстанавливам RAID при помощи hpacucli

Комрады требуется помощь. никогда ранее не использовал хардварный RAID, а теперь приходиться его реанимировать. но всё по порядку.
Имеем сервант Debian c контролером Smart Array P400, RAID 1+0 из 5дисков (1 spare).
Вылетели 2 диска. Приобрелись 2 диска Seagate но без лейбы HP (поставщик утверждал, что будут работать така как c ES серия).
Физически новые диски были подключены. Дальше же начались танцы с бубном. Узнал я, что существует такая утилита как «hpacucli» и давай читать маны про неё. Но вопросов меньше не становится.

# hpacucli ctrl all show config

Smart Array P400 in Slot 6
array A (SATA, Unused Space: 0 MB)
logicaldrive 1 (1.8 TB, RAID 1+0, Interim Recovery Mode)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 0 MB, Failed)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 0 MB, Failed)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 1TB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 1TB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 1TB, OK, active spare)
Видно, что 2 диска Failed. Идём дальше.

#hpacucli ctrl all show config detail

. блаблабла.
physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: Failed
Drive Type: Data Drive
Interface Type: SATA
Size: 0 MB
SATA NCQ Capable: False
PHY Count: 1
PHY Transfer Rate: 1.5GBPS

Почему не определился хард? Котролер его не распознал? или как?
Вообщем пока дальше я ничего не делал, ибо машина удаленная с живой (пока) системой.
Очень надеюсь на то, что кто то объяснит мне порядок замены мёртвых дисков.


Array status still says interim recovery

Starting a little over 3 weeks ago a server started having issues with one of the 2 arrays. This troublesome array is a raid 10, has 2 physical hdd’s and only holds the OS for the server. Randomly the array would start to rebuild. Not knowing much about array controllers I reached out to the company we bought the server from and have a warranty with to try and figure out what was going on and what needed to be done.

A couple mornings later one of the hard drives in this array failed. I hot swapped the failed HDD, the array started to rebuild and I ordered a replacement hdd from the company. Later that evening I received an alert indicating the physical server was offline. Since we have been having some issues with this server and I was going on vacation in a few days I went back to the office to figure out what was wrong. It ended up being that the array «broke» and the server would no longer boot to the OS.

Again I don’t know much about arrays and the following morning I again reached out to the company for help with fixing the array. Ended up deleting the broken array, recreating it and reloading the OS. I’m happy, all my users are happy, I go and enjoy my vacation, and I have multiple backups created while I’m gone.

I come back from vacation and re-enable the HP software alerts. After a couple days I start getting notifications that the same array is rebuilding and this happens at least once a day. The only difference is that only 1 of the 2 drives is mentioned instead of both. I’m still waiting for my replacement drive from the company so I contact them and ask if there is a way to speed up the process. No luck.

I received the replacement drive yesterday and replaced the hdd that is implicated in theses alert messages. However, since replacing the drive the array has had a status of «interim recover» and does not indicate any rebuilding.

Any recommendations on what to do to get the array to rebuild without having to recreate the array again?


Error messages  121 

