Post error 1787 drive array operating in interim recovery mode

Содержание

Compaq PROLIANT ML370 User Manual
Page 75
Certification.ru
Проблема с RAID
Проблема с RAID
Post error 1787 drive array operating in interim recovery mode
Восстанавливам RAID при помощи hpacucli
Array status still says interim recovery

Compaq PROLIANT ML370 User Manual

Page 75

Diagnostics and Troubleshooting 3-21

Table 3-2
POST Error Messages

Probable Source of Problem

1786 — Drive Array
Recovery Needed.
The following SCSI
drive(s) need
Automatic Data
Recovery: SCSI
port 1: SCSI ID 0.

Select F1 to continue
with recovery of data
to drive. Select F2 to
continue without
recovery of data to
drive.

Slot 1 Drive Array
Recovery needed.
Automatic Data
Recovery previously
Aborted! The
following SCSI
drive(s) need
Automatic Data
Recovery: SCSI
port 1: SCSI ID 0.

Select F1 to retry
Automatic Data
Recovery to drive.
Select F2 to continue
without starting
Automatic data
Recovery data to
drive.

System is in Interim Data
Recovery mode. Data has not yet
been recovered.

Press F1 to allow Automatic
Data Recovery to begin. Data
will automatically be restored to
drive X now that the drive has
been replaced or now seems to
be working.

Press F2 and the system will
continue to operate in the
interim Data Recovery mode.

The “previously aborted”
version of the 1786 POST
message will appear if the
previous rebuild attempt was
aborted for any reason. Run
Drive Array Advanced
Diagnostics (ADU) for more
information. If the replacement
drive was failed, try using
another replacement drive. If
rebuild was aborted due to a
read error from another physical
drive in the array, back up all
readable data on the array, run
Diagnostics Surface Analysis,
and then restore your data.

1787 — Slot x Drive
Array Operating in
Interim Recovery
Mode. The following
SCSI drive(s) should
be replaced: SCSI
port (y): SCSI ID (x)

Hard drive X failed or a cable is
loose or defective. Following a
system restart, this message
reminds you that drive X is
defective and fault tolerance is
being used.

Источник

Certification.ru

Клуб сертифицированных специалистов

Проблема с RAID

Добрый день. Есть HP ProLiant ML 350 G3, установлены 4 жестких диска в RAID 5. Работает все под Server 2003, ещё на нем установлен Exchange, вечером перестала работать почта, удаленно из дома не смог ничего сделать, сервер только пинговался других признаков жизни не подавал, через 4 часа всё опять заработало, залез в логи. Всё началось с такой ошибки:

Event Type: Error
Event Source: cpqcissm
Event Category: None
Event ID: 9
Date: 28.05.2007
Time: 18:56:29
User: N/A
Computer: EXCHANGE
Description:
The device, DeviceScsicpqcissm1, did not respond within the timeout period.

Далее ещё много таких ошибок, потом варнинг:

Event Type: Warning
Event Source: CPQCISSE
Event Category: None
Event ID: 24683
Date: 28.05.2007
Time: 19:10:04
User: N/A
Computer: EXCHANGE
Description:
SCSI bus fault occurred on Storage Box box 0, , Port 0 of
Array Controller in slot 2.
This may result in a «downshift» in transfer rate for one or more hard drives on the bus.

Всё закончилось ошибкой:

Event Type: Error
Event Source: CPQCISSE
Event Category: None
Event ID: 24597
Date: 28.05.2007
Time: 21:44:55
User: N/A
Computer: EXCHANGE
Description:
Physical Drive on DEVICE ID 1 on Port 1 of
Array Controller in slot 2 has failed.
Failure Code: 0x05.

После чего написал сообщение:

Event Type: Information
Event Source: CPQCISSE
Event Category: None
Event ID: 24598
Date: 28.05.2007
Time: 21:44:55
User: N/A
Computer: EXCHANGE
Description:
Logical Drive 1 of
Array Controller in slot 2 has changed from status code 0 to status code 3.
Status Codes:
0 = OK
1 = FAILED
2 = NOT CONFIGURED
3 = INTERIM RECOVERY MODE
4 = READY FOR RECOVERY
5 = RECOVERING
6 = WRONG PHYSICAL DRIVE REPLACED
7 = PHYSICAL DRIVE NOT PROPERLY CONNECTED
8 = HARDWARE IS OVERHEATING
9 = HARDWARE HAS OVERHEATED
10 = EXPANDING
11 = NOT YET AVAILABLE
12 = QUEUED FOR EXPANSION
13 = UNKNOWN

Возникли вопросы:
Если что-то произошло с диском, то почему сервер был в отключке 4 часа, ведь он должен был продолжать работать?
И что делать сейчас, что он сейчас делает, когда диск в INTERIM RECOVERY MODE?
Индикатор этого жесткого диска горит красным.

Источник

. The following SCSI drive(s) need Automatic Data Recovery: SCSI Port Y: SCSI ID Z

Select F1 to retry Automatic Data Recovery to drive. Select F2 to continue without starting Automatic Data Recovery.

Audible Beeps: None

Possible Cause: System is in Interim Data Recovery Mode and a failed or replacement drive has not yet been rebuilt. This message is displayed if the F2 key was pressed during a previous boot or if the F1 key was pressed during a previous boot and the system rebooted before the rebuild of the drive completed.

Perform one of the suggested actions:
- Press the F1 key to retry Automatic Data Recovery to the drive. Data will be automatically restored to drive X when a failed drive has been replaced, or to the original drive if it is working again without errors.
- Press the F2 key to continue without recovery of data to the drive. The failed drive will not be rebuilt and the system will continue to operate in a failed state of Interim Data Recovery Mode.
If drive recovery is not successful, run ADU for more information.
- If the replacement drive failed, replace with another drive.
- If the rebuild was aborted due to a read error from another physical drive in the array, back up all readable data on the array, run ADU, and then restore the data.

Источник

Восстанавливам RAID при помощи hpacucli

Комрады требуется помощь. никогда ранее не использовал хардварный RAID, а теперь приходиться его реанимировать. но всё по порядку.
Имеем сервант Debian c контролером Smart Array P400, RAID 1+0 из 5дисков (1 spare).
Вылетели 2 диска. Приобрелись 2 диска Seagate но без лейбы HP (поставщик утверждал, что будут работать така как c ES серия).
Физически новые диски были подключены. Дальше же начались танцы с бубном. Узнал я, что существует такая утилита как «hpacucli» и давай читать маны про неё. Но вопросов меньше не становится.

# hpacucli ctrl all show config

Smart Array P400 in Slot 6
array A (SATA, Unused Space: 0 MB)
logicaldrive 1 (1.8 TB, RAID 1+0, Interim Recovery Mode)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 0 MB, Failed)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 0 MB, Failed)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 1TB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 1TB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 1TB, OK, active spare)
Видно, что 2 диска Failed. Идём дальше.

#hpacucli ctrl all show config detail

. блаблабла.
physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: Failed
Drive Type: Data Drive
Interface Type: SATA
Size: 0 MB
Serial Number: INQUIRY FAILED
SATA NCQ Capable: False
PHY Count: 1
PHY Transfer Rate: 1.5GBPS

Почему не определился хард? Котролер его не распознал? или как?
Вообщем пока дальше я ничего не делал, ибо машина удаленная с живой (пока) системой.
Очень надеюсь на то, что кто то объяснит мне порядок замены мёртвых дисков.

Источник

Array status still says interim recovery

Starting a little over 3 weeks ago a server started having issues with one of the 2 arrays. This troublesome array is a raid 10, has 2 physical hdd’s and only holds the OS for the server. Randomly the array would start to rebuild. Not knowing much about array controllers I reached out to the company we bought the server from and have a warranty with to try and figure out what was going on and what needed to be done.

A couple mornings later one of the hard drives in this array failed. I hot swapped the failed HDD, the array started to rebuild and I ordered a replacement hdd from the company. Later that evening I received an alert indicating the physical server was offline. Since we have been having some issues with this server and I was going on vacation in a few days I went back to the office to figure out what was wrong. It ended up being that the array «broke» and the server would no longer boot to the OS.

Again I don’t know much about arrays and the following morning I again reached out to the company for help with fixing the array. Ended up deleting the broken array, recreating it and reloading the OS. I’m happy, all my users are happy, I go and enjoy my vacation, and I have multiple backups created while I’m gone.

I come back from vacation and re-enable the HP software alerts. After a couple days I start getting notifications that the same array is rebuilding and this happens at least once a day. The only difference is that only 1 of the 2 drives is mentioned instead of both. I’m still waiting for my replacement drive from the company so I contact them and ask if there is a way to speed up the process. No luck.

I received the replacement drive yesterday and replaced the hdd that is implicated in theses alert messages. However, since replacing the drive the array has had a status of «interim recover» and does not indicate any rebuilding.

Any recommendations on what to do to get the array to rebuild without having to recreate the array again?

Источник

1786-Slot 1 Drive Array Recovery Needed. Automatic Data Recovery Previously Aborted!…

…The following SCSI drive(s) need Automatic Data Recovery: SCSI Port Y: SCSI ID Z

Select F1 to retry Automatic Data Recovery to drive. Select F2 to continue without starting Automatic Data Recovery.

Audible Beeps: None

Possible Cause: System is in Interim Data Recovery Mode and a failed or replacement drive has not yet

been rebuilt. This message is displayed if the F2 key was pressed during a previous boot or if the F1 key

was pressed during a previous boot and the system rebooted before the rebuild of the drive completed.

Action:

•

Perform one of the suggested actions:

•

Press the F1 key to retry Automatic Data Recovery to the drive. Data will be automatically

restored to drive X when a failed drive has been replaced, or to the original drive if it is

working again without errors.

•

Press the F2 key to continue without recovery of data to the drive. The failed drive will not be

rebuilt and the system will continue to operate in a failed state of Interim Data Recovery Mode.

•

If drive recovery is not successful, run ADU

information.

•

If the replacement drive failed, replace with another drive.

•

If the rebuild was aborted due to a read error from another physical drive in the array, back up

all readable data on the array, run ADU, and then restore the data.

…Physical drive replacement needed: Drive X

Audible Beeps: None

Possible Cause: Hard drive X failed or cable is loose or defective. Following a system restart, this

message notes that drive X is defective and fault tolerance is being used.

Action:

Be sure all cables are connected properly and securely.

Test and replace defective cables.

Replace drive X. (depending on the fault-tolerance level, all data may be lost if another drive fails).

1788-Slot X Drive Array Reports Incorrect Drive Replacement…

…The following SCSI drive(s) should have been replaced: SCSI Port Y: SCSI ID Z.

The following SCSI drive(s) were incorrectly replaced: SCSI Port y: SCSI ID z.

Select F1 to continue – drive array will remain disabled.

Select F2 to reset configuration – all data will be lost.

Audible Beeps: None

Possible Cause:

•

Replacement drives may have been installed in the wrong drive bays.

•

A bad power cable connection to the drive, noise on the data cable, or defective SCSI cable exists.

Action:

•

If replacement drives are installed in the wrong bays, properly reinstall the drives as indicated and:

•

Press the F1 key to restart the server with the drive array disabled.

•

Press the F2 key to use the drives as configured and lose all the data on them.

•

If a bad power cable connection exists:

(«Array Diagnostic

Utility» on page 56) for more

Error messages 111

Источник

Error messages 121

1786-Slot 1 Drive Array Recovery Needed. Automatic Data Recovery Previously Aborted!…

…The following SCSI drive(s) need Automatic Data Recovery: SCSI Port Y: SCSI ID Z
Select F1 to retry Automatic Data Recovery to drive. Select F2 to continue without starting Automatic Data Recovery.

Audible Beeps: None

Possible Cause: System is in Interim Data Recovery Mode and a failed or replacement drive has not yet
been rebuilt. This message is displayed if the F2 key was pressed during a previous boot or if the F1 key
was pressed during a previous boot and the system rebooted before the rebuild of the drive completed.

Action:

•

Perform one of the suggested actions:

•

Press the F1 key to retry Automatic Data Recovery to the drive. Data will be automatically
restored to drive X when a failed drive has been replaced, or to the original drive if it is working
again without errors.

•

Press the F2 key to continue without recovery of data to the drive. The failed drive will not be
rebuilt and the system will continue to operate in a failed state of Interim Data Recovery Mode.

•

If drive recovery is not successful, run ADU (»

information.

•

If the replacement drive failed, replace with another drive.

•

If the rebuild was aborted due to a read error from another physical drive in the array, back up
all readable data on the array, run ADU, and then restore the data.

1787-Drive Array Operating in Interim Recovery Mode…

…Physical drive replacement needed: Drive X

Audible Beeps: None

Possible Cause: Hard drive X failed or cable is loose or defective. Following a system restart, this
message notes that drive X is defective and fault tolerance is being used.

Action:

Be sure all cables are connected properly and securely.

Test and replace defective cables.

Replace drive X. (depending on the fault-tolerance level, all data may be lost if another drive fails).

1788-Slot X Drive Array Reports Incorrect Drive Replacement…

…The following SCSI drive(s) should have been replaced: SCSI Port Y: SCSI ID Z.
The following SCSI drive(s) were incorrectly replaced: SCSI Port y: SCSI ID z.
Select F1 to continue – drive array will remain disabled.
Select F2 to reset configuration – all data will be lost.

Audible Beeps: None

Possible Cause:

•

Replacement drives may have been installed in the wrong drive bays.

•

A bad power cable connection to the drive, noise on the data cable, or defective SCSI cable exists.

Action:

•

If replacement drives are installed in the wrong bays, properly reinstall the drives as indicated and:

•

Press the F1 key to restart the server with the drive array disabled.

•

Press the F2 key to use the drives as configured and lose all the data on them.

•

If a bad power cable connection exists:

Источник

Compaq PROLIANT ML370 User Manual

Page 75

Certification.ru

Проблема с RAID

Проблема с RAID