Error vdisk s offline see msa storage virtual disk overview panel - Исправление ошибок и поиск оптимальных решений проблем

Модераторы: Trinity admin`s, Free-lance moderator`s

buro: Advanced member; Сообщения: 88; Зарегистрирован: 13 окт 2014, 18:17; Откуда: Киев

Вылечить Vdisk

Доброго дня.
Помогите советом — как вылечить vDisk собранный на СХД HP MSA2312sa
Сейчас состояние:
Health Fault
Health Reason The virtual disk is quarantined and is offline.
Name vd01
Size 3597.0GB
Free 0B
Current Owner A
Preferred Owner A
Serial Number 00c0ff1070e70000146b9d4d00000000
RAID RAID5
Disks 7
Spares 1
Chunk Size 64k
Created 2011-04-07 07:43:16
Minimum Disk Size 599.5GB
Status QTOF
Current Job

Команда Dequarantine приводит вот к такому сообщению:
Unable to dequarantine the vdisk.
Command failed.

Принимаю любые советы.
Спасибо.

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: Вылечить Vdisk

Сообщение

Stranger03 » 01 авг 2016, 13:13

buro
По всей видимости какой-то из дисков вылетел, возможно несколько. Надо сначала с ними разобраться, а потом смотреть, можно ли вылечить вдиск.

buro: Advanced member; Сообщения: 88; Зарегистрирован: 13 окт 2014, 18:17; Откуда: Киев

Re: Вылечить Vdisk

Сообщение

buro » 15 сен 2016, 15:36

Вообщем проблема еще глубже, накрылся (возможно ) один контроллер и теперь оно все вот так и находится в плачевном состоянии..

Stranger03: Сотрудник Тринити; Сообщения: 12979; Зарегистрирован: 14 ноя 2003, 16:25; Откуда: СПб, Екатеринбург; Контактная информация:

Re: Вылечить Vdisk

Сообщение

Stranger03 » 16 сен 2016, 10:18

buro писал(а):Вообщем проблема еще глубже, накрылся (возможно ) один контроллер и теперь оно все вот так и находится в плачевном состоянии..

смотрите логи, там наверняка должны быть записи, что один из контроллеров дохлый.

buro: Advanced member; Сообщения: 88; Зарегистрирован: 13 окт 2014, 18:17; Откуда: Киев

Re: Вылечить Vdisk

Сообщение

buro » 16 сен 2016, 11:02

Вот самое прикольное, что в логах контроллера нет записей, стерлись в момент поломки, как это произошло не понятно, логи только уже после поломки.

Вернуться в «Массивы — Технические вопросы, решение проблем.»

Перейти

Серверы
↳ Серверы — Конфигурирование
↳ Конфигурации сервера для 1С
↳ Серверы — Решение проблем
↳ Серверы — ПО, Unix подобные системы
↳ Серверы — ПО, Windows система, приложения.
↳ Серверы — ПО, Базы Данных и их использование
↳ Серверы — FAQ
Дисковые массивы, RAID, SCSI, SAS, SATA, FC
↳ Массивы — RAID технологии.
↳ Массивы — Технические вопросы, решение проблем.
↳ Массивы — FAQ
Майнинг, плоттинг, фарминг (Добыча криптовалют)
↳ Proof Of Work
↳ Proof Of Space
Кластеры — вычислительные и отказоустойчивые ( SMP, vSMP, NUMA, GRID , NAS, SAN)
↳ Кластеры, Аппаратная часть
↳ Deep Learning и AI
↳ Кластеры, Программное обеспечение
↳ Кластеры, параллельные файловые системы
Медиа технологии, и цифровое ТВ, IPTV, DVB
↳ Станции видеомонтажа, графические системы, рендеринг.
↳ Видеонаблюдение
↳ Компоненты Digital TV решений
↳ Студийные системы, производство ТВ, Кино и рекламы
Инфраструктурное ПО и его лицензирование
↳ Виртуализация
↳ Облачные технологии
↳ Резервное копирования / Защита / Сохранение данных
Сетевые решения
↳ Сети — Вопросы конфигурирования сети
↳ Сети — Технические вопросы, решение проблем
Общие вопросы
↳ Обсуждение общих вопросов
↳ Приколы нашего IT городка
↳ Регистрация на форуме

Источник

Table 8 Vdisk or disk group status

Status

Critical

Fault Tolerant with down drives

Fault Tolerant and online

Offline

Quarantined critical

Quarantined offline

Quarantined with down drives

Vdisk or Disk Group Quarantined

If a virtual disk group is quarantined, see

dequarantine Virtual disk groups without the assistance of knowledgeable support personnel.

Table 9 Quarantine during array boot (available in all FW versions)

Level

Symptom

Raid 5

During boot, multiple disk drives from the Vdisk or

disk group go missing and the Vdisk or disk group

status is marked as QTOF.

Raid 6

During boot, more than two disk drives go missing

from the Vdisk or disk group and the Vdisk or disk

group status is marked as QTOF.

Troubleshooting HPE MSA 1040/2040 Storage

Displayed in the CLI or SMU

CRIT

FTDN

FTOL

OFFL

QTCR

QTOF

QTDN

Accessing Hewlett Packard Enterprise

Action Required

The system de-quarantines the Vdisk or disk group

once the missing disk drives are recognized after a

rescan. If the Vdisk or disk group is not

de-quarantined, a manual de-quarantine is performed

which brings the Vdisk or disk group status to OFFL

(if the drives are not recognized during a rescan). If

the status of the Vdisk or disk group changes to

Description

The Vdisk or disk group is online;

however, some drives are down and

the Vdisk or disk group is not fault

tolerant.

The Vdisk or disk group is online and

fault tolerant; however, some drives

are down.

The Vdisk or disk group is online and

fault tolerant.

The Vdisk or disk group is offline

because it is using offline initialization

or the drives are down and data is lost.

The Vdisk or disk group is in a critical

state and quarantined because some

drives are missing.

The Vdisk or disk group is offline and

quarantined because some drives are

missing.

The Vdisk or disk group is offline and

quarantined because at least one drive

is missing; however, the Vdisk or disk

group is accessed and is fault tolerant.

Example: One drive is missing from a

RAID-6.

The Vdisk or disk group is online and

does not have fault-tolerant attributes.

Support. Do not

Источник

Also — you should most likely consult HP — to exactly understand the events and extent-of changes that occured which may have caused the current situation. Also, you have good backups, right !?

Now — if you exhausted all the above options — just a quick process we worked recently upon a power-hiccup, and LEFTOVER disk (amber-light) event. We know this LEFTOVER disk did not move from some other slot, nor it came from some other MSA/system. We are sure — this LEFTOVER event occured due to a power-hiccup/event. Also, we had the RAID/Vdisk has not-failed (ofcourse — in critical condition though!), and hosts are still accessing the volumes on that vdisk. With this backgroud, here is the process/detailed-steps. You’ve gotta have Word or OpenOffice for this to read

How to rebuild a degraded vdisk in a HP MSA P2000 / 2012fc; A disk has changed State to “LEFTOVER”; original spare disk showing state: “AVAIL” ; RAID5 become CRITICAL!

We had a power-event and MSA had a power-hiccup. It is possible some disks came-back online AFTER controller came-up. Initially – we could not get to MSA controllers either thru SMU/Web-interface or thru putty, or much worse – not even via the USB cable hooked-up from controller-A to Backup server. After moving the USB cable to controller-B (bottom) now we regained access thru serial port (COM3) via putty. Restarted management interfaces of both controllers, and now we can communicate with the controllers via putty, and via SMU/Web Management Interfaces.

Checking thru SMU, Narrows MSA P2000 indicated VDisk Failure, reporting following messages:
Degraded.
Virtual-disk is not fault tolerant.

Event 55: A disk drive reported a SMART event
Event 314: There is a problem with a FRU.

The Physical Hard Disk Drive likely not failed, but the vdisk array was in a degraded state. One Physical disk (Enclosure#2, disk#4) was lit AMBER – indicating some sort of failure/malfunction.

The first thing that needs to be done, is the meta-data on the leftover disk needs to be cleared before it can re-join the RAID as either a regular disk or a dedicated spare.

Clearing Disk Meta Data

Each disk has metadata that identifies whether the disk is a member of a vdisk, and identifies other members of that vdisk. If a disk’s metadata says the disk is a member of a vdisk but other members’ metadata say the disk isn’t a member, the disk becomes a LEFTOVER. The system overview and enclosure overview pages show the disk’s How-Used value as LEFTOVR. A leftover disk’s Fault/UID LED is illuminated amber.

Before you can use that disk in a new vdisk or as a spare, you must clear that disk’s metadata.

To Clear meta data from LEFTOVER disks

In the Configuration View panel, right-click the system and then select Tools > Clear Disk Metadata.
In the main panel, select disks to clear metadata from.
Click Clear Metadata. When processing is complete a success dialog appears.
4. Click Ok

Re-Join Physical Disk to VDisk RAID

In the HP MSA Storage Management Utility, Right click on the affected vdisk and select Configuration > Manage Dedicated Spares
Your disk should appear in the list of available drives with a state of AVAIL. Tick the drive, then click the Modify Spares button.
The disk will be re-joined to the array. Initially it may be listed as a spare, but the MSA will automatically re-join the disk as an active member of the RAID if this is how it was originally configured.
The array will begin the Reconstruction process automatically. This can take a very long time depending on the size of your drives.
In this situation, original SPARE has now become AVAIL. So, re-add this AVAIL disk as SPARE using above steps. After setting this as spare – this disk will now show a status of: VDISKSP or SPARE

Before State of disks – one disk stuck in LEFTOVER while original SPARE stuck as AVAIL state in enclosure#2 (Tier 2 Storage – RAID5 is now critical):

After performing this rebuilding vdisk after a disk stuck/failed in LEFTOVER state (metadata cleared):

Notice that the disks are now marked as Vdisk under Reconstruction. See RAID rebuild progress below:
# show vdisks

Name Size Free Own Pref RAID Disks Spr Chk Status Jobs Serial Number Drive Spin Down Spin Down Delay

——————————————————————————————————————————————

ESX Prod Clstr 5995.1GB 0B B B RAID5 11 1 64k FTOL 00c0ffdb2868000084498b4d00000000 Disabled 0

Tier 2 Storage 9991.8GB 775.8GB A A RAID5 11 1 64k CRIT RCON 2% 00c0ffdb2f970000c84a924c00000000 Disabled 0

——————————————————————————————————————————————

Success: Command completed successfully.

# show vdisks

Name Size Free Own Pref RAID Disks Spr Chk Status Jobs Serial Number Drive Spin Down Spin Down Delay

——————————————————————————————————————————————

ESX Prod Clstr 5995.1GB 0B B B RAID5 11 1 64k FTOL 00c0ffdb2868000084498b4d00000000 Disabled 0

Tier 2 Storage 9991.8GB 775.8GB A A RAID5 11 1 64k CRIT RCON 19% 00c0ffdb2f970000c84a924c00000000 Disabled 0

After Vdisk/RAID-5 reconstruction finishes, VDISK status shows normal/FTOL:

Events occurred as these process steps performed, Explained in details:

Источник

Hello

We have several Hyper-converged einvoronments based on HP ProLiant DL360/DL380.
We have 3 Node and 2 Node Clusters, running with Windows 2016 and actual patches, Firmware Updates done, Witness configured.

The following issue occurs with at least one 3 Node and one 2 Node cluster:
When we put one node into maintenance mode (correctly as described in microsoft docs and checked everything is fine) and reboot that node, it can happen, that one of the Cluster Virtual Disks goes offline. It is always the Disk Performance with the SSD only
storage in each environment. The issue occurs only sometimes and not always. So sometimes I can reboot the nodes one after the other several times in a row and everything is fine, but sometimes the Disk «Performance» goes offline. I can not bring
this disk back online until the rebooted node comes back online. After the node which was down during maintenance is back online the Virtual Disk can be taken online without any issues.

We have created 3 Cluster Virtual Disks & CSV Volumes on these clusters:
1x Volume with only SSD Storage, called Performance
1x Volume with Mixed Storage (SSD, HDD), called Mixed
1x Volume with Capacity Storage (HDD only), called Capacity

Disk Setup for Storage Spaces Direct (per Host):
— P440ar Raid Controller
— 2 x HP 800 GB NVME (803200-B21)
— 2 x HP 1.6 TB 6G SATA SSD (804631-B21)
— 4 x HP 2 TB 12G SAS HDD (765466-B21)
— No spare Disks
— Network Adapter for Storage: HP 10 GBit/s 546FLR-SFP+ (2 storage networks for redundancy)
— 3 Node Cluster Storage Network Switch: HPE FlexFabric 5700 40XG 2QSFP+ (JG896A), 2 Node Cluster directly connected with each other

Cluster Events Log is showing the following errors when the issue occurs:

Error 1069 FailoverClustering
Cluster resource ‘Cluster Virtual Disk (Performance)’ of type ‘Physical Disk’ in clustered role ‘6ca63b55-1a16-4bb2-ac53-2b23619e258a’ failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Warning 5120 FailoverClustering
Cluster Shared Volume ‘Performance’ (‘Cluster Virtual Disk (Performance)’) has entered a paused state because of ‘STATUS_NO_SUCH_DEVICE(c000000e)’. All I/O will temporarily be queued until a path to the volume is reestablished.

Error 5150 FailoverClustering
Cluster physical disk resource ‘Cluster Virtual Disk (Performance)’ failed. The Cluster Shared Volume was put in failed state with the following error: ‘Failed to get the volume number for \?GLOBALROOTDeviceHarddisk10ClusterPartition2 (error 2)’

Error 1205 FailoverClustering
The Cluster service failed to bring clustered role ‘6ca63b55-1a16-4bb2-ac53-2b23619e258a’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Error 1254 FailoverClustering
Clustered role ‘6ca63b55-1a16-4bb2-ac53-2b23619e258a’ has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional
attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the
cluster may attempt to bring it online again after the restart delay period.

Error 5142 FailoverClustering
Cluster Shared Volume ‘Performance’ (‘Cluster Virtual Disk (Performance)’) is no longer accessible from this cluster node because of error ‘(1460)’. Please troubleshoot this node’s connectivity to the storage device and network connectivity.

Any hints / inputs appreciated. Had someone something similar?

Thanks in advance

Philippe

Источник

Приветствую. Есть дисковый массив hp msa 2012fc, к нему последовательно как экстендеры присоединены еще 2 полки. После обновления прошивки на голове — 2012fc у одной из массива-расширения диски перешли в статус leftover, vdisk offline. Что это за статус? Можно ли вернуть данные?

Значит что физически диск целый но выпал из vdisk. Очисть метаданные.
1. Select Manage > Utilities > Disk Drive Utilities > Clear Metadata.
An enclosure view is displayed in which only Leftover and Available drives are selectable. Available drives are considered to have had their metadata cleared, but are selectable in case a drive with partial metadata has been inserted into the system.
2. Select the drives whose metadata you want to clear.
3. Click Clear Metadata For Selected Disk Drives.

Значит что физически диск целый но выпал из vdisk. Очисть метаданные.
1. Select Manage > Utilities > Disk Drive Utilities > Clear Metadata.
An enclosure view is displayed in which only Leftover and Available drives are selectable. Available drives are considered to have had their metadata cleared, but are selectable in case a drive with partial metadata has been inserted into the system.
2. Select the drives whose metadata you want to clear.
3. Click Clear Metadata For Selected Disk Drives.

Ок, но у меня все диски в полке в таком статусе.

Пересоздай vdisk, видимо все рассыпалось. Бэкап есть?

Пересоздай vdisk, видимо все рассыпалось. Бэкап есть?

я думаю мне надо зайти в утилиту управления raid в сервере и просто добавить выпавший диск обратно в рэйд. Подскажите как в серверах DELL называется такая утилита??
ЗЫ. В HPe серверах это array configuration utilities а что в DELL ?

это через bios при загрузке надо зайти

Источник