Ata read dma error

Модератор: Bizdelnick

dergachev: Сообщения: 847; ОС: archlinux

Решено: Ошибки при загрузке — что-то про DMA

Собственно, не знаю, какое отношение это имеет к юниксам (скорее всего дело в железе), но поскольку в венде я бы об этом так никогда и не узнал бы, то пишу сюда.
Имеются вот такие ошибки.

Код: Выделить всё

ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/100
ata2: EH complete
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/100
ata2: EH complete
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/100
ata2: EH complete
ata2.01: limiting speed to UDMA/66:PIO4
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/66
ata2: EH complete

При этом недавно накрылась windows 7: сначала стала просто грузиться по десять минут, потом при загрузке давала какой-то input-output error; а вот в разных линуксах при загрузке стали сыпаться примерно такие ошибки, хотя потом всё идеально работает. Причем при подключении одного конкретного диска они тоже порой сыпятся где-то по десять минут прежде чем что-либо начнет работать, а если его не подключать (то есть не монтировать) — только то, что показано выше.

Что делать?

Код: Выделить всё

# fdisk -l

Диск /dev/sda: 750.2 ГБ, 750156374016 байт
255 heads, 63 sectors/track, 91201 cylinders
Units = цилиндры of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xbfb2917c

Устр-во Загр     Начало       Конец       Блоки   Id  Система
/dev/sda1               1        1101     8843751   83  Linux
/dev/sda2   *        1102       91201   723728249    7  HPFS/NTFS

Диск /dev/sdc: 300.1 ГБ, 300069052416 байт
255 heads, 63 sectors/track, 36481 cylinders
Units = цилиндры of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xf515ed38

Устр-во Загр     Начало       Конец       Блоки   Id  Система
/dev/sdc1               1        6785    54500481   83  Linux
/dev/sdc2            6786       36481   238533120   83  Linux

Диск /dev/sdb: 320.1 ГБ, 320072933376 байт
255 heads, 63 sectors/track, 38913 cylinders
Units = цилиндры of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x2f479f33

Устр-во Загр     Начало       Конец       Блоки   Id  Система
/dev/sdb1               1        8317    66806271   83  Linux
/dev/sdb3            8318       38914   245757952    7  HPFS/NTFS
Раздел 3 не заканчивается на границе цилиндра.

xoomer: Сообщения: 201

Re: Решено: Ошибки при загрузке — что-то про DMA

Сообщение

xoomer » 05.05.2010 00:14

dergachev, а если попробовать вbIключить UDMA и включить режим PIO на всех дисках?

dergachev писал(а): ↑

04.05.2010 21:18

ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }

знать бbI, что значит ata2.01 — ето как я понимаю накопитель на 4-м коннекторе SATA ?? (возможно, ошибаюсь)

Что я бbI сделал:
— насчет PIO я уже писал
— попробовал бbI отключить вторую ветку SATA-контроллера
— и подумал бbI о сохранении даннbIх на жестких дисках и о сохранении самих жестких дисков. Желательно бbI старbIй диск для разбора етого всего дела, т.к. я понимаю проблема не в HDD, а в чем-то другом…

Far behind the skies…

dergachev: Сообщения: 847; ОС: archlinux

Re: Решено: Ошибки при загрузке — что-то про DMA

Сообщение

dergachev » 15.06.2010 19:24

Поскольку я недавно тут очень удивился тому, как высоко unixforum.org в поисковиках сидит, решил-таки отписаться о решении.
Таки да, плохой IDE-проводочек был, поменял — и наступило счастье.
Правда, за это время уже наступил локальный вендекапец, ну так ему и надо

Источник

Содержание

Arch Linux
#1 2014-12-07 07:31:05
[SOLVED] libata: status failed command: READ DMA
unixforum.org
Решено: Ошибки при загрузке — что-то про DMA (помогите расследовать)
Решено: Ошибки при загрузке — что-то про DMA
Re: Решено: Ошибки при загрузке — что-то про DMA
[HDD] failed command: READ DMA EXT
[РЕШЕНО] Failed command: READ DMA
Ata read dma error
ATA/SATA other issues
ATA/SATA DMA timeout issues
SATA disk troubleshooting
Understanding what you’re dealing with
What FreeBSD has to say
What disks have to say

Arch Linux

You are not logged in.

#1 2014-12-07 07:31:05

[SOLVED] libata: status failed command: READ DMA

acquired a new amd R7 SSD to boost up my asus 1000H experience (it quiet actually does the double of the rotating seagate integrated) and are watching those errormessages during boot.

The Last 3 Lines are for the support if i have to return the drive. (Nobody wants to know his car will explode for sure and ride it until it realy does)

No occasional Errors.
The drive doesn’t spit out errors if plugged through multiple 2,5» to usb adaptors (diversing brands also), it isn’t affected by the actual kernel since i started with 3.16 and upgraded lately to 3.17 and the error-message persists in similar matters.
Using a different PC (wortmann/clevo 1547p) results in no error-messages (same procedure 3.16 and .17 kernel, internal SATA and external through the adaptors).
If i use the delivered HDD, the various other 2,5» HDD’s lying around my table or whatever doesn’t result in error-messages either, thus i think the sata-port isn’t damaged on the pc-side. Blowing dust from the connectors and ensuring the drive was in place [check!]

I tried both multiple times, with live and resident OS.
No bad sectors, last time trimmed: 05122014.

To conclude: the Error-messages do not impose a threat on the performance, they are simply annoying :s Googling did reveal similar problems and error messages but never as is, so i consider myself a postworthy case.

I’m using openrc.
P.S. on systemd it also spits out the errors above

Solution:
I read in the libata-sourcecode and greped the error messages in the ata part of the kernel sources and got some known issues with OCZ-SSD’s in combination with DMA.
Disabling of DMA solved the problem and rendered the drive access ultra slow, so i searched anew, another read in the sources brought me to the assumption that something with AHCI could be wrong, i googled and revealed
https://www.bios-mods.com/forum/Thread- … eePC-1000H
which proposes a bios-mod which enables AHCI-capabilities already present on the ASUS1000H.
/*not encouraging to upgrade your bios, consider your case!*/

I changed the bios to the modded one and alas the error is gone, the speed is remarkably better.

Sadly i’m not able to explain the issue or the solution technically correct.

With kind regards, frig

Last edited by frig (2015-01-13 02:46:07)

Источник

unixforum.org

Форум для пользователей UNIX-подобных систем

Темы без ответов
Активные темы
Поиск
Статус форума

Решено: Ошибки при загрузке — что-то про DMA (помогите расследовать)

Модератор: Bizdelnick

Решено: Ошибки при загрузке — что-то про DMA

Сообщение dergachev » 04.05.2010 21:18

Re: Решено: Ошибки при загрузке — что-то про DMA

Сообщение xoomer » 05.05.2010 00:14

dergachev, а если попробовать вbIключить UDMA и включить режим PIO на всех дисках?

знать бbI, что значит ata2.01 — ето как я понимаю накопитель на 4-м коннекторе SATA ?? (возможно, ошибаюсь)

Источник

[HDD] failed command: READ DMA EXT

Доброого времени суток.

Заметил в логах следующее:

Обратил внимание на это когда обнаружил редкие фризы чтения с этого диска.

На smartctl -H /dev/sda ругани нет, но на тесты результат такой:

Готовиться к худшему ?

Сейчас ищу бэды.

PS НЖМД — WD green 1TB, фс — reiserfs ошибок не находит

1. Попробуй поменяй кабель.

2. Попробуй переключить режимы IDE/AHCI/RAID

Для начала просто вынуть и вставить кабель обратно — иногда помогает.

Тоже хотел написать о кабеле, но похоже это таки не кабель, а винт. Поясню почему, у меня как раз на днях отвалился кабель, так ошибок было много, но других.

Вот — host bus error

В тоже самое время у меня grep media /var/log/messages* не выдает ничего, а в данном случае именно media error.

Поэтому таки да, готовится к худшему.

> Тоже хотел написать о кабеле, но похоже это таки не кабель, а винт

Почему похоже — так и есть, винт сообщает о media error

Почему похоже — так и есть, винт сообщает о media error

Ну я это и имею ввиду, а «похоже» — это на случайный 0.001% вероятности «всякое бывает, мало ли».

Только вчера подобное у себя исправлял (успешно). MHDD восстановил три софт-бэда.

Вот есть подозрение, что это софт-бэды, вообще, разбираться стал после того как грохнулась корневая фс из-за плохого контакта на разъёме питания.

badblocks /dev/sda3 (именно раздел, а не всё устройство) находил бэды. На расстройствах снёс всю фс (тем более, что там всё старое было и пришло время переделать кое-что), бэды исчезли.

В данном случае всё несколько сложнее ибо на разделе данных гигов на 800, действительно нужного из этого не так много, но всё же не хотелось бы терять, а забэкапить пока некуда. Попробую с mhdd поколдовать.

+ спокойствие SMART’а наводит на мысли, как что определится — отпишусь.

MHDD с ремапом — долго конечно, но что делать.
badblocks достаточно тупая вещь т.к. не показывает характер повреждений. В моём случае было несколько жестких перезагрузок как раз из-за проблем с разъемом питания.

У меня ошибки UNC (Unrecoverable?) бывают на винчестерах с бэдами. Большинство лечится MHDD или Викторией с Erase Delays

MHDD проблем не нашёл, дело в фс. поиграюсь намедни.

Значит, проблема в интерфейсе, но никак не в ФС. Они на разных уровнях

>>MHDD проблем не нашёл, дело в фс. поиграюсь намедни.

Значит, проблема в интерфейсе, но никак не в ФС. Они на разных уровнях

Не факт. Диск мог ремапнуть проблемный сектор. Media error не может быть из-за интерфейса или ФС (если это, конечно, не баг в драйвере).

Вроде ничего не ремапнуто и видно, что свежие тесты проходят без выявления ошибок, смущает только Offline_Uncorrectable = 4, то есть, если не ошибаюсь, есть (или когда-то были) 4 сектора, доступ к которым замедлен. Попробую провести long-тест, но сейчас никаких проблем нет.

Кстати, ошибки начали проявляться после проблем с питанием и только когда пытался качать с торрентов то, что качалось в момент сбоя питания. И потом в закачке были обнаружены ошибки. Сейчас проблем вроде нет.

В общем, железячных проблем с винчестером не нашёл, даже значение Offline_Uncorrectable обнулилось. Однако ж недавно (по мере заполнения винчестера )вот что в логах обнаружил:

Я так понимаю, что когда нжмд испытывал проблемы с питанием (а в это время на него производилась запись), он мусору набросал в незаполненное пространство, а по мере заполнения диска вылезают подобные ошибки, надо было с самого начала сделать

—scan-whole-partition, -S This option causes —rebuild-tree to scan the whole partition but not only the used space on the partition

Источник

[РЕШЕНО] Failed command: READ DMA

# 6 лет, 8 месяцев назад (отредактировано 6 лет, 8 месяцев назад) Здравствуйте!
Есть свежеустановленная система ArchLinux(поставлены только base, base-devl, grub-bios), при попытке загрузиться в неё вижу сообщение
и дальше система не грузится(доходит до kernel panic).

Сообщение возникает при попытке подмонтировать разделы tmp, var, home, boot, в общем всего кроме root — root монтируется удачно и fsck его проверяет. SmartMonTools ни каких ошибок не показывает, mHDD говорит что hdd просто идеальный. Пробовал файловые системы ext4 и ext3.
При использовании ext4 с параметром ядра libata.force=noncq ещё есть вот такая вот ошибка
Грешу на ядро арча, тк если установить RFRemix(Fedora) 23 — всё работает просто идеально.

Прошу совета в какую сторону копать. Заранее спасибо!
HDD: SAMSUNG Spinpoint M8 ST1000LM024 (HN-M101MBB)
Kernel: 4.4.5-1
Notebook: Lenovo P585
smartctl -a /dev/sda1
Interrupted (host reset) — при повторном сканировании выключили ноут.

Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)
# 6 лет, 8 месяцев назад (отредактировано 6 лет, 8 месяцев назад)

kurych
Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)

говорит (0/0/0 errors)

По поводу тестов — это был повторный который оборвали.
Если проблема явно железная(я тоже сначала к этому склонялся), то почему Ф23 нормально стартует и ни на что не ругается?

# 6 лет, 8 месяцев назад (отредактировано 6 лет, 8 месяцев назад) и проблема решилась)
P.S. косяк ядра 4.5 и AMD A* CPU

© 2006-2023, Русскоязычное сообщество Arch Linux.
Название и логотип Arch Linux ™ являются признанными торговыми марками.
Linux ® — зарегистрированная торговая марка Linus Torvalds и LMI.

Источник

ATA subsystem causes kernel to lock (no panic) if atacontrol detach is executed without remembering to umount relevant filesystems beforehand

ATA subsystem acts erratically/incorrectly when a SATA disk is removed from the system without doing atacontrol detach prior to the removal.

Reference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html

Easily reproducable on any hardware sporting a commercial-grade hot-swap SATA backplane.

Intel MatrixRAID: New ar(4) device created when bad disk in RAID-1 array replaced with new disk

Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=121899

Patch available in PR.

Intel MatrixRAID: Array goes incorrectly into READY state when rebooting machine in the middle of an array rebuild

Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=102210

Patch available in PR 102210, and has been available since 2006.

Intel MatrixRAID: Kernel panics when a disk is lost and reattached

Open PRs:
Patch available in PR 102211, and has been available since 2006.

Numerous problems with embedded LSI v3 MegaRAID

http://www.freebsd.org/cgi/query-pr.cgi?pr=101819

Patches available in PR 92786 and PR 101819. Patches have been available since 2006.

ServerWorks HT1000 chipsets causing SATA data corruption

Known to affect at least Dell PowerEdge SC1435 systems

ATA/SATA other issues

SMART monitoring: Using the -s flag in smartd.conf to run periodic short/long offline tests results in DMA timeouts

Workaround: Stop using this feature. I explain why in this post.

I am in the process of communicating with Bruce Allen (author of smartmontools) to discuss why this feature exists, why it’s advocated in the man page and example smartd.conf, and why one would want to perform these tests on a regular basis.

ATA/SATA DMA timeout issues

Symptom: messages similar to below are seen output from the kernel. Sometimes harmless, sometimes fatal. LBAs listed are scattered, and SMART statistics for the disk in question show no sign of increased error rates or sector issues:
References:
PATA only: Set hw.ata.ata_dma=0 in /boot/loader.conf. This will disable use of ATA DMA. NOTE: This workaround greatly decreases I/O performance. You have been warned.

Volker Theile of the FreeNAS project informs me that they have solved most of the DMA problems by increasing a hard-coded arbitrary timeout value of 5 (seconds) in the ATA code to 10 or 15, while simultaneously making the timeout value adjustable via sysctl. Volker submit patches to sos@ over a year ago, but never received a response.
FreeBSD 7.0 patch: http://freenas.svn.sourceforge.net/viewvc/freenas/trunk/build/kernel-patches/ata/files/patch-ata.diff?view=markup

As of 2008/02/27, Scott Long has offered to help track this problem down. Those who are able to reproduce the problem reliably should get in contact with Scott; serial console access will very likely be mandatory.

SATA disk troubleshooting

Understanding what you’re dealing with

A substantial number of FreeBSD users report SATA disk problems. It is difficult to determine the source of these problems, due to the complex nature of hard disks and all related pieces (cabling and power, disk mechanics, disk firmware, protocol/transport, kernel driver, etc.). Even with a thorough understanding of how SATA disks work, there is a decent chance that even the most skilled system administrator won’t be able to determine the root cause. To make matters worse, many system administrators do not have fail-over systems in place, which makes thorough analysis and troubleshooting impossible («This system has to be up and running 24×7, I can’t afford the downtime for others to look at it»). And then there’s the issue of finances: sometimes cash is required to work around issues («Do we know if this Adaptec SATA controller even works? Maybe we should switch to Areca or 3ware or Promise. «), while not everyone has such funds available.

But back to the disks themselves. Comparatively, with regards to bad block management, SCSI disks behave quite differently than SATA. SCSI will report any disk errors with sense code, ASC, and ASCQ, and may even automatically mark that block as a «grown defect» (a user-manageable list of bad blocks) — while SATA disks will silently attempt to remap bad blocks, keeping track of such defects internally, and will not report to the transport layer (e.g. operating system) that anything had happened. For example, assuming the block was remapped successfully, even SMART statistics are usually left untouched; while in the case of a remapping failure, SMART attribute 198 (Offline_Uncorrectable) may get incremented.

In the case of SATA, such a scenario can take time, and depends greatly upon the type of error. Some errors (such as soft errors) may take under a second to recover from, while others (hard errors) may take longer periods — and some may cause the disk to lock up entirely, requiring the disk power-cycled and the SATA channel reattached. FreeBSD expects that all ATA commands (that includes SATA!) sent to a device receive a response within 5 seconds. The timeout is hard-coded, and is entirely arbitrary; it has no implied meaning. It was chosen by sos@freebsd.org probably based on personal choice.

What FreeBSD has to say

So what happens when a disk operation is executed, but takes longer than 5 seconds to return a sense code? Well, FreeBSD spits out quite a lot of crap to the kernel console (see dmesg or /var/log/console.log), such as:

This tells you a few things, most of which are low-level:

The disk which experienced the problem was ad0

A time-out occurred when attempting a write operation

FreeBSD attempted to write data to LBA XXXXX via standard DMA (which uses 28-bit LBA addressing), experienced a time-out, and attempted a write retry once

FreeBSD attempted to write data to LBA YYYYY via 48-bit DMA, experienced a time-out, and attempted a write retry twice

FreeBSD deemed the write operation a failure

The ATA status result is value 0x51 (bits 6 ( DRDY), 4 (not applicable), and 1 ( ERR) set)

The ATA error result is value 0x10 (bit 4 set), which according to ATA-7 specification, Section 6.59.6 is: » IDNF shall be set to one if a user-accessible address could not be found. IDNF shall be set to one if an address outside of the range of user-accessible addresses is requested if command aborted is not returned.» FreeBSD labels this bit as NID_NOT_FOUND

The IDNF bit seems to indicate that a particular LBA on the disk was inaccessible; I interpret this to mean «the LBA you’re trying to access is within an invalid LBA range» (which would strongly indicate a bug in FreeBSD), but there’s a good chance I’m reading the description wrong. This needs some further research/clarification, particularly by those more familiar with the ATA protocol semantics than I am.

None of these are very helpful though, are they? To a system administrator, this means «there’s something wrong, possibly around 48-bit LBA YYYYY. or maybe 28-bit LBA XXXXX». Most administrators know that if the LBA seeing errors is always the same that the disk itself is likely the cause, but what if the LBA is random?

What disks have to say

Rather than try to decipher what FreeBSD says, a more logical approach is to examine the disk to see if it logged any sort of error in SMART.

I need to make something clear: SMART is not a guaranteed way to determine the current state of a disk, or past events on a disk. SMART is entirely dependent upon the level of pedantry of the disk firmware programmer him/herself. Some SMART implementations don’t even bother to log real errors; others increment counters only when offline SMART tests are run. The trick is knowing how to interpret SMART stats for each disk vendor (Western Digital, Seagate, Fujitsu, etc.). Sometimes it gets even more granular than that (different models of disks behaving differently when it comes to SMART).

JeremyChadwick/ATA_issues_and_troubleshooting (last edited 2020-04-26T06:18:24+0000 by MarkLinimon )

Источник

1

0

Доброого времени суток.

Заметил в логах следующее:

debian kernel: [ 1358.084798] ata3.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6
debian kernel: [ 1358.084806] ata3.00: BMDMA stat 0x25
debian kernel: [ 1358.084812] ata3.00: failed command: READ DMA EXT
debian kernel: [ 1358.084821] ata3.00: cmd 25/00:00:4d:aa:d8/00:01:4a:00:00/e0 tag 0 dma 131072 in
debian kernel: [ 1358.084823]          res 51/40:ef:57:aa:d8/40:00:4a:00:00/e0 Emask 0x9 (media error)
debian kernel: [ 1358.084828] ata3.00: status: { DRDY ERR }
debian kernel: [ 1358.084831] ata3.00: error: { UNC }
debian kernel: [ 1358.084847] ata3: hard resetting link
debian kernel: [ 1358.404059] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 3F0)
debian kernel: [ 1358.421075] ata3.00: configured for UDMA/133
debian kernel: [ 1358.421098] ata3: EH complete

Обратил внимание на это когда обнаружил редкие фризы чтения с этого диска.

На smartctl -H /dev/sda ругани нет, но на тесты результат такой:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      2142         858299943
# 2  Short offline       Completed: read failure       90%      2142         858299943

Готовиться к худшему ?

Сейчас ищу бэды.

PS НЖМД — WD green 1TB, фс — reiserfs ошибок не находит

Источник

The first error you reported:

ata1:00: status: { DRDY ERR }
ata1.00: error {UNC }
ata1:00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1:00: BMDMA stat 0x25
ata1:00: failed command: READ DMA

says that a READ DMA ATA command to a disk on ATA port 1 failed (status includes ERR for error). That port is most likely the hard disk, and the error points toward the drive having problems. The DMA part can likely be ignored; DMA is Direct Memory Access which is the dominant transfer mode these days, and if you were having RAM or RAM bus problems to the degree that you were hitting something like that repeatedly, you’d likely be seeing a ton more errors if the system was able to function at all.

The second error:

end_request: critical target error, dev sda, sector 32839936
EXT4_fs error: (device sda5): ext4_find_entry:935: inode #393217: comm init: reading directory lblock 0
INIT: No inittab file found

says there is some problem on /dev/sda, sector 32839936, which with 512-byte sectors puts us physically toward the end of the /dev/sda5 partition, which adds up with device sda5 as reported by the file system driver. The error reported by init together with the file system driver’s error details points toward a problem with the file system causing /etc/inittab to be unavailable or (less likely) unreadable. This would mean that either the root directory, the /etc directory, or the /etc/inittab file entry are somehow involved in the corruption. Given the inode number, I’d take a shot at /etc/inittab specifically being the culprit, until proven wrong.

You write (my emphasis):

Suspecting a HDD crash, I took it out and used in another PC as an external USB HDD drive and I was able to mount & see all partitions and files within. So I assume Disc is OK.

I would say that your assumption is unfounded. The disk is obviously having some problem; with any luck, it’ll be easy to fix.

The first thing I would do in your situation is to refresh my backup of everything that is on that disk. Make sure that you do not overwrite or delete anything from your most recent backup, as there is certainly a possibility that you will need it. Perhaps the best option is to make a fresh backup onto a new (or at least not previously used for your own backups) drive of everything that you are able to access. Expect some I/O errors on the source while making that copy.

Second comes attempting recovery. With any luck, given the errors, this is a single-sector or few-sectors problem which has caused a small amount of file system corruption, in which case e2fsck should be able to repair most of the damage. Some of your files are likely gone, but with some luck, you might be able to find them in /lost+found under the file system’s mount root (meaning for example /data/lost+found if you mount /dev/sda5 on /data) after having e2fsck do what it can. Otherwise, do a comparison against your most recent backup from before the problems started, and restore relevant files from the backup. (Did I mention backups are useful if bad things ever happen, as they inevitably do?)

Third comes the question of whether you can trust the drive for future use. A few bad sectors doesn’t have to be catastrophic from the drive’s point of view, but rotational drives about 100 GB in size practically cannot be sourced new today in most form factors, which points to this being a relatively old drive. Personally, I’d probably just accept that the drive has outlived its useful life at this point and get a replacement, but then again I am rather paranoid when it comes to my data; your mileage may vary. You will have to weigh the cost of a replacement drive against the risk of total failure of the drive and subsequent total loss of all the data on the drive.

Источник

Ещё раз доброго времени суток!
И так имеется:
FreeBSD 7.0-relaase i386
CPU: Intel(R) Celeron(R) CPU 2.93GHz (2929.51-MHz 686-class CPU)
real memory = 2146697216 (2047 MB)
avail memory = 2095570944 (1998 MB)
Было установлено 4 диска: ad0, ad1, ad4, ad6

Код: Выделить всё

ad0: 76319MB <Seagate ST380011A 8.01> at ata0-master UDMA100
ad1: 305245MB <WDC WD3200AAJB-00WGA0 00.02C01> at ata0-slave UDMA100
ad4: 715404MB <Seagate ST3750640AS 3.AAE> at ata2-master SATA150
ad6: 152627MB <WDC WD1600AAJS-22WAA0 58.01D58> at ata3-master SATA150

Машина в принципе боевая честно служила практически без простоя 6 месяцев, естественно прерываясь только на профилакты.
И проблем с ней небыло. Однако неделю назад стали появляться ошибки

Код: Выделить всё

ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=619745471
ad1s1d[READ(offset=314088423424, length=16384)]error = 5
ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=620121823
g_vfs_done():ad1s1d[READ(offset=314281115648, length=16384)]error = 5
ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=620121823
g_vfs_done():ad1s1d[READ(offset=314281115648, length=16384)]error = 5
ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=622003583
g_vfs_done():ad1s1d[READ(offset=315244576768, length=16384)]error = 5
ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=624638047
g_vfs_done():ad1s1d[READ(offset=316593422336, length=16384)]error = 5
ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=619745471
g_vfs_done():ad1s1d[READ(offset=314088423424, length=16384)]error = 5

естественно первым делом стал грешить на диск, ввиду того что машина в работе срочным порядком отключил диск. Думал проблема решена, но ни тут то было:

Код: Выделить всё

ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5

Как видно полезли ошибки уже на диски ad0 который до этого ни разу в этом замечен не был.
Сразу закралась мысль, что дело врядли в дисках. Стал грешить на шлейф. Заменил, в итоге получил:

Код: Выделить всё

ad0: 76319MB <Seagate ST380011A 8.01> at ata0-master UDMA33
ad4: 715404MB <Seagate ST3750640AS 3.AAE> at ata2-master SATA150
ad6: 152627MB <WDC WD1600AAJS-22WAA0 58.01D58> at ata3-master SATA150
...
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5

После всех этих выкрутасов проверил оба диска, они в норме
Подскажите пожалуйста, может кто сталкивался с такой ситуацией?

Сейчас вынужден катать сервер игнорируя эти ошибки, но это крайне не верный подход, поэтому стал готовить на замену новую машину, но хотелось бы разобраться, что же хочет от меня эта
Заранее огромное спасибо!

Не знаю кто писал мою жизнь, но чувствую я себя бета-тестером…

Источник

Hello there,

acquired a new amd R7 SSD to boost up my asus 1000H experience (it quiet actually does the double of the rotating seagate integrated) and are watching those errormessages during boot.

[    2.263498] ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
[    2.420827] ata1.00: ATA-8: Radeon R7, 1.00, max UDMA/133
[    2.423824] ata1.00: 234441648 sectors, multi 1: LBA48 NCQ (depth 0/32)
[    2.433962] ata1.00: configured for UDMA/133
[    2.480106] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    2.483457] ata1.00: BMDMA stat 0x24
[    2.486730] ata1.00: failed command: READ DMA
[    2.489985] ata1.00: cmd c8/00:08:a8:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.496697] ata1.00: status: { DRDY ERR }
[    2.500063] ata1.00: error: { ABRT }
[    2.510710] ata1.00: configured for UDMA/133
[    2.514043] ata1: EH complete
[    2.526934] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    2.530401] ata1.00: BMDMA stat 0x24
[    2.533724] ata1.00: failed command: READ DMA
[    2.537034] ata1.00: cmd c8/00:08:88:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.543709] ata1.00: status: { DRDY ERR }
[    2.546931] ata1.00: error: { ABRT }
[    2.557328] ata1.00: configured for UDMA/133
[    2.560569] ata1: EH complete
[    2.573579] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    2.576977] ata1.00: BMDMA stat 0x24
[    2.580177] ata1.00: failed command: READ DMA
[    2.583434] ata1.00: cmd c8/00:08:90:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.590086] ata1.00: status: { DRDY ERR }
[    2.593391] ata1.00: error: { ABRT }
[    2.603991] ata1.00: configured for UDMA/133
[    2.607257] ata1: EH complete
[    2.620245] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    2.623682] ata1.00: BMDMA stat 0x24
[    2.627010] ata1.00: failed command: READ DMA
[    2.630268] ata1.00: cmd c8/00:08:98:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.636873] ata1.00: status: { DRDY ERR }
[    2.640123] ata1.00: error: { ABRT }
[    2.650659] ata1.00: configured for UDMA/133
[    2.653825] ata1: EH complete
[    2.666904] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    2.670101] ata1.00: BMDMA stat 0x24
[    2.673076] ata1.00: failed command: READ DMA
[    2.676084] ata1.00: cmd c8/00:08:a0:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.682256] ata1.00: status: { DRDY ERR }
[    2.685253] ata1.00: error: { ABRT }
[    2.693994] ata1.00: configured for UDMA/133
[    2.697046] ata1: EH complete
[    2.726959] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    2.730130] ata1.00: BMDMA stat 0x24
[    2.736257] ata1.00: failed command: READ DMA
[    2.739324] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.745689] ata1.00: status: { DRDY ERR }
[    2.748865] ata1.00: error: { ABRT }
[    2.757231] ata1.00: configured for UDMA/133
[    2.760390] ata1: EH complete
[    2.773584] ata1.00: limiting speed to UDMA/100:PIO4
[    2.776803] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[    2.779847] ata1.00: BMDMA stat 0x24
[    2.782890] ata1.00: failed command: READ DMA
[    2.785980] ata1.00: cmd c8/00:08:a0:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.792354] ata1.00: status: { DRDY ERR }
[    2.795499] ata1.00: error: { ABRT }
[    2.798626] ata1: soft resetting link
[    2.960611] ata1.00: configured for UDMA/100
[    2.963739] ata1: EH complete
[    2.976877] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    2.980091] ata1.00: BMDMA stat 0x24
[    2.983144] ata1.00: failed command: READ DMA
[    2.986226] ata1.00: cmd c8/00:08:a8:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    2.992504] ata1.00: status: { DRDY ERR }
[    2.995652] ata1.00: error: { ABRT }
[    3.003925] ata1.00: configured for UDMA/100
[    3.007042] ata1: EH complete
[    3.020170] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.023399] ata1.00: BMDMA stat 0x24
[    3.026462] ata1.00: failed command: READ DMA
[    3.029549] ata1.00: cmd c8/00:08:a8:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.035927] ata1.00: status: { DRDY ERR }
[    3.039085] ata1.00: error: { ABRT }
[    3.047327] ata1.00: configured for UDMA/100
[    3.051577] ata1: EH complete
[    3.063562] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.066735] ata1.00: BMDMA stat 0x24
[    3.069694] ata1.00: failed command: READ DMA
[    3.072671] ata1.00: cmd c8/00:08:70:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.078680] ata1.00: status: { DRDY ERR }
[    3.082313] ata1.00: error: { ABRT }
[    3.090631] ata1.00: configured for UDMA/100
[    3.093767] ata1: EH complete
[    3.106937] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.110106] ata1.00: BMDMA stat 0x24
[    3.113094] ata1.00: failed command: READ DMA
[    3.116073] ata1.00: cmd c8/00:08:b0:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.122146] ata1.00: status: { DRDY ERR }
[    3.125194] ata1.00: error: { ABRT }
[    3.137418] ata1.00: configured for UDMA/100
[    3.140507] ata1: EH complete
[    3.153643] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.156823] ata1.00: BMDMA stat 0x24
[    3.159807] ata1.00: failed command: READ DMAlibata: status{DRDY ERR} failed command: READ DMA
[    3.162839] ata1.00: cmd c8/00:08:20:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.168998] ata1.00: status: { DRDY ERR }
[    3.172007] ata1.00: error: { ABRT }
[    3.180668] ata1.00: configured for UDMA/100
[    3.183723] ata1: EH complete
[    3.196959] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.200133] ata1.00: BMDMA stat 0x24
[    3.203119] ata1.00: failed command: READ DMA
[    3.206164] ata1.00: cmd c8/00:08:60:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.212299] ata1.00: status: { DRDY ERR }
[    3.215307] ata1.00: error: { ABRT }
[    3.223992] ata1.00: configured for UDMA/100
[    3.227045] ata1: EH complete
[    3.240272] ata1.00: limiting speed to UDMA/33:PIO4
[    3.243410] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[    3.246425] ata1.00: BMDMA stat 0x24
[    3.249432] ata1.00: failed command: READ DMA
[    3.252409] ata1.00: cmd c8/00:08:08:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.258466] ata1.00: status: { DRDY ERR }
[    3.261493] ata1.00: error: { ABRT }
[    3.264538] ata1: soft resetting link
[    3.427332] ata1.00: configured for UDMA/33
[    3.430433] ata1: EH complete
[    3.443595] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.446820] ata1.00: BMDMA stat 0x24
[    3.449852] ata1.00: failed command: READ DMA
[    3.452832] ata1.00: cmd c8/00:08:d0:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.459026] ata1.00: status: { DRDY ERR }
[    3.462171] ata1.00: error: { ABRT }
[    3.470657] ata1.00: configured for UDMA/33
[    3.473739] ata1: EH complete
[    3.486913] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.490139] ata1.00: BMDMA stat 0x24
[    3.493166] ata1.00: failed command: READ DMA
[    3.496142] ata1.00: cmd c8/00:08:20:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.502331] ata1.00: status: { DRDY ERR }
[    3.505461] ata1.00: error: { ABRT }
[    3.514007] ata1.00: configured for UDMA/33
[    3.517105] ata1: EH complete
[    3.530253] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.533492] ata1.00: BMDMA stat 0x24
[    3.536538] ata1.00: failed command: READ DMAlibata: status{DRDY ERR} failed command: READ DMA
[    3.539551] ata1.00: cmd c8/00:08:e0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.545772] ata1.00: status: { DRDY ERR }
[    3.548906] ata1.00: error: { ABRT }
[    3.557336] ata1.00: configured for UDMA/33
[    3.560498] ata1: EH complete
[    3.576774] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.580911] ata1.00: BMDMA stat 0x24
[    3.584568] ata1.00: failed command: READ DMA
[    3.587701] ata1.00: cmd c8/00:08:d0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.593924] ata1.00: status: { DRDY ERR }
[    3.598096] ata1.00: error: { ABRT }
[    3.607296] ata1.00: configured for UDMA/33
[    3.610368] ata1: EH complete
[    3.623605] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.626779] ata1.00: BMDMA stat 0x24
[    3.629744] ata1.00: failed command: READ DMA
[    3.632766] ata1.00: cmd c8/00:08:f8:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.638870] ata1.00: status: { DRDY ERR }
[    3.641872] ata1.00: error: { ABRT }
[    3.650611] ata1.00: configured for UDMA/33
[    3.653642] ata1: EH complete
[    3.683604] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.686772] ata1.00: BMDMA stat 0x24
[    3.689736] ata1.00: failed command: READ DMA
[    3.692743] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.698878] ata1.00: status: { DRDY ERR }
[    3.701864] ata1.00: error: { ABRT }
[    3.710673] ata1.00: configured for UDMA/33
[    3.713704] ata1: EH complete
[    3.726837] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.729990] ata1.00: BMDMA stat 0x24
[    3.732974] ata1.00: failed command: READ DMA
[    3.735978] ata1.00: cmd c8/00:08:78:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.742069] ata1.00: status: { DRDY ERR }
[    3.745055] ata1.00: error: { ABRT }
[    3.753991] ata1.00: configured for UDMA/33
[    3.757019] ata1: EH complete
[    3.773597] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.776769] ata1.00: BMDMA stat 0x24
[    3.779733] ata1.00: failed command: READ DMA
[    3.782731] ata1.00: cmd c8/00:07:88:4b:f9/00:00:00:00:00/ed tag 0 dma 3584 in
[    3.788819] ata1.00: status: { DRDY ERR }
[    3.791802] ata1.00: error: { ABRT }
[    3.800656] ata1.00: configured for UDMA/33
[    3.803685] ata1: EH complete
[    3.816836] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.819987] ata1.00: BMDMA stat 0x24
[    3.822972] ata1.00: failed command: READ DMA
[    3.825977] ata1.00: cmd c8/00:08:88:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.832062] ata1.00: status: { DRDY ERR }
[    3.835048] ata1.00: error: { ABRT }
[    3.843961] ata1.00: configured for UDMA/33
[    3.846993] ata1: EH complete
[    3.860289] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    3.863469] ata1.00: BMDMA stat 0x24
[    3.866447] ata1.00: failed command: READ DMA
[    3.869446] ata1.00: cmd c8/00:08:80:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    3.875548] ata1.00: status: { DRDY ERR }
[    3.878532] ata1.00: error: { ABRT }
[    3.887264] ata1.00: configured for UDMA/33
[    3.890287] ata1: EH complete
[    6.760125] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.760135] ata1.00: BMDMA stat 0x24
[    6.760145] ata1.00: failed command: READ DMA
[    6.760162] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.760171] ata1.00: status: { DRDY ERR }
[    6.760178] ata1.00: error: { ABRT }
[    6.767301] ata1.00: configured for UDMA/33
[    6.767364] ata1: EH complete
[    6.776800] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.776811] ata1.00: BMDMA stat 0x24
[    6.776819] ata1.00: failed command: READ DMA
[    6.776838] ata1.00: cmd c8/00:08:a0:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.776847] ata1.00: status: { DRDY ERR }
[    6.776854] ata1.00: error: { ABRT }
[    6.783957] ata1.00: configured for UDMA/33
[    6.784011] ata1: EH complete
[    6.800140] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.800152] ata1.00: BMDMA stat 0x24
[    6.800160] ata1.00: failed command: READ DMAlibata: status{DRDY ERR} failed command: READ DMA
[    6.800178] ata1.00: cmd c8/00:08:a8:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.800187] ata1.00: status: { DRDY ERR }
[    6.800194] ata1.00: error: { ABRT }
[    6.807270] ata1.00: configured for UDMA/33
[    6.807333] ata1: EH complete
[    6.816800] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.816811] ata1.00: BMDMA stat 0x24
[    6.816819] ata1.00: failed command: READ DMA
[    6.816838] ata1.00: cmd c8/00:08:a8:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.816848] ata1.00: status: { DRDY ERR }
[    6.816854] ata1.00: error: { ABRT }
[    6.823953] ata1.00: configured for UDMA/33
[    6.824006] ata1: EH complete
[    6.834673] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.834685] ata1.00: BMDMA stat 0x24
[    6.834693] ata1.00: failed command: READ DMA
[    6.834710] ata1.00: cmd c8/00:08:70:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.834720] ata1.00: status: { DRDY ERR }
[    6.834726] ata1.00: error: { ABRT }
[    6.841449] ata1.00: configured for UDMA/33
[    6.841504] ata1: EH complete
[    6.850112] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.850122] ata1.00: BMDMA stat 0x24
[    6.850131] ata1.00: failed command: READ DMA
[    6.850149] ata1.00: cmd c8/00:08:b0:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.850159] ata1.00: status: { DRDY ERR }
[    6.850166] ata1.00: error: { ABRT }
[    6.857270] ata1.00: configured for UDMA/33
[    6.857325] ata1: EH complete
[    6.866792] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.866801] ata1.00: BMDMA stat 0x24
[    6.866810] ata1.00: failed command: READ DMA
[    6.866829] ata1.00: cmd c8/00:08:20:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.866838] ata1.00: status: { DRDY ERR }
[    6.866845] ata1.00: error: { ABRT }
[    6.873943] ata1.00: configured for UDMA/33
[    6.873998] ata1: EH complete
[    6.883475] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.883486] ata1.00: BMDMA stat 0x24
[    6.883495] ata1.00: failed command: READ DMA
[    6.883512] ata1.00: cmd c8/00:08:60:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.883521] ata1.00: status: { DRDY ERR }
[    6.883528] ata1.00: error: { ABRT }
[    6.890605] ata1.00: configured for UDMA/33
[    6.892878] ata1: EH complete
[    6.903459] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.903469] ata1.00: BMDMA stat 0x24
[    6.903478] ata1.00: failed command: READ DMA
[    6.903496] ata1.00: cmd c8/00:08:08:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.903506] ata1.00: status: { DRDY ERR }
[    6.903513] ata1.00: error: { ABRT }
[    6.910612] ata1.00: configured for UDMA/33
[    6.910675] ata1: EH complete
[    6.923479] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.923489] ata1.00: BMDMA stat 0x24
[    6.923498] ata1.00: failed command: READ DMA
[    6.923516] ata1.00: cmd c8/00:08:d0:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.923526] ata1.00: status: { DRDY ERR }
[    6.923532] ata1.00: error: { ABRT }
[    6.926080] ata1.00: configured for UDMA/33
[    6.926140] ata1: EH complete
[    6.936806] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.936816] ata1.00: BMDMA stat 0x24
[    6.936825] ata1.00: failed command: READ DMA
[    6.936842] ata1.00: cmd c8/00:08:20:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.936852] ata1.00: status: { DRDY ERR }
[    6.936859] ata1.00: error: { ABRT }
[    6.945210] ata1.00: configured for UDMA/33
[    6.945269] ata1: EH complete
[    6.953493] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.953503] ata1.00: BMDMA stat 0x24
[    6.953512] ata1.00: failed command: READ DMA
[    6.953531] ata1.00: cmd c8/00:08:e0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.953541] ata1.00: status: { DRDY ERR }
[    6.953547] ata1.00: error: { ABRT }
[    6.960623] ata1.00: configured for UDMA/33
[    6.960685] ata1: EH complete
[    6.972021] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.972032] ata1.00: BMDMA stat 0x24
[    6.972041] ata1.00: failed command: READ DMA
[    6.972059] ata1.00: cmd c8/00:08:d0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.972069] ata1.00: status: { DRDY ERR }
[    6.972076] ata1.00: error: { ABRT }
[    6.978496] ata1.00: configured for UDMA/33
[    6.978551] ata1: EH complete
[    6.987805] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    6.987816] ata1.00: BMDMA stat 0x24
[    6.987825] ata1.00: failed command: READ DMA
[    6.987844] ata1.00: cmd c8/00:08:f8:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    6.987855] ata1.00: status: { DRDY ERR }
[    6.987862] ata1.00: error: { ABRT }
[    6.994054] ata1.00: configured for UDMA/33
[    6.994112] ata1: EH complete
[    7.061253] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    7.061262] ata1.00: BMDMA stat 0x24
[    7.061269] ata1.00: failed command: READ DMA
[    7.061281] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    7.061288] ata1.00: status: { DRDY ERR }
[    7.061293] ata1.00: error: { ABRT }
[    7.067754] ata1.00: configured for UDMA/33
[    7.067793] ata1: EH complete
[    7.076833] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    7.076844] ata1.00: BMDMA stat 0x24
[    7.076853] ata1.00: failed command: READ DMA
[    7.076872] ata1.00: cmd c8/00:08:78:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    7.076882] ata1.00: status: { DRDY ERR }
[    7.076888] ata1.00: error: { ABRT }
[    7.084012] ata1.00: configured for UDMA/33
[    7.084060] ata1: EH complete
[    7.096775] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    7.096783] ata1.00: BMDMA stat 0x24
[    7.096790] ata1.00: failed command: READ DMA
[    7.096802] ata1.00: cmd c8/00:07:88:4b:f9/00:00:00:00:00/ed tag 0 dma 3584 in
[    7.096809] ata1.00: status: { DRDY ERR }
[    7.096813] ata1.00: error: { ABRT }
[    7.104009] ata1.00: configured for UDMA/33
[    7.104046] ata1: EH complete
[    7.113495] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    7.113509] ata1.00: BMDMA stat 0x24
[    7.113522] ata1.00: failed command: READ DMA
[    7.113547] ata1.00: cmd c8/00:08:88:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    7.113560] ata1.00: status: { DRDY ERR }
[    7.113570] ata1.00: error: { ABRT }
[    7.120540] ata1.00: configured for UDMA/33
[    7.120578] ata1: EH complete
[    7.130188] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[    7.130203] ata1.00: BMDMA stat 0x24
[    7.130216] ata1.00: failed command: READ DMA
[    7.130242] ata1.00: cmd c8/00:08:80:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
[    7.130255] ata1.00: status: { DRDY ERR }
[    7.130264] ata1.00: error: { ABRT }
[    7.137317] ata1.00: configured for UDMA/33
[    7.137353] ata1: EH complete
[   10.733468] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   10.733482] ata1.00: BMDMA stat 0x24
[   10.733494] ata1.00: failed command: READ DMA
[   10.733513] ata1.00: cmd c8/00:01:87:4b:f9/00:00:00:00:00/ed tag 0 dma 512 in
[   10.733523] ata1.00: status: { DRDY ERR }
[   10.733531] ata1.00: error: { ABRT }
[   10.740782] ata1.00: configured for UDMA/33
[   10.740833] ata1: EH complete

#Produktbeschreibung
[    2.420827] ata1.00: ATA-8: Radeon R7, 1.00, max UDMA/133
[    2.437274] scsi 0:0:0:0: Direct-Access     ATA      Radeon R7        1.00 PQ: 0 ANSI: 5

The Last 3 Lines are for the support if i have to return the drive. (Nobody wants to know his car will explode for sure and ride it until it realy does)

I tried both multiple times, with live and resident OS.
No bad sectors, last time trimmed: 05122014.

To conclude: the Error-messages do not impose a threat on the performance, they are simply annoying :s Googling did reveal similar problems and error messages but never {DRDY ERR} as is, so i consider myself a postworthy case.

I’m using openrc.
P.S. on systemd it also spits out the errors above

I changed the bios to the modded one and alas the error is gone, the speed is remarkably better.

Sadly i’m not able to explain the issue or the solution technically correct.

With kind regards, frig

Last edited by frig (2015-01-13 02:46:07)

Источник

ATA/SATA kernel issues

ATA subsystem causes kernel to lock (no panic) if atacontrol detach <channel> is executed without remembering to umount relevant filesystems beforehand
- Reference: Kernel — Panic occurs when a mounted device is removed
- Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=89102
ATA subsystem acts erratically/incorrectly when a SATA disk is removed from the system without doing atacontrol detach <channel> prior to the removal.
- Reference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html
- Easily reproducable on any hardware sporting a commercial-grade hot-swap SATA backplane.
Intel MatrixRAID: New ar(4) device created when bad disk in RAID-1 array replaced with new disk
- Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=121899
- Patch available in PR.
Intel MatrixRAID: Array goes incorrectly into READY state when rebooting machine in the middle of an array rebuild
- Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=102210
- Patch available in PR 102210, and has been available since 2006.
Intel MatrixRAID: Kernel panics when a disk is lost and reattached
- Open PRs:
  - http://www.freebsd.org/cgi/query-pr.cgi?pr=102211
  - http://www.freebsd.org/cgi/query-pr.cgi?pr=108924
- Patch available in PR 102211, and has been available since 2006.
Numerous problems with embedded LSI v3 MegaRAID
- Reference: http://butcher.heavennet.ru/patches/kernel/ata/LSIv3/
- Open PRs:
  - http://www.freebsd.org/cgi/query-pr.cgi?pr=92786
  - http://www.freebsd.org/cgi/query-pr.cgi?pr=95260
  - http://www.freebsd.org/cgi/query-pr.cgi?pr=101819
- Patches available in PR 92786 and PR 101819. Patches have been available since 2006.
ServerWorks HT1000 chipsets causing SATA data corruption
- Known to affect at least Dell PowerEdge SC1435 systems
- Troubleshooting details: http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045549.html
- Reference: http://lists.freebsd.org/pipermail/freebsd-current/2007-December/081429.html
- Reference: http://lists.freebsd.org/pipermail/freebsd-current/2008-March/084272.html
- Supposedly fixed in January 2008 on RELENG_7 and HEAD.
Adaptec 1420SA support
- Reference: http://lists.freebsd.org/pipermail/freebsd-current/2008-April/084974.html

ATA/SATA other issues

SMART monitoring: Using the -s flag in smartd.conf to run periodic short/long offline tests results in DMA timeouts
- Reference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-October/046208.html
- Workaround: Stop using this feature. I explain why in this post.
- I am in the process of communicating with Bruce Allen (author of smartmontools) to discuss why this feature exists, why it’s advocated in the man page and example smartd.conf, and why one would want to perform these tests on a regular basis.

ATA/SATA DMA timeout issues

Symptom: messages similar to below are seen output from the kernel. Sometimes harmless, sometimes fatal. LBAs listed are scattered, and SMART statistics for the disk in question show no sign of increased error rates or sector issues:

ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54112319
ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=764596887
ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=764596887
ad0: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=764596887
ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=453849407
ad0: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=453849407
ad0: TIMEOUT - FLUSHCACHE retrying (1 retry left)
ad0: TIMEOUT - FLUSHCACHE retrying (0 retries left)

ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=193407827

References:
- http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040744.html
- http://lists.freebsd.org/pipermail/freebsd-stable/2008-March/041427.html
Troubleshooting: http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/039983.html
Workarounds:
- PATA only: Set hw.ata.ata_dma=0 in /boot/loader.conf. This will disable use of ATA DMA. NOTE: This workaround greatly decreases I/O performance. You have been warned…
- Volker Theile of the FreeNAS project informs me that they have solved most of the DMA problems by increasing a hard-coded arbitrary timeout value of 5 (seconds) in the ATA code to 10 or 15, while simultaneously making the timeout value adjustable via sysctl. Volker submit patches to sos@ over a year ago, but never received a response.
  - FreeBSD 6.3 patch: http://freenas.svn.sourceforge.net/viewvc/freenas/branches/0.69/build/kernel-patches/ata/files/patch-ata.diff?view=markup
  - FreeBSD 7.0 patch: http://freenas.svn.sourceforge.net/viewvc/freenas/trunk/build/kernel-patches/ata/files/patch-ata.diff?view=markup
As of 2008/02/27, Scott Long has offered to help track this problem down. Those who are able to reproduce the problem reliably should get in contact with Scott; serial console access will very likely be mandatory.

SATA disk troubleshooting

Understanding what you’re dealing with

A substantial number of FreeBSD users report SATA disk problems. It is difficult to determine the source of these problems, due to the complex nature of hard disks and all related pieces (cabling and power, disk mechanics, disk firmware, protocol/transport, kernel driver, etc.). Even with a thorough understanding of how SATA disks work, there is a decent chance that even the most skilled system administrator won’t be able to determine the root cause. To make matters worse, many system administrators do not have fail-over systems in place, which makes thorough analysis and troubleshooting impossible («This system has to be up and running 24×7, I can’t afford the downtime for others to look at it»). And then there’s the issue of finances: sometimes cash is required to work around issues («Do we know if this Adaptec SATA controller even works? Maybe we should switch to Areca or 3ware or Promise…»), while not everyone has such funds available.

What FreeBSD has to say

ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=XXXXX
ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=YYYYY
ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=YYYYY
ad0: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=YYYYY

This tells you a few things, most of which are low-level:

The disk which experienced the problem was ad0
A time-out occurred when attempting a write operation
FreeBSD attempted to write data to LBA XXXXX via standard DMA (which uses 28-bit LBA addressing), experienced a time-out, and attempted a write retry once
FreeBSD attempted to write data to LBA YYYYY via 48-bit DMA, experienced a time-out, and attempted a write retry twice
FreeBSD deemed the write operation a failure
The ATA status result is value 0x51 (bits 6 (DRDY), 4 (not applicable), and 1 (ERR) set)
The ATA error result is value 0x10 (bit 4 set), which according to ATA-7 specification, Section 6.59.6 is: «IDNF shall be set to one if a user-accessible address could not be found. IDNF shall be set to one if an address outside of the range of user-accessible addresses is requested if command aborted is not returned.» FreeBSD labels this bit as NID_NOT_FOUND

None of these are very helpful though, are they? To a system administrator, this means «there’s something wrong, possibly around 48-bit LBA YYYYY… or maybe 28-bit LBA XXXXX». Most administrators know that if the LBA seeing errors is always the same that the disk itself is likely the cause, but what if the LBA is random?

…more to follow…

What disks have to say

Rather than try to decipher what FreeBSD says, a more logical approach is to examine the disk to see if it logged any sort of error in SMART.

…more to follow…

CategoryHowTo CategoryStale

JeremyChadwick/ATA_issues_and_troubleshooting (last edited 2020-04-26T06:18:24+0000 by MarkLinimon)

Источник

# 6 лет, 9 месяцев назад (отредактировано 6 лет, 9 месяцев назад)
Темы: 10 Сообщения: 36 Участник с: 03 мая 2016	Здравствуйте! Есть свежеустановленная система ArchLinux(поставлены только base, base-devl, grub-bios), при попытке загрузиться в неё вижу сообщение `Failed command: READ DMA ...` и дальше система не грузится(доходит до kernel panic). Сообщение возникает при попытке подмонтировать разделы tmp, var, home, boot, в общем всего кроме root — root монтируется удачно и fsck его проверяет. SmartMonTools ни каких ошибок не показывает, mHDD говорит что hdd просто идеальный. Пробовал файловые системы ext4 и ext3. При использовании ext4 с параметром ядра libata.force=noncq ещё есть вот такая вот ошибка `EXT_4 fs error: _ext4_get_inode_loc unable to read itable block` Грешу на ядро арча, тк если установить RFRemix(Fedora) 23 — всё работает просто идеально. Прошу совета в какую сторону копать. Заранее спасибо! HDD: SAMSUNG Spinpoint M8 ST1000LM024 (HN-M101MBB) Kernel: 4.4.5-1 Notebook: Lenovo P585 smartctl -a /dev/sda1 smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.5-1-ARCH] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Samsung SpinPoint M8 (AF) Device Model: ST1000LM024 HN-M101MBB Serial Number: S2U5J9EC831769 LU WWN Device Id: 5 0004cf 2083a59e1 Firmware Version: 2AR10001 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Tue May 3 07:09:09 2016 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 39) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: (13140) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 219) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 7 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 089 088 025 Pre-fail Always - 3452 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1609 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1335 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 46 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1695 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 236 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 063 040 000 Old_age Always - 37 (Min/Max 18/61) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 4347 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 46 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 8109 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Interrupted (host reset) 70% 1332 - # 2 Short offline Completed without error 00% 1331 - # 3 Vendor (0x50) Completed without error 00% 1 - SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Interrupted [70% left] (0-65535) 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Interrupted (host reset) — при повторном сканировании выключили ноут.

#
6 лет, 9 месяцев назад

(отредактировано

6 лет, 9 месяцев назад)

Темы:

Сообщения:

Участник с: 03 мая 2016

Здравствуйте!
Есть свежеустановленная система ArchLinux(поставлены только base, base-devl, grub-bios), при попытке загрузиться в неё вижу сообщение


Failed command: READ DMA
...

и дальше система не грузится(доходит до kernel panic).


EXT_4 fs error: _ext4_get_inode_loc unable to read itable block

Грешу на ядро арча, тк если установить RFRemix(Fedora) 23 — всё работает просто идеально.

Прошу совета в какую сторону копать. Заранее спасибо!
HDD: SAMSUNG Spinpoint M8 ST1000LM024 (HN-M101MBB)
Kernel: 4.4.5-1
Notebook: Lenovo P585
smartctl -a /dev/sda1


smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.5-1-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Samsung SpinPoint M8 (AF)
Device Model:     ST1000LM024 HN-M101MBB
Serial Number:    S2U5J9EC831769
LU WWN Device Id: 5 0004cf 2083a59e1
Firmware Version: 2AR10001
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue May  3 07:09:09 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (  39)	The self-test routine was interrupted
					by the host with a hard or soft reset.
Total time to complete Offline
data collection: 		(13140) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 219) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       7
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   089   088   025    Pre-fail  Always       -       3452
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1609
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1335
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       46
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1695
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       236
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   063   040   000    Old_age   Always       -       37 (Min/Max 18/61)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       4347
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       46
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       8109

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      70%      1332         -
# 2  Short offline       Completed without error       00%      1331         -
# 3  Vendor (0x50)       Completed without error       00%         1         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Interrupted [70% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Interrupted (host reset) — при повторном сканировании выключили ноут.

kurych	# 6 лет, 9 месяцев назад
Темы: 0 Сообщения: 1394 Участник с: 06 ноября 2011	Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы. Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)

kurych

#
6 лет, 9 месяцев назад

Темы:

Сообщения:

1394

Участник с: 06 ноября 2011

Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)

tekma	# 6 лет, 9 месяцев назад (отредактировано 6 лет, 9 месяцев назад)
Темы: 10 Сообщения: 36 Участник с: 03 мая 2016	kurych Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы. Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО) `fsck -t -y -f -c` говорит (0/0/0 errors) По поводу тестов — это был повторный который оборвали. Если проблема явно железная(я тоже сначала к этому склонялся), то почему Ф23 нормально стартует и ни на что не ругается?

tekma

#
6 лет, 9 месяцев назад

(отредактировано

6 лет, 9 месяцев назад)

Темы:

Сообщения:

Участник с: 03 мая 2016

kurych
Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)

fsck -t -y -f -c

говорит (0/0/0 errors)

tekma	# 6 лет, 9 месяцев назад (отредактировано 6 лет, 9 месяцев назад)
Темы: 10 Сообщения: 36 Участник с: 03 мая 2016	`modprobe.blacklist=sp5100_tco` и проблема решилась) P.S. косяк ядра 4.5 и AMD A* CPU

tekma

#
6 лет, 9 месяцев назад

(отредактировано

6 лет, 9 месяцев назад)

Темы:

Сообщения:

Участник с: 03 мая 2016

modprobe.blacklist=sp5100_tco

и проблема решилась)
P.S. косяк ядра 4.5 и AMD A* CPU

Источник

Решено: Ошибки при загрузке — что-то про DMA

Re: Решено: Ошибки при загрузке — что-то про DMA

Re: Решено: Ошибки при загрузке — что-то про DMA

Arch Linux

#1 2014-12-07 07:31:05

[SOLVED] libata: status failed command: READ DMA

unixforum.org

Решено: Ошибки при загрузке — что-то про DMA (помогите расследовать)

Решено: Ошибки при загрузке — что-то про DMA

Re: Решено: Ошибки при загрузке — что-то про DMA

[HDD] failed command: READ DMA EXT

[РЕШЕНО] Failed command: READ DMA