Ata read dma error

Модератор: Bizdelnick

Модератор: Bizdelnick

dergachev

Сообщения: 847
ОС: archlinux

Решено: Ошибки при загрузке — что-то про DMA

Собственно, не знаю, какое отношение это имеет к юниксам (скорее всего дело в железе), но поскольку в венде я бы об этом так никогда и не узнал бы, то пишу сюда.
Имеются вот такие ошибки.

Код: Выделить всё

ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/100
ata2: EH complete
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/100
ata2: EH complete
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/100
ata2: EH complete
ata2.01: limiting speed to UDMA/66:PIO4
ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
ata2.01: BMDMA stat 0x64
ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
         res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/66
ata2: EH complete

При этом недавно накрылась windows 7: сначала стала просто грузиться по десять минут, потом при загрузке давала какой-то input-output error; а вот в разных линуксах при загрузке стали сыпаться примерно такие ошибки, хотя потом всё идеально работает. Причем при подключении одного конкретного диска они тоже порой сыпятся где-то по десять минут прежде чем что-либо начнет работать, а если его не подключать (то есть не монтировать) — только то, что показано выше.

Что делать?

Код: Выделить всё

# fdisk -l

Диск /dev/sda: 750.2 ГБ, 750156374016 байт
255 heads, 63 sectors/track, 91201 cylinders
Units = цилиндры of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xbfb2917c

Устр-во Загр     Начало       Конец       Блоки   Id  Система
/dev/sda1               1        1101     8843751   83  Linux
/dev/sda2   *        1102       91201   723728249    7  HPFS/NTFS

Диск /dev/sdc: 300.1 ГБ, 300069052416 байт
255 heads, 63 sectors/track, 36481 cylinders
Units = цилиндры of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xf515ed38

Устр-во Загр     Начало       Конец       Блоки   Id  Система
/dev/sdc1               1        6785    54500481   83  Linux
/dev/sdc2            6786       36481   238533120   83  Linux

Диск /dev/sdb: 320.1 ГБ, 320072933376 байт
255 heads, 63 sectors/track, 38913 cylinders
Units = цилиндры of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x2f479f33

Устр-во Загр     Начало       Конец       Блоки   Id  Система
/dev/sdb1               1        8317    66806271   83  Linux
/dev/sdb3            8318       38914   245757952    7  HPFS/NTFS
Раздел 3 не заканчивается на границе цилиндра.

Аватара пользователя

xoomer

Сообщения: 201

Re: Решено: Ошибки при загрузке — что-то про DMA

Сообщение

xoomer » 05.05.2010 00:14

dergachev, а если попробовать вbIключить UDMA и включить режим PIO на всех дисках?

dergachev писал(а): ↑

04.05.2010 21:18

ata2.01: failed command: READ DMA EXT
ata2.01: cmd 25/00:08:01:ad:ee/00:00:22:00:00/f0 tag 0 dma 4096 in
res 51/84:00:08:ad:ee/84:00:22:00:00/f0 Emask 0x10 (ATA bus error)
ata2.01: status: { DRDY ERR }
ata2.01: error: { ICRC ABRT }

знать бbI, что значит ata2.01 — ето как я понимаю накопитель на 4-м коннекторе SATA ?? (возможно, ошибаюсь)

Что я бbI сделал:
— насчет PIO я уже писал
— попробовал бbI отключить вторую ветку SATA-контроллера
— и подумал бbI о сохранении даннbIх на жестких дисках и о сохранении самих жестких дисков. Желательно бbI старbIй диск для разбора етого всего дела, т.к. я понимаю проблема не в HDD, а в чем-то другом…

Far behind the skies…

dergachev

Сообщения: 847
ОС: archlinux

Re: Решено: Ошибки при загрузке — что-то про DMA

Сообщение

dergachev » 15.06.2010 19:24

Поскольку я недавно тут очень удивился тому, как высоко unixforum.org в поисковиках сидит, решил-таки отписаться о решении.
Таки да, плохой IDE-проводочек был, поменял — и наступило счастье.
Правда, за это время уже наступил локальный вендекапец, ну так ему и надо :rolleyes:

Содержание

  1. Arch Linux
  2. #1 2014-12-07 07:31:05
  3. [SOLVED] libata: status failed command: READ DMA
  4. unixforum.org
  5. Решено: Ошибки при загрузке — что-то про DMA (помогите расследовать)
  6. Решено: Ошибки при загрузке — что-то про DMA
  7. Re: Решено: Ошибки при загрузке — что-то про DMA
  8. [HDD] failed command: READ DMA EXT
  9. [РЕШЕНО] Failed command: READ DMA
  10. Ata read dma error
  11. ATA/SATA other issues
  12. ATA/SATA DMA timeout issues
  13. SATA disk troubleshooting
  14. Understanding what you’re dealing with
  15. What FreeBSD has to say
  16. What disks have to say

Arch Linux

You are not logged in.

#1 2014-12-07 07:31:05

[SOLVED] libata: status failed command: READ DMA

acquired a new amd R7 SSD to boost up my asus 1000H experience (it quiet actually does the double of the rotating seagate integrated) and are watching those errormessages during boot.

The Last 3 Lines are for the support if i have to return the drive. (Nobody wants to know his car will explode for sure and ride it until it realy does)

No occasional Errors.
The drive doesn’t spit out errors if plugged through multiple 2,5» to usb adaptors (diversing brands also), it isn’t affected by the actual kernel since i started with 3.16 and upgraded lately to 3.17 and the error-message persists in similar matters.
Using a different PC (wortmann/clevo 1547p) results in no error-messages (same procedure 3.16 and .17 kernel, internal SATA and external through the adaptors).
If i use the delivered HDD, the various other 2,5» HDD’s lying around my table or whatever doesn’t result in error-messages either, thus i think the sata-port isn’t damaged on the pc-side. Blowing dust from the connectors and ensuring the drive was in place [check!]

I tried both multiple times, with live and resident OS.
No bad sectors, last time trimmed: 05122014.

To conclude: the Error-messages do not impose a threat on the performance, they are simply annoying :s Googling did reveal similar problems and error messages but never as is, so i consider myself a postworthy case.

I’m using openrc.
P.S. on systemd it also spits out the errors above

Solution:
I read in the libata-sourcecode and greped the error messages in the ata part of the kernel sources and got some known issues with OCZ-SSD’s in combination with DMA.
Disabling of DMA solved the problem and rendered the drive access ultra slow, so i searched anew, another read in the sources brought me to the assumption that something with AHCI could be wrong, i googled and revealed
https://www.bios-mods.com/forum/Thread- … eePC-1000H
which proposes a bios-mod which enables AHCI-capabilities already present on the ASUS1000H.
/*not encouraging to upgrade your bios, consider your case!*/

I changed the bios to the modded one and alas the error is gone, the speed is remarkably better.

Sadly i’m not able to explain the issue or the solution technically correct.

With kind regards, frig

Last edited by frig (2015-01-13 02:46:07)

Источник

unixforum.org

Форум для пользователей UNIX-подобных систем

  • Темы без ответов
  • Активные темы
  • Поиск
  • Статус форума

Решено: Ошибки при загрузке — что-то про DMA (помогите расследовать)

Модератор: Bizdelnick

Решено: Ошибки при загрузке — что-то про DMA

Сообщение dergachev » 04.05.2010 21:18

При этом недавно накрылась windows 7: сначала стала просто грузиться по десять минут, потом при загрузке давала какой-то input-output error; а вот в разных линуксах при загрузке стали сыпаться примерно такие ошибки, хотя потом всё идеально работает. Причем при подключении одного конкретного диска они тоже порой сыпятся где-то по десять минут прежде чем что-либо начнет работать, а если его не подключать (то есть не монтировать) — только то, что показано выше.

Re: Решено: Ошибки при загрузке — что-то про DMA

Сообщение xoomer » 05.05.2010 00:14

dergachev, а если попробовать вbIключить UDMA и включить режим PIO на всех дисках?

знать бbI, что значит ata2.01 — ето как я понимаю накопитель на 4-м коннекторе SATA ?? (возможно, ошибаюсь)

Что я бbI сделал:
— насчет PIO я уже писал
— попробовал бbI отключить вторую ветку SATA-контроллера
— и подумал бbI о сохранении даннbIх на жестких дисках и о сохранении самих жестких дисков. Желательно бbI старbIй диск для разбора етого всего дела, т.к. я понимаю проблема не в HDD, а в чем-то другом.

Источник

[HDD] failed command: READ DMA EXT

Доброого времени суток.

Заметил в логах следующее:

Обратил внимание на это когда обнаружил редкие фризы чтения с этого диска.

На smartctl -H /dev/sda ругани нет, но на тесты результат такой:

Готовиться к худшему ?

Сейчас ищу бэды.

PS НЖМД — WD green 1TB, фс — reiserfs ошибок не находит

1. Попробуй поменяй кабель.

2. Попробуй переключить режимы IDE/AHCI/RAID

Для начала просто вынуть и вставить кабель обратно — иногда помогает.

Тоже хотел написать о кабеле, но похоже это таки не кабель, а винт. Поясню почему, у меня как раз на днях отвалился кабель, так ошибок было много, но других.

Вот — host bus error

В тоже самое время у меня grep media /var/log/messages* не выдает ничего, а в данном случае именно media error.

Поэтому таки да, готовится к худшему.

> Тоже хотел написать о кабеле, но похоже это таки не кабель, а винт

Почему похоже — так и есть, винт сообщает о media error

Почему похоже — так и есть, винт сообщает о media error

Ну я это и имею ввиду, а «похоже» — это на случайный 0.001% вероятности «всякое бывает, мало ли».

Только вчера подобное у себя исправлял (успешно). MHDD восстановил три софт-бэда.

Вот есть подозрение, что это софт-бэды, вообще, разбираться стал после того как грохнулась корневая фс из-за плохого контакта на разъёме питания.

badblocks /dev/sda3 (именно раздел, а не всё устройство) находил бэды. На расстройствах снёс всю фс (тем более, что там всё старое было и пришло время переделать кое-что), бэды исчезли.

В данном случае всё несколько сложнее ибо на разделе данных гигов на 800, действительно нужного из этого не так много, но всё же не хотелось бы терять, а забэкапить пока некуда. Попробую с mhdd поколдовать.

+ спокойствие SMART’а наводит на мысли, как что определится — отпишусь.

MHDD с ремапом — долго конечно, но что делать.
badblocks достаточно тупая вещь т.к. не показывает характер повреждений. В моём случае было несколько жестких перезагрузок как раз из-за проблем с разъемом питания.

У меня ошибки UNC (Unrecoverable?) бывают на винчестерах с бэдами. Большинство лечится MHDD или Викторией с Erase Delays

MHDD проблем не нашёл, дело в фс. поиграюсь намедни.

Значит, проблема в интерфейсе, но никак не в ФС. Они на разных уровнях

>>MHDD проблем не нашёл, дело в фс. поиграюсь намедни.

Значит, проблема в интерфейсе, но никак не в ФС. Они на разных уровнях

Не факт. Диск мог ремапнуть проблемный сектор. Media error не может быть из-за интерфейса или ФС (если это, конечно, не баг в драйвере).

Вроде ничего не ремапнуто и видно, что свежие тесты проходят без выявления ошибок, смущает только Offline_Uncorrectable = 4, то есть, если не ошибаюсь, есть (или когда-то были) 4 сектора, доступ к которым замедлен. Попробую провести long-тест, но сейчас никаких проблем нет.

Кстати, ошибки начали проявляться после проблем с питанием и только когда пытался качать с торрентов то, что качалось в момент сбоя питания. И потом в закачке были обнаружены ошибки. Сейчас проблем вроде нет.

В общем, железячных проблем с винчестером не нашёл, даже значение Offline_Uncorrectable обнулилось. Однако ж недавно (по мере заполнения винчестера )вот что в логах обнаружил:

Я так понимаю, что когда нжмд испытывал проблемы с питанием (а в это время на него производилась запись), он мусору набросал в незаполненное пространство, а по мере заполнения диска вылезают подобные ошибки, надо было с самого начала сделать

—scan-whole-partition, -S This option causes —rebuild-tree to scan the whole partition but not only the used space on the partition

Источник

[РЕШЕНО] Failed command: READ DMA

# 6 лет, 8 месяцев назад (отредактировано 6 лет, 8 месяцев назад) Здравствуйте!
Есть свежеустановленная система ArchLinux(поставлены только base, base-devl, grub-bios), при попытке загрузиться в неё вижу сообщение
и дальше система не грузится(доходит до kernel panic).

Сообщение возникает при попытке подмонтировать разделы tmp, var, home, boot, в общем всего кроме root — root монтируется удачно и fsck его проверяет. SmartMonTools ни каких ошибок не показывает, mHDD говорит что hdd просто идеальный. Пробовал файловые системы ext4 и ext3.
При использовании ext4 с параметром ядра libata.force=noncq ещё есть вот такая вот ошибка
Грешу на ядро арча, тк если установить RFRemix(Fedora) 23 — всё работает просто идеально.

Прошу совета в какую сторону копать. Заранее спасибо!
HDD: SAMSUNG Spinpoint M8 ST1000LM024 (HN-M101MBB)
Kernel: 4.4.5-1
Notebook: Lenovo P585
smartctl -a /dev/sda1
Interrupted (host reset) — при повторном сканировании выключили ноут.

Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)
# 6 лет, 8 месяцев назад (отредактировано 6 лет, 8 месяцев назад)

kurych
Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)

говорит (0/0/0 errors)

По поводу тестов — это был повторный который оборвали.
Если проблема явно железная(я тоже сначала к этому склонялся), то почему Ф23 нормально стартует и ни на что не ругается?

# 6 лет, 8 месяцев назад (отредактировано 6 лет, 8 месяцев назад) и проблема решилась)
P.S. косяк ядра 4.5 и AMD A* CPU

© 2006-2023, Русскоязычное сообщество Arch Linux.
Название и логотип Arch Linux ™ являются признанными торговыми марками.
Linux ® — зарегистрированная торговая марка Linus Torvalds и LMI.

Источник

Ata read dma error

ATA subsystem causes kernel to lock (no panic) if atacontrol detach is executed without remembering to umount relevant filesystems beforehand

ATA subsystem acts erratically/incorrectly when a SATA disk is removed from the system without doing atacontrol detach prior to the removal.

Reference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html

  • Easily reproducable on any hardware sporting a commercial-grade hot-swap SATA backplane.
  • Intel MatrixRAID: New ar(4) device created when bad disk in RAID-1 array replaced with new disk

    Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=121899

  • Patch available in PR.
  • Intel MatrixRAID: Array goes incorrectly into READY state when rebooting machine in the middle of an array rebuild

      Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=102210

    • Patch available in PR 102210, and has been available since 2006.
  • Intel MatrixRAID: Kernel panics when a disk is lost and reattached
    • Open PRs:

        http://www.freebsd.org/cgi/query-pr.cgi?pr=108924

    • Patch available in PR 102211, and has been available since 2006.
  • Numerous problems with embedded LSI v3 MegaRAID

      http://www.freebsd.org/cgi/query-pr.cgi?pr=101819

  • Patches available in PR 92786 and PR 101819. Patches have been available since 2006.
  • ServerWorks HT1000 chipsets causing SATA data corruption

    Known to affect at least Dell PowerEdge SC1435 systems

    ATA/SATA other issues

    SMART monitoring: Using the -s flag in smartd.conf to run periodic short/long offline tests results in DMA timeouts

    Workaround: Stop using this feature. I explain why in this post.

    I am in the process of communicating with Bruce Allen (author of smartmontools) to discuss why this feature exists, why it’s advocated in the man page and example smartd.conf, and why one would want to perform these tests on a regular basis.

    ATA/SATA DMA timeout issues

    • Symptom: messages similar to below are seen output from the kernel. Sometimes harmless, sometimes fatal. LBAs listed are scattered, and SMART statistics for the disk in question show no sign of increased error rates or sector issues:
    • References:

      PATA only: Set hw.ata.ata_dma=0 in /boot/loader.conf. This will disable use of ATA DMA. NOTE: This workaround greatly decreases I/O performance. You have been warned.

      Volker Theile of the FreeNAS project informs me that they have solved most of the DMA problems by increasing a hard-coded arbitrary timeout value of 5 (seconds) in the ATA code to 10 or 15, while simultaneously making the timeout value adjustable via sysctl. Volker submit patches to sos@ over a year ago, but never received a response.

      FreeBSD 7.0 patch: http://freenas.svn.sourceforge.net/viewvc/freenas/trunk/build/kernel-patches/ata/files/patch-ata.diff?view=markup

  • As of 2008/02/27, Scott Long has offered to help track this problem down. Those who are able to reproduce the problem reliably should get in contact with Scott; serial console access will very likely be mandatory.
  • SATA disk troubleshooting

    Understanding what you’re dealing with

    A substantial number of FreeBSD users report SATA disk problems. It is difficult to determine the source of these problems, due to the complex nature of hard disks and all related pieces (cabling and power, disk mechanics, disk firmware, protocol/transport, kernel driver, etc.). Even with a thorough understanding of how SATA disks work, there is a decent chance that even the most skilled system administrator won’t be able to determine the root cause. To make matters worse, many system administrators do not have fail-over systems in place, which makes thorough analysis and troubleshooting impossible («This system has to be up and running 24×7, I can’t afford the downtime for others to look at it»). And then there’s the issue of finances: sometimes cash is required to work around issues («Do we know if this Adaptec SATA controller even works? Maybe we should switch to Areca or 3ware or Promise. «), while not everyone has such funds available.

    But back to the disks themselves. Comparatively, with regards to bad block management, SCSI disks behave quite differently than SATA. SCSI will report any disk errors with sense code, ASC, and ASCQ, and may even automatically mark that block as a «grown defect» (a user-manageable list of bad blocks) — while SATA disks will silently attempt to remap bad blocks, keeping track of such defects internally, and will not report to the transport layer (e.g. operating system) that anything had happened. For example, assuming the block was remapped successfully, even SMART statistics are usually left untouched; while in the case of a remapping failure, SMART attribute 198 (Offline_Uncorrectable) may get incremented.

    In the case of SATA, such a scenario can take time, and depends greatly upon the type of error. Some errors (such as soft errors) may take under a second to recover from, while others (hard errors) may take longer periods — and some may cause the disk to lock up entirely, requiring the disk power-cycled and the SATA channel reattached. FreeBSD expects that all ATA commands (that includes SATA!) sent to a device receive a response within 5 seconds. The timeout is hard-coded, and is entirely arbitrary; it has no implied meaning. It was chosen by sos@freebsd.org probably based on personal choice.

    What FreeBSD has to say

    So what happens when a disk operation is executed, but takes longer than 5 seconds to return a sense code? Well, FreeBSD spits out quite a lot of crap to the kernel console (see dmesg or /var/log/console.log), such as:

    This tells you a few things, most of which are low-level:

    The disk which experienced the problem was ad0

  • A time-out occurred when attempting a write operation
  • FreeBSD attempted to write data to LBA XXXXX via standard DMA (which uses 28-bit LBA addressing), experienced a time-out, and attempted a write retry once
  • FreeBSD attempted to write data to LBA YYYYY via 48-bit DMA, experienced a time-out, and attempted a write retry twice
  • FreeBSD deemed the write operation a failure

    The ATA status result is value 0x51 (bits 6 ( DRDY), 4 (not applicable), and 1 ( ERR) set)

    The ATA error result is value 0x10 (bit 4 set), which according to ATA-7 specification, Section 6.59.6 is: » IDNF shall be set to one if a user-accessible address could not be found. IDNF shall be set to one if an address outside of the range of user-accessible addresses is requested if command aborted is not returned.» FreeBSD labels this bit as NID_NOT_FOUND

    The IDNF bit seems to indicate that a particular LBA on the disk was inaccessible; I interpret this to mean «the LBA you’re trying to access is within an invalid LBA range» (which would strongly indicate a bug in FreeBSD), but there’s a good chance I’m reading the description wrong. This needs some further research/clarification, particularly by those more familiar with the ATA protocol semantics than I am.

    None of these are very helpful though, are they? To a system administrator, this means «there’s something wrong, possibly around 48-bit LBA YYYYY. or maybe 28-bit LBA XXXXX». Most administrators know that if the LBA seeing errors is always the same that the disk itself is likely the cause, but what if the LBA is random?

    What disks have to say

    Rather than try to decipher what FreeBSD says, a more logical approach is to examine the disk to see if it logged any sort of error in SMART.

    I need to make something clear: SMART is not a guaranteed way to determine the current state of a disk, or past events on a disk. SMART is entirely dependent upon the level of pedantry of the disk firmware programmer him/herself. Some SMART implementations don’t even bother to log real errors; others increment counters only when offline SMART tests are run. The trick is knowing how to interpret SMART stats for each disk vendor (Western Digital, Seagate, Fujitsu, etc.). Sometimes it gets even more granular than that (different models of disks behaving differently when it comes to SMART).

    JeremyChadwick/ATA_issues_and_troubleshooting (last edited 2020-04-26T06:18:24+0000 by MarkLinimon )

    Источник


  • 1

    0

    Доброого времени суток.

    Заметил в логах следующее:

    debian kernel: [ 1358.084798] ata3.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6
    debian kernel: [ 1358.084806] ata3.00: BMDMA stat 0x25
    debian kernel: [ 1358.084812] ata3.00: failed command: READ DMA EXT
    debian kernel: [ 1358.084821] ata3.00: cmd 25/00:00:4d:aa:d8/00:01:4a:00:00/e0 tag 0 dma 131072 in
    debian kernel: [ 1358.084823]          res 51/40:ef:57:aa:d8/40:00:4a:00:00/e0 Emask 0x9 (media error)
    debian kernel: [ 1358.084828] ata3.00: status: { DRDY ERR }
    debian kernel: [ 1358.084831] ata3.00: error: { UNC }
    debian kernel: [ 1358.084847] ata3: hard resetting link
    debian kernel: [ 1358.404059] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 3F0)
    debian kernel: [ 1358.421075] ata3.00: configured for UDMA/133
    debian kernel: [ 1358.421098] ata3: EH complete
    

    Обратил внимание на это когда обнаружил редкие фризы чтения с этого диска.

    На smartctl -H /dev/sda ругани нет, но на тесты результат такой:

    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed: read failure       90%      2142         858299943
    # 2  Short offline       Completed: read failure       90%      2142         858299943
    

    Готовиться к худшему ?

    Сейчас ищу бэды.

    PS НЖМД — WD green 1TB, фс — reiserfs ошибок не находит

    The first error you reported:

    ata1:00: status: { DRDY ERR }
    ata1.00: error {UNC }
    ata1:00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    ata1:00: BMDMA stat 0x25
    ata1:00: failed command: READ DMA
    

    says that a READ DMA ATA command to a disk on ATA port 1 failed (status includes ERR for error). That port is most likely the hard disk, and the error points toward the drive having problems. The DMA part can likely be ignored; DMA is Direct Memory Access which is the dominant transfer mode these days, and if you were having RAM or RAM bus problems to the degree that you were hitting something like that repeatedly, you’d likely be seeing a ton more errors if the system was able to function at all.

    The second error:

    end_request: critical target error, dev sda, sector 32839936
    EXT4_fs error: (device sda5): ext4_find_entry:935: inode #393217: comm init: reading directory lblock 0
    INIT: No inittab file found
    

    says there is some problem on /dev/sda, sector 32839936, which with 512-byte sectors puts us physically toward the end of the /dev/sda5 partition, which adds up with device sda5 as reported by the file system driver. The error reported by init together with the file system driver’s error details points toward a problem with the file system causing /etc/inittab to be unavailable or (less likely) unreadable. This would mean that either the root directory, the /etc directory, or the /etc/inittab file entry are somehow involved in the corruption. Given the inode number, I’d take a shot at /etc/inittab specifically being the culprit, until proven wrong.

    You write (my emphasis):

    Suspecting a HDD crash, I took it out and used in another PC as an external USB HDD drive and I was able to mount & see all partitions and files within. So I assume Disc is OK.

    I would say that your assumption is unfounded. The disk is obviously having some problem; with any luck, it’ll be easy to fix.

    The first thing I would do in your situation is to refresh my backup of everything that is on that disk. Make sure that you do not overwrite or delete anything from your most recent backup, as there is certainly a possibility that you will need it. Perhaps the best option is to make a fresh backup onto a new (or at least not previously used for your own backups) drive of everything that you are able to access. Expect some I/O errors on the source while making that copy.

    Second comes attempting recovery. With any luck, given the errors, this is a single-sector or few-sectors problem which has caused a small amount of file system corruption, in which case e2fsck should be able to repair most of the damage. Some of your files are likely gone, but with some luck, you might be able to find them in /lost+found under the file system’s mount root (meaning for example /data/lost+found if you mount /dev/sda5 on /data) after having e2fsck do what it can. Otherwise, do a comparison against your most recent backup from before the problems started, and restore relevant files from the backup. (Did I mention backups are useful if bad things ever happen, as they inevitably do?)

    Third comes the question of whether you can trust the drive for future use. A few bad sectors doesn’t have to be catastrophic from the drive’s point of view, but rotational drives about 100 GB in size practically cannot be sourced new today in most form factors, which points to this being a relatively old drive. Personally, I’d probably just accept that the drive has outlived its useful life at this point and get a replacement, but then again I am rather paranoid when it comes to my data; your mileage may vary. You will have to weigh the cost of a replacement drive against the risk of total failure of the drive and subsequent total loss of all the data on the drive.

    Ещё раз доброго времени суток!
    И так имеется:
    FreeBSD 7.0-relaase i386
    CPU: Intel(R) Celeron(R) CPU 2.93GHz (2929.51-MHz 686-class CPU)
    real memory = 2146697216 (2047 MB)
    avail memory = 2095570944 (1998 MB)
    Было установлено 4 диска: ad0, ad1, ad4, ad6

    Код: Выделить всё

    ad0: 76319MB <Seagate ST380011A 8.01> at ata0-master UDMA100
    ad1: 305245MB <WDC WD3200AAJB-00WGA0 00.02C01> at ata0-slave UDMA100
    ad4: 715404MB <Seagate ST3750640AS 3.AAE> at ata2-master SATA150
    ad6: 152627MB <WDC WD1600AAJS-22WAA0 58.01D58> at ata3-master SATA150

    Машина в принципе боевая честно служила практически без простоя 6 месяцев, естественно прерываясь только на профилакты.
    И проблем с ней небыло. Однако неделю назад стали появляться ошибки

    Код: Выделить всё

    ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=619745471
    ad1s1d[READ(offset=314088423424, length=16384)]error = 5
    ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=620121823
    g_vfs_done():ad1s1d[READ(offset=314281115648, length=16384)]error = 5
    ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=620121823
    g_vfs_done():ad1s1d[READ(offset=314281115648, length=16384)]error = 5
    ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=622003583
    g_vfs_done():ad1s1d[READ(offset=315244576768, length=16384)]error = 5
    ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=624638047
    g_vfs_done():ad1s1d[READ(offset=316593422336, length=16384)]error = 5
    ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=619745471
    g_vfs_done():ad1s1d[READ(offset=314088423424, length=16384)]error = 5

    естественно первым делом стал грешить на диск, ввиду того что машина в работе срочным порядком отключил диск. Думал проблема решена, но ни тут то было:

    Код: Выделить всё

    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5

    Как видно полезли ошибки уже на диски ad0 который до этого ни разу в этом замечен не был.
    Сразу закралась мысль, что дело врядли в дисках. Стал грешить на шлейф. Заменил, в итоге получил:

    Код: Выделить всё

    ad0: 76319MB <Seagate ST380011A 8.01> at ata0-master UDMA33
    ad4: 715404MB <Seagate ST3750640AS 3.AAE> at ata2-master SATA150
    ad6: 152627MB <WDC WD1600AAJS-22WAA0 58.01D58> at ata3-master SATA150
    ...
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5
    ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=39788895
    g_vfs_done():ad0s1e[READ(offset=14466301952, length=16384)]error = 5

    После всех этих выкрутасов проверил оба диска, они в норме :pardon:
    Подскажите пожалуйста, может кто сталкивался с такой ситуацией?

    Сейчас вынужден катать сервер игнорируя эти ошибки, но это крайне не верный подход, поэтому стал готовить на замену новую машину, но хотелось бы разобраться, что же хочет от меня эта :smile:
    Заранее огромное спасибо!

    Не знаю кто писал мою жизнь, но чувствую я себя бета-тестером…

    Hello there,

    acquired a new amd R7 SSD to boost up my asus 1000H experience (it quiet actually does the double of the rotating seagate integrated) and are watching those errormessages during boot.

    [    2.263498] ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
    [    2.420827] ata1.00: ATA-8: Radeon R7, 1.00, max UDMA/133
    [    2.423824] ata1.00: 234441648 sectors, multi 1: LBA48 NCQ (depth 0/32)
    [    2.433962] ata1.00: configured for UDMA/133
    [    2.480106] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    2.483457] ata1.00: BMDMA stat 0x24
    [    2.486730] ata1.00: failed command: READ DMA
    [    2.489985] ata1.00: cmd c8/00:08:a8:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.496697] ata1.00: status: { DRDY ERR }
    [    2.500063] ata1.00: error: { ABRT }
    [    2.510710] ata1.00: configured for UDMA/133
    [    2.514043] ata1: EH complete
    [    2.526934] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    2.530401] ata1.00: BMDMA stat 0x24
    [    2.533724] ata1.00: failed command: READ DMA
    [    2.537034] ata1.00: cmd c8/00:08:88:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.543709] ata1.00: status: { DRDY ERR }
    [    2.546931] ata1.00: error: { ABRT }
    [    2.557328] ata1.00: configured for UDMA/133
    [    2.560569] ata1: EH complete
    [    2.573579] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    2.576977] ata1.00: BMDMA stat 0x24
    [    2.580177] ata1.00: failed command: READ DMA
    [    2.583434] ata1.00: cmd c8/00:08:90:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.590086] ata1.00: status: { DRDY ERR }
    [    2.593391] ata1.00: error: { ABRT }
    [    2.603991] ata1.00: configured for UDMA/133
    [    2.607257] ata1: EH complete
    [    2.620245] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    2.623682] ata1.00: BMDMA stat 0x24
    [    2.627010] ata1.00: failed command: READ DMA
    [    2.630268] ata1.00: cmd c8/00:08:98:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.636873] ata1.00: status: { DRDY ERR }
    [    2.640123] ata1.00: error: { ABRT }
    [    2.650659] ata1.00: configured for UDMA/133
    [    2.653825] ata1: EH complete
    [    2.666904] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    2.670101] ata1.00: BMDMA stat 0x24
    [    2.673076] ata1.00: failed command: READ DMA
    [    2.676084] ata1.00: cmd c8/00:08:a0:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.682256] ata1.00: status: { DRDY ERR }
    [    2.685253] ata1.00: error: { ABRT }
    [    2.693994] ata1.00: configured for UDMA/133
    [    2.697046] ata1: EH complete
    [    2.726959] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    2.730130] ata1.00: BMDMA stat 0x24
    [    2.736257] ata1.00: failed command: READ DMA
    [    2.739324] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.745689] ata1.00: status: { DRDY ERR }
    [    2.748865] ata1.00: error: { ABRT }
    [    2.757231] ata1.00: configured for UDMA/133
    [    2.760390] ata1: EH complete
    [    2.773584] ata1.00: limiting speed to UDMA/100:PIO4
    [    2.776803] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
    [    2.779847] ata1.00: BMDMA stat 0x24
    [    2.782890] ata1.00: failed command: READ DMA
    [    2.785980] ata1.00: cmd c8/00:08:a0:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.792354] ata1.00: status: { DRDY ERR }
    [    2.795499] ata1.00: error: { ABRT }
    [    2.798626] ata1: soft resetting link
    [    2.960611] ata1.00: configured for UDMA/100
    [    2.963739] ata1: EH complete
    [    2.976877] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    2.980091] ata1.00: BMDMA stat 0x24
    [    2.983144] ata1.00: failed command: READ DMA
    [    2.986226] ata1.00: cmd c8/00:08:a8:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    2.992504] ata1.00: status: { DRDY ERR }
    [    2.995652] ata1.00: error: { ABRT }
    [    3.003925] ata1.00: configured for UDMA/100
    [    3.007042] ata1: EH complete
    [    3.020170] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.023399] ata1.00: BMDMA stat 0x24
    [    3.026462] ata1.00: failed command: READ DMA
    [    3.029549] ata1.00: cmd c8/00:08:a8:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.035927] ata1.00: status: { DRDY ERR }
    [    3.039085] ata1.00: error: { ABRT }
    [    3.047327] ata1.00: configured for UDMA/100
    [    3.051577] ata1: EH complete
    [    3.063562] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.066735] ata1.00: BMDMA stat 0x24
    [    3.069694] ata1.00: failed command: READ DMA
    [    3.072671] ata1.00: cmd c8/00:08:70:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.078680] ata1.00: status: { DRDY ERR }
    [    3.082313] ata1.00: error: { ABRT }
    [    3.090631] ata1.00: configured for UDMA/100
    [    3.093767] ata1: EH complete
    [    3.106937] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.110106] ata1.00: BMDMA stat 0x24
    [    3.113094] ata1.00: failed command: READ DMA
    [    3.116073] ata1.00: cmd c8/00:08:b0:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.122146] ata1.00: status: { DRDY ERR }
    [    3.125194] ata1.00: error: { ABRT }
    [    3.137418] ata1.00: configured for UDMA/100
    [    3.140507] ata1: EH complete
    [    3.153643] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.156823] ata1.00: BMDMA stat 0x24
    [    3.159807] ata1.00: failed command: READ DMAlibata: status{DRDY ERR} failed command: READ DMA
    [    3.162839] ata1.00: cmd c8/00:08:20:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.168998] ata1.00: status: { DRDY ERR }
    [    3.172007] ata1.00: error: { ABRT }
    [    3.180668] ata1.00: configured for UDMA/100
    [    3.183723] ata1: EH complete
    [    3.196959] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.200133] ata1.00: BMDMA stat 0x24
    [    3.203119] ata1.00: failed command: READ DMA
    [    3.206164] ata1.00: cmd c8/00:08:60:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.212299] ata1.00: status: { DRDY ERR }
    [    3.215307] ata1.00: error: { ABRT }
    [    3.223992] ata1.00: configured for UDMA/100
    [    3.227045] ata1: EH complete
    [    3.240272] ata1.00: limiting speed to UDMA/33:PIO4
    [    3.243410] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
    [    3.246425] ata1.00: BMDMA stat 0x24
    [    3.249432] ata1.00: failed command: READ DMA
    [    3.252409] ata1.00: cmd c8/00:08:08:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.258466] ata1.00: status: { DRDY ERR }
    [    3.261493] ata1.00: error: { ABRT }
    [    3.264538] ata1: soft resetting link
    [    3.427332] ata1.00: configured for UDMA/33
    [    3.430433] ata1: EH complete
    [    3.443595] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.446820] ata1.00: BMDMA stat 0x24
    [    3.449852] ata1.00: failed command: READ DMA
    [    3.452832] ata1.00: cmd c8/00:08:d0:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.459026] ata1.00: status: { DRDY ERR }
    [    3.462171] ata1.00: error: { ABRT }
    [    3.470657] ata1.00: configured for UDMA/33
    [    3.473739] ata1: EH complete
    [    3.486913] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.490139] ata1.00: BMDMA stat 0x24
    [    3.493166] ata1.00: failed command: READ DMA
    [    3.496142] ata1.00: cmd c8/00:08:20:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.502331] ata1.00: status: { DRDY ERR }
    [    3.505461] ata1.00: error: { ABRT }
    [    3.514007] ata1.00: configured for UDMA/33
    [    3.517105] ata1: EH complete
    [    3.530253] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.533492] ata1.00: BMDMA stat 0x24
    [    3.536538] ata1.00: failed command: READ DMAlibata: status{DRDY ERR} failed command: READ DMA
    [    3.539551] ata1.00: cmd c8/00:08:e0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.545772] ata1.00: status: { DRDY ERR }
    [    3.548906] ata1.00: error: { ABRT }
    [    3.557336] ata1.00: configured for UDMA/33
    [    3.560498] ata1: EH complete
    [    3.576774] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.580911] ata1.00: BMDMA stat 0x24
    [    3.584568] ata1.00: failed command: READ DMA
    [    3.587701] ata1.00: cmd c8/00:08:d0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.593924] ata1.00: status: { DRDY ERR }
    [    3.598096] ata1.00: error: { ABRT }
    [    3.607296] ata1.00: configured for UDMA/33
    [    3.610368] ata1: EH complete
    [    3.623605] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.626779] ata1.00: BMDMA stat 0x24
    [    3.629744] ata1.00: failed command: READ DMA
    [    3.632766] ata1.00: cmd c8/00:08:f8:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.638870] ata1.00: status: { DRDY ERR }
    [    3.641872] ata1.00: error: { ABRT }
    [    3.650611] ata1.00: configured for UDMA/33
    [    3.653642] ata1: EH complete
    [    3.683604] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.686772] ata1.00: BMDMA stat 0x24
    [    3.689736] ata1.00: failed command: READ DMA
    [    3.692743] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.698878] ata1.00: status: { DRDY ERR }
    [    3.701864] ata1.00: error: { ABRT }
    [    3.710673] ata1.00: configured for UDMA/33
    [    3.713704] ata1: EH complete
    [    3.726837] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.729990] ata1.00: BMDMA stat 0x24
    [    3.732974] ata1.00: failed command: READ DMA
    [    3.735978] ata1.00: cmd c8/00:08:78:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.742069] ata1.00: status: { DRDY ERR }
    [    3.745055] ata1.00: error: { ABRT }
    [    3.753991] ata1.00: configured for UDMA/33
    [    3.757019] ata1: EH complete
    [    3.773597] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.776769] ata1.00: BMDMA stat 0x24
    [    3.779733] ata1.00: failed command: READ DMA
    [    3.782731] ata1.00: cmd c8/00:07:88:4b:f9/00:00:00:00:00/ed tag 0 dma 3584 in
    [    3.788819] ata1.00: status: { DRDY ERR }
    [    3.791802] ata1.00: error: { ABRT }
    [    3.800656] ata1.00: configured for UDMA/33
    [    3.803685] ata1: EH complete
    [    3.816836] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.819987] ata1.00: BMDMA stat 0x24
    [    3.822972] ata1.00: failed command: READ DMA
    [    3.825977] ata1.00: cmd c8/00:08:88:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.832062] ata1.00: status: { DRDY ERR }
    [    3.835048] ata1.00: error: { ABRT }
    [    3.843961] ata1.00: configured for UDMA/33
    [    3.846993] ata1: EH complete
    [    3.860289] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    3.863469] ata1.00: BMDMA stat 0x24
    [    3.866447] ata1.00: failed command: READ DMA
    [    3.869446] ata1.00: cmd c8/00:08:80:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    3.875548] ata1.00: status: { DRDY ERR }
    [    3.878532] ata1.00: error: { ABRT }
    [    3.887264] ata1.00: configured for UDMA/33
    [    3.890287] ata1: EH complete
    [    6.760125] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.760135] ata1.00: BMDMA stat 0x24
    [    6.760145] ata1.00: failed command: READ DMA
    [    6.760162] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.760171] ata1.00: status: { DRDY ERR }
    [    6.760178] ata1.00: error: { ABRT }
    [    6.767301] ata1.00: configured for UDMA/33
    [    6.767364] ata1: EH complete
    [    6.776800] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.776811] ata1.00: BMDMA stat 0x24
    [    6.776819] ata1.00: failed command: READ DMA
    [    6.776838] ata1.00: cmd c8/00:08:a0:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.776847] ata1.00: status: { DRDY ERR }
    [    6.776854] ata1.00: error: { ABRT }
    [    6.783957] ata1.00: configured for UDMA/33
    [    6.784011] ata1: EH complete
    [    6.800140] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.800152] ata1.00: BMDMA stat 0x24
    [    6.800160] ata1.00: failed command: READ DMAlibata: status{DRDY ERR} failed command: READ DMA
    [    6.800178] ata1.00: cmd c8/00:08:a8:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.800187] ata1.00: status: { DRDY ERR }
    [    6.800194] ata1.00: error: { ABRT }
    [    6.807270] ata1.00: configured for UDMA/33
    [    6.807333] ata1: EH complete
    [    6.816800] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.816811] ata1.00: BMDMA stat 0x24
    [    6.816819] ata1.00: failed command: READ DMA
    [    6.816838] ata1.00: cmd c8/00:08:a8:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.816848] ata1.00: status: { DRDY ERR }
    [    6.816854] ata1.00: error: { ABRT }
    [    6.823953] ata1.00: configured for UDMA/33
    [    6.824006] ata1: EH complete
    [    6.834673] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.834685] ata1.00: BMDMA stat 0x24
    [    6.834693] ata1.00: failed command: READ DMA
    [    6.834710] ata1.00: cmd c8/00:08:70:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.834720] ata1.00: status: { DRDY ERR }
    [    6.834726] ata1.00: error: { ABRT }
    [    6.841449] ata1.00: configured for UDMA/33
    [    6.841504] ata1: EH complete
    [    6.850112] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.850122] ata1.00: BMDMA stat 0x24
    [    6.850131] ata1.00: failed command: READ DMA
    [    6.850149] ata1.00: cmd c8/00:08:b0:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.850159] ata1.00: status: { DRDY ERR }
    [    6.850166] ata1.00: error: { ABRT }
    [    6.857270] ata1.00: configured for UDMA/33
    [    6.857325] ata1: EH complete
    [    6.866792] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.866801] ata1.00: BMDMA stat 0x24
    [    6.866810] ata1.00: failed command: READ DMA
    [    6.866829] ata1.00: cmd c8/00:08:20:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.866838] ata1.00: status: { DRDY ERR }
    [    6.866845] ata1.00: error: { ABRT }
    [    6.873943] ata1.00: configured for UDMA/33
    [    6.873998] ata1: EH complete
    [    6.883475] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.883486] ata1.00: BMDMA stat 0x24
    [    6.883495] ata1.00: failed command: READ DMA
    [    6.883512] ata1.00: cmd c8/00:08:60:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.883521] ata1.00: status: { DRDY ERR }
    [    6.883528] ata1.00: error: { ABRT }
    [    6.890605] ata1.00: configured for UDMA/33
    [    6.892878] ata1: EH complete
    [    6.903459] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.903469] ata1.00: BMDMA stat 0x24
    [    6.903478] ata1.00: failed command: READ DMA
    [    6.903496] ata1.00: cmd c8/00:08:08:49:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.903506] ata1.00: status: { DRDY ERR }
    [    6.903513] ata1.00: error: { ABRT }
    [    6.910612] ata1.00: configured for UDMA/33
    [    6.910675] ata1: EH complete
    [    6.923479] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.923489] ata1.00: BMDMA stat 0x24
    [    6.923498] ata1.00: failed command: READ DMA
    [    6.923516] ata1.00: cmd c8/00:08:d0:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.923526] ata1.00: status: { DRDY ERR }
    [    6.923532] ata1.00: error: { ABRT }
    [    6.926080] ata1.00: configured for UDMA/33
    [    6.926140] ata1: EH complete
    [    6.936806] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.936816] ata1.00: BMDMA stat 0x24
    [    6.936825] ata1.00: failed command: READ DMA
    [    6.936842] ata1.00: cmd c8/00:08:20:48:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.936852] ata1.00: status: { DRDY ERR }
    [    6.936859] ata1.00: error: { ABRT }
    [    6.945210] ata1.00: configured for UDMA/33
    [    6.945269] ata1: EH complete
    [    6.953493] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.953503] ata1.00: BMDMA stat 0x24
    [    6.953512] ata1.00: failed command: READ DMA
    [    6.953531] ata1.00: cmd c8/00:08:e0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.953541] ata1.00: status: { DRDY ERR }
    [    6.953547] ata1.00: error: { ABRT }
    [    6.960623] ata1.00: configured for UDMA/33
    [    6.960685] ata1: EH complete
    [    6.972021] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.972032] ata1.00: BMDMA stat 0x24
    [    6.972041] ata1.00: failed command: READ DMA
    [    6.972059] ata1.00: cmd c8/00:08:d0:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.972069] ata1.00: status: { DRDY ERR }
    [    6.972076] ata1.00: error: { ABRT }
    [    6.978496] ata1.00: configured for UDMA/33
    [    6.978551] ata1: EH complete
    [    6.987805] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    6.987816] ata1.00: BMDMA stat 0x24
    [    6.987825] ata1.00: failed command: READ DMA
    [    6.987844] ata1.00: cmd c8/00:08:f8:47:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    6.987855] ata1.00: status: { DRDY ERR }
    [    6.987862] ata1.00: error: { ABRT }
    [    6.994054] ata1.00: configured for UDMA/33
    [    6.994112] ata1: EH complete
    [    7.061253] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    7.061262] ata1.00: BMDMA stat 0x24
    [    7.061269] ata1.00: failed command: READ DMA
    [    7.061281] ata1.00: cmd c8/00:08:00:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    7.061288] ata1.00: status: { DRDY ERR }
    [    7.061293] ata1.00: error: { ABRT }
    [    7.067754] ata1.00: configured for UDMA/33
    [    7.067793] ata1: EH complete
    [    7.076833] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    7.076844] ata1.00: BMDMA stat 0x24
    [    7.076853] ata1.00: failed command: READ DMA
    [    7.076872] ata1.00: cmd c8/00:08:78:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    7.076882] ata1.00: status: { DRDY ERR }
    [    7.076888] ata1.00: error: { ABRT }
    [    7.084012] ata1.00: configured for UDMA/33
    [    7.084060] ata1: EH complete
    [    7.096775] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    7.096783] ata1.00: BMDMA stat 0x24
    [    7.096790] ata1.00: failed command: READ DMA
    [    7.096802] ata1.00: cmd c8/00:07:88:4b:f9/00:00:00:00:00/ed tag 0 dma 3584 in
    [    7.096809] ata1.00: status: { DRDY ERR }
    [    7.096813] ata1.00: error: { ABRT }
    [    7.104009] ata1.00: configured for UDMA/33
    [    7.104046] ata1: EH complete
    [    7.113495] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    7.113509] ata1.00: BMDMA stat 0x24
    [    7.113522] ata1.00: failed command: READ DMA
    [    7.113547] ata1.00: cmd c8/00:08:88:4a:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    7.113560] ata1.00: status: { DRDY ERR }
    [    7.113570] ata1.00: error: { ABRT }
    [    7.120540] ata1.00: configured for UDMA/33
    [    7.120578] ata1: EH complete
    [    7.130188] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [    7.130203] ata1.00: BMDMA stat 0x24
    [    7.130216] ata1.00: failed command: READ DMA
    [    7.130242] ata1.00: cmd c8/00:08:80:4b:f9/00:00:00:00:00/ed tag 0 dma 4096 in
    [    7.130255] ata1.00: status: { DRDY ERR }
    [    7.130264] ata1.00: error: { ABRT }
    [    7.137317] ata1.00: configured for UDMA/33
    [    7.137353] ata1: EH complete
    [   10.733468] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    [   10.733482] ata1.00: BMDMA stat 0x24
    [   10.733494] ata1.00: failed command: READ DMA
    [   10.733513] ata1.00: cmd c8/00:01:87:4b:f9/00:00:00:00:00/ed tag 0 dma 512 in
    [   10.733523] ata1.00: status: { DRDY ERR }
    [   10.733531] ata1.00: error: { ABRT }
    [   10.740782] ata1.00: configured for UDMA/33
    [   10.740833] ata1: EH complete
    
    #Produktbeschreibung
    [    2.420827] ata1.00: ATA-8: Radeon R7, 1.00, max UDMA/133
    [    2.437274] scsi 0:0:0:0: Direct-Access     ATA      Radeon R7        1.00 PQ: 0 ANSI: 5

    The Last 3 Lines are for the support if i have to return the drive. (Nobody wants to know his car will explode for sure and ride it until it realy does)

    No occasional Errors.
    The drive doesn’t spit out errors if plugged through multiple 2,5» to usb adaptors (diversing brands also), it isn’t affected by the actual kernel since i started with 3.16 and upgraded lately to 3.17 and the error-message persists in similar matters.
    Using a different PC (wortmann/clevo 1547p) results in no error-messages (same procedure 3.16 and .17 kernel, internal SATA and external through the adaptors).
    If i use the delivered HDD, the various other 2,5» HDD’s lying around my table or whatever doesn’t result in error-messages either, thus i think the sata-port isn’t damaged on the pc-side. Blowing dust from the connectors and ensuring the drive was in place [check!]

    I tried both multiple times, with live and resident OS.
    No bad sectors, last time trimmed: 05122014.

    To conclude: the Error-messages do not impose a threat on the performance, they are simply annoying :s Googling did reveal similar problems and error messages but never {DRDY  ERR} as is, so i consider myself a postworthy case.

    I’m using openrc.
    P.S. on systemd it also spits out the errors above

    Solution:
    I read in the libata-sourcecode and greped the error messages in the ata part of the kernel sources and got some known issues with OCZ-SSD’s in combination with DMA.
    Disabling of DMA solved the problem and rendered the drive access ultra slow, so i searched anew, another read in the sources brought me to the assumption that something with AHCI could be wrong, i googled and revealed
    https://www.bios-mods.com/forum/Thread- … eePC-1000H
    which proposes a bios-mod which enables AHCI-capabilities already present on the ASUS1000H.
    /*not encouraging to upgrade your bios, consider your case!*/

    I changed the bios to the modded one and alas the error is gone, the speed is remarkably better.

    Sadly i’m not able to explain the issue or the solution technically correct.

    With kind regards, frig

    Last edited by frig (2015-01-13 02:46:07)

    ATA/SATA kernel issues

    • ATA subsystem causes kernel to lock (no panic) if atacontrol detach <channel> is executed without remembering to umount relevant filesystems beforehand

      • Reference: Kernel — Panic occurs when a mounted device is removed

      • Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=89102

    • ATA subsystem acts erratically/incorrectly when a SATA disk is removed from the system without doing atacontrol detach <channel> prior to the removal.

      • Reference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html

      • Easily reproducable on any hardware sporting a commercial-grade hot-swap SATA backplane.
    • Intel MatrixRAID: New ar(4) device created when bad disk in RAID-1 array replaced with new disk

      • Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=121899

      • Patch available in PR.
    • Intel MatrixRAID: Array goes incorrectly into READY state when rebooting machine in the middle of an array rebuild
      • Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=102210

      • Patch available in PR 102210, and has been available since 2006.
    • Intel MatrixRAID: Kernel panics when a disk is lost and reattached
      • Open PRs:
        • http://www.freebsd.org/cgi/query-pr.cgi?pr=102211

        • http://www.freebsd.org/cgi/query-pr.cgi?pr=108924

      • Patch available in PR 102211, and has been available since 2006.
    • Numerous problems with embedded LSI v3 MegaRAID
      • Reference: http://butcher.heavennet.ru/patches/kernel/ata/LSIv3/

      • Open PRs:
        • http://www.freebsd.org/cgi/query-pr.cgi?pr=92786

        • http://www.freebsd.org/cgi/query-pr.cgi?pr=95260

        • http://www.freebsd.org/cgi/query-pr.cgi?pr=101819

      • Patches available in PR 92786 and PR 101819. Patches have been available since 2006.
    • ServerWorks HT1000 chipsets causing SATA data corruption

      • Known to affect at least Dell PowerEdge SC1435 systems

      • Troubleshooting details: http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045549.html

      • Reference: http://lists.freebsd.org/pipermail/freebsd-current/2007-December/081429.html

      • Reference: http://lists.freebsd.org/pipermail/freebsd-current/2008-March/084272.html

      • Supposedly fixed in January 2008 on RELENG_7 and HEAD.
    • Adaptec 1420SA support
      • Reference: http://lists.freebsd.org/pipermail/freebsd-current/2008-April/084974.html

    ATA/SATA other issues

    • SMART monitoring: Using the -s flag in smartd.conf to run periodic short/long offline tests results in DMA timeouts

      • Reference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-October/046208.html

      • Workaround: Stop using this feature. I explain why in this post.

      • I am in the process of communicating with Bruce Allen (author of smartmontools) to discuss why this feature exists, why it’s advocated in the man page and example smartd.conf, and why one would want to perform these tests on a regular basis.

    ATA/SATA DMA timeout issues

    • Symptom: messages similar to below are seen output from the kernel. Sometimes harmless, sometimes fatal. LBAs listed are scattered, and SMART statistics for the disk in question show no sign of increased error rates or sector issues:
      • ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54112319
        ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=764596887
        ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=764596887
        ad0: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=764596887
        ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=453849407
        ad0: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=453849407
        ad0: TIMEOUT - FLUSHCACHE retrying (1 retry left)
        ad0: TIMEOUT - FLUSHCACHE retrying (0 retries left) 

        ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
        ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
        ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
        ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
        ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
        ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=193407827 

    • References:
      • http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040744.html

      • http://lists.freebsd.org/pipermail/freebsd-stable/2008-March/041427.html

    • Troubleshooting: http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/039983.html

    • Workarounds:
      • PATA only: Set hw.ata.ata_dma=0 in /boot/loader.conf. This will disable use of ATA DMA. NOTE: This workaround greatly decreases I/O performance. You have been warned…

      • Volker Theile of the FreeNAS project informs me that they have solved most of the DMA problems by increasing a hard-coded arbitrary timeout value of 5 (seconds) in the ATA code to 10 or 15, while simultaneously making the timeout value adjustable via sysctl. Volker submit patches to sos@ over a year ago, but never received a response.

        • FreeBSD 6.3 patch: http://freenas.svn.sourceforge.net/viewvc/freenas/branches/0.69/build/kernel-patches/ata/files/patch-ata.diff?view=markup

        • FreeBSD 7.0 patch: http://freenas.svn.sourceforge.net/viewvc/freenas/trunk/build/kernel-patches/ata/files/patch-ata.diff?view=markup

    • As of 2008/02/27, Scott Long has offered to help track this problem down. Those who are able to reproduce the problem reliably should get in contact with Scott; serial console access will very likely be mandatory.

    SATA disk troubleshooting

    Understanding what you’re dealing with

    A substantial number of FreeBSD users report SATA disk problems. It is difficult to determine the source of these problems, due to the complex nature of hard disks and all related pieces (cabling and power, disk mechanics, disk firmware, protocol/transport, kernel driver, etc.). Even with a thorough understanding of how SATA disks work, there is a decent chance that even the most skilled system administrator won’t be able to determine the root cause. To make matters worse, many system administrators do not have fail-over systems in place, which makes thorough analysis and troubleshooting impossible («This system has to be up and running 24×7, I can’t afford the downtime for others to look at it»). And then there’s the issue of finances: sometimes cash is required to work around issues («Do we know if this Adaptec SATA controller even works? Maybe we should switch to Areca or 3ware or Promise…»), while not everyone has such funds available.

    But back to the disks themselves. Comparatively, with regards to bad block management, SCSI disks behave quite differently than SATA. SCSI will report any disk errors with sense code, ASC, and ASCQ, and may even automatically mark that block as a «grown defect» (a user-manageable list of bad blocks) — while SATA disks will silently attempt to remap bad blocks, keeping track of such defects internally, and will not report to the transport layer (e.g. operating system) that anything had happened. For example, assuming the block was remapped successfully, even SMART statistics are usually left untouched; while in the case of a remapping failure, SMART attribute 198 (Offline_Uncorrectable) may get incremented.

    In the case of SATA, such a scenario can take time, and depends greatly upon the type of error. Some errors (such as soft errors) may take under a second to recover from, while others (hard errors) may take longer periods — and some may cause the disk to lock up entirely, requiring the disk power-cycled and the SATA channel reattached. FreeBSD expects that all ATA commands (that includes SATA!) sent to a device receive a response within 5 seconds. The timeout is hard-coded, and is entirely arbitrary; it has no implied meaning. It was chosen by sos@freebsd.org probably based on personal choice.

    What FreeBSD has to say

    So what happens when a disk operation is executed, but takes longer than 5 seconds to return a sense code? Well, FreeBSD spits out quite a lot of crap to the kernel console (see dmesg or /var/log/console.log), such as:

    ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=XXXXX
    ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=YYYYY
    ad0: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=YYYYY
    ad0: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=YYYYY

    This tells you a few things, most of which are low-level:

    • The disk which experienced the problem was ad0

    • A time-out occurred when attempting a write operation
    • FreeBSD attempted to write data to LBA XXXXX via standard DMA (which uses 28-bit LBA addressing), experienced a time-out, and attempted a write retry once
    • FreeBSD attempted to write data to LBA YYYYY via 48-bit DMA, experienced a time-out, and attempted a write retry twice
    • FreeBSD deemed the write operation a failure
    • The ATA status result is value 0x51 (bits 6 (DRDY), 4 (not applicable), and 1 (ERR) set)

    • The ATA error result is value 0x10 (bit 4 set), which according to ATA-7 specification, Section 6.59.6 is: «IDNF shall be set to one if a user-accessible address could not be found. IDNF shall be set to one if an address outside of the range of user-accessible addresses is requested if command aborted is not returned.» FreeBSD labels this bit as NID_NOT_FOUND

    The IDNF bit seems to indicate that a particular LBA on the disk was inaccessible; I interpret this to mean «the LBA you’re trying to access is within an invalid LBA range» (which would strongly indicate a bug in FreeBSD), but there’s a good chance I’m reading the description wrong. This needs some further research/clarification, particularly by those more familiar with the ATA protocol semantics than I am.

    None of these are very helpful though, are they? To a system administrator, this means «there’s something wrong, possibly around 48-bit LBA YYYYY… or maybe 28-bit LBA XXXXX». Most administrators know that if the LBA seeing errors is always the same that the disk itself is likely the cause, but what if the LBA is random?

    …more to follow…

    What disks have to say

    Rather than try to decipher what FreeBSD says, a more logical approach is to examine the disk to see if it logged any sort of error in SMART.

    I need to make something clear: SMART is not a guaranteed way to determine the current state of a disk, or past events on a disk. SMART is entirely dependent upon the level of pedantry of the disk firmware programmer him/herself. :-) Some SMART implementations don’t even bother to log real errors; others increment counters only when offline SMART tests are run. The trick is knowing how to interpret SMART stats for each disk vendor (Western Digital, Seagate, Fujitsu, etc.). Sometimes it gets even more granular than that (different models of disks behaving differently when it comes to SMART).

    …more to follow…


    CategoryHowTo CategoryStale

    JeremyChadwick/ATA_issues_and_troubleshooting (last edited 2020-04-26T06:18:24+0000 by MarkLinimon)

    #
    6 лет, 9 месяцев назад

    (отредактировано

    6 лет, 9 месяцев назад)

    Темы:

    10

    Сообщения:

    36

    Участник с: 03 мая 2016

    Здравствуйте!
    Есть свежеустановленная система ArchLinux(поставлены только base, base-devl, grub-bios), при попытке загрузиться в неё вижу сообщение

    
    Failed command: READ DMA
    ...
    

    и дальше система не грузится(доходит до kernel panic).

    Сообщение возникает при попытке подмонтировать разделы tmp, var, home, boot, в общем всего кроме root — root монтируется удачно и fsck его проверяет. SmartMonTools ни каких ошибок не показывает, mHDD говорит что hdd просто идеальный. Пробовал файловые системы ext4 и ext3.
    При использовании ext4 с параметром ядра libata.force=noncq ещё есть вот такая вот ошибка

    
    EXT_4 fs error: _ext4_get_inode_loc unable to read itable block
    

    Грешу на ядро арча, тк если установить RFRemix(Fedora) 23 — всё работает просто идеально.

    Прошу совета в какую сторону копать. Заранее спасибо!
    HDD: SAMSUNG Spinpoint M8 ST1000LM024 (HN-M101MBB)
    Kernel: 4.4.5-1
    Notebook: Lenovo P585
    smartctl -a /dev/sda1

    
    smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.5-1-ARCH] (local build)
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Seagate Samsung SpinPoint M8 (AF)
    Device Model:     ST1000LM024 HN-M101MBB
    Serial Number:    S2U5J9EC831769
    LU WWN Device Id: 5 0004cf 2083a59e1
    Firmware Version: 2AR10001
    User Capacity:    1,000,204,886,016 bytes [1.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5400 rpm
    Form Factor:      2.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS T13/1699-D revision 6
    SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Tue May  3 07:09:09 2016 UTC
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)	Offline data collection activity
    					was never started.
    					Auto Offline Data Collection: Disabled.
    Self-test execution status:      (  39)	The self-test routine was interrupted
    					by the host with a hard or soft reset.
    Total time to complete Offline
    data collection: 		(13140) seconds.
    Offline data collection
    capabilities: 			 (0x5b) SMART execute Offline immediate.
    					Auto Offline data collection on/off support.
    					Suspend Offline collection upon new
    					command.
    					Offline surface scan supported.
    					Self-test supported.
    					No Conveyance Self-test supported.
    					Selective Self-test supported.
    SMART capabilities:            (0x0003)	Saves SMART data before entering
    					power-saving mode.
    					Supports SMART auto save timer.
    Error logging capability:        (0x01)	Error logging supported.
    					General Purpose Logging supported.
    Short self-test routine
    recommended polling time: 	 (   2) minutes.
    Extended self-test routine
    recommended polling time: 	 ( 219) minutes.
    SCT capabilities: 	       (0x003f)	SCT Status supported.
    					SCT Error Recovery Control supported.
    					SCT Feature Control supported.
    					SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       7
      2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
      3 Spin_Up_Time            0x0023   089   088   025    Pre-fail  Always       -       3452
      4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1609
      5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
      8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1335
     10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
     11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       46
     12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1695
    191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       236
    192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
    194 Temperature_Celsius     0x0002   063   040   000    Old_age   Always       -       37 (Min/Max 18/61)
    195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
    196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
    200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       4347
    223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       46
    225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       8109
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Interrupted (host reset)      70%      1332         -
    # 2  Short offline       Completed without error       00%      1331         -
    # 3  Vendor (0x50)       Completed without error       00%         1         -
    
    SMART Selective self-test log data structure revision number 0
    Note: revision number not 1 implies that no selective self-test has ever been run
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Interrupted [70% left] (0-65535)
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    

    Interrupted (host reset) — при повторном сканировании выключили ноут.

    kurych

    #
    6 лет, 9 месяцев назад

    Темы:

    0

    Сообщения:

    1394

    Участник с: 06 ноября 2011

    Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
    Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)

    tekma

    #
    6 лет, 9 месяцев назад

    (отредактировано

    6 лет, 9 месяцев назад)

    Темы:

    10

    Сообщения:

    36

    Участник с: 03 мая 2016

    kurych
    Дык, включите ноут и проведите тесты до конца. Ваша ошибка четко указывает на железные проблемы.
    Либо форматируйте с проверкой на бедблоки, но все равно это ненадолго. (ИМХО)

    fsck -t -y -f -c

    говорит (0/0/0 errors)

    По поводу тестов — это был повторный который оборвали.
    Если проблема явно железная(я тоже сначала к этому склонялся), то почему Ф23 нормально стартует и ни на что не ругается?

    tekma

    #
    6 лет, 9 месяцев назад

    (отредактировано

    6 лет, 9 месяцев назад)

    Темы:

    10

    Сообщения:

    36

    Участник с: 03 мая 2016

    modprobe.blacklist=sp5100_tco

    и проблема решилась)
    P.S. косяк ядра 4.5 и AMD A* CPU

    Понравилась статья? Поделить с друзьями:
  • Ata identify failed error 0x51
  • Ata bus error linux
  • Ata 152 ошибка скания
  • At u2diag 255 error
  • At u2diag 0 error что делать