Cam status ata status error

Hi everyone, On one server I got some disk related errors. There are not many (the shown dmesg is about 5 months), but frightening anyway. I have no data loss until now, many thanks to mirrored ZFS. Does this messages point to a real harddisk controller failure? Or only a bad configured kernel...
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 0b 00 40 00 00 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 9f 50 40 5d 01 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 a1 50 40 5d 01 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
(ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 f0 2c 9c 40 60 00 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
ahcich0: Timeout on slot 18 port 0
ahcich0: is 00000000 cs 003c0000 ss 003c0000 rs 003c0000 tfd c0 serr 00000000 cmd 0000d217
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 88 d9 2d 40 5c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 113720, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1063093, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1058432, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 606048, size: 8192
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 30 35 fc 40 9d 00 00 01 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 18 36 fc 00 9d 00 00 00 01
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 f8 31 1f 40 9d 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 00 32 1f 00 9d 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 70 55 f9 40 9c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 70 55 f9 00 9c 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 70 55 f9 40 9c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 70 55 f9 00 9c 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 70 55 f9 40 9c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 70 55 f9 00 9c 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 3 port 0
ahcich0: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd c0 serr 00000000 cmd 0000c317
(ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 290833, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 637539, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1082327, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 767227, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 586772, size: 12288
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 290833, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1057171, size: 24576
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 201066, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1055856, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 854055, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 637539, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1082327, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 767227, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 586772, size: 12288
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1057171, size: 24576
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 174964, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1051025, size: 36864
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1028930, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 201066, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1055856, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 854055, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1082327, size: 32768

Topic: [SOLVED] CAM status: ATA Status Error  (Read 23677 times)

Hi Everyone,
I have 2 Soekris devices installed with CF-card running Opnsense 16.7.2-i386.

On both I get the following messages in the System Log File:
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): Error 5, Retries exhausted
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): RES: 51 04 6f 63 6b 45 45 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): CAM status: ATA Status Error
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): Retrying command
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): RES: 51 04 6f 63 6b 45 45 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): CAM status: ATA Status Error
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): Retrying command
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): RES: 51 04 6f 63 6b 45 45 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): CAM status: ATA Status Error
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): Retrying command
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): RES: 51 04 6f 63 6b 45 45 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): CAM status: ATA Status Error
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): Retrying command
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): RES: 51 04 6f 63 6b 45 45 00 00 01 00
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): CAM status: ATA Status Error
Aug 26 12:16:51    kernel: (ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00

It has to do with the disk, anyone seen this before and is there a solution for this?

Greets,
Rosie

« Last Edit: August 31, 2016, 04:29:06 pm by franco »


Logged


Hi Rosie,

CF card and TRIM in the same post is suspicious.

Assuming this is a Nano image, can you edit your /etc/fstab accordingly:

DISCLAIMER: Editing the fstab should not be done lightly, system bootup may fail. That’s why OPNsense code never touches the file beyond image building or the bsdinstaller’s installation process.

Add » # notrim» (no quotes) at the very end of the root partition line (the device should be /dev/ufs/opnsense0) and reboot.

It’ll disable TRIM and hopefully the errors go away as the CF card is likely unable to handle the command.

Cheers,
Franco

« Last Edit: August 26, 2016, 06:32:03 pm by franco »


Logged


Hi Franco,
thanks for the reply.
I have done research as well and found tunefs and fsck.
As I have a CF there are 2 slices.
ad0s1a has trim enabled and ad0s2a has trim disabled.
When I try to disabled it on ad0s1a i’m not able due to the fact that this slice is still readonly.
I thought by booting in second slice i could disable it, but still no success.
Can you tell me how to enable writing to the first slice.
Greetings,
Rosie


Logged


Didn’t you mean it’s still read-write? That’s when you can’t run the tunefs to modify.

If you use the fstab trick the RC system will do the disabling for you on reboot. That’s why I mentioned it. :)

Cheers,
Franco


Logged


Hi Franco,
i’m still strugling with this one.
What I do:
First reboot and start Opnsense with option 2.
I do:
# cat /etc/fstab
/dev/ufs/OPNsense0 / ufs rw,async,noatime 1 1
# vi /etc/fstab
ex/vi: Error: /var/tmp/vi.recover/: Read-only file system
ex/vi: Modifications not recoverable if the session fails
ex/vi: Error: /etc/fstab: Read-only file system
ex/vi: Error: Unable to create temporary file: Read-only file system
#
Sorry, but I still don’t know how to get this partition read write.
I’m missing knowledge to get this slice read write.
What is the trick?


Logged


Franco,
don’t know what happened but system is back to OPNsense 15.7.18_1-i386 and i had to assing nics and ips again.
I was trying with serial connection and I reset the soekris system with reset button.
I will build the system up again and see what happens.
Keep you informed.
Roro


Logged


Well I booted system with OPNsense1, I will update this slice and see whats happening next.
Fetching the update files now. Upgrade in progress:
Fetching libevent2-2.0.22_1.txz: 100%  254 KiB 260.0kB/s    00:01
Fetching libedit-3.1.20150325_1.txz: 100%  119 KiB 121.8kB/s    00:01
Fetching ldns-1.6.17_5.txz: 100%  379 KiB 388.3kB/s    00:01
Fetching jansson-2.7_1.txz: 100%   39 KiB  40.4kB/s    00:01
Fetching idnkit-1.0_5.txz: 100%  184 KiB 188.5kB/s    00:01
Fetching gmp-5.1.3_2.txz: 100%  474 KiB 484.9kB/s    00:01
Fetching gettext-runtime-0.19.6.txz: 100%  144 KiB 147.7kB/s    00:01
Fetching freetype2-2.6.2.txz: 100%  535 KiB 547.4kB/s    00:01
Fetching easy-rsa-3.0.1.txz: 100%   31 KiB  32.1kB/s    00:01
Fetching dnsmasq-2.75_1,1.txz: 100%  257 KiB 262.7kB/s    00:01
Fetching dhcp6-20080615_5.txz: 100%  104 KiB 106.2kB/s    00:01
Fetching curl-7.46.0_2.txz: 100%    1 MiB   1.5MB/s    00:01
Fetching choparp-20150613.txz: 100%    7 KiB   7.2kB/s    00:01
Fetching ca_root_nss-3.21.txz: 100%  330 KiB 337.7kB/s    00:01
Fetching bind910-9.10.3P2.txz: 100%    6 MiB   5.9MB/s    00:01
Fetching apinger-0.6.1_4.txz: 100%   32 KiB  33.0kB/s    00:01
Fetching libucl-0.7.3_1.txz: 100%   79 KiB  80.7kB/s    00:01
Checking integrity… done (0 conflicting)
[1/74] Upgrading openssl from 1.0.2_4 to 1.0.2_6…
[1/74] Extracting openssl-1.0.2_6:  62%


Logged


I’m not exactly sure what you’re trying to do. The second slice is dormant, so it wasn’t up to date. It doesn’t have a shared configuration directory, that’s also by design. We’d either go ahead and make the second slice of nano usable, or we’d at some point see that the second slice isn’t as useful. We’ve now seen the latter. :)

Boot from the first slice as usual (not single-user mode), edit /etc/fstab accordingly and reboot and TRIM will be off…

Cheers,
Franco


Logged


The whole evening upgrading the opnsense cf i386 soekris box from version 15.x.x to 16.7.2 OpenSSL.
1  OPNsense
2  OPNsense

F6 PXE
Boot:  2

Took a long time but no errors at all.
As usuall I go from openssl to libressl and then the messages about:

(ada0:ata0:0:0:0): CAM status: ATA Status Error
etc.

appears again. After that I went to sleep.

Now I will go back to OpenSSL and see whats happening.

System is running now OPNsense 16.7.2 (i386/OpenSSL), but:

 0) Logout                             7) Ping host
 1) Assign Interfaces                  8) Shell
 2) Set interface(s) IP address        9) pfTop
 3) Reset the root password           10) Filter Logs
 4) Reset to factory defaults         11) Restart web interface
 5) Power off system                  12) Upgrade from console
 6) Reboot system                     13) Restore a configuration

Enter an option: (ada0:ata0:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00
 01 00
(ada0:ata0:0:0:0): CAM status: ATA Status Error
(ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(ada0:ata0:0:0:0): RES: 51 04 20 3c 2f 74 74 00 00 01 00
(ada0:ata0:0:0:0): Retrying command

So I think the switch from OpenSSL to LibreSSL causes these messages in the system log.
Is this solvable in the installation or otherwise?

Greets.


Logged


If it’s not the TRIM support the SD card’s first slice is wearing out. LibreSSL will not cause physical errors on your media. Note that your second slice is pretty young in terms of write cycles as opposed to the first one.


Logged


Recap:
after a few days of hard try and error.
Soekris net5501-70 serial.
Opnsense i386 cf-card.

Findings:
1. because I could not see much via serial (soekris 9600, opnsense 115200). fixed
2. the nanobsd boot menu. Here you can choose:
   1  OPNsense
   2  OPNsense

   F6 PXE
   Boot:  1

3. then the freebsd/opnsense logo will come, but it doesn’t display well on serial output, so you miss the freebsd/opnsense menu with Multiuser, Singleuser etc.

4. then I had issues with VI and /etc/fstab. I choose to use EE to edit this file and I was able to add «# notrim».

5. Finally SUCCES. The trim messages are gone. Thanks to Franco’s help.

root@opn01:~ # tunefs -p /dev/ad0s1a
tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       disabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  512
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: space to hold for metadata blocks: (-k)            1032
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)                                 OPNsense0

root@opn01:~ # tunefs -p /dev/ada0s1a
tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       disabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  512
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: space to hold for metadata blocks: (-k)            1032
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)                                 OPNsense0
root@opn01:~ #
root@opn01:~ #


Logged


Hi Rosie,

Sounds good. I’m not entirely sure why the system things the card is TRIM-capable but there were only ever two cases I heard of. I think chemlud had similar issues, we added # notrim upon his request. Maybe he has more background on this?

You can probably improve the console experience in terms of disabling the second console (new in 16.7.3) and changing the serial speed as well to match Soekris, all under System: Settings: Administration.

Cheers,
Franco


Logged


Hi Franco,
i’m still getting the messages again.
Is it also possible to turn of:
tunefs: soft updates: (-n)                                 enabled
Greets


Logged


and it driving crazy. >:(


Logged


Try » # notrim,nosoft» instead. :)


Logged


Автор Сообщение

Заголовок сообщения: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 5:31 pm 

Не в сети
Аватара пользователя



Зарегистрирован: Чт 20 фев, 2014 10:26 am
Сообщения: 75

Подскажите что с диском. Freebsd 9.2 на нем samba. Стали жаловаться что пишет нет места на шарах. Стал смотреть ls — половина папок input/output error. Ребутнул, при ребуте писал что secondary gpt corrupt, восстановил gpart recover. потом пишет not clean — прошелся fsck’ом — почистил. Ребутнул вроде все самонтировалось. но потом опять ребутал несколько раз — пишет dma read error — куча таких ошибок, также CAM status: ATA Status Error. В общем я думаю что диск посыпался (переезд был может долбанули в машине комп), он еще поскрипывает както нехорошо когда к нему чем нибудь обращаюсь. Сейчас из всей шары некоторых папок и файлов нет — перезагружу — уже других нет — все рандомно, диск сыпется?

Вернуться к началу

Профиль  

erema15

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 5:53 pm 

Не в сети



Зарегистрирован: Вт 17 авг, 2010 8:48 pm
Сообщения: 494
Откуда: Беларусь

Вернуться к началу

Профиль  

grayich

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 6:03 pm 



Зарегистрирован: Вт 10 авг, 2004 2:24 am
Сообщения: 3359
Откуда: Харьков

смарт глянь
ну и badblocks можно прогнать

Вернуться к началу

Профиль  

gmax007

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 6:28 pm 



Зарегистрирован: Чт 20 фев, 2014 10:26 am
Сообщения: 75

там смарт то уже говорит не знаю тип устройства а на бэд блоки может на ночь поставлю если сейчас подключусь к работе и вообще если там диск не отвалился полностью пока я домой ехал) такой вопрос: от перегрева может такое случится и на сколько критичен перегрев? обидно, хотел туда еще одну шару перелить гигов на 100 и с другим сервером rsynk сделать — неуспел)

Последний раз редактировалось gmax007 Пн 02 июн, 2014 6:42 pm, всего редактировалось 1 раз.

Вернуться к началу

Профиль  

grayich

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 6:39 pm 



Зарегистрирован: Вт 10 авг, 2004 2:24 am
Сообщения: 3359
Откуда: Харьков

не слышал такого, что б смарт нельзя было получить .. возможно он просто не активирован

Код:

smartctl -s on /dev/ad0 # включить
smartctl -a /dev/ad0 # текущая инфа
smartctl -t long /dev/ad0  # тест, после которого может значительно измениться инфа по smartctl -a /dev/ad0

/dev/ad0 естественно на свой заменить

может кабельконтролер полетели?
к другому тазику подключить не помешает

Вернуться к началу

Профиль  

gmax007

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 6:49 pm 



Зарегистрирован: Чт 20 фев, 2014 10:26 am
Сообщения: 75

смарт включен должен был быть, он мне постоянно логи слал на почту, его там никто не отключал. хз. сейчас зайти через путти не могу — там кирдык походу все отвалилось. ну вот бэкап залью им и буду этот комп тестировать до изнеможения и тазики менять. да кстати я его раскрывал — думал переткну кабеля. сата-питание прям в руках разлетелся наконечник пластмассовый) — поменял сразу но толку так и не было. Подскажите вообще насколько часто могут быть подобные проблемы связаны с памятью оперативной или БП — то есть хочу понять насколько опасно старые компы как сервера гонять

Вернуться к началу

Профиль  

grayich

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 7:02 pm 



Зарегистрирован: Вт 10 авг, 2004 2:24 am
Сообщения: 3359
Откуда: Харьков

по питанию часто проблемы для старых тазов, в остальном обычно нормально
часто кондёры вздуваются на матери и в БП, перепайка решает.

Вернуться к началу

Профиль  

xemul

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 7:37 pm 

Не в сети



Зарегистрирован: Ср 14 окт, 2009 2:26 pm
Сообщения: 617

Про электролиты на мамке и в БП Вам уже сказали.
Проверьте также состояние контактов на плате контроллера диска. Бита Torex T9 под рукой найдётся?

Гонять любые тесты (даже на чтение) при непонятном состоянии железа настоятельно не рекомендую.

Вернуться к началу

Профиль  

gmax007

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 7:54 pm 



Зарегистрирован: Чт 20 фев, 2014 10:26 am
Сообщения: 75

биту могу у кореша попросить такую. надо начальство на новый комп, пару дисков и райд контроллер разводить, про сервер уж молчу — нищета же.
Кстати про непонятное состояние железа, когда первый раз я комп перезагружал было зависание на самом первом этапе до начала POST на картинке вендора материнки. это из-за чего бывает? ну потом на этом месте не висло. а вообще какие жесткие диски посоветуете брать естественно не энтерпрайзес-уровня, ну вообще по надежности для самба-сервера на фряхе небольшой конторы

Вернуться к началу

Профиль  

grayich

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 8:10 pm 



Зарегистрирован: Вт 10 авг, 2004 2:24 am
Сообщения: 3359
Откуда: Харьков

зависания на посте, может питание, может чип

Вернуться к началу

Профиль  

xemul

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 8:14 pm 

Не в сети



Зарегистрирован: Ср 14 окт, 2009 2:26 pm
Сообщения: 617

Телепатически — на 99% проблемы с питанием, и именно с электролитами на мамке.
Из ширпотребных дисков за последние годы у меня, пожалуй, наиболее положительная статистика по WD Green (если не забыть на них отключить «зелёный» таймаут). Из 3 дисков в RAIDZ и 4 ГБ памяти получается дешёвая (домашняя) файлопомойка.

Вернуться к началу

Профиль  

gmax007

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Пн 02 июн, 2014 9:46 pm 



Зарегистрирован: Чт 20 фев, 2014 10:26 am
Сообщения: 75

я вот собственно и упоминал питание, потому диск не так уж давно и куплен и вечно там дохнет что-то. причем пару лет назад на том компе был другой диск был, который при мне же полетел. стоял тогда CentOS 5. Там какая то беда вечно на том компе. на вид то кондеры ничо вроде, но понятно чтобы был вердикт нужно мерить все цепи питания и ключевые элементы. комп старье проще выкинуть. потом с ним беда была сломалось крепление для cpu_fan — херня — я его прям к плате привинтил. короче этот компьютер еще тот «выживальщик»)). Зеленый таймаут, это всмысле что такое, с прошивкой диска что-то связано? насчет домашней, у меня около 30 «домочадцев» пользуют помойку, вот приведите на ваш взгляд примерную конфу железа на 30 человек для фряхи и самбы. 4 гб памяти — не не слышал)

Вернуться к началу

Профиль  

xemul

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Вт 03 июн, 2014 11:26 am 

Не в сети



Зарегистрирован: Ср 14 окт, 2009 2:26 pm
Сообщения: 617

Чтобы был вердикт, все цепи «мерить» не надо. Достаточно немного потыкать осциллографом.
Верхняя часть старых процовых сокетов (с ушами для крепления кулера) легко заменяется. Прикрутить кулер к мамке — тоже вариант, — каждый ССЗБ.
Ваш рассказ в духе «А в остальном всё хорошо» захватывает, жду развития интриги.
Серия WD Green паркует головки и тормозит шпиндель при отсутствии обращений в течение программируемого таймаута (по умолчанию, по-моему, 8 с), который можно отключить. Гуглите wdidle на сайте WD.
За конфигурацию ничего не скажу, т.к. критерия «30 «домочадцев»» недостаточно даже для выбора офисной мебели.

На клаве есть разные полезные кнопки типа Shift, Enter, … Не стесняйтесь их использовать.

Вернуться к началу

Профиль  

gmax007

Заголовок сообщения: Re: Сыпется диск?

СообщениеДобавлено: Вт 03 июн, 2014 1:59 pm 



Зарегистрирован: Чт 20 фев, 2014 10:26 am
Сообщения: 75

Забрал комп из серверной, диск горячий как ад, остыл, загрузился без ошибок все замонтировалось, прогнал на бэды с помощью dd rescue — err 0. На сколько я понял у меня он мозги крутил из-за перегрева

Вернуться к началу

Профиль  

Кто сейчас на конференции

Зарегистрированные пользователи: Bing [Bot]

Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете добавлять вложения

Яндекс.Метрика

This appear on a second system with same Freenas version, same up to date BIOS version, on same hardware you can find on my signature, but equiped with a G2020 processor and a 3 WD30EFRX in RAIDZ1. The first sytem (described in my signature) works like a charme.

Not sure to have well understand what you’re saying, but I have physicaly replaced the drive with the system OFF. Hotplug is disable on the BIOS.

Unfortunatly the system was running 9.1.1 before replacement and is still with the same version. I’ve just proceed to the drive replacement.

I was thinking this was related to the power management of the drive, so I’ve disable it and set HDD standby to «always on» . I let the smartd Check interval set to 125.
Then I have reboot the system.

Code:

Dec 13 22:17:09 freenas newsyslog[1489]: logfile first created
Dec 13 22:17:09 freenas syslogd: kernel boot file is /boot/kernel/kernel
Dec 13 22:17:09 freenas kernel: Copyright (c) 1992-2013 The FreeBSD Project.
Dec 13 22:17:09 freenas kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Dec 13 22:17:09 freenas kernel: The Regents of the University of California. All rights reserved.
Dec 13 22:17:09 freenas kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
Dec 13 22:17:09 freenas kernel: FreeBSD 9.1-STABLE #0 r+16f6355: Tue Aug 27 00:38:40 PDT 2013
Dec 13 22:17:09 freenas kernel: root@build.ixsystems.com:/tank/home/jkh/src/freenas/os-base/amd64/tank/home/jkh/src/freenas/FreeBSD/src/sys/FREENAS.amd64 amd64
Dec 13 22:17:09 freenas kernel: gcc version 4.2.1 20070831 patched [FreeBSD]
Dec 13 22:17:09 freenas kernel: CPU: Intel(R) Pentium(R) CPU G2020 @ 2.90GHz (2900.08-MHz K8-class CPU)
Dec 13 22:17:09 freenas kernel: Origin = "GenuineIntel"  Id = 0x306a9  Family = 0x6  Model = 0x3a  Stepping = 9
Dec 13 22:17:09 freenas kernel: Features=0xbfebfbff
Dec 13 22:17:09 freenas kernel: Features2=0xd9ae3bf
Dec 13 22:17:09 freenas kernel: AMD Features=0x28100800
Dec 13 22:17:09 freenas kernel: AMD Features2=0x1
Dec 13 22:17:09 freenas kernel: Standard Extended Features=0x281
Dec 13 22:17:09 freenas kernel: TSC: P-state invariant, performance statistics
Dec 13 22:17:09 freenas kernel: real memory  = 17179869184 (16384 MB)
Dec 13 22:17:09 freenas kernel: avail memory = 16204607488 (15453 MB)
Dec 13 22:17:09 freenas kernel: Event timer "LAPIC" quality 600
Dec 13 22:17:09 freenas kernel: ACPI APIC Table:
Dec 13 22:17:09 freenas kernel: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
Dec 13 22:17:09 freenas kernel: FreeBSD/SMP: 1 package(s) x 2 core(s)
Dec 13 22:17:09 freenas kernel: cpu0 (BSP): APIC ID:  0
Dec 13 22:17:09 freenas kernel: cpu1 (AP): APIC ID:  2
Dec 13 22:17:09 freenas kernel: WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
Dec 13 22:17:09 freenas kernel: ACPI Warning: FADT (revision 5) is longer than ACPI 2.0 version, truncating length 268 to 244 (20110527/tbfadt-320)
Dec 13 22:17:09 freenas kernel: ioapic0  irqs 0-23 on motherboard
Dec 13 22:17:09 freenas kernel: kbd1 at kbdmux0
Dec 13 22:17:09 freenas kernel: aesni0: No AESNI support.
Dec 13 22:17:09 freenas kernel: cryptosoft0:  on motherboard
Dec 13 22:17:09 freenas kernel: acpi0:  on motherboard
Dec 13 22:17:09 freenas kernel: ACPI Error: [RAMB] Namespace lookup failure, AE_NOT_FOUND (20110527/psargs-392)
Dec 13 22:17:09 freenas kernel: ACPI Exception: AE_NOT_FOUND, Could not execute arguments for [RAMW] (Region) (20110527/nsinit-380)
Dec 13 22:17:09 freenas kernel: acpi0: Power Button (fixed)
Dec 13 22:17:09 freenas kernel: acpi0: reservation of 67, 1 (4) failed
Dec 13 22:17:09 freenas kernel: cpu0:  on acpi0
Dec 13 22:17:09 freenas kernel: cpu1:  on acpi0
Dec 13 22:17:09 freenas kernel: hpet0:  iomem 0xfed00000-0xfed003ff on acpi0
Dec 13 22:17:09 freenas kernel: Timecounter "HPET" frequency 14318180 Hz quality 950
Dec 13 22:17:09 freenas kernel: Event timer "HPET" frequency 14318180 Hz quality 550
Dec 13 22:17:09 freenas kernel: Event timer "HPET1" frequency 14318180 Hz quality 440
Dec 13 22:17:09 freenas kernel: Event timer "HPET2" frequency 14318180 Hz quality 440
Dec 13 22:17:09 freenas kernel: Event timer "HPET3" frequency 14318180 Hz quality 440
Dec 13 22:17:09 freenas kernel: Event timer "HPET4" frequency 14318180 Hz quality 440
Dec 13 22:17:09 freenas kernel: Event timer "HPET5" frequency 14318180 Hz quality 440
Dec 13 22:17:09 freenas kernel: Event timer "HPET6" frequency 14318180 Hz quality 440
Dec 13 22:17:09 freenas kernel: atrtc0:  port 0x70-0x77 irq 8 on acpi0
Dec 13 22:17:09 freenas kernel: atrtc0: Warning: Couldn't map I/O.
Dec 13 22:17:09 freenas kernel: Event timer "RTC" frequency 32768 Hz quality 0
Dec 13 22:17:09 freenas kernel: attimer0:  port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Dec 13 22:17:09 freenas kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Dec 13 22:17:09 freenas kernel: Event timer "i8254" frequency 1193182 Hz quality 100
Dec 13 22:17:09 freenas kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
Dec 13 22:17:09 freenas kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
Dec 13 22:17:09 freenas kernel: pcib0:  port 0xcf8-0xcff on acpi0
Dec 13 22:17:09 freenas kernel: pci0:  on pcib0
Dec 13 22:17:09 freenas kernel: pcib1:  irq 16 at device 1.0 on pci0
Dec 13 22:17:09 freenas kernel: pci1:  on pcib1
Dec 13 22:17:09 freenas kernel: vgapci0:  port 0xf000-0xf03f mem 0xf7800000-0xf7bfffff,0xe0000000-0xefffffff irq 16 at device 2.0 on pci0
Dec 13 22:17:09 freenas kernel: agp0:  on vgapci0
Dec 13 22:17:09 freenas kernel: agp0: aperture size is 256M, detected 262140k stolen memory
Dec 13 22:17:09 freenas kernel: xhci0:  mem 0xf7c00000-0xf7c0ffff irq 16 at device 20.0 on pci0
Dec 13 22:17:09 freenas kernel: xhci0: 32 byte context size.
Dec 13 22:17:09 freenas kernel: usbus0 on xhci0
Dec 13 22:17:09 freenas kernel: pci0:  at device 22.0 (no driver attached)
Dec 13 22:17:09 freenas kernel: ehci0:  mem 0xf7c14000-0xf7c143ff irq 23 at device 26.0 on pci0
Dec 13 22:17:09 freenas kernel: usbus1: EHCI version 1.0
Dec 13 22:17:09 freenas kernel: usbus1 on ehci0
Dec 13 22:17:09 freenas kernel: pcib2:  irq 16 at device 28.0 on pci0
Dec 13 22:17:09 freenas kernel: pci2:  on pcib2
Dec 13 22:17:09 freenas kernel: pcib3:  irq 16 at device 28.4 on pci0
Dec 13 22:17:09 freenas kernel: pci3:  on pcib3
Dec 13 22:17:09 freenas kernel: re0:  port 0xe000-0xe0ff mem 0xf0004000-0xf0004fff,0xf0000000-0xf0003fff irq 16 at device 0.0 on pci3
Dec 13 22:17:09 freenas kernel: re0: Using 1 MSI-X message
Dec 13 22:17:09 freenas kernel: re0: Chip rev. 0x48000000
Dec 13 22:17:09 freenas kernel: re0: MAC rev. 0x00000000
Dec 13 22:17:09 freenas kernel: miibus0:  on re0
Dec 13 22:17:09 freenas kernel: rgephy0:  PHY 1 on miibus0
Dec 13 22:17:09 freenas kernel: rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
Dec 13 22:17:09 freenas kernel: re0: Ethernet address: ac:22:0b:75:96:c7
Dec 13 22:17:09 freenas kernel: ehci1:  mem 0xf7c13000-0xf7c133ff irq 23 at device 29.0 on pci0
Dec 13 22:17:09 freenas kernel: usbus2: EHCI version 1.0
Dec 13 22:17:09 freenas kernel: usbus2 on ehci1
Dec 13 22:17:09 freenas kernel: isab0:  at device 31.0 on pci0
Dec 13 22:17:09 freenas kernel: isa0:  on isab0
Dec 13 22:17:09 freenas kernel: ahci0:  port 0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf090-0xf097,0xf080-0xf083,0xf060-0xf07f mem 0xf7c12000-0xf7c127ff irq 19 at device 31.2 on pci0
Dec 13 22:17:09 freenas kernel: ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
Dec 13 22:17:09 freenas kernel: ahcich0:  at channel 0 on ahci0
Dec 13 22:17:09 freenas kernel: ahcich1:  at channel 1 on ahci0
Dec 13 22:17:09 freenas kernel: ahcich2:  at channel 2 on ahci0
Dec 13 22:17:09 freenas kernel: pci0:  at device 31.3 (no driver attached)
Dec 13 22:17:09 freenas kernel: acpi_button0:  on acpi0
Dec 13 22:17:09 freenas kernel: acpi_tz0:  on acpi0
Dec 13 22:17:09 freenas kernel: acpi_tz1:  on acpi0
Dec 13 22:17:09 freenas kernel: orm0:  at iomem 0xc0000-0xce7ff,0xce800-0xcf7ff on isa0
Dec 13 22:17:09 freenas kernel: sc0:  at flags 0x100 on isa0
Dec 13 22:17:09 freenas kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Dec 13 22:17:09 freenas kernel: vga0:  at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Dec 13 22:17:09 freenas kernel: atkbdc0:  at port 0x60,0x64 on isa0
Dec 13 22:17:09 freenas kernel: atkbd0:  irq 1 on atkbdc0
Dec 13 22:17:09 freenas kernel: kbd0 at atkbd0
Dec 13 22:17:09 freenas kernel: atkbd0: [GIANT-LOCKED]
Dec 13 22:17:09 freenas kernel: ppc0: cannot reserve I/O port range
Dec 13 22:17:09 freenas kernel: coretemp0:  on cpu0
Dec 13 22:17:09 freenas kernel: est0:  on cpu0
Dec 13 22:17:09 freenas kernel: p4tcc0:  on cpu0
Dec 13 22:17:09 freenas kernel: coretemp1:  on cpu1
Dec 13 22:17:09 freenas kernel: est1:  on cpu1
Dec 13 22:17:09 freenas kernel: p4tcc1:  on cpu1
Dec 13 22:17:09 freenas kernel: Timecounters tick every 1.000 msec
Dec 13 22:17:09 freenas kernel: ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled
Dec 13 22:17:09 freenas kernel: DUMMYNET 0xfffffe0003e8c440 with IPv6 initialized (100409)
Dec 13 22:17:09 freenas kernel: load_dn_sched dn_sched WF2Q+ loaded
Dec 13 22:17:09 freenas kernel: load_dn_sched dn_sched FIFO loaded
Dec 13 22:17:09 freenas kernel: load_dn_sched dn_sched PRIO loaded
Dec 13 22:17:09 freenas kernel: load_dn_sched dn_sched QFQ loaded
Dec 13 22:17:09 freenas kernel: load_dn_sched dn_sched RR loaded
Dec 13 22:17:09 freenas kernel: usbus0: 5.0Gbps Super Speed USB v3.0
Dec 13 22:17:09 freenas kernel: usbus1: 480Mbps High Speed USB v2.0
Dec 13 22:17:09 freenas kernel: usbus2: 480Mbps High Speed USB v2.0
Dec 13 22:17:09 freenas kernel: ugen0.1: <0x8086> at usbus0
Dec 13 22:17:09 freenas kernel: uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
Dec 13 22:17:09 freenas kernel: ugen1.1:  at usbus1
Dec 13 22:17:09 freenas kernel: uhub1:  on usbus1
Dec 13 22:17:09 freenas kernel: ugen2.1:  at usbus2
Dec 13 22:17:09 freenas kernel: uhub2:  on usbus2
Dec 13 22:17:09 freenas kernel: ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
Dec 13 22:17:09 freenas kernel: ada0:  ATA-9 SATA 3.x device
Dec 13 22:17:09 freenas kernel: ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Dec 13 22:17:09 freenas kernel: ada0: Command Queueing enabled
Dec 13 22:17:09 freenas kernel: ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
Dec 13 22:17:09 freenas kernel: ada0: quirks=0x1<4K>
Dec 13 22:17:09 freenas kernel: ada0: Previously was known as ad4
Dec 13 22:17:09 freenas kernel: ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
Dec 13 22:17:09 freenas kernel: ada1:  ATA-9 SATA 3.x device
Dec 13 22:17:09 freenas kernel: ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Dec 13 22:17:09 freenas kernel: ada1: Command Queueing enabled
Dec 13 22:17:09 freenas kernel: ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
Dec 13 22:17:09 freenas kernel: ada1: quirks=0x1<4K>
Dec 13 22:17:09 freenas kernel: ada1: Previously was known as ad6
Dec 13 22:17:09 freenas kernel: ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
Dec 13 22:17:09 freenas kernel: ada2:  ATA-9 SATA 3.x device
Dec 13 22:17:09 freenas kernel: ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
Dec 13 22:17:09 freenas kernel: ada2: Command Queueing enabled
Dec 13 22:17:09 freenas kernel: ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
Dec 13 22:17:09 freenas kernel: ada2: quirks=0x1<4K>
Dec 13 22:17:09 freenas kernel: ada2: Previously was known as ad8
Dec 13 22:17:09 freenas kernel: SMP: AP CPU #1 Launched!
Dec 13 22:17:09 freenas kernel: Timecounter "TSC-low" frequency 1450042036 Hz quality 1000
Dec 13 22:17:09 freenas kernel: uhub0: 8 ports with 8 removable, self powered
Dec 13 22:17:09 freenas kernel: uhub1: 2 ports with 2 removable, self powered
Dec 13 22:17:09 freenas kernel: uhub2: 2 ports with 2 removable, self powered
Dec 13 22:17:09 freenas kernel: Root mount waiting for: usbus2 usbus1
Dec 13 22:17:09 freenas kernel: ugen1.2:  at usbus1
Dec 13 22:17:09 freenas kernel: uhub3:  on usbus1
Dec 13 22:17:09 freenas kernel: ugen2.2:  at usbus2
Dec 13 22:17:09 freenas kernel: uhub4:  on usbus2
Dec 13 22:17:09 freenas kernel: Root mount waiting for: usbus2 usbus1
Dec 13 22:17:09 freenas kernel: uhub3: 6 ports with 6 removable, self powered
Dec 13 22:17:09 freenas kernel: uhub4: 8 ports with 8 removable, self powered
Dec 13 22:17:09 freenas kernel: Root mount waiting for: usbus2
Dec 13 22:17:09 freenas kernel: ugen2.3:  at usbus2
Dec 13 22:17:09 freenas kernel: umass0:  on usbus2
Dec 13 22:17:09 freenas kernel: Trying to mount root from ufs:/dev/ufs/FreeNASs1a [ro]...
Dec 13 22:17:09 freenas kernel: mountroot: waiting for device /dev/ufs/FreeNASs1a ...
Dec 13 22:17:09 freenas kernel: da0 at umass-sim0 bus 0 scbus3 target 0 lun 0
Dec 13 22:17:09 freenas kernel: da0:  Fixed Direct Access SCSI-6 device
Dec 13 22:17:09 freenas kernel: da0: 40.000MB/s transfers
Dec 13 22:17:09 freenas kernel: da0: 7633MB (15633408 512 byte sectors: 255H 63S/T 973C)
Dec 13 22:17:09 freenas kernel: da0: quirks=0x2
Dec 13 22:17:09 freenas kernel: ZFS filesystem version: 5
Dec 13 22:17:09 freenas kernel: ZFS storage pool version: features support (5000)
Dec 13 22:17:09 freenas kernel: GEOM_RAID5: Module loaded, version 1.1.20110927.40 (rev 00ce00e5abb4)
Dec 13 22:17:09 freenas kernel: GEOM_ELI: Device ada1p1.eli created.
Dec 13 22:17:09 freenas kernel: GEOM_ELI: Encryption: AES-XTS 256
Dec 13 22:17:09 freenas kernel: GEOM_ELI:    Crypto: software
Dec 13 22:17:09 freenas kernel: GEOM_ELI: Device ada2p1.eli created.
Dec 13 22:17:09 freenas kernel: GEOM_ELI: Encryption: AES-XTS 256
Dec 13 22:17:09 freenas kernel: GEOM_ELI:    Crypto: software
Dec 13 22:17:09 freenas kernel: GEOM_ELI: Device ada0p1.eli created.
Dec 13 22:17:09 freenas kernel: GEOM_ELI: Encryption: AES-XTS 256
Dec 13 22:17:09 freenas kernel: GEOM_ELI:    Crypto: software
Dec 13 22:17:09 freenas root: /etc/rc: WARNING: failed precmd routine for vmware_guestd
Dec 13 22:17:09 freenas ntpd[1793]: ntpd 4.2.4p5-a (1)
Dec 13 22:17:09 freenas kernel: re0: link state changed to UP
Dec 13 22:17:10 freenas proftpd[2062]: 127.0.0.1 - ProFTPD 1.3.4c (maint) (built Tue Aug 27 2013 07:56:15 UTC) standalone mode STARTUP
Dec 13 22:17:15 freenas avahi-daemon[2799]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
Dec 13 22:17:15 freenas INADYN[2819]: Fri Dec 13 22:17:15 2013: W:INADYN: IP address for alias 'xxxxxxxxxxx.no-ip.biz:auto' needs update to 'xx.xx.xx.xx'...
Dec 13 22:17:16 freenas INADYN[2819]: Fri Dec 13 22:17:16 2013: W:INADYN: Alias 'xx.xx.xx.xx.no-ip.biz' to IP 'xx.xx.xx.xx' updated successfully.
Dec 13 22:17:16 freenas INADYN[2819]: Fri Dec 13 22:17:16 2013: W:INADYN: DYNDNS Server response: HTTP/1.1 200 OK^M Date: Fri, 13 Dec 2013 21:17:16 GMT^M Server: Apache/2^M Content-Location: update.php^M Vary: negotiate^M TCN: choice^M Content-Length: 19^M Connection: close^M Content-Type: text/plain; charset=UTF-8^M ^M nochg 81.249.42.154
Dec 13 22:17:16 freenas kernel: bridge0: Ethernet address: 02:57:8c:50:a0:00
Dec 13 22:17:16 freenas kernel: bridge0: link state changed to UP
Dec 13 22:17:16 freenas kernel: re0: promiscuous mode enabled
Dec 13 22:17:16 freenas kernel: epair0a: Ethernet address: 02:90:6b:00:08:0a
Dec 13 22:17:16 freenas kernel: epair0b: Ethernet address: 02:90:6b:00:09:0b
Dec 13 22:17:16 freenas kernel: epair0a: link state changed to UP
Dec 13 22:17:16 freenas kernel: epair0b: link state changed to UP
Dec 13 22:17:16 freenas kernel: re0: link state changed to DOWN
Dec 13 22:17:16 freenas kernel: epair0a: promiscuous mode enabled
Dec 13 22:17:17 freenas ntpd[1794]: sendto(xx.xx.xx.xx) (fd=22): No route to host
Dec 13 22:17:17 freenas ntpd[1794]: bind() fd 28, family AF_INET6, port 123, scope 8, addr fe80::90:6bff:fe00:80a, mcast=0 flags=0x11 fails: Can't assign requested address
Dec 13 22:17:17 freenas ntpd[1794]: unable to create socket on epair0a (7) for fe80::90:6bff:fe00:80a#123
Dec 13 22:17:20 freenas kernel: re0: link state changed to UP
Dec 13 22:17:42 freenas manage.py: [common.pipesubr:57] Popen()ing: /usr/local/bin/warden list  -v
Dec 13 22:17:42 freenas last message repeated 2 times
Dec 13 22:17:44 freenas manage.py: [py.warnings:193] /usr/local/lib/python2.7/site-packages/django/http/request.py:193: DeprecationWarning: HttpRequest.raw_post_data has been deprecated. Use HttpRequest.body instead.  warnings.warn('HttpRequest.raw_post_data has been deprecated. Use HttpRequest.body instead.', DeprecationWarning)
Dec 13 22:17:44 freenas manage.py: [common.pipesubr:57] Popen()ing: /usr/local/bin/warden list  -v
Dec 13 22:17:44 freenas manage.py: [common.pipesubr:57] Popen()ing: /usr/local/bin/warden list  -v
Dec 13 22:41:32 freenas manage.py: [common.pipesubr:57] Popen()ing: /usr/local/bin/warden list  -v
Dec 13 22:47:38 freenas last message repeated 9 times
Dec 14 00:20:16 freenas kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e8 fa ea 40 0a 00 00 00 00 00
Dec 14 00:20:16 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 00:20:16 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Dec 14 00:20:16 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 10 e8 fa ea 40 0a 00 00 00 00
Dec 14 00:20:16 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command
Dec 14 01:25:13 freenas kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 98 a0 eb 40 0a 00 00 00 00 00
Dec 14 01:25:13 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 01:25:13 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Dec 14 01:25:13 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 10 98 a0 eb 40 0a 00 00 00 00
Dec 14 01:25:13 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command
Dec 14 02:00:13 freenas kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 20 25 40 40 b8 00 00 00 00 00
Dec 14 02:00:13 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 02:00:13 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Dec 14 02:00:13 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 40 20 25 40 40 b8 00 00 00 00
Dec 14 02:00:13 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command
Dec 14 02:00:14 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs snapshot -r Volume1/toto@auto-20131214.0200-1m
Dec 14 02:00:14 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131113.0200-1m
Dec 14 02:00:14 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131113.0200-1m
Dec 14 02:00:14 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131112.0400-1m
Dec 14 02:00:14 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131112.0400-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131112.0500-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131112.0500-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131113.0600-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131113.0600-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131113.0400-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131113.0400-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131112.0600-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131112.0600-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131113.0500-1m
Dec 14 02:00:15 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131113.0500-1m
Dec 14 02:00:16 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs get -H freenas:state Volume1/toto@auto-20131113.0300-1m
Dec 14 02:00:16 freenas autosnap.py: [tools.autosnap:57] Popen()ing: /sbin/zfs destroy -r Volume1/toto@auto-20131113.0300-1m
Dec 14 02:40:13 freenas kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 08 6b 40 40 38 00 00 00 00 00
Dec 14 02:40:13 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 02:40:13 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Dec 14 02:40:13 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 10 08 6b 40 40 38 00 00 00 00
Dec 14 02:40:13 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command
Dec 14 03:50:15 freenas kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 f0 40 ee 40 0a 00 00 00 00 00
Dec 14 03:50:15 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 03:50:15 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Dec 14 03:50:15 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 10 f0 40 ee 40 0a 00 00 00 00
Dec 14 03:50:15 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command
Dec 14 04:30:17 freenas kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 60 4d ee 40 0a 00 00 00 00 00
Dec 14 04:30:17 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 04:30:17 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Dec 14 04:30:17 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 10 60 4d ee 40 0a 00 00 00 00
Dec 14 04:30:17 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command
Dec 14 04:50:15 freenas kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 00 94 40 40 38 00 00 00 00 00
Dec 14 04:50:15 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 04:50:15 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Dec 14 04:50:15 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 10 00 94 40 40 38 00 00 00 00
Dec 14 04:50:15 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command
Dec 14 06:55:33 freenas manage.py: [common.pipesubr:57] Popen()ing: /usr/local/bin/warden list  -v
Dec 14 06:55:34 freenas last message repeated 4 times
Dec 14 07:11:13 freenas kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 38 79 ee 40 0a 00 00 00 00 00
Dec 14 07:11:13 freenas kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Dec 14 07:11:13 freenas kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF )
Dec 14 07:11:13 freenas kernel: (ada0:ahcich0:0:0:0): RES: 41 10 38 79 ee 40 0a 00 00 00 00
Dec 14 07:11:13 freenas kernel: (ada0:ahcich0:0:0:0): Retrying command

As you can see the error continue to occure «randomly», every several hour, which made the troubleshooting boring.

I will try to connect the drive on another SATA port.

Содержание

  1. Solved CAM status: ATA Status Error
  2. Ordoban
  3. SirDice
  4. Ordoban
  5. Ordoban
  6. worldi
  7. Ordoban
  8. Terry_Kennedy
  9. Ordoban
  10. emmex
  11. CAM status: ATA Status Error
  12. SmallGuy
  13. cyberjock
  14. SmallGuy
  15. cyberjock
  16. UFS CAM status : ATA Status Error while I try to boot FreeBSD in multi-user-mode.
  17. ziomario
  18. diizzy
  19. ziomario
  20. eternal_noob
  21. LiveCDs – smartmontools
  22. grahamperrin@
  23. ziomario
  24. System doesn’t recognize hdd after boot
  25. Vovas
  26. Vovas
  27. Sebulon
  28. Vovas
  29. Sebulon
  30. Upgrade from 9.2 to 9.3 — CAM status: ATA Status Error on /dev/ada1 (HP 40L, Toshiba DT01ACA300)
  31. victorhooi
  32. victorhooi

Solved CAM status: ATA Status Error

Ordoban

On one server I got some disk related errors. There are not many (the shown dmesg(1) is about 5 months), but frightening anyway. I have no data loss until now, many thanks to mirrored ZFS. Does this messages point to a real harddisk controller failure? Or only a bad configured kernel module? Are there some kernel-parameters to tweak? Something like bus timing settings?

Reactions: Nyantastic

SirDice

Administrator

Re: CAM status: ATA Status Error

Looking at the lifetime (982 days) and the type of errors my first guess would be a disk that’s close to dying.

Ordoban

Re: CAM status: ATA Status Error

Both at the same time?

Re: CAM status: ATA Status Error

Ordoban

Re: CAM status: ATA Status Error

This are 2 different errors: the «CAM status» one and the «swap_pager» one. The first are rare and seems not critical, but the second one leads me to a real disk fault. The Reallocated_Sector_Ct of the first disk is jumped up from 0 to

20k at last 2 days! The disk is replaced now and all is fine.

(How i can mark this thread as solved?)

worldi

Re: CAM status: ATA Status Error

Ordoban

Terry_Kennedy

Re: CAM status: ATA Status Error

This does seem a bit fishy:

Ordoban

emmex

Re: CAM status: ATA Status Error

This does seem a bit fishy:

You don’t indicate the drive manufacturer / model, but it seems odd that 2 drives had an error on the same disk block, and that block just happened to be the «magic» last addressable LBA in pre-LBA48 mode. In theory, a drive should reject a command to access a block outside its capacity, but it may be that the model you’re using barfs and logs a SMART error instead.

Источник

CAM status: ATA Status Error

SmallGuy

Then I have proceed to a short follwing by a long SMART test and all seems to be fine:

Find also the gpart list (I don’t have enough knoledge to extract any information from that):

I have try to change the SATA cable and I have test the RAM with Memetest and nothing to notice.
I don’t understand what’s happend, and what to do now.
Has somebody any idea?

cyberjock

Inactive Account

I got nothing. Everything looks fine. Even the cable error values are zero per SMART. So you should start looking elsewhere. Maybe a crappy power supply or something?

You have made the choice to buy an Asus desktop motherboard, which is not exactly a choice that is even remotely recommended around here. You also didn’t go with ECC RAM, not recommended around here. Granted, ECC shouldn’t be making drives sound funny. But there’s 2 ways to build a server. The way we recommend and any other way. The recommended way is obviously easier as you KNOW everything should work properly. As soon as you want to go any other way you accept certain risks.

Unfortunately, you are on your own at this point as none of your info gives me any impression anything is wrong. This is why I think its smarter for people just learning FreeNAS/FreeBSD to stick to recommended hardware as you are not familiar enough with the OS to go out buying random hardware. This isn’t Windows. 100% of your hardware won’t work in FreeBSD. Even if it works, that doesn’t mean it will work properly or even work reliably.

SmallGuy

I got nothing. Everything looks fine. Even the cable error values are zero per SMART. So you should start looking elsewhere. Maybe a crappy power supply or something?

You have made the choice to buy an Asus desktop motherboard, which is not exactly a choice that is even remotely recommended around here. You also didn’t go with ECC RAM, not recommended around here. Granted, ECC shouldn’t be making drives sound funny. But there’s 2 ways to build a server. The way we recommend and any other way. The recommended way is obviously easier as you KNOW everything should work properly. As soon as you want to go any other way you accept certain risks.

Unfortunately, you are on your own at this point as none of your info gives me any impression anything is wrong. This is why I think its smarter for people just learning FreeNAS/FreeBSD to stick to recommended hardware as you are not familiar enough with the OS to go out buying random hardware. This isn’t Windows. 100% of your hardware won’t work in FreeBSD. Even if it works, that doesn’t mean it will work properly or even work reliably.

cyberjock

Inactive Account

1. Sure, but Asus adds their own stuff to the board. Are you 100% sure that Asus own additions to the board aren’t responsible? Hint: You cannot answer that with a «yes» unless you have a duplicate system with the exact same hardware and BIOS version and it doesn’t do it.
2. No, but that error is usually caused by an issue like a bad hard drive. In your case it gives a generic ATA status error and no disk. This makes me thing something is up with your SATA or something is interfering with your SATA and controller causing the problem.
3. If it only appears after a disk replacement then I’d ignore it. It should not be giving that error on disk replacement if your hardware supports hotswap. Hot plugging and hot swapping will depend on if you hardware supports it and if the FreeBSD driver supports it. Hot plugging is NOT the same as hot swapping.
4. You’re right. And that’s why I think it might be an unspecified error with your hardware. Unfortunately, you are totally on your own since you aren’t using hardware that’s been used alot. If you had a board like mine and still had the error at least I(or someone on the forums) could vouch that it *should* be working properly with a given version. But we can’t since you are on an island by yourself regarding the hardware. There’s nobody to vouch that your exact hardware should work.
5. If you look at gpart list‘s output you’ll see a line for each disk that says «state: OK». That means it is definitely NOT a partition problem. Of course, the error made it obvious(in my opinion) that it wasn’t a partition problem, but that state: OK makes it a «for certain» condition.

And regarding Haswell, I don’t recommend them yet because of those issues. Those issues are similar to yours in that the error messages are unspecified and you’d have to figure it out by process of elimination or by finding someone with the same error(remember, you can’t do this because of your hardware).

Unless you can provide more specific conditions on which the error occurs you are pretty much on your own to find the cause. Sorry.

Источник

UFS CAM status : ATA Status Error while I try to boot FreeBSD in multi-user-mode.

ziomario

Something bad happened to the disk where I was working. I have never seen the error before and I don’t know what to do. I tried to fix the error with fsck -y /dev/ada2p2 (the main partition) in single user mode but it didn’t work. Very odd error. I can boot FreeBSD in single user mode but I can’t boot it multi user mode. I did the check several times,only the first time it cleaned the disk. the other times it was already cleaned. The error is still there. fsck is not able to fix that kind of error. I suspect that there is a bug behind that.

diizzy

ziomario

eternal_noob

LiveCDs – smartmontools

Reactions: ziomario and richardtoohey2

grahamperrin@

Single user mode requires reading from a subset of the file system.

An exit (from single user mode) to multi-user mode will require reading from a larger set, and some writes. If it’s a hard disk, there might be a problem with an area of the disk that’s occupied by all or part of a file in the larger set.

If I’m not mistaken, your photograph shows failure before multi-user mode. Do check the disk but also, check cabling and other hardware.

If you temporarily disconnect:

  • the other two or more internal disks
  • all non-essential peripherals (leaving only the keyboard, mouse and display)

– then can the computer boot in multi-user mode from the suspect disk alone?

ziomario

-–> then can the computer boot in multi-user mode from the suspect disk alone ?

No. The other disks are good. The only damaged disk is the disk where I have installed FreeBSD. If I was in Linux,I would have used the USB Live cd. But It seems that for FreeBSD there isn’t any Live cd. (For Live CD I mean the full OS which run on the USB stick). Someone should create it. Its useful

Источник

System doesn’t recognize hdd after boot

Vovas

Vovas

Sebulon

could you please also share the output of:
# gpart show

and:
# zdb | grep ashift

Vovas

Sebulon

Thank you. You have different «problems» also, which are unlikely contributors to your problem, but I can start by explaining it to you at least.

The disks you have used to build your pool with are «Advanced Format(AF)»-drives that have 4k large physical sectors, but they lie and present themselves as 512b, as to not confuse lesser knowing beeings, like Windows XP .e.g. When you create the pool with these drives raw, ZFS sends all IO unaligned which severely impacts performance. So the first thing you have to do is to partition the hard drives aligned to 1MiB.

The second problem is the ashift-value that ZFS uses to determine the smallest IO it can send. «ashift: 9» stands for «I will send 512b IO´s», while «ashift: 12» stands for «I will send 4k IO´s», which is what these drives like, since that´s what they really are.

Remediation; Backup and recreate. Sorry, there´s no other way.

Aligned partitioning:
# gpart create -s gpt ada(1,2,3)
# gpart add -t freebsd-zfs -b 2048 -a 4k -l disk(1,2,3) ada(1,2,3)

Pool creation with «ashift: 12»:
# gnop create -S 4096 /dev/gpt/disk1
# zpool create storage raidz gpt/disk1[b].nop[/b] gpt/disk2 gpt/disk3
# zpool export storage
# gnop destroy /dev/gpt/disk1.nop
# zpool import -d /dev/gpt storage

Will land you with aligned partitions and ZFS sending 4k IO´s for optimal performance.

But there´s another «snag» about these drives, and that is their firmware that says «park the read-head if idle for 5 secs». The problem with that is that ZFS is a transactional database that buffers IO for about 5 secs between flushes, which means that these drives parks and unparks their heads a gazillion times more than any other drive used with ZFS. Although the specification says that they should be good for about a gazillion times parking, but it may cause unnecessary ware for them to be acting like that. So there is some sort of DOS firmware modifier that removes that behavior. I think it is called «wdidle». Might be worth looking in to.

About your original issue, maybe BIOS is wonky? Make sure it´s set in AHCI-mode and that all SATA are treated equal.

Источник

Upgrade from 9.2 to 9.3 — CAM status: ATA Status Error on /dev/ada1 (HP 40L, Toshiba DT01ACA300)

victorhooi

Contributor

I have a HP Proliant N40L Microserver, that previously has FreeNAS 92

It has four SATA drives, and FreeNAS 9.2 (Beta, I can’t recall which build) installed on a USB stick.

I recently decided to wipe the USB stick, and install FreeNAS 9.3. release onto it.

However, during the boot, I’m getting a whole bunch of error messages about «CAM Status: ATA Status Error» and »
ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )»
on what I believe is the USB stick (/dev/ada1 — the SATA drives shouldn’t have been mounted at this point).

I’ve tried two different USB sticks (one of which is known good, and the other is the one that was previously running 9.2 Beta successfully for several months), and it exhibits the same symptoms each time.

Eventually, it does finish booting, if you leave it long enough — however, then when you try to run the initial wizard from a web browser, it then prints out those errors again, and hangs.

Any thoughts on what’s going on, or what the next diagnosis steps might be?

victorhooi

Contributor

Damn, I’m silly — I’m fairly sure /dev/ada1 is was one of the SATA drives, not the USB stick. I just booted it up with all SATA drives ejected, and it booted up fine, no error messages.

Hmm, ok, that’s not good.

I have three SATA drives in them — this is an old NAS I had setup at my parent’s place, I was visiting today so I figured I’d upgrade it.

I am really hoping that I set it up in RAID-Z mode. otherwise I’m guessing I’m out of luck, right? Three disks, hmm, yeah, RAID-Z still works. right. ugh.

/dev/ada1 would put it as the second drive in the set — so slot 2? Or is the ordering of the device names not really related?

Should I boot up the machine with that disk ejected, and see how things go?

Would it be safe to do a ZFS import with only two drives, and see what happens there?

Or what else would a safe next step?

If it helps, the drives are Toshiba (Hitachi) DT01ACA300 3.0 TB drives.

Источник

July 11 2013, 11:04

Помогите разобраться в ошибке диска

FreeBSD 9.1. Периодически лезут ошибки вида

Jul 10 21:16:04 book-mf-1 kernel: (ada5:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jul 10 21:16:04 book-mf-1 kernel: (ada5:ahcich5:0:0:0): CAM status: ATA Status Error
Jul 10 21:16:04 book-mf-1 kernel: (ada5:ahcich5:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
Jul 10 21:16:04 book-mf-1 kernel: (ada5:ahcich5:0:0:0): RES: 51 04 38 df f7 47 00 00 00 00 00
Jul 10 21:16:04 book-mf-1 kernel: (ada5:ahcich5:0:0:0): Retrying command
Jul 10 21:17:13 book-mf-1 kernel: ahcich5: Timeout on slot 23 port 0
Jul 10 21:17:13 book-mf-1 kernel: ahcich5: is 00000000 cs 00800000 ss 00000000 rs 00800000 tfd 10c1 serr 00000000 cmd 0004d717
Jul 10 21:17:13 book-mf-1 kernel: ahcich5: Error while READ LOG EXT
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 12 02 75 13 40 21 00 00 00 00 00
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): CAM status: ATA Status Error
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): ATA status: 00 ()
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): Retrying command
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 1c 75 13 40 21 00 00 00 00 00
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): CAM status: ATA Status Error
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): ATA status: 00 ()
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00
Jul 10 21:17:13 book-mf-1 kernel: (ada5:ahcich5:0:0:0): Retrying command

Замена диска не привела к устранению ошибок. Контроллер на матери. Может ли это быть проблема с кабелем? что можно предпринять для уточнения ситуации (заменять материнскую плату, машину в сборе, государственный строй РФ, орбиту Земли не предлагать)?

Содержание

  1. Freebsd ata status error
  2. ATA/SATA other issues
  3. ATA/SATA DMA timeout issues
  4. SATA disk troubleshooting
  5. Understanding what you’re dealing with
  6. What FreeBSD has to say
  7. What disks have to say
  8. Solved CAM status: ATA Status Error
  9. Ordoban
  10. SirDice
  11. Ordoban
  12. Ordoban
  13. worldi
  14. Ordoban
  15. Terry_Kennedy
  16. Ordoban
  17. emmex
  18. CAM status: ATA Status Error
  19. SmallGuy
  20. cyberjock
  21. SmallGuy
  22. cyberjock

Freebsd ata status error

ATA subsystem causes kernel to lock (no panic) if atacontrol detach is executed without remembering to umount relevant filesystems beforehand

ATA subsystem acts erratically/incorrectly when a SATA disk is removed from the system without doing atacontrol detach prior to the removal.

Reference: http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040534.html

  • Easily reproducable on any hardware sporting a commercial-grade hot-swap SATA backplane.
  • Intel MatrixRAID: New ar(4) device created when bad disk in RAID-1 array replaced with new disk

    Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=121899

  • Patch available in PR.
  • Intel MatrixRAID: Array goes incorrectly into READY state when rebooting machine in the middle of an array rebuild

      Open PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=102210

    • Patch available in PR 102210, and has been available since 2006.
  • Intel MatrixRAID: Kernel panics when a disk is lost and reattached
    • Open PRs:

        http://www.freebsd.org/cgi/query-pr.cgi?pr=108924

    • Patch available in PR 102211, and has been available since 2006.
  • Numerous problems with embedded LSI v3 MegaRAID

      http://www.freebsd.org/cgi/query-pr.cgi?pr=101819

  • Patches available in PR 92786 and PR 101819. Patches have been available since 2006.
  • ServerWorks HT1000 chipsets causing SATA data corruption

    Known to affect at least Dell PowerEdge SC1435 systems

    ATA/SATA other issues

    SMART monitoring: Using the -s flag in smartd.conf to run periodic short/long offline tests results in DMA timeouts

    Workaround: Stop using this feature. I explain why in this post.

    I am in the process of communicating with Bruce Allen (author of smartmontools) to discuss why this feature exists, why it’s advocated in the man page and example smartd.conf, and why one would want to perform these tests on a regular basis.

    ATA/SATA DMA timeout issues

    • Symptom: messages similar to below are seen output from the kernel. Sometimes harmless, sometimes fatal. LBAs listed are scattered, and SMART statistics for the disk in question show no sign of increased error rates or sector issues:
    • References:

      PATA only: Set hw.ata.ata_dma=0 in /boot/loader.conf. This will disable use of ATA DMA. NOTE: This workaround greatly decreases I/O performance. You have been warned.

      Volker Theile of the FreeNAS project informs me that they have solved most of the DMA problems by increasing a hard-coded arbitrary timeout value of 5 (seconds) in the ATA code to 10 or 15, while simultaneously making the timeout value adjustable via sysctl. Volker submit patches to sos@ over a year ago, but never received a response.

      FreeBSD 7.0 patch: http://freenas.svn.sourceforge.net/viewvc/freenas/trunk/build/kernel-patches/ata/files/patch-ata.diff?view=markup

  • As of 2008/02/27, Scott Long has offered to help track this problem down. Those who are able to reproduce the problem reliably should get in contact with Scott; serial console access will very likely be mandatory.
  • SATA disk troubleshooting

    Understanding what you’re dealing with

    A substantial number of FreeBSD users report SATA disk problems. It is difficult to determine the source of these problems, due to the complex nature of hard disks and all related pieces (cabling and power, disk mechanics, disk firmware, protocol/transport, kernel driver, etc.). Even with a thorough understanding of how SATA disks work, there is a decent chance that even the most skilled system administrator won’t be able to determine the root cause. To make matters worse, many system administrators do not have fail-over systems in place, which makes thorough analysis and troubleshooting impossible («This system has to be up and running 24×7, I can’t afford the downtime for others to look at it»). And then there’s the issue of finances: sometimes cash is required to work around issues («Do we know if this Adaptec SATA controller even works? Maybe we should switch to Areca or 3ware or Promise. «), while not everyone has such funds available.

    But back to the disks themselves. Comparatively, with regards to bad block management, SCSI disks behave quite differently than SATA. SCSI will report any disk errors with sense code, ASC, and ASCQ, and may even automatically mark that block as a «grown defect» (a user-manageable list of bad blocks) — while SATA disks will silently attempt to remap bad blocks, keeping track of such defects internally, and will not report to the transport layer (e.g. operating system) that anything had happened. For example, assuming the block was remapped successfully, even SMART statistics are usually left untouched; while in the case of a remapping failure, SMART attribute 198 (Offline_Uncorrectable) may get incremented.

    In the case of SATA, such a scenario can take time, and depends greatly upon the type of error. Some errors (such as soft errors) may take under a second to recover from, while others (hard errors) may take longer periods — and some may cause the disk to lock up entirely, requiring the disk power-cycled and the SATA channel reattached. FreeBSD expects that all ATA commands (that includes SATA!) sent to a device receive a response within 5 seconds. The timeout is hard-coded, and is entirely arbitrary; it has no implied meaning. It was chosen by sos@freebsd.org probably based on personal choice.

    What FreeBSD has to say

    So what happens when a disk operation is executed, but takes longer than 5 seconds to return a sense code? Well, FreeBSD spits out quite a lot of crap to the kernel console (see dmesg or /var/log/console.log), such as:

    This tells you a few things, most of which are low-level:

    The disk which experienced the problem was ad0

  • A time-out occurred when attempting a write operation
  • FreeBSD attempted to write data to LBA XXXXX via standard DMA (which uses 28-bit LBA addressing), experienced a time-out, and attempted a write retry once
  • FreeBSD attempted to write data to LBA YYYYY via 48-bit DMA, experienced a time-out, and attempted a write retry twice
  • FreeBSD deemed the write operation a failure

    The ATA status result is value 0x51 (bits 6 ( DRDY), 4 (not applicable), and 1 ( ERR) set)

    The ATA error result is value 0x10 (bit 4 set), which according to ATA-7 specification, Section 6.59.6 is: » IDNF shall be set to one if a user-accessible address could not be found. IDNF shall be set to one if an address outside of the range of user-accessible addresses is requested if command aborted is not returned.» FreeBSD labels this bit as NID_NOT_FOUND

    The IDNF bit seems to indicate that a particular LBA on the disk was inaccessible; I interpret this to mean «the LBA you’re trying to access is within an invalid LBA range» (which would strongly indicate a bug in FreeBSD), but there’s a good chance I’m reading the description wrong. This needs some further research/clarification, particularly by those more familiar with the ATA protocol semantics than I am.

    None of these are very helpful though, are they? To a system administrator, this means «there’s something wrong, possibly around 48-bit LBA YYYYY. or maybe 28-bit LBA XXXXX». Most administrators know that if the LBA seeing errors is always the same that the disk itself is likely the cause, but what if the LBA is random?

    What disks have to say

    Rather than try to decipher what FreeBSD says, a more logical approach is to examine the disk to see if it logged any sort of error in SMART.

    I need to make something clear: SMART is not a guaranteed way to determine the current state of a disk, or past events on a disk. SMART is entirely dependent upon the level of pedantry of the disk firmware programmer him/herself. Some SMART implementations don’t even bother to log real errors; others increment counters only when offline SMART tests are run. The trick is knowing how to interpret SMART stats for each disk vendor (Western Digital, Seagate, Fujitsu, etc.). Sometimes it gets even more granular than that (different models of disks behaving differently when it comes to SMART).

    JeremyChadwick/ATA_issues_and_troubleshooting (last edited 2020-04-26T06:18:24+0000 by MarkLinimon )

    Источник

    Solved CAM status: ATA Status Error

    Ordoban

    On one server I got some disk related errors. There are not many (the shown dmesg(1) is about 5 months), but frightening anyway. I have no data loss until now, many thanks to mirrored ZFS. Does this messages point to a real harddisk controller failure? Or only a bad configured kernel module? Are there some kernel-parameters to tweak? Something like bus timing settings?

    Reactions: Nyantastic

    SirDice

    Administrator

    Re: CAM status: ATA Status Error

    Looking at the lifetime (982 days) and the type of errors my first guess would be a disk that’s close to dying.

    Ordoban

    Re: CAM status: ATA Status Error

    Both at the same time?

    Re: CAM status: ATA Status Error

    Ordoban

    Re: CAM status: ATA Status Error

    This are 2 different errors: the «CAM status» one and the «swap_pager» one. The first are rare and seems not critical, but the second one leads me to a real disk fault. The Reallocated_Sector_Ct of the first disk is jumped up from 0 to

    20k at last 2 days! The disk is replaced now and all is fine.

    (How i can mark this thread as solved?)

    worldi

    Re: CAM status: ATA Status Error

    Ordoban

    Terry_Kennedy

    Re: CAM status: ATA Status Error

    This does seem a bit fishy:

    Ordoban

    emmex

    Re: CAM status: ATA Status Error

    This does seem a bit fishy:

    You don’t indicate the drive manufacturer / model, but it seems odd that 2 drives had an error on the same disk block, and that block just happened to be the «magic» last addressable LBA in pre-LBA48 mode. In theory, a drive should reject a command to access a block outside its capacity, but it may be that the model you’re using barfs and logs a SMART error instead.

    Источник

    CAM status: ATA Status Error

    SmallGuy

    Then I have proceed to a short follwing by a long SMART test and all seems to be fine:

    Find also the gpart list (I don’t have enough knoledge to extract any information from that):

    I have try to change the SATA cable and I have test the RAM with Memetest and nothing to notice.
    I don’t understand what’s happend, and what to do now.
    Has somebody any idea?

    cyberjock

    Inactive Account

    I got nothing. Everything looks fine. Even the cable error values are zero per SMART. So you should start looking elsewhere. Maybe a crappy power supply or something?

    You have made the choice to buy an Asus desktop motherboard, which is not exactly a choice that is even remotely recommended around here. You also didn’t go with ECC RAM, not recommended around here. Granted, ECC shouldn’t be making drives sound funny. But there’s 2 ways to build a server. The way we recommend and any other way. The recommended way is obviously easier as you KNOW everything should work properly. As soon as you want to go any other way you accept certain risks.

    Unfortunately, you are on your own at this point as none of your info gives me any impression anything is wrong. This is why I think its smarter for people just learning FreeNAS/FreeBSD to stick to recommended hardware as you are not familiar enough with the OS to go out buying random hardware. This isn’t Windows. 100% of your hardware won’t work in FreeBSD. Even if it works, that doesn’t mean it will work properly or even work reliably.

    SmallGuy

    I got nothing. Everything looks fine. Even the cable error values are zero per SMART. So you should start looking elsewhere. Maybe a crappy power supply or something?

    You have made the choice to buy an Asus desktop motherboard, which is not exactly a choice that is even remotely recommended around here. You also didn’t go with ECC RAM, not recommended around here. Granted, ECC shouldn’t be making drives sound funny. But there’s 2 ways to build a server. The way we recommend and any other way. The recommended way is obviously easier as you KNOW everything should work properly. As soon as you want to go any other way you accept certain risks.

    Unfortunately, you are on your own at this point as none of your info gives me any impression anything is wrong. This is why I think its smarter for people just learning FreeNAS/FreeBSD to stick to recommended hardware as you are not familiar enough with the OS to go out buying random hardware. This isn’t Windows. 100% of your hardware won’t work in FreeBSD. Even if it works, that doesn’t mean it will work properly or even work reliably.

    cyberjock

    Inactive Account

    1. Sure, but Asus adds their own stuff to the board. Are you 100% sure that Asus own additions to the board aren’t responsible? Hint: You cannot answer that with a «yes» unless you have a duplicate system with the exact same hardware and BIOS version and it doesn’t do it.
    2. No, but that error is usually caused by an issue like a bad hard drive. In your case it gives a generic ATA status error and no disk. This makes me thing something is up with your SATA or something is interfering with your SATA and controller causing the problem.
    3. If it only appears after a disk replacement then I’d ignore it. It should not be giving that error on disk replacement if your hardware supports hotswap. Hot plugging and hot swapping will depend on if you hardware supports it and if the FreeBSD driver supports it. Hot plugging is NOT the same as hot swapping.
    4. You’re right. And that’s why I think it might be an unspecified error with your hardware. Unfortunately, you are totally on your own since you aren’t using hardware that’s been used alot. If you had a board like mine and still had the error at least I(or someone on the forums) could vouch that it *should* be working properly with a given version. But we can’t since you are on an island by yourself regarding the hardware. There’s nobody to vouch that your exact hardware should work.
    5. If you look at gpart list‘s output you’ll see a line for each disk that says «state: OK». That means it is definitely NOT a partition problem. Of course, the error made it obvious(in my opinion) that it wasn’t a partition problem, but that state: OK makes it a «for certain» condition.

    And regarding Haswell, I don’t recommend them yet because of those issues. Those issues are similar to yours in that the error messages are unspecified and you’d have to figure it out by process of elimination or by finding someone with the same error(remember, you can’t do this because of your hardware).

    Unless you can provide more specific conditions on which the error occurs you are pretty much on your own to find the cause. Sorry.

    Источник

  • Раз в сутки сервер требует перезагрузки

    Модераторы: vadim64, terminus

    Правила форума
    Убедительная просьба юзать теги [cоde] при оформлении листингов.
    Сообщения не оформленные должным образом имеют все шансы быть незамеченными.

    Денис

    проходил мимо

    Раз в сутки сервер требует перезагрузки

    Доброго времени суток. Сервер — FreeBSD 9.1-RELEASE #0: Fri Sep 27 01:29:26 MSK 2013.
    Система стоит на двух RAID

    Код: Выделить всё

    Name Status Components
    mirror/boot COMPLETE ada0p1 (ACTIVE)
    ada1p1 (ACTIVE)
    mirror/swap COMPLETE ada0p2 (ACTIVE)
    ada1p2 (ACTIVE)
    mirror/root COMPLETE ada0p3 (ACTIVE)
    ada1p3 (ACTIVE)
    mirror/web COMPLETE ada2p1 (ACTIVE)
    ada3p1 (ACTIVE)
    mirror/storage COMPLETE ada2p2 (ACTIVE)
    ada3p2 (ACTIVE)

    Раз в сутки требует перезагрузки, причем примерно через 24 часа после предыдущего. Безошибочный способ nslookup — не может найти сервер. В логах ничего найти не могу, что могло бы подсказать в каком направлении искать. На сервере «крутится» все и почта и вэб и MySQl.
    Еще один момент определить, что сервер «встал» можно по звуку — винты делают такой еле слышный «дзынкь», после этого проверяю nslookup, все отзыва нет. Подскажите пожалуйста где можно поискать причину.

    Последний раз редактировалось f_andrey 2013-10-18 13:34:11, всего редактировалось 1 раз.

    Причина: Автору. пожалуйста, выбирайте соответствующий раздел форума, оформляйте сообщение по человечески.


    Хостинговая компания Host-Food.ru

    Хостинг HostFood.ru

     

    Услуги хостинговой компании Host-Food.ru

    Хостинг HostFood.ru

    Тарифы на хостинг в России, от 12 рублей: https://www.host-food.ru/tariffs/hosting/
    Тарифы на виртуальные сервера (VPS/VDS/KVM) в РФ, от 189 руб.: https://www.host-food.ru/tariffs/virtualny-server-vps/
    Выделенные сервера, Россия, Москва, от 2000 рублей (HP Proliant G5, Intel Xeon E5430 (2.66GHz, Quad-Core, 12Mb), 8Gb RAM, 2x300Gb SAS HDD, P400i, 512Mb, BBU):
    https://www.host-food.ru/tariffs/vydelennyi-server-ds/
    Недорогие домены в популярных зонах: https://www.host-food.ru/domains/


    Аватара пользователя

    tom.cat

    старшина
    Сообщения: 446
    Зарегистрирован: 2007-11-24 20:23:49
    Откуда: Мытищи
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    tom.cat » 2013-10-18 14:49:06

    snorlov писал(а):Как делаешь перезагрузку…

    Reset button ?

    When you see pigs fly it means Windows has become open source


    Аватара пользователя

    QweЯty

    лейтенант
    Сообщения: 796
    Зарегистрирован: 2010-10-12 0:15:15
    Откуда: Таганрог, Калининград
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    QweЯty » 2013-10-20 18:51:45

    такс, в туже тему, тока разброс 3-5 суток, а бывает неделя…
    перестает выдавать ip, отвечать по ssh, пинговаться… НО, судя по лампочкам работа продолжается…
    в логах all.log

    Oct 20 02:10:10 radist04ka named[6408]: client 80.77.172.138#62034 (xn--80aaasphcburb2bjg5q.su): query (cache) ‘xn--80aaasphcburb2bjg5q.su/SOA/IN’ denied
    Oct 20 02:11:00 radist04ka /usr/sbin/cron[17823]: (root) CMD (/usr/local/etc/rrd/base/mem_update.sh)
    Oct 20 02:11:00 radist04ka /usr/sbin/cron[17828]: (root) CMD (/usr/local/etc/rrd/base/net_graph.sh)
    Oct 20 02:11:00 radist04ka /usr/sbin/cron[17830]: (operator) CMD (/usr/libexec/save-entropy)
    Oct 20 02:11:00 radist04ka /usr/sbin/cron[17829]: (root) CMD (/usr/local/etc/rrd/base/net_update.sh)
    Oct 20 02:11:00 radist04ka /usr/sbin/cron[17831]: (root) CMD (/usr/local/etc/rrd/base/cpu_graph.sh)
    Oct 20 02:11:00 radist04ka /usr/sbin/cron[17834]: (root) CMD (/usr/local/etc/rrd/base/cpu_update.sh)
    Oct 20 18:42:00 radist04ka syslogd: restart

    Oct 20 18:42:00 radist04ka syslogd: kernel boot file is /boot/kernel/kernel
    Oct 20 18:42:00 radist04ka kernel: Copyright (c) 1992-2013 The FreeBSD Project.
    Oct 20 18:42:00 radist04ka kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    Oct 20 18:42:00 radist04ka kernel: The Regents of the University of California. All rights reserved.
    Oct 20 18:42:00 radist04ka kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
    Oct 20 18:42:00 radist04ka kernel: FreeBSD 9.1-STABLE #0: Sun Jul 28 15:55:49 FET 2013
    Oct 20 18:42:00 radist04ka kernel: radist@radist04ka.localdoiman:/sys/i386/compile/RADIST.28.07.2013 i386
    Oct 20 18:42:00 radist04ka kernel: gcc version 4.2.1 20070831 patched [FreeBSD]
    Oct 20 18:42:00 radist04ka kernel: CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (3147.83-MHz 686-class CPU)
    Oct 20 18:42:00 radist04ka kernel: Origin = «GenuineIntel» Id = 0xf34 Family = 0xf Model = 0x3 Stepping = 4

    console.log

    Oct 19 23:15:59 radist04ka kernel: Oct 19 23:15:59 radist04ka dhcpd: Dynamic and static leases present for 192.168.7.2.
    Oct 19 23:15:59 radist04ka kernel: Oct 19 23:15:59 radist04ka dhcpd: Remove host declaration Loner-XP or remove 192.168.7.2
    Oct 19 23:15:59 radist04ka kernel: Oct 19 23:15:59 radist04ka dhcpd: from the dynamic address pool for 192.168.7.0/24
    Oct 19 23:45:59 radist04ka kernel: Oct 19 23:45:59 radist04ka dhcpd: Dynamic and static leases present for 192.168.7.2.
    Oct 19 23:45:59 radist04ka kernel: Oct 19 23:45:59 radist04ka dhcpd: Remove host declaration Loner-XP or remove 192.168.7.2
    Oct 19 23:45:59 radist04ka kernel: Oct 19 23:45:59 radist04ka dhcpd: from the dynamic address pool for 192.168.7.0/24
    Oct 20 00:16:00 radist04ka kernel: Oct 20 00:16:00 radist04ka dhcpd: Dynamic and static leases present for 192.168.7.2.
    Oct 20 00:16:00 radist04ka kernel: Oct 20 00:16:00 radist04ka dhcpd: Remove host declaration Loner-XP or remove 192.168.7.2
    Oct 20 00:16:00 radist04ka kernel: Oct 20 00:16:00 radist04ka dhcpd: from the dynamic address pool for 192.168.7.0/24
    Oct 20 18:42:00 radist04ka kernel: Setting hostuuid: 00020003-0004-0005-0006-000700080009.

    Oct 20 18:42:00 radist04ka kernel: Setting hostid: 0x81f4ec68.
    Oct 20 18:42:00 radist04ka kernel: Entropy harvesting: interrupts ethernet point_to_point kickstart.
    Oct 20 18:42:00 radist04ka kernel: Starting file system checks:
    Oct 20 18:42:00 radist04ka kernel: ** SU+J Recovering /dev/ada2p2
    Oct 20 18:42:00 radist04ka kernel: ** Reading 33554432 byte journal from inode 4.
    Oct 20 18:42:00 radist04ka kernel: ** Building recovery table.

    dmesg.yesterday

    (ada1:ata3:0:0:0): READ_DMA48. ACB: 25 00 bf 10 ff 40 73 00 00 00 00 01
    (ada1:ata3:0:0:0): CAM status: ATA Status Error
    (ada1:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
    (ada1:ata3:0:0:0): RES: 51 84 bf 10 ff 73 73 00 00 ef 00
    (ada1:ata3:0:0:0): Retrying command
    (ada1:ata3:0:0:0): READ_DMA48. ACB: 25 00 3f 46 a4 40 3c 00 00 00 00 01
    (ada1:ata3:0:0:0): CAM status: ATA Status Error
    (ada1:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
    (ada1:ata3:0:0:0): RES: 51 84 3f 46 a4 3c 3c 00 00 4f 00
    (ada1:ata3:0:0:0): Retrying command
    (ada1:ata3:0:0:0): READ_DMA48. ACB: 25 00 ff 5c c3 40 2b 00 00 00 00 01
    (ada1:ata3:0:0:0): CAM status: ATA Status Error
    (ada1:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
    (ada1:ata3:0:0:0): RES: 51 84 ff 5c c3 2b 2b 00 00 1f 00
    (ada1:ata3:0:0:0): Retrying command
    rl0: link state changed to DOWN
    (ada1:ata3:0:0:0): READ_DMA48. ACB: 25 00 7f c2 7c 40 2f 00 00 00 a0 00
    (ada1:ata3:0:0:0): CAM status: ATA Status Error
    (ada1:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 84 (ICRC ABRT )
    (ada1:ata3:0:0:0): RES: 51 84 7f c2 7c 2f 2f 00 00 0f 00
    (ada1:ata3:0:0:0): Retrying command

    потогает только отключение питания… рестарт буттон отключен в ядре)))

    хм…. у меня что винт подыхает??????!!!!!!!!!!!!

    Инженер не смотрит порно. Он ведет расчет бабы на усталость © Федор Сумкин
    Изображение


    Аватара пользователя

    Dominator

    мл. сержант
    Сообщения: 123
    Зарегистрирован: 2009-06-06 15:43:01
    Откуда: Новосибирск/Кобург
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    Dominator » 2013-10-20 19:27:41

    QweЯty писал(а):хм…. у меня что винт подыхает??????!!!!!!!!!!!!

    Возможно, а CAM его добивает retry’ами. Я сразу на боевом севере (FreeBSD 9.1 amd64) CAM вырезал

    Денис писал(а): Безошибочный способ nslookup — не может найти сервер.

    /etc/rc.d/netif restart

    Windows must die!


    Аватара пользователя

    QweЯty

    лейтенант
    Сообщения: 796
    Зарегистрирован: 2010-10-12 0:15:15
    Откуда: Таганрог, Калининград
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    QweЯty » 2013-10-20 20:12:41

    Я сразу на боевом севере (FreeBSD 9.1 amd64) CAM вырезал

    эм… а что это, что дает и как вырезать…

    Инженер не смотрит порно. Он ведет расчет бабы на усталость © Федор Сумкин
    Изображение


    Аватара пользователя

    QweЯty

    лейтенант
    Сообщения: 796
    Зарегистрирован: 2010-10-12 0:15:15
    Откуда: Таганрог, Калининград
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    QweЯty » 2013-10-20 22:35:27

    ———
    #
    # Areca 11xx and 12xx series of SATA II RAID controllers.
    # CAM is required.
    #
    device arcmsr # Areca SATA II RAID

    ———————
    #
    # 3ware 9000 series PATA/SATA RAID controller driver and options.
    # The driver is implemented as a SIM, and so, needs the CAM infrastructure.
    #
    options TWA_DEBUG # 0-10; 10 prints the most messages.
    options TWA_FLASH_FIRMWARE # firmware image bundled when defined.
    device twa # 3ware 9000 series PATA/SATA RAID
    ————-
    #
    # Adaptec FSA RAID controllers, including integrated DELL controllers,
    # the Dell PERC 2/QC and the HP NetRAID-4M
    device aac
    device aacp # SCSI Passthrough interface (optional, CAM required)

    # The ‘asr’ driver provides support for current DPT/Adaptec SCSI RAID
    # controllers (SmartRAID V and VI and later).
    # These controllers require the CAM infrastructure.
    #
    device asr

    4ре места где встречает cam в notes…
    но что за что отвечает….

    правда в генерик есть:

    cat GENERIC | grep CAM
    options ATA_CAM # Handle legacy controllers with CAM
    device ctl # CAM Target Layer
    device aacp # SCSI passthrough for aac (requires CAM)

    Инженер не смотрит порно. Он ведет расчет бабы на усталость © Федор Сумкин
    Изображение


    snorlov

    подполковник
    Сообщения: 3918
    Зарегистрирован: 2008-09-04 11:51:25
    Откуда: Санкт-Петербург

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    snorlov » 2013-10-21 8:19:14

    QweЯty писал(а):

    Я сразу на боевом севере (FreeBSD 9.1 amd64) CAM вырезал

    эм… а что это, что дает и как вырезать…

    Кабель еще поменяй…


    Аватара пользователя

    QweЯty

    лейтенант
    Сообщения: 796
    Зарегистрирован: 2010-10-12 0:15:15
    Откуда: Таганрог, Калининград
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    QweЯty » 2013-10-21 22:13:28

    менял кабель(((
    многа раз…

    Инженер не смотрит порно. Он ведет расчет бабы на усталость © Федор Сумкин
    Изображение


    Аватара пользователя

    QweЯty

    лейтенант
    Сообщения: 796
    Зарегистрирован: 2010-10-12 0:15:15
    Откуда: Таганрог, Калининград
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    QweЯty » 2013-10-29 19:20:50

    в общем дело не в этом винте…
    с 2013-10-21 23:13:28 +/- пара часов по сегодня работало без проблемного винта

    в логах тоже тишина…

    Инженер не смотрит порно. Он ведет расчет бабы на усталость © Федор Сумкин
    Изображение


    Аватара пользователя

    Dominator

    мл. сержант
    Сообщения: 123
    Зарегистрирован: 2009-06-06 15:43:01
    Откуда: Новосибирск/Кобург
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    Dominator » 2014-02-15 14:56:30

    QweЯty писал(а):

    Я сразу на боевом севере (FreeBSD 9.1 amd64) CAM вырезал

    эм… а что это, что дает и как вырезать…

    В конфигах ядра все упоминание про CAM убираешь, заместо этого вводишь что-то вроде

    Точнее не помню — загляни в дефолтный конфиг FreeBSD 8.0

    Windows must die!


    guest

    проходил мимо

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    guest » 2014-02-15 17:32:16

    Dominator писал(а):

    QweЯty писал(а):

    Я сразу на боевом севере (FreeBSD 9.1 amd64) CAM вырезал

    эм… а что это, что дает и как вырезать…

    В конфигах ядра все упоминание про CAM убираешь, заместо этого вводишь что-то вроде

    Точнее не помню — загляни в дефолтный конфиг FreeBSD 8.0

    бред, «cam» он вырезал…
    «cam добивает диск ретраями» — видимо из лога, только слово retry поняли

    CAM — Common Access Method, изначально разработан для SCSI устройств, но представляет из себя универсальный
    метод доступа, новый ATA драйвер переписан с использованием CAM интерфейса


    Аватара пользователя

    Dominator

    мл. сержант
    Сообщения: 123
    Зарегистрирован: 2009-06-06 15:43:01
    Откуда: Новосибирск/Кобург
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    Dominator » 2014-02-16 7:10:40

    guest писал(а):
    бред, «cam» он вырезал…

    Не знаю, как у кого, а у меня после этого намного меньше мусора стало в логах и бракованный винт смог отбекапировать. Так что, прежде чем бросаться громкими заявлениями, рекомендую проверить на тестовой машине сие действие.

    P.S. guest, я вижу лишь твои эмоции, но не аргументы

    Windows must die!


    guest

    проходил мимо

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    guest » 2014-02-16 12:31:59

    Dominator писал(а):

    guest писал(а):
    бред, «cam» он вырезал…

    Не знаю, как у кого, а у меня после этого намного меньше мусора стало в логах и бракованный винт смог отбекапировать. Так что, прежде чем бросаться громкими заявлениями, рекомендую проверить на тестовой машине сие действие.

    P.S. guest, я вижу лишь твои эмоции, но не аргументы

    какие эмоции и аргументы могут быть в ответ на бред.

    Вы чисто для общего представления почитайте что есть CAM и подумайте как старый и новый ATA драйвера могут
    влиять на HDD и портить их и меньше писать в логи.


    Аватара пользователя

    Dominator

    мл. сержант
    Сообщения: 123
    Зарегистрирован: 2009-06-06 15:43:01
    Откуда: Новосибирск/Кобург
    Контактная информация:

    Re: Раз в сутки сервер требует перезагрузки

    Непрочитанное сообщение

    Dominator » 2014-02-16 19:35:43

    guest писал(а):Вы чисто для общего представления почитайте что есть CAM и подумайте как старый и новый ATA драйвера могут
    влиять на HDD и портить их и меньше писать в логи.

    Как бы это все красиво не было написано, без проверки в боевых условиях, это ничто. Я не привык верить написанному, пока сам не проверю. Такого засирания консоли я еще не видел, как это было с CAM.

    После отката на старый драйвер, который был в восьмерке, было одно ругательное сообщение в консоли и всед за ним ругань mc, при помощи которого спасал то, что можно спасти. И самое главное, все было быстро, в отличие от cam, который после каждого битого блока «просирался» по 2-3 минуты

    P.S. Я так сильно не ругался с 2009 года, когда дернул неотмонтированную флешку на 6.2 во время установки прог из портов и повредил конфигурацию оных. Поэтому, что бы там ни говорили, я с CAM больше не дружу — сразу под скальпель. Хотя признаюсь, у него есть удобные плюшки.

    Windows must die!


    Намедни обзавелся тремя дисками WD Caviar Green по 3 ТБ. Счастья полные штаны smile Но за счастьем следовало разочарование. Итак, немного предыстории.

    Диски были куплены для организации файлового хранилища, как бы пафосно это не звучало smile

    Тазик используется как медиа сервер. Весь медиа контент воспроизводится через dune hd tv101 приставочку по протоколу SMB.

    Естественно, я решил использовать ZFS для организации массива RAID5, RAID0 для таких дел как бе нелепо, а RAID1 из трех дисков не сложится, да и четвертый покупать пока не имеет смысла.

    Я без задней мысли штампую:

    # zpool create storage raidz ada1 ada2 ada3

    Потом я создал каталоги и радостный стал переносить всякое добро с системного диска на новый массив.

    После нескольких минут копирования, я увидел это в /var/log/messages:

    (aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    (aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
    (aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted
    (aprobe1:ahcich1:0:15:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    (aprobe1:ahcich1:0:15:0): CAM status: ATA Status Error
    (aprobe1:ahcich1:0:15:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
    (aprobe1:ahcich1:0:15:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
    (aprobe1:ahcich1:0:15:0): Error 5, Retries exhausted
    (aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    (aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
    (aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted

    Что показывал camcontrol devlist:

    beast# camcontrol devlist
    <SAMSUNG HD160JJ ZM100-47>         at scbus0 target 0 lun 0 (pass0,ada0)
    <WDC WD30EZRX-00MMMB0 80.00A80>    at scbus2 target 0 lun 0 (pass2,ada2)
    <WDC WD30EZRX-00MMMB0 80.00A80>    at scbus3 target 0 lun 0 (pass3,ada3)
    <Generic- SD/MMC 1.00>             at scbus6 target 0 lun 0 (da0,pass4)
    <Generic- Compact Flash 1.01>      at scbus6 target 0 lun 1 (da1,pass5)
    <Generic- SM/xD-Picture 1.02>      at scbus6 target 0 lun 2 (da2,pass6)
    <Generic- MS/MS-Pro 1.03>          at scbus6 target 0 lun 3 (da3,pass7)

    Хм, первый диск отвалился.

    Запустил более подробный вывод camcontrol:

    beast# camcontrol devlist -v
    scbus0 on ahcich0 bus 0:
    <SAMSUNG HD160JJ ZM100-47>         at scbus0 target 0 lun 0 (pass0,ada0)
    <>                                 at scbus0 target -1 lun -1 ()
    scbus1 on ahcich1 bus 0:
    <WDC WD30EZRX-00MMMB0 80.00A80>    at scbus1 target 0 lun 0 (pass1)
    <>                                 at scbus1 target -1 lun -1 ()
    scbus2 on ahcich2 bus 0:
    <WDC WD30EZRX-00MMMB0 80.00A80>    at scbus2 target 0 lun 0 (pass2,ada2)
    <>                                 at scbus2 target -1 lun -1 ()
    scbus3 on ahcich3 bus 0:
    <WDC WD30EZRX-00MMMB0 80.00A80>    at scbus3 target 0 lun 0 (pass3,ada3)
    <>                                 at scbus3 target -1 lun -1 ()
    scbus4 on ahcich4 bus 0:
    <>                                 at scbus4 target -1 lun -1 ()
    scbus5 on ahcich5 bus 0:
    <>                                 at scbus5 target -1 lun -1 ()
    scbus6 on umass-sim0 bus 0:
    <Generic- SD/MMC 1.00>             at scbus6 target 0 lun 0 (da0,pass4)
    <Generic- Compact Flash 1.01>      at scbus6 target 0 lun 1 (da1,pass5)
    <Generic- SM/xD-Picture 1.02>      at scbus6 target 0 lun 2 (da2,pass6)
    <Generic- MS/MS-Pro 1.03>          at scbus6 target 0 lun 3 (da3,pass7)
    scbus-1 on xpt0 bus 0:
    <>                                 at scbus-1 target -1 lun -1 (xpt0)

    Притом dmesg всегда показывал, что диск как бе на месте и виден системе:

    beast# dmesg -a | grep ada
    ada0: <SAMSUNG HD160JJ ZM100-47> ATA-7 SATA 2.x device
    ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    ada0: Command Queueing enabled
    ada0: 152627MB (312581808 512 byte sectors: 16H 63S/T 16383C)
    ada0: Previously was known as ad4
    ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
    ada1: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
    ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    ada1: Command Queueing enabled
    ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
    ada1: Previously was known as ad6
    ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
    ada2: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
    ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    ada2: Command Queueing enabled
    ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
    ada2: Previously was known as ad8
    ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
    ada3: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
    ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    ada3: Command Queueing enabled
    ada3: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
    ada3: Previously was known as ad10

    Вывод pciconf -lv был следующим:

    beast# pciconf -lv
    ahci0@pci0:0:31:2:      class=0x010601 card=0x82d41043 chip=0x3a228086 rev=0x00 hdr=0x00
        vendor     = 'Intel Corporation'
        device     = '82801JI (ICH10 Family) SATA AHCI Controller'
        class      = mass storage
        subclass   = SATA
        bar   [10] = type I/O Port, range 32, base 0xac00, size  8, enabled
        bar   [14] = type I/O Port, range 32, base 0xa880, size  4, enabled
        bar   [18] = type I/O Port, range 32, base 0xa800, size  8, enabled
        bar   [1c] = type I/O Port, range 32, base 0xa480, size  4, enabled
        bar   [20] = type I/O Port, range 32, base 0xa400, size 32, enabled
        bar   [24] = type Memory, range 32, base 0xf9ffc000, size 2048, enabled
        cap 05[80] = MSI supports 16 messages enabled with 1 message
        cap 01[70] = powerspec 3  supports D0 D3  current D0
        cap 12[a8] = SATA Index-Data Pair
        cap 13[b0] = PCI Advanced Features: FLR TP

    В BIOS’е ahci режим был включен изначально. Еще хотел бы обратить внимание:

    В FreeBSD 9 версии

    device ahci

    вшит в ядро GENERIC, то есть нет необходимости добавлять в /boot/loader.conf опции

    ahci_load="YES"
    ahci_enable="YES"

    Как это требовалось для FreeBSD <8

    Статус пула выглядел следующим образом:

    # zpool status
    pool: storage
    state: DEGRADED
    status: One or more devices has been removed by the administrator.
            Sufficient replicas exist for the pool to continue functioning in a
            degraded state.
    action: Online the device using 'zpool online' or replace the device with
            'zpool replace'.
    scan: resilvered 36,5M in 0h2m with 0 errors on Wed Oct 24 15:11:44 2012
    config:
            NAME                     STATE     READ WRITE CKSUM
            storage                  DEGRADED     0     0     0
              raidz1-0               DEGRADED     0     0     0
                1180438976994044890  REMOVED      0     0     0  was /dev/ada1
                ada2                 ONLINE       0     0     0
                ada3                 ONLINE       0     0     0
    errors: No known data errors

    После долгих раздумий и педолирования интернета, решение подсказали на офф. форуме FreeBSD.

    В чем же была проблема и как избежать геморроя при использовании терабайтных дисков. Решение касается винтов WD и не только.

    Эти диски имеют фичу AF — Advanced Format, т.е. имеют размер сектора в 4k, но на самом деле представляют сектор в размере 512b. Врут, негодники.

    Создавать пул ZFS из сырых дисков не рекомендуется, т.к. ZFS отправляет операции I/O (ввода/вывода) не выровненными, что может очень сильно сказаться на производительности.

    Поэтому необходимо сделать разделы жестких дисков по 1Мб.

    Следующая проблема — показатель zfs ashift. Когда я создал пул из сырых дисков, этот показатель был равен:

    beast# zdb | grep ashift
                ashift: 9

    Что свидетельствует о том, что ZFS будет отправлять 512байтные IO, тогда как ashift: 12, отправляет 4K IO. Что и любят эти диски.

    Если есть важные данные, сделайте резервную копию. У меня не было, я просто грохнул пул и принялся за исправление ошибок.

    Делаем партиции:

    # gpart create -s gpt ada(1,2,3)
    # gpart add -t freebsd-zfs -b 2048 -a 4k -l disk(1,2,3) ada(1,2,3)

    Создаем пул с ashift: 12

    # gnop create -S 4096 /dev/gpt/disk1
    # zpool create storage raidz gpt/disk1.nop gpt/disk2 gpt/disk3
    # zpool export storage
    # gnop destroy /dev/gpt/disk1.nop
    # zpool import -d /dev/gpt storage

    Это, что касается пула. Теперь собственно, что качается дисков. Есть у них неприятная вещь, как парковка головок при простое в 5 секунд. Про эту спецификацию можно почитать в интернетах, скажу, что это не есть хорошо для дисков, которые состоят в массиве ZFS. Быстрее выйдут из строя.

    Из этой ситуации есть выход. Есть тулза от WD называется wdidle. Скачать ее можно с офф. сайта произвродителя или скачать The Ultimate Boot CD 5.11 по-моему. В этот диск входит эта утилита.

    Небольшое видео про нее и как пользоваться.

    В моем случае это не помогло. Оказалось, что сам диск был неисправен и по гарантии был заменен у продавца.

    Во всяком случае перед тем как вести в сервис центр, проверьте с разными SATA кабелями и портами подозрительный диск.

    Убедитесь, что контроллер на материнской плате поддерживает такие диски, и свежий ли BIOS.

    Понравилась статья? Поделить с друзьями:
  • Callx не записывает собеседника как исправить
  • Callout generic error please contact customer care
  • Called runscript when not marked in progress win 10 как исправить
  • Called method may not always return a value как исправить
  • Callclassinstaller registerdevice failure 0xe0000235 как исправить