Critical target error dev sda sector

After upgrade to 3.12.28 (de69b134dc6e4066fe70db29816d57895dffd9b9) my raspberry started to show kernel errors while trying to write to USB HD, causing the filesystem to go into read-only mode. [ 3...

Comments

@bosscyril

popcornmix

added a commit
to raspberrypi/firmware
that referenced
this issue

Dec 13, 2014

@popcornmix

popcornmix

added a commit
to Hexxeh/rpi-firmware
that referenced
this issue

Dec 13, 2014

@popcornmix

bosscyril

referenced
this issue
in Hexxeh/rpi-firmware

Apr 14, 2015

@popcornmix

neuschaefer

pushed a commit
to neuschaefer/raspi-binary-firmware
that referenced
this issue

Feb 27, 2017

@popcornmix

popcornmix

pushed a commit
that referenced
this issue

Nov 11, 2019

@w1ldptr

Geneve implementation changed mlx5 tc to user direct pointer to tunnel_key
action's internal struct ip_tunnel_info instance. However, this leads to
use-after-free error when initial filter that caused creation of new encap
entry is deleted or when tunnel_key action is manually overwritten through
action API. Moreover, with recent TC offloads API unlocking change struct
flow_action_entry->tunnel point to temporal copy of tunnel info that is
deallocated after filter is offloaded to hardware which causes bug to
reproduce every time new filter is attached to existing encap entry with
following KASAN bug:

[  314.885555] ==================================================================
[  314.886641] BUG: KASAN: use-after-free in memcmp+0x2c/0x60
[  314.886864] Read of size 1 at addr ffff88886c746280 by task tc/2682

[  314.887179] CPU: 22 PID: 2682 Comm: tc Not tainted 5.3.0-rc7+ #703
[  314.887188] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  314.887195] Call Trace:
[  314.887215]  dump_stack+0x9a/0xf0
[  314.887236]  print_address_description+0x67/0x323
[  314.887248]  ? memcmp+0x2c/0x60
[  314.887257]  ? memcmp+0x2c/0x60
[  314.887272]  __kasan_report.cold+0x1a/0x3d
[  314.887474]  ? __mlx5e_tc_del_fdb_peer_flow+0x100/0x1b0 [mlx5_core]
[  314.887484]  ? memcmp+0x2c/0x60
[  314.887509]  kasan_report+0xe/0x12
[  314.887521]  memcmp+0x2c/0x60
[  314.887662]  mlx5e_tc_add_fdb_flow+0x51b/0xbe0 [mlx5_core]
[  314.887838]  ? mlx5e_encap_take+0x110/0x110 [mlx5_core]
[  314.887902]  ? lockdep_init_map+0x87/0x2c0
[  314.887924]  ? __init_waitqueue_head+0x4f/0x60
[  314.888062]  ? mlx5e_alloc_flow.isra.0+0x18c/0x1c0 [mlx5_core]
[  314.888207]  __mlx5e_add_fdb_flow+0x2d7/0x440 [mlx5_core]
[  314.888359]  ? mlx5e_tc_update_neigh_used_value+0x6f0/0x6f0 [mlx5_core]
[  314.888374]  ? match_held_lock+0x2e/0x240
[  314.888537]  mlx5e_configure_flower+0x830/0x16a0 [mlx5_core]
[  314.888702]  ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
[  314.888713]  ? down_read+0x118/0x2c0
[  314.888728]  ? down_read_killable+0x300/0x300
[  314.888882]  ? mlx5e_rep_get_ethtool_stats+0x180/0x180 [mlx5_core]
[  314.888899]  tc_setup_cb_add+0x127/0x270
[  314.888937]  fl_hw_replace_filter+0x2ac/0x380 [cls_flower]
[  314.888976]  ? fl_hw_destroy_filter+0x1b0/0x1b0 [cls_flower]
[  314.888990]  ? fl_change+0xbcf/0x27ef [cls_flower]
[  314.889030]  ? fl_change+0xa57/0x27ef [cls_flower]
[  314.889069]  fl_change+0x16bd/0x27ef [cls_flower]
[  314.889135]  ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[  314.889167]  ? __radix_tree_lookup+0xa4/0x130
[  314.889200]  ? fl_get+0x169/0x240 [cls_flower]
[  314.889218]  ? fl_walk+0x230/0x230 [cls_flower]
[  314.889249]  tc_new_tfilter+0x5e1/0xd40
[  314.889281]  ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[  314.889309]  ? tc_del_tfilter+0xa30/0xa30
[  314.889335]  ? __lock_acquire+0x5b5/0x2460
[  314.889378]  ? find_held_lock+0x85/0xa0
[  314.889442]  ? tc_del_tfilter+0xa30/0xa30
[  314.889465]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  314.889488]  ? rtnl_dellink+0x490/0x490
[  314.889518]  ? lockdep_hardirqs_on+0x260/0x260
[  314.889538]  ? netlink_deliver_tap+0xab/0x5a0
[  314.889550]  ? match_held_lock+0x1b/0x240
[  314.889575]  netlink_rcv_skb+0xd0/0x200
[  314.889588]  ? rtnl_dellink+0x490/0x490
[  314.889605]  ? netlink_ack+0x440/0x440
[  314.889635]  ? netlink_deliver_tap+0x161/0x5a0
[  314.889648]  ? lock_downgrade+0x360/0x360
[  314.889657]  ? lock_acquire+0xe5/0x210
[  314.889686]  netlink_unicast+0x296/0x350
[  314.889707]  ? netlink_attachskb+0x390/0x390
[  314.889726]  ? _copy_from_iter_full+0xe0/0x3a0
[  314.889738]  ? __virt_addr_valid+0xbb/0x130
[  314.889771]  netlink_sendmsg+0x394/0x600
[  314.889800]  ? netlink_unicast+0x350/0x350
[  314.889817]  ? move_addr_to_kernel.part.0+0x90/0x90
[  314.889852]  ? netlink_unicast+0x350/0x350
[  314.889872]  sock_sendmsg+0x96/0xa0
[  314.889891]  ___sys_sendmsg+0x482/0x520
[  314.889919]  ? copy_msghdr_from_user+0x250/0x250
[  314.889930]  ? __fput+0x1fa/0x390
[  314.889941]  ? task_work_run+0xb7/0xf0
[  314.889957]  ? exit_to_usermode_loop+0x117/0x120
[  314.889972]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  314.889982]  ? do_syscall_64+0x74/0xe0
[  314.889992]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  314.890012]  ? mark_lock+0xac/0x9a0
[  314.890028]  ? __lock_acquire+0x5b5/0x2460
[  314.890053]  ? mark_lock+0xac/0x9a0
[  314.890083]  ? __lock_acquire+0x5b5/0x2460
[  314.890112]  ? match_held_lock+0x1b/0x240
[  314.890144]  ? __fget_light+0xa1/0xf0
[  314.890166]  ? sockfd_lookup_light+0x91/0xb0
[  314.890187]  __sys_sendmsg+0xba/0x130
[  314.890201]  ? __sys_sendmsg_sock+0xb0/0xb0
[  314.890225]  ? __blkcg_punt_bio_submit+0xd0/0xd0
[  314.890264]  ? lockdep_hardirqs_off+0xbe/0x100
[  314.890274]  ? mark_held_locks+0x24/0x90
[  314.890286]  ? do_syscall_64+0x1e/0xe0
[  314.890308]  do_syscall_64+0x74/0xe0
[  314.890325]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  314.890336] RIP: 0033:0x7f00ca33d7b8
[  314.890348] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[  314.890356] RSP: 002b:00007ffea2983928 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  314.890369] RAX: ffffffffffffffda RBX: 000000005d777d5b RCX: 00007f00ca33d7b8
[  314.890377] RDX: 0000000000000000 RSI: 00007ffea2983990 RDI: 0000000000000003
[  314.890384] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[  314.890392] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[  314.890400] R13: 000000000047f640 R14: 00007ffea2987b58 R15: 0000000000000021

[  314.890529] Allocated by task 2687:
[  314.890684]  save_stack+0x1b/0x80
[  314.890694]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[  314.890705]  __kmalloc_track_caller+0x102/0x340
[  314.890721]  kmemdup+0x1d/0x40
[  314.890730]  tc_setup_flow_action+0x731/0x2c27
[  314.890743]  fl_hw_replace_filter+0x23b/0x380 [cls_flower]
[  314.890756]  fl_change+0x16bd/0x27ef [cls_flower]
[  314.890765]  tc_new_tfilter+0x5e1/0xd40
[  314.890776]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  314.890786]  netlink_rcv_skb+0xd0/0x200
[  314.890796]  netlink_unicast+0x296/0x350
[  314.890805]  netlink_sendmsg+0x394/0x600
[  314.890815]  sock_sendmsg+0x96/0xa0
[  314.890825]  ___sys_sendmsg+0x482/0x520
[  314.890834]  __sys_sendmsg+0xba/0x130
[  314.890844]  do_syscall_64+0x74/0xe0
[  314.890854]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  314.890937] Freed by task 2687:
[  314.891076]  save_stack+0x1b/0x80
[  314.891086]  __kasan_slab_free+0x12c/0x170
[  314.891095]  kfree+0xeb/0x2f0
[  314.891106]  tc_cleanup_flow_action+0x69/0xa0
[  314.891119]  fl_hw_replace_filter+0x2c5/0x380 [cls_flower]
[  314.891132]  fl_change+0x16bd/0x27ef [cls_flower]
[  314.891140]  tc_new_tfilter+0x5e1/0xd40
[  314.891151]  rtnetlink_rcv_msg+0x4ab/0x5f0
[  314.891161]  netlink_rcv_skb+0xd0/0x200
[  314.891170]  netlink_unicast+0x296/0x350
[  314.891180]  netlink_sendmsg+0x394/0x600
[  314.891190]  sock_sendmsg+0x96/0xa0
[  314.891200]  ___sys_sendmsg+0x482/0x520
[  314.891208]  __sys_sendmsg+0xba/0x130
[  314.891218]  do_syscall_64+0x74/0xe0
[  314.891228]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  314.891315] The buggy address belongs to the object at ffff88886c746280
                which belongs to the cache kmalloc-96 of size 96
[  314.891762] The buggy address is located 0 bytes inside of
                96-byte region [ffff88886c746280, ffff88886c7462e0)
[  314.892196] The buggy address belongs to the page:
[  314.892387] page:ffffea0021b1d180 refcount:1 mapcount:0 mapping:ffff88835d00ef80 index:0x0
[  314.892398] flags: 0x57ffffc0000200(slab)
[  314.892413] raw: 0057ffffc0000200 ffffea00219e0340 0000000800000008 ffff88835d00ef80
[  314.892423] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[  314.892430] page dumped because: kasan: bad access detected

[  314.892515] Memory state around the buggy address:
[  314.892707]  ffff88886c746180: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.892976]  ffff88886c746200: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.893251] >ffff88886c746280: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.893522]                    ^
[  314.893657]  ffff88886c746300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  314.893924]  ffff88886c746380: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
[  314.894189] ==================================================================

Fix the issue by duplicating tunnel info into per-encap copy that is
deallocated with encap structure. Also, duplicate tunnel info in flow parse
attribute to support cases when flow might be attached asynchronously.

Fixes: 1f6da30 ("net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

The first error you reported:

ata1:00: status: { DRDY ERR }
ata1.00: error {UNC }
ata1:00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1:00: BMDMA stat 0x25
ata1:00: failed command: READ DMA

says that a READ DMA ATA command to a disk on ATA port 1 failed (status includes ERR for error). That port is most likely the hard disk, and the error points toward the drive having problems. The DMA part can likely be ignored; DMA is Direct Memory Access which is the dominant transfer mode these days, and if you were having RAM or RAM bus problems to the degree that you were hitting something like that repeatedly, you’d likely be seeing a ton more errors if the system was able to function at all.

The second error:

end_request: critical target error, dev sda, sector 32839936
EXT4_fs error: (device sda5): ext4_find_entry:935: inode #393217: comm init: reading directory lblock 0
INIT: No inittab file found

says there is some problem on /dev/sda, sector 32839936, which with 512-byte sectors puts us physically toward the end of the /dev/sda5 partition, which adds up with device sda5 as reported by the file system driver. The error reported by init together with the file system driver’s error details points toward a problem with the file system causing /etc/inittab to be unavailable or (less likely) unreadable. This would mean that either the root directory, the /etc directory, or the /etc/inittab file entry are somehow involved in the corruption. Given the inode number, I’d take a shot at /etc/inittab specifically being the culprit, until proven wrong.

You write (my emphasis):

Suspecting a HDD crash, I took it out and used in another PC as an external USB HDD drive and I was able to mount & see all partitions and files within. So I assume Disc is OK.

I would say that your assumption is unfounded. The disk is obviously having some problem; with any luck, it’ll be easy to fix.

The first thing I would do in your situation is to refresh my backup of everything that is on that disk. Make sure that you do not overwrite or delete anything from your most recent backup, as there is certainly a possibility that you will need it. Perhaps the best option is to make a fresh backup onto a new (or at least not previously used for your own backups) drive of everything that you are able to access. Expect some I/O errors on the source while making that copy.

Second comes attempting recovery. With any luck, given the errors, this is a single-sector or few-sectors problem which has caused a small amount of file system corruption, in which case e2fsck should be able to repair most of the damage. Some of your files are likely gone, but with some luck, you might be able to find them in /lost+found under the file system’s mount root (meaning for example /data/lost+found if you mount /dev/sda5 on /data) after having e2fsck do what it can. Otherwise, do a comparison against your most recent backup from before the problems started, and restore relevant files from the backup. (Did I mention backups are useful if bad things ever happen, as they inevitably do?)

Third comes the question of whether you can trust the drive for future use. A few bad sectors doesn’t have to be catastrophic from the drive’s point of view, but rotational drives about 100 GB in size practically cannot be sourced new today in most form factors, which points to this being a relatively old drive. Personally, I’d probably just accept that the drive has outlived its useful life at this point and get a replacement, but then again I am rather paranoid when it comes to my data; your mileage may vary. You will have to weigh the cost of a replacement drive against the risk of total failure of the drive and subsequent total loss of all the data on the drive.

В первую очередь сделайте копию всех важных данных на сервере и убедитесь, что в копии они не повреждены.

развернут ESXI 6.5 … Диски работают в RAID 1+0 под контроллером HP b120i

ESXI и RAID-контроллер — это действительно, два «слоя», которые могут помешать взаимодействию с дисками напрямую. Как минимум, с рэйд-контроллером надо изучать как работает он сам, что позволяют его драйверы, и какой софт доступен.

Если не получится пробиться в родной ОС, то пронумеруйте диски, разберите массив, и подключите диски напрямую к компьютеру с Виндоус. Windows давно стала отраслевым стандартом в data recovery и весь самый интересный софт разрабатывается под неё, независимо от того, с какими накопителями ведётся работа. Если она предложит инициализировать/форматировать диски, или запустит проверку — откажитесь / остановите.

Скачайте и распакуйте R.tester: https://rlab.ru/tools/rtester.html
В нём можно как посмотреть SMART, так и сделать максимально детальные тесты чтения, которые покажут состояние поверхности.
Можно также сделать тесты записи, но они уничтожают всё безвозвратно, так что предварительно надо готовиться (бэкапить данные или делать образы дисков).

I have a server which is running customized version of Debian. It is attached a sun storage raid. It has very limited tools and installing new tools is not allowed. :(

This message I see in dmesg:

end_request: critical target error, dev sda, sector 556782970
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]  Result: hostbyte=0x10 driverbyte=0x08
sd 0:0:0:0: [sda]  Sense Key : 0x4 [current] 
sd 0:0:0:0: [sda]  ASC=0x44 ASCQ=0x0
sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 21 2f d5 7a 00 00 08 00
end_request: critical target error, dev sda, sector 556782970

So it seems there is badblock in sector 556782970, but I don’t know which hard drive it belongs to so that I can get it replaced.

  • arcconf getlogs command does not reveal any issues with drives.

  • I did arcconf task start 1 logicaldrive 0 verify_fix but this didn’t help.

  • I did an e2fsck check. It attempts to fix some inodes but above issue remained. (I assume since its physical issue).

more info: http://pastebin.com/cJ2bUywj

Tools not available :(
smartctl
badblocks

jkt123's user avatar

asked Apr 23, 2014 at 17:32

p0werf00l's user avatar

If you actually use arcconf you can see physical drive status like this:

arcconf getconfig 1 PD

Look for drives with Failed state to identify drives that have been marked as failed. For your reference the output would look something like this:

  Device #6
     Device is a Hard drive
     State                              : Failed
     Block Size                         : Unknown
     Supported                          : Yes
     Reported Channel,Device(T:L)       : 0,15(15:0)
     Vendor                             : *MISSING*
     Model                              : 
     Firmware                           : 
     Total Size                         : 0 MB
     Write Cache                        : Unknown
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     SSD                                : No
     MaxCache Capable                   : No
     MaxCache Assigned                  : No

answered Dec 8, 2014 at 19:19

ILIV's user avatar

ILIVILIV

1758 bronze badges

Based on the end_request: critical target error, dev sda, sector 556782970 line, I assume that /dev/sda is the trouble child. You can find the serial number of that device using:

ls -l /dev/disk/by-id | grep "sda$"

answered Apr 23, 2014 at 17:58

jkt123's user avatar

jkt123jkt123

1134 bronze badges

1

While there might be an answer that gets you the info in the OS, odds are the controller firmware can tell you if you access it during boot.

Another thing I noticed in your cut and paste is that your battery says «failed». I wonder if the stripes are reporting failure because the Write Cache battery is toast? Though it reporting and actual bad inode would probably mean you have two problems.

Though my experience with Adaptec Controllers typically when the battery dies, it disables write caching as a preemptive measure.

answered Apr 23, 2014 at 18:33

MikeAWood's user avatar

MikeAWoodMikeAWood

2,5661 gold badge12 silver badges13 bronze badges

2

  • Печать

Страницы: [1] 2  Все   Вниз

Тема: Ошибка диска: I/O error, dev sda, sector XXXXX  (Прочитано 9899 раз)

0 Пользователей и 1 Гость просматривают эту тему.

Оффлайн
p4sh

При старте ПК наблюдаю множество ошибок в dmesg:

https://paste.ubuntu.com/p/YKY74JTwsD/

Если же прочитать любой отдельный сектор вручную получаю иногда

root@mail:~# hdparm --read-sector 25523880 /dev/sda

/dev/sda:
reading sector 25523880: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e1 01 11 04 00 00 00 a8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
succeeded


Но иногда это просто succeeded, то есть сектора читаются.
Я проверил SMART — пишет что ошибок на диске нет.

Проблема в том, что по истечении некоторого времени одна из файловых систем (/var) становится в read-only и перестаёт работает множество программ.
Что посоветуете сделать?


ТС не появлялся на Форуме более полугода по состоянию на 22/07/2019 (последняя явка: 23/11/2018). Модератором раздела принято решение закрыть тему.
—zg_nico

« Последнее редактирование: 22 Июля 2019, 15:23:03 от zg_nico »


Оффлайн
ALiEN175

« Последнее редактирование: 13 Августа 2018, 13:11:14 от ALiEN175 »

ASUS P5K-C :: Intel Xeon E5450 @ 3.00GHz :: 8 GB DDR2 :: Radeon R7 260X :: XFCE
ACER 5750G :: Intel Core i5-2450M @ 2.50GHz :: 6 GB DDR3 :: GeForce GT 630M :: XFCE


Оффлайн
p4sh


Оффлайн
bearpuh

Я проверил SMART — пишет что ошибок на диске нет.

А это ни о чем не говорит?

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     20772         25523880
# 2  Short offline       Completed: read failure       90%     20677         1057345043
# 3  Short offline       Completed: read failure       90%     20677         1057345043
# 4  Short offline       Completed: read failure       90%     20677         1057345043
# 5  Short offline       Completed: read failure       90%     20677         1057345043

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     29074         24928270

На дисках присутствуют нечитаемые сектора.
Бэкап в первую очередь, потом проверка с помощью виктории или badblocks

sudo /usr/sbin/badblocks -o /path/to/file/badblocks.list -b 4096 -s -v -t random /dev/sdX


Оффлайн
p4sh

В том и дело, сектора читаются (или я ошибаюсь, прошу поправить):

root@mail:~# hdparm --read-sector 1873032872 /dev/sda
/dev/sda:
reading sector 1873032872: succeeded
0000 0000 f40f 0c01 4442 4537 4136 3534
3937 3335 6857 5806 1400 0c01 4433 3639
.......

root@mail:~# hdparm --read-sector 148453280 /dev/sda

/dev/sda:
reading sector 148453280: succeeded
bb10 5600 0c00 0102 2e00 0000 ba10 5600
3000 0202 2e2e 0000 bc10 5600 2400 1c01

root@mail:~# hdparm --read-sector 1285929908 /dev/sda

/dev/sda:
reading sector 1285929908: SG_IO: bad/missing sense data, sb[]:  70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 b4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
succeeded
0000 0000 0000 0000 0000 0000 0000 0000


Спасибо!


Оффлайн
bearpuh

В том и дело, сектора читаются

Чтобы в этом убедиться, необходимо проверить.
Как, я уже написал.
Я бы еще подключил к другому контроллеру/компу для проверки.


Оффлайн
Sly_tom_cat

Беды в SMART заменой контроллера не решить.

Контроллер это обычно вылезает в :

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0Но тут чисто.


Оффлайн
bearpuh

Беды в SMART заменой контроллера не решить.

Согласен. Смутило просто.

SG_IO: bad/missing sense data


Оффлайн
snowin

В том и дело, сектора читаются (или я ошибаюсь, прошу поправить)

ты ошибаешься


Оффлайн
ReNzRv

Проверять и лечить лучше с загрузочного образа Seagate Tools for DOS
командами Zero All (затирает все сектора) и Long Test (DST) — полная проверка всех секторов с переопределением бэдов на уровне контроллера диска.


Оффлайн
p4sh

Z

man hdparm

       --read-sector
              Reads from the specified sector number, and dumps the contents in hex to standard output.  The sector number must be given (base10) after this option.  hdparm will issue a
              low-level  read (completely bypassing the usual block layer read/write mechanisms) for the specified sector.  This can be used to definitively check whether a given sector
              is bad (media error) or not (doing so through the usual mechanisms can sometimes give false positives).

ты ошибаешься

Мне не понятно, могли бы объяснить подробнее, почему при чтении hdparm получаем «SUCCESS», но сектора «не читаемые»? Это негожий софт?

« Последнее редактирование: 15 Августа 2018, 09:49:18 от p4sh »


Оффлайн
bearpuh

hdparm получаем «SUCCESS», но сектора «не читаемые»?

А сколько времени затрачивается на чтение этого сектора?
По какому принципу та же victoria hdd определяет статус сектора — «bad»?
Прочтите это сектор викторией, возможно станет понятней.
Хотите теории, вот она, от автора smartmontools — https://www.smartmontools.org/wiki/BadBlockHowto#ext2ext3secondexample


Пользователь добавил сообщение 15 Августа 2018, 10:13:40:


Вот еще обратите внимание.
У вас несколько секторов на обоих дисках кандидаты на перемещение.

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       4


Пользователь добавил сообщение 15 Августа 2018, 10:15:17:


Им можно «дать пинка» — force rellocation.
Инфа есть в вышеуказанной ссылке по smartmontools.

« Последнее редактирование: 15 Августа 2018, 10:15:17 от bearpuh »


Оффлайн
snowin

Им можно «дать пинка» — force rellocation.

достаточно просто записать в них и заново считать, можно несколько раз
если это бэды, винч сам их переместит, в противном случае это просто так называемые «софтовые бэды» и они должны будут исчезнуть из смарта


Оффлайн
p4sh

Что я сделал:
загрузился с live usb, собрал массив и проверил ФС:
e2fsck -ct /dev/…
Прогнал тесты еще раз.
Перезагрузился и мониторю состояние ФС.
Также обновился smart:
Изменился Multi_Zone_Error_Rate
Остался на sda 1 сектор на перемещение: Current_Pending_Sector 1

/dev/sda

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    24
  3 Spin_Up_Time            POS--K   179   172   021    -    4033
  4 Start_Stop_Count        -O--CK   099   099   000    -    1401
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   072   072   000    -    20819
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    876
192 Power-Off_Retract_Count -O--CK   199   199   000    -    758
193 Load_Cycle_Count        -O--CK   200   200   000    -    642
194 Temperature_Celsius     -O---K   102   081   000    -    45
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    1
198 Offline_Uncorrectable   ----CK   200   200   000    -    1
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1

/dev/sdb

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    209
  3 Spin_Up_Time            POS--K   191   173   021    -    3416
  4 Start_Stop_Count        -O--CK   099   099   000    -    1680
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   100   253   000    -    0
  9 Power_On_Hours          -O--CK   060   060   000    -    29215
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    773
192 Power-Off_Retract_Count -O--CK   200   200   000    -    650
193 Load_Cycle_Count        -O--CK   200   200   000    -    1029
194 Temperature_Celsius     -O---K   103   091   000    -    44
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    4
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    3


Сейчас проверю с помощью Victoria (она же делала авторемап вроде).
Спасибо всем за ответы — очень полезный топ для меня!


Оффлайн
snowin

Сейчас проверю с помощью Victoria (она же делала авторемап вроде).

ремап тебе не нужен
кабель, для начало, поменяй
на обоих винтах
по поводу

В том и дело, сектора читаются (или я ошибаюсь, прошу поправить):

кажется, что ты совсем не понимаешь, что делаешь и для чего
ты берешь случайный сектор на диске и проверяешь его утилитой hdparm на чтение и утверждаешь что он читается
в то время как проблемные сектора ты не проверяешь
тем не менее твои случайные, безрассудные действия (переборка рейда) привели к более хорошим результатам
но это топорный метод

« Последнее редактирование: 16 Августа 2018, 15:17:24 от snowin »


  • Печать

Страницы: [1] 2  Все   Вверх

I just ordered a new server with a 1TB Samsung SSD. Installed Ubuntu 14.04.5 LTS.

After booting into the newly installed system, I see this in my dmesg and /var/lib/syslog. Output of grep error /var/log/syslog:

May 12 03:47:34 lf5 kernel: [    0.373789] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:47:34 lf5 kernel: [   10.382147] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   10.382152]          res 40/00:e0:f8:69:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   10.712517] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   10.712521]          res 40/00:d0:38:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.119541] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   11.119545]          res 40/00:40:30:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526336] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 03:47:34 lf5 kernel: [   11.526341]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526345]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526348]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526351]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   21.349950] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 03:51:10 lf5 kernel: [    0.389787] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:51:10 lf5 kernel: [   10.906423] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   10.906429]          res 40/00:80:08:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   11.488276] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   11.488281]          res 40/00:c0:28:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   11.960792] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   11.960796]          res 40/00:b8:b0:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   12.366482] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   12.366486]          res 40/00:60:e0:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   20.918620] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:07:19 lf5 kernel: [    0.390011] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:07:19 lf5 kernel: [   10.349119] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   10.349124]          res 40/00:88:a8:6d:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   10.738449] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   10.738453]          res 40/00:20:60:6b:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   11.072972] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   11.072976]          res 40/00:60:50:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   11.471777] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   11.471781]          res 40/00:48:c8:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   20.651217] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:18:16 lf5 kernel: [    0.389808] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:18:17 lf5 kernel: [   10.762352] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [   10.762360]          res 40/00:40:08:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   11.338565]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338569]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338572]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338576]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   20.087229]          res 41/84:08:b8:14:7d/00:00:63:00:00/00 Emask 0x410 (ATA bus error) <F>
May 12 17:18:17 lf5 kernel: [   20.298295] ata8.00: error: { ICRC ABRT }
May 12 17:18:17 lf5 kernel: [   21.176551] sd 7:0:0:0: [sda] tag#0 Add. Sense: Scsi parity error
May 12 17:18:17 lf5 kernel: [   21.316632] blk_update_request: I/O error, dev sda, sector 1669074520
May 12 17:18:17 lf5 kernel: [   21.542013] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [   21.759477]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.052681]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.347138]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.642363]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.938868]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.239764]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.542336]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.840288]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.138769]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.439063]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.740494]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.047057]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.354884]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.662079]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.967498]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.273208]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.579035]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.884890]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.190868]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.496523]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.801825]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.106876]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.412223]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.717662]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.022620]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.326675]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.629826]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.932271]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   30.234666]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   30.537024]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   31.765128] blk_update_request: I/O error, dev sda, sector 1669071496
May 12 17:18:17 lf5 kernel: [   32.143969] blk_update_request: I/O error, dev sda, sector 1669071504
May 12 17:18:17 lf5 kernel: [   32.527171] blk_update_request: I/O error, dev sda, sector 1669071512
May 12 17:18:17 lf5 kernel: [   32.915371] blk_update_request: I/O error, dev sda, sector 1669071544
May 12 17:18:17 lf5 kernel: [   33.308218] blk_update_request: I/O error, dev sda, sector 1669071552
May 12 17:18:17 lf5 kernel: [   33.706503] blk_update_request: I/O error, dev sda, sector 1669071520
May 12 17:18:17 lf5 kernel: [   34.108892] blk_update_request: I/O error, dev sda, sector 1669071528
May 12 17:18:17 lf5 kernel: [   34.516541] blk_update_request: I/O error, dev sda, sector 1669071536
May 12 17:18:17 lf5 kernel: [   34.929267] blk_update_request: I/O error, dev sda, sector 1669071368
May 12 17:18:17 lf5 kernel: [   35.347838] blk_update_request: I/O error, dev sda, sector 1669071376
May 12 17:18:17 lf5 kernel: [   36.004437]          res 41/04:a8:90:d2:89/00:00:5f:00:00/00 Emask 0x401 (device error) <F>
May 12 17:18:17 lf5 kernel: [   36.257143] ata8.00: error: { ABRT }
May 12 17:18:17 lf5 kernel: [   37.681581] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 17:18:17 lf5 kernel: [   37.681586]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681590]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681593]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681596]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681599]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681602]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681605]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681608]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681611]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681615]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681618]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681621]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681624]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681627]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681630]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681633]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681636]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681639]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681642]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681645]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681649]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681652]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681655]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   38.005003] blk_update_request: I/O error, dev sda, sector 1891370112
May 12 17:18:17 lf5 kernel: [   38.005009] blk_update_request: I/O error, dev sda, sector 1891370120
May 12 17:18:17 lf5 kernel: [   38.005013] blk_update_request: I/O error, dev sda, sector 1891370128
May 12 17:18:17 lf5 kernel: [   38.005017] blk_update_request: I/O error, dev sda, sector 1891370136
May 12 17:18:17 lf5 kernel: [   38.005021] blk_update_request: I/O error, dev sda, sector 1891370144
May 12 17:18:17 lf5 kernel: [   38.005025] blk_update_request: I/O error, dev sda, sector 1891370152
May 12 17:18:17 lf5 kernel: [   38.005029] blk_update_request: I/O error, dev sda, sector 1891370160
May 12 17:18:17 lf5 kernel: [   38.005032] blk_update_request: I/O error, dev sda, sector 1891370168
May 12 17:18:17 lf5 kernel: [   38.005036] blk_update_request: I/O error, dev sda, sector 1891370176
May 12 17:18:17 lf5 kernel: [   38.005040] blk_update_request: I/O error, dev sda, sector 1891370184
May 12 17:18:17 lf5 kernel: [   49.093973] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro

I am mostly concerned about these entries: blk_update_request: I/O error, dev sda, sector xxxxxxxxxxx

I ran badblocks -v /dev/sda which returned no errors.

I then ran smartctl --all /dev/sda, which also returned no errors. See output below. This one includes a short self test

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 850 EVO 1TB
Serial Number:    S3PHNF0JC00710K
LU WWN Device Id: 5 002538 d428254a0
Firmware Version: EMT03B6Q
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat May 12 19:08:22 2018 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 512) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       8
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       31
177 Wear_Leveling_Count     0x0013   100   100   000    Pre-fail  Always       -       0
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   099   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   069   067   000    Old_age   Always       -       31
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   099   099   000    Old_age   Always       -       20
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       25
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       55078112

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

SMART Selective self-test log data structure revision number 1
 SPAN    MIN_LBA    MAX_LBA  CURRENT_TEST_STATUS
    1          0          0  Not_testing
    2          0          0  Not_testing
    3          0          0  Not_testing
    4          0          0  Not_testing
    5          0          0  Not_testing
  255  116055040  116120575  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

My question is simple: What do you think might be wrong? The SSD should be brand new. It’s hard for me, in good conscience, to put this server into production with those errors in the logs. And the box is otherwise acting normal.

На прошивке v2.07(AAUW.5)C3 так же не мог подключиться с Nexus5 через приложение my.keenetic к роутеру ни через QR-код, ни через Wi-Fi. Выдавалось сообщение что-то типа «Идет подключение к Wi-Fi» (хотя телефон уже был подключен к Wi-Fi), и  ничего не происходило.

Обновился до v2.09(AAUW.3)A0 и стало гораздо лучше: 

Интернет на мобильных устройствах стал работать нормально, сайты открываются быстро.

Подключиться через приложение my.keenetic к роутеру после обновления получилось, управление доступно.

Но проблемы все еще есть.

1. Пинг c ПК до роутера — 1мс (изредка 2-4мс). Канал 2.4Ghz.

Пинг с Nexus5 до роутера скачет от 9 до 20мс. Проверял на 2.4Ghz и 5Ghz

Пинг с ПК до сайта 2ip.ru (178.63.151.224) — 55-60 мс.

Пинг с Nexus5 до сайта 2ip.ru (178.63.151.224) — 65-75 мс.

Условия тестирования: ПК, телефон и роутер в одной комнате, расстояние 1,5 метра между ними. 

2. Ошибки в логах остались:

Jan 07 14:06:32ndmkernel: end_request: critical target error, dev sda, sector 0

Jan 07 14:06:32ndmCore::Syslog: last message repeated 21 times.

Jan 07 14:07:02ndmkernel: end_request: critical target error, dev sda, sector 0

Jan 07 14:07:02ndmCore::Syslog: last message repeated 23 times.


Edited January 7, 2017 by Cobain

Thanks for the links to that thread, really insightful.

My version info:

modinfo mpt3sas
filename:       /lib/modules/4.19.24-Unraid/kernel/drivers/scsi/mpt3sas/mpt3sas.ko.xz
alias:          mpt2sas
version:        26.100.00.00

Also tried some commands from the mentioned thread.

[email protected]:~# hdparm -I /dev/sdm | grep TRIM
       *    Data Set Management TRIM supported (limit 8 blocks)
       *    Deterministic read ZEROs after TRIM
[email protected]:~# hdparm -I /dev/sdj | grep TRIM
       *    Data Set Management TRIM supported (limit 1 block)
       *    Deterministic read data after TRIM
[email protected]:~# hdparm -I /dev/sdl | grep TRIM
       *    Data Set Management TRIM supported (limit 8 blocks)
[email protected]:~# hdparm -I /dev/sdk | grep TRIM
       *    Data Set Management TRIM supported (limit 1 block)
       *    Deterministic read data after TRIM
[email protected]:~# fstrim -av
/etc/libvirt: 926.5 MiB (971513856 bytes) trimmed on /dev/loop3
/var/lib/docker: 13.7 GiB (14724616192 bytes) trimmed on /dev/loop2

fstrim: /mnt/cache: FITRIM ioctl failed: Remote I/O error

The reason I say only the intel doesn’t work is because the unraid logs only mentions the intel one(sdj) as having problems. 

Feb 25 16:50:01 unraid kernel: print_req_error: critical target error, dev sdj, sector 232785982
Feb 25 16:50:01 unraid kernel: BTRFS warning (device sdj1): failed to trim 1 device(s), last error -121

I see that the /mnt/cache/ is mounted from /dev/sdj1. could that be why it complains about sdj above? Thought it would have said sdj1  on both errors.

Seems like it’s time to look for one of those 9300-8i soon. Thanks for your help.


Edited February 26, 2019 by Nischi

A friend of mine gave me his external 2TB Seagate HDD which appeared to be somewhat faulty.
And, it is indeed pretty faulty.

First, I did try a lot of «common» commands, spent a few hours googling stuff, tried Linux and Windows (for chkdsk), opened the HDD case to plug it directly in SATA and I’ll add that I do not need to recover the data, I just need to format it.

lsblk

NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda            8:0    0   1,8T  0 disk 

Here, sda is the disk, its size, 1,8T seems correct.

In GParted, the disk only appears to be ~1.9GB. I can create a partition table but I cannot create a valid partition. And even if I could, it could only be 1.9GB.

dd if=/dev/zero of=/dev/sda

dd: error writing '/dev/sda': No space left on device
3782129+0 records in
3782128+0 records out
1936449536 bytes (1,9 GB, 1,8 GiB) copied, 7,04022 s, 275 MB/s

smartctl -a /dev/sda

Read Device Identity failed: Invalid argument

parted -l

Error: Unable to open /dev/sda - unrecognised disk label.   
Model:  (file)                                                           
Disk /dev/sda : 1936MB
Sector size (logical/physical): 512B/512B
Partition table : unknown

dmesg

[ 7925.612174] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7925.862625] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 7931.193045] sd 2:0:0:0: [sda] 3809353968 512-byte logical blocks: (1.95 TB/1.77 TiB)
[ 7931.193049] sd 2:0:0:0: [sda] 4096-byte physical blocks
[ 7931.193313] sd 2:0:0:0: [sda] Write Protect is off
[ 7931.193316] sd 2:0:0:0: [sda] Mode Sense: 2f 00 00 00
[ 7931.193593] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 7931.193995] sd 2:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
[ 7931.390515] sd 2:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.390523] sd 2:0:0:0: [sda] tag#18 Sense Key : Illegal Request [current] 
[ 7931.390529] sd 2:0:0:0: [sda] tag#18 Add. Sense: Invalid command operation code
[ 7931.390536] sd 2:0:0:0: [sda] tag#18 CDB: Read(6) 08 00 00 00 08 00
[ 7931.390545] blk_update_request: critical target error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 7931.390558] Buffer I/O error on dev sda, logical block 0, async page read
[ 7931.500384] sd 2:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.500451] sd 2:0:0:0: [sda] tag#19 Sense Key : Illegal Request [current] 
[ 7931.500461] sd 2:0:0:0: [sda] tag#19 Add. Sense: Invalid command operation code
[ 7931.500472] sd 2:0:0:0: [sda] tag#19 CDB: Read(6) 08 00 00 00 08 00

Do you have any idea? I guess the HDD may be dead, but I’m not quite sure.
What I find intriguing is the 1.8TB size with lsblk and 1.9GB elsewhere.
And again, I do not need to recover previous data (and since I did write a lot of 0’s, they’re probably gone for good :p). I just want to format the disk to make it usable again.

Thanks for your time :)

Понравилась статья? Поделить с друзьями:
  • Critical service failed windows 10 как исправить через командную строку
  • Critical ops ошибка 1001
  • Critical service failed windows 10 как исправить при запуске windows
  • Critical ops critical error
  • Critical service failed windows 10 как исправить на ноутбуке