Comments
popcornmix
added a commit
to raspberrypi/firmware
that referenced
this issue
Dec 13, 2014
popcornmix
added a commit
to Hexxeh/rpi-firmware
that referenced
this issue
Dec 13, 2014
bosscyril
referenced
this issue
in Hexxeh/rpi-firmware
Apr 14, 2015
neuschaefer
pushed a commit
to neuschaefer/raspi-binary-firmware
that referenced
this issue
Feb 27, 2017
popcornmix
pushed a commit
that referenced
this issue
Nov 11, 2019
Geneve implementation changed mlx5 tc to user direct pointer to tunnel_key action's internal struct ip_tunnel_info instance. However, this leads to use-after-free error when initial filter that caused creation of new encap entry is deleted or when tunnel_key action is manually overwritten through action API. Moreover, with recent TC offloads API unlocking change struct flow_action_entry->tunnel point to temporal copy of tunnel info that is deallocated after filter is offloaded to hardware which causes bug to reproduce every time new filter is attached to existing encap entry with following KASAN bug: [ 314.885555] ================================================================== [ 314.886641] BUG: KASAN: use-after-free in memcmp+0x2c/0x60 [ 314.886864] Read of size 1 at addr ffff88886c746280 by task tc/2682 [ 314.887179] CPU: 22 PID: 2682 Comm: tc Not tainted 5.3.0-rc7+ #703 [ 314.887188] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 314.887195] Call Trace: [ 314.887215] dump_stack+0x9a/0xf0 [ 314.887236] print_address_description+0x67/0x323 [ 314.887248] ? memcmp+0x2c/0x60 [ 314.887257] ? memcmp+0x2c/0x60 [ 314.887272] __kasan_report.cold+0x1a/0x3d [ 314.887474] ? __mlx5e_tc_del_fdb_peer_flow+0x100/0x1b0 [mlx5_core] [ 314.887484] ? memcmp+0x2c/0x60 [ 314.887509] kasan_report+0xe/0x12 [ 314.887521] memcmp+0x2c/0x60 [ 314.887662] mlx5e_tc_add_fdb_flow+0x51b/0xbe0 [mlx5_core] [ 314.887838] ? mlx5e_encap_take+0x110/0x110 [mlx5_core] [ 314.887902] ? lockdep_init_map+0x87/0x2c0 [ 314.887924] ? __init_waitqueue_head+0x4f/0x60 [ 314.888062] ? mlx5e_alloc_flow.isra.0+0x18c/0x1c0 [mlx5_core] [ 314.888207] __mlx5e_add_fdb_flow+0x2d7/0x440 [mlx5_core] [ 314.888359] ? mlx5e_tc_update_neigh_used_value+0x6f0/0x6f0 [mlx5_core] [ 314.888374] ? match_held_lock+0x2e/0x240 [ 314.888537] mlx5e_configure_flower+0x830/0x16a0 [mlx5_core] [ 314.888702] ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core] [ 314.888713] ? down_read+0x118/0x2c0 [ 314.888728] ? down_read_killable+0x300/0x300 [ 314.888882] ? mlx5e_rep_get_ethtool_stats+0x180/0x180 [mlx5_core] [ 314.888899] tc_setup_cb_add+0x127/0x270 [ 314.888937] fl_hw_replace_filter+0x2ac/0x380 [cls_flower] [ 314.888976] ? fl_hw_destroy_filter+0x1b0/0x1b0 [cls_flower] [ 314.888990] ? fl_change+0xbcf/0x27ef [cls_flower] [ 314.889030] ? fl_change+0xa57/0x27ef [cls_flower] [ 314.889069] fl_change+0x16bd/0x27ef [cls_flower] [ 314.889135] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower] [ 314.889167] ? __radix_tree_lookup+0xa4/0x130 [ 314.889200] ? fl_get+0x169/0x240 [cls_flower] [ 314.889218] ? fl_walk+0x230/0x230 [cls_flower] [ 314.889249] tc_new_tfilter+0x5e1/0xd40 [ 314.889281] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower] [ 314.889309] ? tc_del_tfilter+0xa30/0xa30 [ 314.889335] ? __lock_acquire+0x5b5/0x2460 [ 314.889378] ? find_held_lock+0x85/0xa0 [ 314.889442] ? tc_del_tfilter+0xa30/0xa30 [ 314.889465] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 314.889488] ? rtnl_dellink+0x490/0x490 [ 314.889518] ? lockdep_hardirqs_on+0x260/0x260 [ 314.889538] ? netlink_deliver_tap+0xab/0x5a0 [ 314.889550] ? match_held_lock+0x1b/0x240 [ 314.889575] netlink_rcv_skb+0xd0/0x200 [ 314.889588] ? rtnl_dellink+0x490/0x490 [ 314.889605] ? netlink_ack+0x440/0x440 [ 314.889635] ? netlink_deliver_tap+0x161/0x5a0 [ 314.889648] ? lock_downgrade+0x360/0x360 [ 314.889657] ? lock_acquire+0xe5/0x210 [ 314.889686] netlink_unicast+0x296/0x350 [ 314.889707] ? netlink_attachskb+0x390/0x390 [ 314.889726] ? _copy_from_iter_full+0xe0/0x3a0 [ 314.889738] ? __virt_addr_valid+0xbb/0x130 [ 314.889771] netlink_sendmsg+0x394/0x600 [ 314.889800] ? netlink_unicast+0x350/0x350 [ 314.889817] ? move_addr_to_kernel.part.0+0x90/0x90 [ 314.889852] ? netlink_unicast+0x350/0x350 [ 314.889872] sock_sendmsg+0x96/0xa0 [ 314.889891] ___sys_sendmsg+0x482/0x520 [ 314.889919] ? copy_msghdr_from_user+0x250/0x250 [ 314.889930] ? __fput+0x1fa/0x390 [ 314.889941] ? task_work_run+0xb7/0xf0 [ 314.889957] ? exit_to_usermode_loop+0x117/0x120 [ 314.889972] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.889982] ? do_syscall_64+0x74/0xe0 [ 314.889992] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.890012] ? mark_lock+0xac/0x9a0 [ 314.890028] ? __lock_acquire+0x5b5/0x2460 [ 314.890053] ? mark_lock+0xac/0x9a0 [ 314.890083] ? __lock_acquire+0x5b5/0x2460 [ 314.890112] ? match_held_lock+0x1b/0x240 [ 314.890144] ? __fget_light+0xa1/0xf0 [ 314.890166] ? sockfd_lookup_light+0x91/0xb0 [ 314.890187] __sys_sendmsg+0xba/0x130 [ 314.890201] ? __sys_sendmsg_sock+0xb0/0xb0 [ 314.890225] ? __blkcg_punt_bio_submit+0xd0/0xd0 [ 314.890264] ? lockdep_hardirqs_off+0xbe/0x100 [ 314.890274] ? mark_held_locks+0x24/0x90 [ 314.890286] ? do_syscall_64+0x1e/0xe0 [ 314.890308] do_syscall_64+0x74/0xe0 [ 314.890325] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.890336] RIP: 0033:0x7f00ca33d7b8 [ 314.890348] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5 4 [ 314.890356] RSP: 002b:00007ffea2983928 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 314.890369] RAX: ffffffffffffffda RBX: 000000005d777d5b RCX: 00007f00ca33d7b8 [ 314.890377] RDX: 0000000000000000 RSI: 00007ffea2983990 RDI: 0000000000000003 [ 314.890384] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 314.890392] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001 [ 314.890400] R13: 000000000047f640 R14: 00007ffea2987b58 R15: 0000000000000021 [ 314.890529] Allocated by task 2687: [ 314.890684] save_stack+0x1b/0x80 [ 314.890694] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 314.890705] __kmalloc_track_caller+0x102/0x340 [ 314.890721] kmemdup+0x1d/0x40 [ 314.890730] tc_setup_flow_action+0x731/0x2c27 [ 314.890743] fl_hw_replace_filter+0x23b/0x380 [cls_flower] [ 314.890756] fl_change+0x16bd/0x27ef [cls_flower] [ 314.890765] tc_new_tfilter+0x5e1/0xd40 [ 314.890776] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 314.890786] netlink_rcv_skb+0xd0/0x200 [ 314.890796] netlink_unicast+0x296/0x350 [ 314.890805] netlink_sendmsg+0x394/0x600 [ 314.890815] sock_sendmsg+0x96/0xa0 [ 314.890825] ___sys_sendmsg+0x482/0x520 [ 314.890834] __sys_sendmsg+0xba/0x130 [ 314.890844] do_syscall_64+0x74/0xe0 [ 314.890854] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.890937] Freed by task 2687: [ 314.891076] save_stack+0x1b/0x80 [ 314.891086] __kasan_slab_free+0x12c/0x170 [ 314.891095] kfree+0xeb/0x2f0 [ 314.891106] tc_cleanup_flow_action+0x69/0xa0 [ 314.891119] fl_hw_replace_filter+0x2c5/0x380 [cls_flower] [ 314.891132] fl_change+0x16bd/0x27ef [cls_flower] [ 314.891140] tc_new_tfilter+0x5e1/0xd40 [ 314.891151] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 314.891161] netlink_rcv_skb+0xd0/0x200 [ 314.891170] netlink_unicast+0x296/0x350 [ 314.891180] netlink_sendmsg+0x394/0x600 [ 314.891190] sock_sendmsg+0x96/0xa0 [ 314.891200] ___sys_sendmsg+0x482/0x520 [ 314.891208] __sys_sendmsg+0xba/0x130 [ 314.891218] do_syscall_64+0x74/0xe0 [ 314.891228] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 314.891315] The buggy address belongs to the object at ffff88886c746280 which belongs to the cache kmalloc-96 of size 96 [ 314.891762] The buggy address is located 0 bytes inside of 96-byte region [ffff88886c746280, ffff88886c7462e0) [ 314.892196] The buggy address belongs to the page: [ 314.892387] page:ffffea0021b1d180 refcount:1 mapcount:0 mapping:ffff88835d00ef80 index:0x0 [ 314.892398] flags: 0x57ffffc0000200(slab) [ 314.892413] raw: 0057ffffc0000200 ffffea00219e0340 0000000800000008 ffff88835d00ef80 [ 314.892423] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 314.892430] page dumped because: kasan: bad access detected [ 314.892515] Memory state around the buggy address: [ 314.892707] ffff88886c746180: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.892976] ffff88886c746200: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.893251] >ffff88886c746280: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.893522] ^ [ 314.893657] ffff88886c746300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 314.893924] ffff88886c746380: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc [ 314.894189] ================================================================== Fix the issue by duplicating tunnel info into per-encap copy that is deallocated with encap structure. Also, duplicate tunnel info in flow parse attribute to support cases when flow might be attached asynchronously. Fixes: 1f6da30 ("net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The first error you reported:
ata1:00: status: { DRDY ERR }
ata1.00: error {UNC }
ata1:00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1:00: BMDMA stat 0x25
ata1:00: failed command: READ DMA
says that a READ DMA
ATA command to a disk on ATA port 1 failed (status includes ERR
for error). That port is most likely the hard disk, and the error points toward the drive having problems. The DMA
part can likely be ignored; DMA is Direct Memory Access which is the dominant transfer mode these days, and if you were having RAM or RAM bus problems to the degree that you were hitting something like that repeatedly, you’d likely be seeing a ton more errors if the system was able to function at all.
The second error:
end_request: critical target error, dev sda, sector 32839936
EXT4_fs error: (device sda5): ext4_find_entry:935: inode #393217: comm init: reading directory lblock 0
INIT: No inittab file found
says there is some problem on /dev/sda, sector 32839936, which with 512-byte sectors puts us physically toward the end of the /dev/sda5 partition, which adds up with device sda5
as reported by the file system driver. The error reported by init
together with the file system driver’s error details points toward a problem with the file system causing /etc/inittab to be unavailable or (less likely) unreadable. This would mean that either the root directory, the /etc directory, or the /etc/inittab file entry are somehow involved in the corruption. Given the inode number, I’d take a shot at /etc/inittab specifically being the culprit, until proven wrong.
You write (my emphasis):
Suspecting a HDD crash, I took it out and used in another PC as an external USB HDD drive and I was able to mount & see all partitions and files within. So I assume Disc is OK.
I would say that your assumption is unfounded. The disk is obviously having some problem; with any luck, it’ll be easy to fix.
The first thing I would do in your situation is to refresh my backup of everything that is on that disk. Make sure that you do not overwrite or delete anything from your most recent backup, as there is certainly a possibility that you will need it. Perhaps the best option is to make a fresh backup onto a new (or at least not previously used for your own backups) drive of everything that you are able to access. Expect some I/O errors on the source while making that copy.
Second comes attempting recovery. With any luck, given the errors, this is a single-sector or few-sectors problem which has caused a small amount of file system corruption, in which case e2fsck
should be able to repair most of the damage. Some of your files are likely gone, but with some luck, you might be able to find them in /lost+found under the file system’s mount root (meaning for example /data/lost+found if you mount /dev/sda5 on /data) after having e2fsck do what it can. Otherwise, do a comparison against your most recent backup from before the problems started, and restore relevant files from the backup. (Did I mention backups are useful if bad things ever happen, as they inevitably do?)
Third comes the question of whether you can trust the drive for future use. A few bad sectors doesn’t have to be catastrophic from the drive’s point of view, but rotational drives about 100 GB in size practically cannot be sourced new today in most form factors, which points to this being a relatively old drive. Personally, I’d probably just accept that the drive has outlived its useful life at this point and get a replacement, but then again I am rather paranoid when it comes to my data; your mileage may vary. You will have to weigh the cost of a replacement drive against the risk of total failure of the drive and subsequent total loss of all the data on the drive.
В первую очередь сделайте копию всех важных данных на сервере и убедитесь, что в копии они не повреждены.
развернут ESXI 6.5 … Диски работают в RAID 1+0 под контроллером HP b120i
ESXI и RAID-контроллер — это действительно, два «слоя», которые могут помешать взаимодействию с дисками напрямую. Как минимум, с рэйд-контроллером надо изучать как работает он сам, что позволяют его драйверы, и какой софт доступен.
Если не получится пробиться в родной ОС, то пронумеруйте диски, разберите массив, и подключите диски напрямую к компьютеру с Виндоус. Windows давно стала отраслевым стандартом в data recovery и весь самый интересный софт разрабатывается под неё, независимо от того, с какими накопителями ведётся работа. Если она предложит инициализировать/форматировать диски, или запустит проверку — откажитесь / остановите.
Скачайте и распакуйте R.tester: https://rlab.ru/tools/rtester.html
В нём можно как посмотреть SMART, так и сделать максимально детальные тесты чтения, которые покажут состояние поверхности.
Можно также сделать тесты записи, но они уничтожают всё безвозвратно, так что предварительно надо готовиться (бэкапить данные или делать образы дисков).
I have a server which is running customized version of Debian. It is attached a sun storage raid. It has very limited tools and installing new tools is not allowed.
This message I see in dmesg
:
end_request: critical target error, dev sda, sector 556782970
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda] Result: hostbyte=0x10 driverbyte=0x08
sd 0:0:0:0: [sda] Sense Key : 0x4 [current]
sd 0:0:0:0: [sda] ASC=0x44 ASCQ=0x0
sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 21 2f d5 7a 00 00 08 00
end_request: critical target error, dev sda, sector 556782970
So it seems there is badblock in sector 556782970, but I don’t know which hard drive it belongs to so that I can get it replaced.
-
arcconf getlogs
command does not reveal any issues with drives. -
I did
arcconf task start 1 logicaldrive 0 verify_fix
but this didn’t help. -
I did an
e2fsck
check. It attempts to fix some inodes but above issue remained. (I assume since its physical issue).
more info: http://pastebin.com/cJ2bUywj
Tools not available
smartctl
badblocks
asked Apr 23, 2014 at 17:32
If you actually use arcconf you can see physical drive status like this:
arcconf getconfig 1 PD
Look for drives with Failed state to identify drives that have been marked as failed. For your reference the output would look something like this:
Device #6 Device is a Hard drive State : Failed Block Size : Unknown Supported : Yes Reported Channel,Device(T:L) : 0,15(15:0) Vendor : *MISSING* Model : Firmware : Total Size : 0 MB Write Cache : Unknown FRU : None S.M.A.R.T. : No S.M.A.R.T. warnings : 0 SSD : No MaxCache Capable : No MaxCache Assigned : No
answered Dec 8, 2014 at 19:19
ILIVILIV
1758 bronze badges
Based on the end_request: critical target error, dev sda, sector 556782970
line, I assume that /dev/sda is the trouble child. You can find the serial number of that device using:
ls -l /dev/disk/by-id | grep "sda$"
answered Apr 23, 2014 at 17:58
jkt123jkt123
1134 bronze badges
1
While there might be an answer that gets you the info in the OS, odds are the controller firmware can tell you if you access it during boot.
Another thing I noticed in your cut and paste is that your battery says «failed». I wonder if the stripes are reporting failure because the Write Cache battery is toast? Though it reporting and actual bad inode would probably mean you have two problems.
Though my experience with Adaptec Controllers typically when the battery dies, it disables write caching as a preemptive measure.
answered Apr 23, 2014 at 18:33
MikeAWoodMikeAWood
2,5661 gold badge12 silver badges13 bronze badges
2
- Печать
Страницы: [1] 2 Все Вниз
Тема: Ошибка диска: I/O error, dev sda, sector XXXXX (Прочитано 9899 раз)
0 Пользователей и 1 Гость просматривают эту тему.
p4sh
При старте ПК наблюдаю множество ошибок в dmesg:
https://paste.ubuntu.com/p/YKY74JTwsD/
Если же прочитать любой отдельный сектор вручную получаю иногда
root@mail:~# hdparm --read-sector 25523880 /dev/sda
/dev/sda:
reading sector 25523880: SG_IO: bad/missing sense data, sb[]: 70 00 03 00 00 00 00 0a 40 51 e1 01 11 04 00 00 00 a8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
succeeded
Но иногда это просто succeeded, то есть сектора читаются.
Я проверил SMART — пишет что ошибок на диске нет.
Проблема в том, что по истечении некоторого времени одна из файловых систем (/var) становится в read-only и перестаёт работает множество программ.
Что посоветуете сделать?
ТС не появлялся на Форуме более полугода по состоянию на 22/07/2019 (последняя явка: 23/11/2018). Модератором раздела принято решение закрыть тему.
—zg_nico
« Последнее редактирование: 22 Июля 2019, 15:23:03 от zg_nico »
ALiEN175
« Последнее редактирование: 13 Августа 2018, 13:11:14 от ALiEN175 »
ASUS P5K-C :: Intel Xeon E5450 @ 3.00GHz :: 8 GB DDR2 :: Radeon R7 260X :: XFCE
ACER 5750G :: Intel Core i5-2450M @ 2.50GHz :: 6 GB DDR3 :: GeForce GT 630M :: XFCE
p4sh
bearpuh
Я проверил SMART — пишет что ошибок на диске нет.
А это ни о чем не говорит?
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 20772 25523880
# 2 Short offline Completed: read failure 90% 20677 1057345043
# 3 Short offline Completed: read failure 90% 20677 1057345043
# 4 Short offline Completed: read failure 90% 20677 1057345043
# 5 Short offline Completed: read failure 90% 20677 1057345043
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 29074 24928270
На дисках присутствуют нечитаемые сектора.
Бэкап в первую очередь, потом проверка с помощью виктории или badblocks
sudo /usr/sbin/badblocks -o /path/to/file/badblocks.list -b 4096 -s -v -t random /dev/sdX
p4sh
В том и дело, сектора читаются (или я ошибаюсь, прошу поправить):
root@mail:~# hdparm --read-sector 1873032872 /dev/sda
/dev/sda:
reading sector 1873032872: succeeded
0000 0000 f40f 0c01 4442 4537 4136 3534
3937 3335 6857 5806 1400 0c01 4433 3639
.......
root@mail:~# hdparm --read-sector 148453280 /dev/sda
/dev/sda:
reading sector 148453280: succeeded
bb10 5600 0c00 0102 2e00 0000 ba10 5600
3000 0202 2e2e 0000 bc10 5600 2400 1c01
root@mail:~# hdparm --read-sector 1285929908 /dev/sda
/dev/sda:
reading sector 1285929908: SG_IO: bad/missing sense data, sb[]: 70 00 03 00 00 00 00 0a 40 51 e0 01 11 04 00 00 a0 b4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
succeeded
0000 0000 0000 0000 0000 0000 0000 0000
Спасибо!
bearpuh
В том и дело, сектора читаются
Чтобы в этом убедиться, необходимо проверить.
Как, я уже написал.
Я бы еще подключил к другому контроллеру/компу для проверки.
Sly_tom_cat
Беды в SMART заменой контроллера не решить.
Контроллер это обычно вылезает в :
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
Но тут чисто.
bearpuh
Беды в SMART заменой контроллера не решить.
Согласен. Смутило просто.
SG_IO: bad/missing sense data
snowin
В том и дело, сектора читаются (или я ошибаюсь, прошу поправить)
ты ошибаешься
ReNzRv
Проверять и лечить лучше с загрузочного образа Seagate Tools for DOS
командами Zero All (затирает все сектора) и Long Test (DST) — полная проверка всех секторов с переопределением бэдов на уровне контроллера диска.
p4sh
Z
man hdparm
--read-sector
Reads from the specified sector number, and dumps the contents in hex to standard output. The sector number must be given (base10) after this option. hdparm will issue a
low-level read (completely bypassing the usual block layer read/write mechanisms) for the specified sector. This can be used to definitively check whether a given sector
is bad (media error) or not (doing so through the usual mechanisms can sometimes give false positives).
ты ошибаешься
Мне не понятно, могли бы объяснить подробнее, почему при чтении hdparm получаем «SUCCESS», но сектора «не читаемые»? Это негожий софт?
« Последнее редактирование: 15 Августа 2018, 09:49:18 от p4sh »
bearpuh
hdparm получаем «SUCCESS», но сектора «не читаемые»?
А сколько времени затрачивается на чтение этого сектора?
По какому принципу та же victoria hdd определяет статус сектора — «bad»?
Прочтите это сектор викторией, возможно станет понятней.
Хотите теории, вот она, от автора smartmontools — https://www.smartmontools.org/wiki/BadBlockHowto#ext2ext3secondexample
Пользователь добавил сообщение 15 Августа 2018, 10:13:40:
Вот еще обратите внимание.
У вас несколько секторов на обоих дисках кандидаты на перемещение.
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 2
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 4
Пользователь добавил сообщение 15 Августа 2018, 10:15:17:
Им можно «дать пинка» — force rellocation.
Инфа есть в вышеуказанной ссылке по smartmontools.
« Последнее редактирование: 15 Августа 2018, 10:15:17 от bearpuh »
snowin
Им можно «дать пинка» — force rellocation.
достаточно просто записать в них и заново считать, можно несколько раз
если это бэды, винч сам их переместит, в противном случае это просто так называемые «софтовые бэды» и они должны будут исчезнуть из смарта
p4sh
Что я сделал:
загрузился с live usb, собрал массив и проверил ФС:
e2fsck -ct /dev/…
Прогнал тесты еще раз.
Перезагрузился и мониторю состояние ФС.
Также обновился smart:
Изменился Multi_Zone_Error_Rate
Остался на sda 1 сектор на перемещение: Current_Pending_Sector 1
/dev/sda
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 24
3 Spin_Up_Time POS--K 179 172 021 - 4033
4 Start_Stop_Count -O--CK 099 099 000 - 1401
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 072 072 000 - 20819
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 876
192 Power-Off_Retract_Count -O--CK 199 199 000 - 758
193 Load_Cycle_Count -O--CK 200 200 000 - 642
194 Temperature_Celsius -O---K 102 081 000 - 45
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 1
198 Offline_Uncorrectable ----CK 200 200 000 - 1
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 1
/dev/sdb
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 209
3 Spin_Up_Time POS--K 191 173 021 - 3416
4 Start_Stop_Count -O--CK 099 099 000 - 1680
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 060 060 000 - 29215
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 773
192 Power-Off_Retract_Count -O--CK 200 200 000 - 650
193 Load_Cycle_Count -O--CK 200 200 000 - 1029
194 Temperature_Celsius -O---K 103 091 000 - 44
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 200 200 000 - 4
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 3
Сейчас проверю с помощью Victoria (она же делала авторемап вроде).
Спасибо всем за ответы — очень полезный топ для меня!
snowin
Сейчас проверю с помощью Victoria (она же делала авторемап вроде).
ремап тебе не нужен
кабель, для начало, поменяй
на обоих винтах
по поводу
В том и дело, сектора читаются (или я ошибаюсь, прошу поправить):
кажется, что ты совсем не понимаешь, что делаешь и для чего
ты берешь случайный сектор на диске и проверяешь его утилитой hdparm на чтение и утверждаешь что он читается
в то время как проблемные сектора ты не проверяешь
тем не менее твои случайные, безрассудные действия (переборка рейда) привели к более хорошим результатам
но это топорный метод
« Последнее редактирование: 16 Августа 2018, 15:17:24 от snowin »
- Печать
Страницы: [1] 2 Все Вверх
I just ordered a new server with a 1TB Samsung SSD. Installed Ubuntu 14.04.5 LTS.
After booting into the newly installed system, I see this in my dmesg and /var/lib/syslog. Output of grep error /var/log/syslog
:
May 12 03:47:34 lf5 kernel: [ 0.373789] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:47:34 lf5 kernel: [ 10.382147] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [ 10.382152] res 40/00:e0:f8:69:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 10.712517] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [ 10.712521] res 40/00:d0:38:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.119541] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [ 11.119545] res 40/00:40:30:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526336] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 03:47:34 lf5 kernel: [ 11.526341] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526345] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526348] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526351] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 21.349950] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 03:51:10 lf5 kernel: [ 0.389787] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:51:10 lf5 kernel: [ 10.906423] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 10.906429] res 40/00:80:08:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 11.488276] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 11.488281] res 40/00:c0:28:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 11.960792] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 11.960796] res 40/00:b8:b0:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 12.366482] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 12.366486] res 40/00:60:e0:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 20.918620] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:07:19 lf5 kernel: [ 0.390011] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:07:19 lf5 kernel: [ 10.349119] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 10.349124] res 40/00:88:a8:6d:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 10.738449] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 10.738453] res 40/00:20:60:6b:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 11.072972] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 11.072976] res 40/00:60:50:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 11.471777] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 11.471781] res 40/00:48:c8:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 20.651217] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:18:16 lf5 kernel: [ 0.389808] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:18:17 lf5 kernel: [ 10.762352] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [ 10.762360] res 40/00:40:08:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 11.338565] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 11.338569] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 11.338572] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 11.338576] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 20.087229] res 41/84:08:b8:14:7d/00:00:63:00:00/00 Emask 0x410 (ATA bus error) <F>
May 12 17:18:17 lf5 kernel: [ 20.298295] ata8.00: error: { ICRC ABRT }
May 12 17:18:17 lf5 kernel: [ 21.176551] sd 7:0:0:0: [sda] tag#0 Add. Sense: Scsi parity error
May 12 17:18:17 lf5 kernel: [ 21.316632] blk_update_request: I/O error, dev sda, sector 1669074520
May 12 17:18:17 lf5 kernel: [ 21.542013] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [ 21.759477] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.052681] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.347138] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.642363] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.938868] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 23.239764] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 23.542336] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 23.840288] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 24.138769] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 24.439063] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 24.740494] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.047057] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.354884] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.662079] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.967498] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 26.273208] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 26.579035] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 26.884890] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 27.190868] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 27.496523] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 27.801825] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 28.106876] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 28.412223] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 28.717662] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.022620] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.326675] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.629826] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.932271] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 30.234666] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 30.537024] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 31.765128] blk_update_request: I/O error, dev sda, sector 1669071496
May 12 17:18:17 lf5 kernel: [ 32.143969] blk_update_request: I/O error, dev sda, sector 1669071504
May 12 17:18:17 lf5 kernel: [ 32.527171] blk_update_request: I/O error, dev sda, sector 1669071512
May 12 17:18:17 lf5 kernel: [ 32.915371] blk_update_request: I/O error, dev sda, sector 1669071544
May 12 17:18:17 lf5 kernel: [ 33.308218] blk_update_request: I/O error, dev sda, sector 1669071552
May 12 17:18:17 lf5 kernel: [ 33.706503] blk_update_request: I/O error, dev sda, sector 1669071520
May 12 17:18:17 lf5 kernel: [ 34.108892] blk_update_request: I/O error, dev sda, sector 1669071528
May 12 17:18:17 lf5 kernel: [ 34.516541] blk_update_request: I/O error, dev sda, sector 1669071536
May 12 17:18:17 lf5 kernel: [ 34.929267] blk_update_request: I/O error, dev sda, sector 1669071368
May 12 17:18:17 lf5 kernel: [ 35.347838] blk_update_request: I/O error, dev sda, sector 1669071376
May 12 17:18:17 lf5 kernel: [ 36.004437] res 41/04:a8:90:d2:89/00:00:5f:00:00/00 Emask 0x401 (device error) <F>
May 12 17:18:17 lf5 kernel: [ 36.257143] ata8.00: error: { ABRT }
May 12 17:18:17 lf5 kernel: [ 37.681581] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 17:18:17 lf5 kernel: [ 37.681586] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681590] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681593] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681596] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681599] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681602] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681605] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681608] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681611] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681615] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681618] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681621] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681624] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681627] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681630] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681633] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681636] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681639] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681642] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681645] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681649] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681652] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681655] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 38.005003] blk_update_request: I/O error, dev sda, sector 1891370112
May 12 17:18:17 lf5 kernel: [ 38.005009] blk_update_request: I/O error, dev sda, sector 1891370120
May 12 17:18:17 lf5 kernel: [ 38.005013] blk_update_request: I/O error, dev sda, sector 1891370128
May 12 17:18:17 lf5 kernel: [ 38.005017] blk_update_request: I/O error, dev sda, sector 1891370136
May 12 17:18:17 lf5 kernel: [ 38.005021] blk_update_request: I/O error, dev sda, sector 1891370144
May 12 17:18:17 lf5 kernel: [ 38.005025] blk_update_request: I/O error, dev sda, sector 1891370152
May 12 17:18:17 lf5 kernel: [ 38.005029] blk_update_request: I/O error, dev sda, sector 1891370160
May 12 17:18:17 lf5 kernel: [ 38.005032] blk_update_request: I/O error, dev sda, sector 1891370168
May 12 17:18:17 lf5 kernel: [ 38.005036] blk_update_request: I/O error, dev sda, sector 1891370176
May 12 17:18:17 lf5 kernel: [ 38.005040] blk_update_request: I/O error, dev sda, sector 1891370184
May 12 17:18:17 lf5 kernel: [ 49.093973] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
I am mostly concerned about these entries: blk_update_request: I/O error, dev sda, sector xxxxxxxxxxx
I ran badblocks -v /dev/sda
which returned no errors.
I then ran smartctl --all /dev/sda
, which also returned no errors. See output below. This one includes a short self test
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 850 EVO 1TB
Serial Number: S3PHNF0JC00710K
LU WWN Device Id: 5 002538 d428254a0
Firmware Version: EMT03B6Q
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat May 12 19:08:22 2018 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 512) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 8
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 31
177 Wear_Leveling_Count 0x0013 100 100 000 Pre-fail Always - 0
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 099 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 069 067 000 Old_age Always - 31
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 099 099 000 Old_age Always - 20
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 25
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 55078112
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 8 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
255 116055040 116120575 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
My question is simple: What do you think might be wrong? The SSD should be brand new. It’s hard for me, in good conscience, to put this server into production with those errors in the logs. And the box is otherwise acting normal.
На прошивке v2.07(AAUW.5)C3 так же не мог подключиться с Nexus5 через приложение my.keenetic к роутеру ни через QR-код, ни через Wi-Fi. Выдавалось сообщение что-то типа «Идет подключение к Wi-Fi» (хотя телефон уже был подключен к Wi-Fi), и ничего не происходило.
Обновился до v2.09(AAUW.3)A0 и стало гораздо лучше:
Интернет на мобильных устройствах стал работать нормально, сайты открываются быстро.
Подключиться через приложение my.keenetic к роутеру после обновления получилось, управление доступно.
Но проблемы все еще есть.
1. Пинг c ПК до роутера — 1мс (изредка 2-4мс). Канал 2.4Ghz.
Пинг с Nexus5 до роутера скачет от 9 до 20мс. Проверял на 2.4Ghz и 5Ghz
Пинг с ПК до сайта 2ip.ru (178.63.151.224) — 55-60 мс.
Пинг с Nexus5 до сайта 2ip.ru (178.63.151.224) — 65-75 мс.
Условия тестирования: ПК, телефон и роутер в одной комнате, расстояние 1,5 метра между ними.
2. Ошибки в логах остались:
Jan 07 14:06:32ndmkernel: end_request: critical target error, dev sda, sector 0
Jan 07 14:06:32ndmCore::Syslog: last message repeated 21 times.
Jan 07 14:07:02ndmkernel: end_request: critical target error, dev sda, sector 0
Jan 07 14:07:02ndmCore::Syslog: last message repeated 23 times.
Edited January 7, 2017 by Cobain
Thanks for the links to that thread, really insightful.
My version info:
modinfo mpt3sas
filename: /lib/modules/4.19.24-Unraid/kernel/drivers/scsi/mpt3sas/mpt3sas.ko.xz
alias: mpt2sas
version: 26.100.00.00
Also tried some commands from the mentioned thread.
[email protected]:~# hdparm -I /dev/sdm | grep TRIM
* Data Set Management TRIM supported (limit 8 blocks)
* Deterministic read ZEROs after TRIM
[email protected]:~# hdparm -I /dev/sdj | grep TRIM
* Data Set Management TRIM supported (limit 1 block)
* Deterministic read data after TRIM
[email protected]:~# hdparm -I /dev/sdl | grep TRIM
* Data Set Management TRIM supported (limit 8 blocks)
[email protected]:~# hdparm -I /dev/sdk | grep TRIM
* Data Set Management TRIM supported (limit 1 block)
* Deterministic read data after TRIM
[email protected]:~# fstrim -av
/etc/libvirt: 926.5 MiB (971513856 bytes) trimmed on /dev/loop3
/var/lib/docker: 13.7 GiB (14724616192 bytes) trimmed on /dev/loop2
fstrim: /mnt/cache: FITRIM ioctl failed: Remote I/O error
The reason I say only the intel doesn’t work is because the unraid logs only mentions the intel one(sdj) as having problems.
Feb 25 16:50:01 unraid kernel: print_req_error: critical target error, dev sdj, sector 232785982
Feb 25 16:50:01 unraid kernel: BTRFS warning (device sdj1): failed to trim 1 device(s), last error -121
I see that the /mnt/cache/ is mounted from /dev/sdj1. could that be why it complains about sdj above? Thought it would have said sdj1 on both errors.
Seems like it’s time to look for one of those 9300-8i soon. Thanks for your help.
Edited February 26, 2019 by Nischi
A friend of mine gave me his external 2TB Seagate HDD which appeared to be somewhat faulty.
And, it is indeed pretty faulty.
First, I did try a lot of «common» commands, spent a few hours googling stuff, tried Linux and Windows (for chkdsk), opened the HDD case to plug it directly in SATA and I’ll add that I do not need to recover the data, I just need to format it.
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1,8T 0 disk
Here, sda
is the disk, its size, 1,8T seems correct.
In GParted, the disk only appears to be ~1.9GB. I can create a partition table but I cannot create a valid partition. And even if I could, it could only be 1.9GB.
dd if=/dev/zero of=/dev/sda
dd: error writing '/dev/sda': No space left on device
3782129+0 records in
3782128+0 records out
1936449536 bytes (1,9 GB, 1,8 GiB) copied, 7,04022 s, 275 MB/s
smartctl -a /dev/sda
Read Device Identity failed: Invalid argument
parted -l
Error: Unable to open /dev/sda - unrecognised disk label.
Model: (file)
Disk /dev/sda : 1936MB
Sector size (logical/physical): 512B/512B
Partition table : unknown
dmesg
[ 7925.612174] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7925.862625] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 7931.193045] sd 2:0:0:0: [sda] 3809353968 512-byte logical blocks: (1.95 TB/1.77 TiB)
[ 7931.193049] sd 2:0:0:0: [sda] 4096-byte physical blocks
[ 7931.193313] sd 2:0:0:0: [sda] Write Protect is off
[ 7931.193316] sd 2:0:0:0: [sda] Mode Sense: 2f 00 00 00
[ 7931.193593] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 7931.193995] sd 2:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
[ 7931.390515] sd 2:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.390523] sd 2:0:0:0: [sda] tag#18 Sense Key : Illegal Request [current]
[ 7931.390529] sd 2:0:0:0: [sda] tag#18 Add. Sense: Invalid command operation code
[ 7931.390536] sd 2:0:0:0: [sda] tag#18 CDB: Read(6) 08 00 00 00 08 00
[ 7931.390545] blk_update_request: critical target error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 7931.390558] Buffer I/O error on dev sda, logical block 0, async page read
[ 7931.500384] sd 2:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.500451] sd 2:0:0:0: [sda] tag#19 Sense Key : Illegal Request [current]
[ 7931.500461] sd 2:0:0:0: [sda] tag#19 Add. Sense: Invalid command operation code
[ 7931.500472] sd 2:0:0:0: [sda] tag#19 CDB: Read(6) 08 00 00 00 08 00
Do you have any idea? I guess the HDD may be dead, but I’m not quite sure.
What I find intriguing is the 1.8TB size with lsblk
and 1.9GB elsewhere.
And again, I do not need to recover previous data (and since I did write a lot of 0’s, they’re probably gone for good :p). I just want to format the disk to make it usable again.
Thanks for your time