When running smartctl
on your hard drive, you often get a plethora of information that can be hard to interpret for unexperienced users. This post attempts to provide aid in interpreting what the technical reasons behind the error messages are. If you’re looking for advice on whether to replace your hard drive, the only guidance I can give you is it might fail any time, so better backup your data, but it might also run for many years to come.. Furthermore, this article does not describe basic SMART WHEN_FAILED
checking but rather interpretation of more subtle signs of possibly impending HDD failures.
One example that is particularly hard to interpret is the device error log storing the last few errors, for example
Error 8910 occurred at disk power-on lifetime: 7257 hours (302 days + 9 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 1a 00 33 96 61 Error: UNC at LBA = 0x01963300 = 26620672 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 08 18 00 33 96 40 00 03:09:52.125 READ FPDMA QUEUED 60 88 10 50 06 11 40 00 03:09:52.125 READ FPDMA QUEUED 60 08 08 60 ac 5e 40 00 03:09:52.113 READ FPDMA QUEUED 60 08 00 48 cf 6d 40 00 03:09:52.099 READ FPDMA QUEUED 60 90 f0 b0 ef e5 40 00 03:09:52.065 READ FPDMA QUEUED
Obviously, the first line shows when this error occured. The other lines, however, are not as obvious. Let’s examine the next section:
After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 1a 00 33 96 61 Error: UNC at LBA = 0x01963300 = 26620672
While this section also shows the content of some registers while the error occured, the interesting part of it is the error description Error:
UNC
at
LBA
= 0x01963300 = 26620672.
A LBA is a logical block address, i.e. some logical address on the hard drive. It is shown in both hexadecimal form 0x01963300
and in decimal form 26620672
. In order to convert it to a byte address, you need to multiply it by the value listed at the head of the smartctl
output:
Sector Size: 512 bytes logical/physical
In almost any case, this value is 512 bytes, so in this example the byte offset would be 26620672 * 512 = 13629784064 = 12.69 GiB
. In some cases it might be helpful to look up this address in a tool like GParted to see in which partition the error occured in. Also see this smartmontools HOWTO describing this process in detail.
UNC errors
The error message now tells us than an error called UNC
occured at this LBA. UNC is shorthand for UNCorrectable, which means the data which has been read from the hard drive at this LBA was damaged and could not be corrected.
Hard drives not only store your data by itself, but automatically compute a so-called error-correction code (ECC). While there are many subtypes of those mathematical codes, they have one aspect in common: Given a set of bytes (e.g. the ones stored on the hard drive) which might be slightly damaged (i.e. some 0-bits are now-1 bits or vice versa) and and the matching ECC code (constituting of a few extra bytes) a suitable decoder can recover a limited number of bit errors. In most cases, ECC codes can also detect errors – for example, one specific ECC code might be able to correct one bit flip in two bytes, but it can detect up to three bitflips in two bytes.
If there are more bitflips than the ECC can recover (but not more than it can detect), this results in an unrecoverable error – the UNC. If there are more bitflips than the ECC can detect, anything might happen: Usually, the data that is computed from the ECC will be damaged, or no error might be detected at all.
Note that this explanation is highly simplified. For example, ECC codes are not stored as bytes separate from the data, but instead a mathematical function is computed on the data, resulting in a set of bytes that is larger that the original dataset – containing both the data itself plus the error-recovery extra data. In other words, the ECC data plus the data itself are mixed together.
This has multiple consequences for the interpretation. Firstly, this means that physically the data could be read, yet it does not seem to be correct. This means
Other error messages
While UNC errors occur reasonably often, there are other, more rare errors that you can’t find too much documentation about.
There is one definitive source for all smartctl
error messages: The smartmontools
source code.
We can find the error descriptions in ataprint.cpp
(also see the GPL license information in the source tarball):
const char *abrt = "ABRT"; // ABORTED const char *amnf = "AMNF"; // ADDRESS MARK NOT FOUND const char *ccto = "CCTO"; // COMMAND COMPLETION TIMED OUT const char *eom = "EOM"; // END OF MEDIA const char *icrc = "ICRC"; // INTERFACE CRC ERROR const char *idnf = "IDNF"; // ID NOT FOUND const char *ili = "ILI"; // MEANING OF THIS BIT IS COMMAND-SET SPECIFIC const char *mc = "MC"; // MEDIA CHANGED const char *mcr = "MCR"; // MEDIA CHANGE REQUEST const char *nm = "NM"; // NO MEDIA const char *obs = "obs"; // OBSOLETE const char *tk0nf = "TK0NF"; // TRACK 0 NOT FOUND const char *unc = "UNC"; // UNCORRECTABLE const char *wp = "WP"; // WRITE PROTECTED
Realistically, you’ll only encounter a few of these errors even if you are working with hard disks professionally. Some of these errors like MC
, MCR
or NM
are also related to hot-swapping of hard drives and do not neccessarily represent errors related to hard drive health itself.
One important error is ICRC
– the interface CRC error. This means that there are errors being detected on the IDE/SATA or PCIe bus the hard drive is connected to. Although this is rare and might be caused by the HDD itself, it might mean that your chipset (the hardware controlling e.g. SATA) is damaged – in this case, replacing the hard drive would not fix the issue. Possibly there is also an intermittent cable connection.
How severe are those errors?
Over the life of most hard drives, especially consumer models, errors will occur – more often so in portable devices where high acceleration forces are more like to be encountered.
What separates a good hard drive from one at the end of its life (excluding those that fail without warning) is often the frequency of new errors. If you look at the total lifetime of the HDD, i.e. Power_On_Hours
or similar:
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 8586
and compare the value (in this case 8586
) with the lifetime at the last error,
Error 8911 occurred at disk power-on lifetime: 7257 hours
in this case, 7257
, you can see over a thousand HDD operational hours have passed since the last error. This indicates that there is no mechanical defect which could result in destruction of the hard drive but rather a couple of defective or damaged sectors. UNC errors do not even neccessarily mean that the sectors are physically damaged.
Often hard drive errors are triggered when a files that are accessed very rarely (such as archived video files that are only opened every few years). When there are enough bit flips in such files for any reason, this can result in a larger number of HDD errors appearing at once.
Another indicator is the total number of errors the hard drive has encountered, i.e. 8911
in
Error 8911 occurred at disk power-on lifetime: 7257 hours
or in
ATA Error Count: 8911 (device log contains only the most recent five errors)
While this number is not shown for all hard drives, a very high number or a number which is growing rapidly indicates there is some physical issue with the drive. Issues relating to only a few bad sectors induce a sudden jump in the error counter, but after that. Note, however, that there can be other reasons for a high error counter, for example a bad or intermittent physical connection to the hard drive.
Also see this previous post on how to fix bad HDD sectors.
-
#1
Hi,
Five year old system (disks replaced three years ago) Specs:
MB: Supermicro X11SSM-F
CPU: Intel Xeon E3-1220 V5 Skylake
RAM: 32 Gb ECC: 2x Samsung DDR4-2133 CL15 ECC SC — 16GB
Boot: Supermicro SATA DOM
Disks: 6x WDC WD60EFRX-68L0BN1 i raidz2
OS: TrueNAS-12.0-RELEASE (f862218137)
Received this error today after scrub:
Code:
root@freenas:~ # zpool status pool: freenas-boot state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: scrub repaired 0B in 00:01:23 with 0 errors on Fri Jun 11 03:46:23 2021 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub repaired 8K in 10:37:45 with 0 errors on Tue Jun 15 12:37:48 2021 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/788fc1ad-5f48-11e8-bd8c-0cc47ab4d97a ONLINE 0 0 0 gptid/13a2c7ac-5f91-11e8-be7a-0cc47ab4d97a ONLINE 0 0 0 gptid/28cd3c61-5fd8-11e8-b6db-0cc47ab4d97a ONLINE 0 0 0 gptid/e0c32132-601c-11e8-8ec6-0cc47ab4d97a ONLINE 0 0 0 gptid/9a4156da-605d-11e8-b0f3-0cc47ab4d97a ONLINE 3 0 0 gptid/60556717-60a4-11e8-bb0f-0cc47ab4d97a ONLINE 0 0 0 errors: No known data errors root@freenas:~ #
Smartctl shows:
Code:
root@freenas:~ # smartctl -x /dev/ada5 smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RC3 amd64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD60EFRX-68L0BN1 Serial Number: WD-WX31D87DYY38 LU WWN Device Id: 5 0014ee 20f4b16d2 Firmware Version: 82.00A82 User Capacity: 6,001,175,126,016 bytes [6.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5700 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Jun 15 14:29:34 2021 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 6344) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 717) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 197 197 021 - 9133 4 Start_Stop_Count -O--CK 100 100 000 - 32 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 100 253 000 - 0 9 Power_On_Hours -O--CK 064 064 000 - 26369 10 Spin_Retry_Count -O--CK 100 253 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 31 192 Power-Off_Retract_Count -O--CK 200 200 000 - 6 193 Load_Cycle_Count -O--CK 200 200 000 - 565 194 Temperature_Celsius -O---K 117 106 000 - 35 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x04 GPL,SL R/O 8 Device Statistics log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x0c GPL R/O 2048 Pending Defects log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb6 GPL,SL VS 1 Device vendor specific log 0xb7 GPL,SL VS 54 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 5 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 5 [4] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 40 00 a0 00 02 8e d0 96 20 40 08 26d+00:18:12.496 READ FPDMA QUEUED 60 00 08 00 98 00 02 4c 9d c3 c8 40 08 26d+00:18:12.496 READ FPDMA QUEUED 60 00 50 00 90 00 01 dd 8c 4f 28 40 08 26d+00:18:12.496 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 26d+00:18:12.495 READ LOG EXT 60 00 40 00 80 00 02 8e d0 96 20 40 08 26d+00:18:08.000 READ FPDMA QUEUED Error 4 [3] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 40 00 80 00 02 8e d0 96 20 40 08 26d+00:18:08.000 READ FPDMA QUEUED 60 00 08 00 78 00 02 4c 9d c3 c8 40 08 26d+00:18:08.000 READ FPDMA QUEUED 60 00 50 00 70 00 01 dd 8c 4f 28 40 08 26d+00:18:08.000 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 26d+00:18:07.999 READ LOG EXT 60 00 40 00 60 00 02 8e d0 96 20 40 08 26d+00:18:03.505 READ FPDMA QUEUED Error 3 [2] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 40 00 60 00 02 8e d0 96 20 40 08 26d+00:18:03.505 READ FPDMA QUEUED 60 00 08 00 58 00 02 4c 9d c3 c8 40 08 26d+00:18:03.505 READ FPDMA QUEUED 60 00 50 00 50 00 01 dd 8c 4f 28 40 08 26d+00:18:03.505 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 26d+00:18:03.503 READ LOG EXT 61 00 10 00 40 00 01 f3 8a ae 48 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED Error 2 [1] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: WP at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 00 10 00 40 00 01 f3 8a ae 48 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED 61 00 40 00 38 00 01 f3 8a 1b 18 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED 61 00 40 00 30 00 01 f3 8a 1a c0 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED 60 00 40 00 28 00 02 8e d0 96 20 40 08 26d+00:17:59.020 READ FPDMA QUEUED 60 00 08 00 20 00 02 4c 9d c3 c8 40 08 26d+00:17:59.019 READ FPDMA QUEUED Error 1 [0] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 08 00 d8 00 01 dd 8c 60 f0 40 08 26d+00:17:54.568 READ FPDMA QUEUED 60 00 50 00 d0 00 01 dd 8c 4f 28 40 08 26d+00:17:54.562 READ FPDMA QUEUED 60 00 38 00 c8 00 01 dd 8c 4d 20 40 08 26d+00:17:54.558 READ FPDMA QUEUED 60 00 08 00 c0 00 01 dd 8b ff 20 40 08 26d+00:17:54.552 READ FPDMA QUEUED 60 00 08 00 b8 00 01 dd 8b fb 38 40 08 26d+00:17:54.546 READ FPDMA QUEUED SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 127 - # 2 Extended offline Completed without error 00% 13 - # 3 Extended offline Aborted by host 90% 0 - # 4 Conveyance offline Completed without error 00% 0 - # 5 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) Device State: Active (0) Current Temperature: 35 Celsius Power Cycle Min/Max Temperature: 20/38 Celsius Lifetime Min/Max Temperature: 20/46 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (280) Index Estimated Time Temperature Celsius 281 2021-06-15 06:32 37 ****************** ... ..(136 skipped). .. ****************** 418 2021-06-15 08:49 37 ****************** 419 2021-06-15 08:50 36 ***************** ... ..( 22 skipped). .. ***************** 442 2021-06-15 09:13 36 ***************** 443 2021-06-15 09:14 35 **************** ... ..( 80 skipped). .. **************** 46 2021-06-15 10:35 35 **************** 47 2021-06-15 10:36 38 ******************* ... ..(106 skipped). .. ******************* 154 2021-06-15 12:23 38 ******************* 155 2021-06-15 12:24 37 ****************** ... ..(124 skipped). .. ****************** 280 2021-06-15 14:29 37 ****************** SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 2) == 0x01 0x008 4 31 --- Lifetime Power-On Resets 0x01 0x010 4 26369 --- Power-on Hours 0x01 0x018 6 71888865320 --- Logical Sectors Written 0x01 0x020 6 1818551784 --- Number of Write Commands 0x01 0x028 6 535154745801 --- Logical Sectors Read 0x01 0x030 6 2580298900 --- Number of Read Commands 0x03 ===== = = === == Rotating Media Statistics (rev 1) == 0x03 0x008 4 26188 --- Spindle Motor Power-on Hours 0x03 0x010 4 26175 --- Head Flying Hours 0x03 0x018 4 572 --- Head Load Events 0x03 0x020 4 0 --- Number of Reallocated Logical Sectors 0x03 0x028 4 476 --- Read Recovery Attempts 0x03 0x030 4 0 --- Number of Mechanical Start Failures 0x04 ===== = = === == General Errors Statistics (rev 1) == 0x04 0x008 4 5 --- Number of Reported Uncorrectable Errors 0x04 0x010 4 0 --- Resets Between Cmd Acceptance and Completion 0x05 ===== = = === == Temperature Statistics (rev 1) == 0x05 0x008 1 35 --- Current Temperature 0x05 0x010 1 36 --- Average Short Term Temperature 0x05 0x018 1 32 --- Average Long Term Temperature 0x05 0x020 1 46 --- Highest Temperature 0x05 0x028 1 20 --- Lowest Temperature 0x05 0x030 1 43 --- Highest Average Short Term Temperature 0x05 0x038 1 21 --- Lowest Average Short Term Temperature 0x05 0x040 1 40 --- Highest Average Long Term Temperature 0x05 0x048 1 26 --- Lowest Average Long Term Temperature 0x05 0x050 4 0 --- Time in Over-Temperature 0x05 0x058 1 60 --- Specified Maximum Operating Temperature 0x05 0x060 4 0 --- Time in Under-Temperature 0x05 0x068 1 0 --- Specified Minimum Operating Temperature 0x06 ===== = = === == Transport Statistics (rev 1) == 0x06 0x008 4 168 --- Number of Hardware Resets 0x06 0x010 4 62 --- Number of ASR Events 0x06 0x018 4 0 --- Number of Interface CRC Errors |||_ C monitored condition met ||__ D supports DSN |___ N normalized value Pending Defects log (GP Log 0x0c) Index LBA Hours 0 4303484304 - 1 4303484305 - 2 4303484306 - 3 4303484307 - 4 4303484308 - 5 4303484309 - 6 4303484310 - 7 4303484311 - 8 4303484312 - 9 4303484313 - 10 4303484314 - 11 4303484315 - 12 4303484316 - 13 4303484317 - 14 4303484318 - 15 4303484319 - 16 4321384408 - 17 4321384409 - 18 4321384410 - 19 4321384411 - 20 4321384412 - 21 4321384413 - 22 4321384414 - 23 4321384415 - 24 4321384424 - 25 4321384425 - 26 4321384426 - 27 4321384427 - 28 4321384428 - 29 4321384429 - 30 4321384430 - ... (937 entries not shown) SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 4 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 0 Non-CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 15166624 Vendor specific
Could the ada5 disk be failing? What should be my next move? Do a smartctl long test on all the disks?
-
#2
Could the ada5 disk be failing? What should be my next move? Do a smartctl long test on all the disks?
Since you haven’t run long smart in a while that would be the first if it was me
-
#3
Since you haven’t run long smart in a while that would be the first if it was me
Thank you. Running the test now.
-
#4
Since you haven’t run long smart in a while that would be the first if it was me
Ok. I ran the test and it failed. Should I assume a broken disk and replace it or something else?
smartctl:
Code:
root@freenas:~ # smartctl -x /dev/ada5 smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RC3 amd64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD60EFRX-68L0BN1 Serial Number: WD-WX31D87DYY38 LU WWN Device Id: 5 0014ee 20f4b16d2 Firmware Version: 82.00A82 User Capacity: 6,001,175,126,016 bytes [6.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5700 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Jun 16 08:33:24 2021 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 118) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: ( 6344) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 717) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 197 197 021 - 9133 4 Start_Stop_Count -O--CK 100 100 000 - 32 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 064 064 000 - 26388 10 Spin_Retry_Count -O--CK 100 253 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 31 192 Power-Off_Retract_Count -O--CK 200 200 000 - 6 193 Load_Cycle_Count -O--CK 200 200 000 - 566 194 Temperature_Celsius -O---K 118 106 000 - 34 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 1 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x04 GPL,SL R/O 8 Device Statistics log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x0c GPL R/O 2048 Pending Defects log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb6 GPL,SL VS 1 Device vendor specific log 0xb7 GPL,SL VS 54 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 5 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 5 [4] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 40 00 a0 00 02 8e d0 96 20 40 08 26d+00:18:12.496 READ FPDMA QUEUED 60 00 08 00 98 00 02 4c 9d c3 c8 40 08 26d+00:18:12.496 READ FPDMA QUEUED 60 00 50 00 90 00 01 dd 8c 4f 28 40 08 26d+00:18:12.496 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 26d+00:18:12.495 READ LOG EXT 60 00 40 00 80 00 02 8e d0 96 20 40 08 26d+00:18:08.000 READ FPDMA QUEUED Error 4 [3] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 40 00 80 00 02 8e d0 96 20 40 08 26d+00:18:08.000 READ FPDMA QUEUED 60 00 08 00 78 00 02 4c 9d c3 c8 40 08 26d+00:18:08.000 READ FPDMA QUEUED 60 00 50 00 70 00 01 dd 8c 4f 28 40 08 26d+00:18:08.000 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 26d+00:18:07.999 READ LOG EXT 60 00 40 00 60 00 02 8e d0 96 20 40 08 26d+00:18:03.505 READ FPDMA QUEUED Error 3 [2] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 40 00 60 00 02 8e d0 96 20 40 08 26d+00:18:03.505 READ FPDMA QUEUED 60 00 08 00 58 00 02 4c 9d c3 c8 40 08 26d+00:18:03.505 READ FPDMA QUEUED 60 00 50 00 50 00 01 dd 8c 4f 28 40 08 26d+00:18:03.505 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 26d+00:18:03.503 READ LOG EXT 61 00 10 00 40 00 01 f3 8a ae 48 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED Error 2 [1] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: WP at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 00 10 00 40 00 01 f3 8a ae 48 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED 61 00 40 00 38 00 01 f3 8a 1b 18 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED 61 00 40 00 30 00 01 f3 8a 1a c0 40 08 26d+00:17:59.020 WRITE FPDMA QUEUED 60 00 40 00 28 00 02 8e d0 96 20 40 08 26d+00:17:59.020 READ FPDMA QUEUED 60 00 08 00 20 00 02 4c 9d c3 c8 40 08 26d+00:17:59.019 READ FPDMA QUEUED Error 1 [0] occurred at disk power-on lifetime: 26360 hours (1098 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 dd 8c 4f 70 40 00 Error: UNC at LBA = 0x1dd8c4f70 = 8011927408 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 08 00 d8 00 01 dd 8c 60 f0 40 08 26d+00:17:54.568 READ FPDMA QUEUED 60 00 50 00 d0 00 01 dd 8c 4f 28 40 08 26d+00:17:54.562 READ FPDMA QUEUED 60 00 38 00 c8 00 01 dd 8c 4d 20 40 08 26d+00:17:54.558 READ FPDMA QUEUED 60 00 08 00 c0 00 01 dd 8b ff 20 40 08 26d+00:17:54.552 READ FPDMA QUEUED 60 00 08 00 b8 00 01 dd 8b fb 38 40 08 26d+00:17:54.546 READ FPDMA QUEUED SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 60% 26378 4303484304 # 2 Extended offline Completed without error 00% 127 - # 3 Extended offline Completed without error 00% 13 - # 4 Extended offline Aborted by host 90% 0 - # 5 Conveyance offline Completed without error 00% 0 - # 6 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) Device State: Active (0) Current Temperature: 34 Celsius Power Cycle Min/Max Temperature: 20/38 Celsius Lifetime Min/Max Temperature: 20/46 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (407) Index Estimated Time Temperature Celsius 408 2021-06-16 00:36 35 **************** ... ..(171 skipped). .. **************** 102 2021-06-16 03:28 35 **************** 103 2021-06-16 03:29 34 *************** ... ..( 69 skipped). .. *************** 173 2021-06-16 04:39 34 *************** 174 2021-06-16 04:40 35 **************** ... ..(232 skipped). .. **************** 407 2021-06-16 08:33 35 **************** SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds) Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 2) == 0x01 0x008 4 31 --- Lifetime Power-On Resets 0x01 0x010 4 26388 --- Power-on Hours 0x01 0x018 6 71923078276 --- Logical Sectors Written 0x01 0x020 6 1819607711 --- Number of Write Commands 0x01 0x028 6 535156219835 --- Logical Sectors Read 0x01 0x030 6 2580368784 --- Number of Read Commands 0x03 ===== = = === == Rotating Media Statistics (rev 1) == 0x03 0x008 4 26206 --- Spindle Motor Power-on Hours 0x03 0x010 4 26193 --- Head Flying Hours 0x03 0x018 4 573 --- Head Load Events 0x03 0x020 4 0 --- Number of Reallocated Logical Sectors 0x03 0x028 4 479 --- Read Recovery Attempts 0x03 0x030 4 0 --- Number of Mechanical Start Failures 0x04 ===== = = === == General Errors Statistics (rev 1) == 0x04 0x008 4 5 --- Number of Reported Uncorrectable Errors 0x04 0x010 4 0 --- Resets Between Cmd Acceptance and Completion 0x05 ===== = = === == Temperature Statistics (rev 1) == 0x05 0x008 1 34 --- Current Temperature 0x05 0x010 1 35 --- Average Short Term Temperature 0x05 0x018 1 32 --- Average Long Term Temperature 0x05 0x020 1 46 --- Highest Temperature 0x05 0x028 1 20 --- Lowest Temperature 0x05 0x030 1 43 --- Highest Average Short Term Temperature 0x05 0x038 1 21 --- Lowest Average Short Term Temperature 0x05 0x040 1 40 --- Highest Average Long Term Temperature 0x05 0x048 1 26 --- Lowest Average Long Term Temperature 0x05 0x050 4 0 --- Time in Over-Temperature 0x05 0x058 1 60 --- Specified Maximum Operating Temperature 0x05 0x060 4 0 --- Time in Under-Temperature 0x05 0x068 1 0 --- Specified Minimum Operating Temperature 0x06 ===== = = === == Transport Statistics (rev 1) == 0x06 0x008 4 168 --- Number of Hardware Resets 0x06 0x010 4 62 --- Number of ASR Events 0x06 0x018 4 0 --- Number of Interface CRC Errors |||_ C monitored condition met ||__ D supports DSN |___ N normalized value Pending Defects log (GP Log 0x0c) Index LBA Hours 0 4303484304 - 1 4303484305 - 2 4303484306 - 3 4303484307 - 4 4303484308 - 5 4303484309 - 6 4303484310 - 7 4303484311 - 8 4303484312 - 9 4303484313 - 10 4303484314 - 11 4303484315 - 12 4303484316 - 13 4303484317 - 14 4303484318 - 15 4303484319 - 16 4321384408 - 17 4321384409 - 18 4321384410 - 19 4321384411 - 20 4321384412 - 21 4321384413 - 22 4321384414 - 23 4321384415 - 24 4321384424 - 25 4321384425 - 26 4321384426 - 27 4321384427 - 28 4321384428 - 29 4321384429 - 30 4321384430 - ... (937 entries not shown) SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 4 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 0 Non-CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 15231590 Vendor specific
-
#5
Ok. I ran the test and it failed. Should I assume a broken disk and replace it or something else?
When SMART can’t complete the test with the error shown in the output the disk is dying and should be replaced. If it was aborted by host there might be other things at work, but here it’s the disk.
Remember to do proper burn-in on the replacement
-
#6
When SMART can’t complete the test with the error shown in the output the disk is dying and should be replaced. If it was aborted by host there might be other things at work, but here it’s the disk.
Remember to do proper burn-in on the replacement
Thank you. I RMAd the disk and currently doing burn-in on the new disk.
0
2
Кроха сын к отцу пришел и спросила кроха: «Папа, это хорошо или очень плохо?»
# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.0.0-12-server] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: ST1000DM003-9YN162
Serial Number: Z1D0L2FY
LU WWN Device Id: 5 000c50 03fcee1af
Firmware Version: CC4B
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Fri May 4 10:09:06 2012 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 108) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 105 100 006 Pre-fail Always - 235420362
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 24
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 072 057 030 Pre-fail Always - 8622211739
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 997
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 24
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 082 082 000 Old_age Always - 18
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 074 066 045 Old_age Always - 26 (Min/Max 23/34)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 54
194 Temperature_Celsius 0x0022 026 040 000 Old_age Always - 26 (0 21 0 0)
197 Current_Pending_Sector 0x0012 100 090 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 090 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 107253923316683
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 57499686168628
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 86599643247192
SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18 occurred at disk power-on lifetime: 961 hours (40 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 15d+04:39:26.598 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 15d+04:39:26.598 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 15d+04:39:26.598 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 15d+04:39:26.598 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 15d+04:39:26.598 SET FEATURES [Set transfer mode]
Error 17 occurred at disk power-on lifetime: 961 hours (40 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 15d+04:39:23.672 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 15d+04:39:23.672 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 15d+04:39:23.672 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 15d+04:39:23.672 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 15d+04:39:23.672 SET FEATURES [Set transfer mode]
Error 16 occurred at disk power-on lifetime: 961 hours (40 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 15d+04:39:20.721 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 15d+04:39:20.721 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 15d+04:39:20.721 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 15d+04:39:20.721 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 15d+04:39:20.721 SET FEATURES [Set transfer mode]
Error 15 occurred at disk power-on lifetime: 961 hours (40 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 15d+04:39:17.770 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 15d+04:39:17.770 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 15d+04:39:17.770 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 15d+04:39:17.770 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 15d+04:39:17.770 SET FEATURES [Set transfer mode]
Error 14 occurred at disk power-on lifetime: 961 hours (40 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 15d+04:39:14.811 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 15d+04:39:14.811 SET FEATURES [Reserved for Serial ATA]
27 00 00 00 00 00 e0 00 15d+04:39:14.811 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00 15d+04:39:14.811 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 15d+04:39:14.810 SET FEATURES [Set transfer mode]
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
И снова здравствуйте. Перевод следующей статьи подготовлен специально для студентов курса «Администратор Linux». Поехали!
Что такое S.M.A.R.T.?
S.M.A.R.T. (расшифровывается как Self-Monitoring, Analysis, and Reporting Technology) – это технология, вшитая в накопители, такие как жесткие диски или SSD. Ее основная задача – это мониторинг состояния.
На деле, S.M.A.R.T. контролирует несколько параметров во время обычной работы с диском. Он мониторит такие параметры как количество ошибок чтения, время запуска диска и даже состояние окружающей среды. Помимо этого, S.M.A.R.T. также может проводить тесты с использованием накопителя.
В идеале, S.M.A.R.T. позволит прогнозировать предсказуемые отказы, такие как отказы, вызванные механическим износом или ухудшением состояния поверхности диска, а также непредсказуемые отказы, вызванные каким-либо неожиданным дефектом. Поскольку обычно диски не выходят из строя внезапно, S.M.A.R.T. помогает операционной системе или системному администратору идентифицировать те диски, которые скоро выйдут из строя, чтобы их можно было заменить и избежать потери данных.
Что не относится к S.M.A.R.T.?
Все это, конечно, круто. Однако S.M.A.R.T. – это не хрустальный шар. Он не может спрогнозировать отказ со стопроцентной вероятностью и не может гарантировать, что накопитель не выйдет из строя без предупреждения. В лучшем случае S.M.A.R.T. стоит использовать для оценки вероятности поломки.
Учитывая статистический характер прогнозирования отказов, технология S.M.A.R.T. особенно интересует компании, использующие большое количество устройств для хранения данных. Чтобы выяснить, насколько точно S.M.A.R.T. может прогнозировать отказы и сообщать о необходимости замены дисков в центрах обработки данных или серверных мейнфреймах, даже проводились специальные исследования.
В 2016 году Microsoft и университет штата Пенсильвания провели исследование, связанное с SSD.
Согласно этому исследованию, некоторые атрибуты S.M.A.R.T. считаются хорошими индикаторами неизбежности отказа. В особенности в статье упоминаются:
Счетчик переназначенных (Realloc) секторов:
Несмотря на то, что основополагающие технологии радикально отличаются, этот показатель остается востребованным как в мире SSD, так и в мире жестких дисков. Стоит отметить, что из-за особенностей алгоритмов балансировки износа, используемых в SSD, когда несколько секторов выходят из строя, то с большой вероятностью можно предположить, что скоро выйдут из строя еще больше.
Ошибки в цикле Program/Erase (P/E):
Это признак проблем с основным оборудованием флеш-памяти, связанных с тем, что диск не может удалить данные из блока или сохранить их там. Дело в том, что процесс производства несовершенен, поэтому появление таких ошибок вполне можно ожидать. Однако флеш-память имеет ограниченное число циклов записи/удаления. По этой причине внезапное увеличение числа событий может сигнализировать о том, что диск достигает своего предела, и вполне ожидаемо, что другие ячейки памяти также начнут выходить из строя.
CRC и неисправимые ошибки («Data Error ”):
События такого типа могут быть вызваны ошибками хранения, либо проблемами с внутренним каналом связи накопителя. Этот индикатор учитывает как исправленные ошибки (без проблем сообщенные хост-системе), так и неисправленные ошибки (из-за которых происходит блокировка диска, сообщившего хост-системе о невозможности чтения). Другими словами, исправляемые ошибки невидимы для операционной системы, тем не менее они влияют на производительность накопителя, увеличивая вероятность переназначения сектора.
SATA downshift count:
Из-за временных помех, проблем с каналом связи между накопителем и хостом или из-за внутренних проблем с накопителем, интерфейс SATA может переключиться на более низкую скорость передачи сигналов. Снижение скорости соединения ниже номинального уровня оказывает очевидное влияние на производительность диска. Таким образом, этот показатель является наиболее значимым, в особенности, когда он коррелирует с наличием одного или нескольких предыдущих показателей.
Согласно исследованию, 62% вышедших из строя SSD показали наличие как минимум одного из вышеприведенных симптомов. С другой стороны можно сказать, что 38% изученных накопителей сломались без индикации этих симптомов. В исследованиях не упоминалось, были ли какие-то еще сообщения об отказах от S. M. A. R. T. по другим «симптомам». По этой причине нельзя напрямую сопоставить эти значения с отказом без предупреждения в 36% случаев из статьи от Google.
В исследовании Microsoft и университета штата Пенсильвания не раскрывались модели исследуемых дисков, однако, по словам авторов, большинство дисков поступают от одного и того же поставщика в течение уже нескольких поколений.
В ходе исследования также были отмечены значительные различия в надёжности между различными моделями. Например, «худшая» изученная модель показывает двадцатипроцентную частоту отказов через 9 месяцев после первой ошибки переназначения и до 36-ти процентов отказов в течение 9 месяцев после первого появления ошибок данных. «Худшей» моделью было названо более старое поколение дисков, рассматриваемых в статье.
С другой стороны, с теми же симптомами, что приведены выше, накопители нового поколения отказали в 3% и 20% в соответствии с теми же ошибками. Трудно сказать, можно ли объяснить эти цифры улучшением конструкции накопителя и производственного процесса, или здесь роль играет эффект устаревания накопителя.
Самое интересное, что упоминается в статье (я уже писал об этом ранее), так это то, что увеличение количества зарегистрированных ошибок может случить тревожным индикатором:
«Существует большая вероятность появления симптомов, предшествующих отказу SSD, которые активно себя проявляют и быстро прогрессируют, сильно сокращая время жизни накопителя до нескольких месяцев.»
Другими словами, одна случайная ошибка, о которой сообщил S.M.A.R.T., определенно не должна рассматриваться как сигнал о неизбежном отказе. Однако, когда исправный SSD начинает сообщать о все большем количестве ошибок, следует ждать краткосрочного или среднесрочного сбоя.
Но как узнать, в каком состоянии сейчас ваш SSD? Для удовлетворения своего любопытства, либо из желания начать внимательно следить за своими накопителями, вы можете использовать инструмент мониторинга smartctl
.
Использование smartctl
для мониторинга состояния вашего SSD в Linux
Чтобы следить за S.M.A.R.T статусом вашего диска, я предлагаю использовать инструмент smartctl
, который является частью пакета smartmontool
(по крайней мере на Debian/Ubuntu).
sudo apt install smartmontools
smartctl
– это инструмент командной строки, но это особенно помогает в случаях, когда вам нужно автоматизировать сбор данных, например, с ваших серверов.
Первый шаг в использовании smartctl
– это проверка того, есть ли на вашем диске S.M.A.R.T. и поддерживается ли он инструментом:
sh$ sudo smartctl -i /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 7200.4
Device Model: ST9500420AS
Serial Number: 5VJAS7FL
LU WWN Device Id: 5 000c50 02fa0b800
Firmware Version: D005SDM1
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Mon Mar 12 15:54:43 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Как видите, мой внутренний жесткий диск ноутбука действительно поддерживает S.M.A.R.T. и он включен. Итак, как теперь получить S.M.A.R.T статус? Есть ли какие-то зафиксированные ошибки?
Выдача отчета «о всей S.M.A.R.T. информации о диске» — это опция -a
:
sh$ sudo smartctl -i -a /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 7200.4
Device Model: ST9500420AS
Serial Number: 5VJAS7FL
LU WWN Device Id: 5 000c50 02fa0b800
Firmware Version: D005SDM1
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Mon Mar 12 15:56:58 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 110) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 29694249
3 Spin_Up_Time 0x0003 100 098 085 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 095 095 020 Old_age Always - 5413
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 3
7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Always - 51710773327
9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 26423
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 096 037 020 Old_age Always - 4836
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 072 072 000 Old_age Always - 28
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 4295033738
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 056 042 045 Old_age Always In_the_past 44 (Min/Max 21/44 #22)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 184
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 104
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 395415
194 Temperature_Celsius 0x0022 044 058 000 Old_age Always - 44 (0 13 0 0 0)
195 Hardware_ECC_Recovered 0x001a 050 045 000 Old_age Always - 29694249
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 25131 (246 202 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 3028413736
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1613088055
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 3
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 3 occurred at disk power-on lifetime: 21171 hours (882 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 00:45:12.580 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 00:45:12.580 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 00:45:12.579 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 00:45:12.571 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 00:45:12.543 READ FPDMA QUEUED
Error 2 occurred at disk power-on lifetime: 21171 hours (882 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 00:45:09.456 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 00:45:09.451 READ FPDMA QUEUED
61 00 08 ff ff ff 4f 00 00:45:09.450 WRITE FPDMA QUEUED
60 00 00 ff ff ff 4f 00 00:45:08.878 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 00:45:08.856 READ FPDMA QUEUED
Error 1 occurred at disk power-on lifetime: 21131 hours (880 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 00 ff ff ff 4f 00 05:52:18.809 READ FPDMA QUEUED
61 00 00 7e fb 31 45 00 05:52:18.806 WRITE FPDMA QUEUED
60 00 00 ff ff ff 4f 00 05:52:18.571 READ FPDMA QUEUED
ea 00 00 00 00 00 a0 00 05:52:18.529 FLUSH CACHE EXT
61 00 08 ff ff ff 4f 00 05:52:18.527 WRITE FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 10904 -
# 2 Short offline Completed without error 00% 12 -
# 3 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Понимание выходных данных команд smartctl
На выходе получается много информации, которую не всегда легко понять. Наиболее интересной, вероятно, является та часть, которая помечена как “Vendor Specific SMART Attributes with Thresholds”. Она сообщает различные статистические данные, собранные S.M.A.R.T. устройством, и позволяет сравнить эти значения (текущие или худшие за все время) с некоторым порогом, определенным поставщиком.
Например, вот мои отчеты о переназначенных секторах на диске:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 3
Вы можете заметить атрибут «Pre-fail». Он означает, что значение является аномальным. Таким образом, если значение превышает пороговое, велика вероятность сбоя. Другая категория »Old_age» используется для атрибутов, отвечающих значениям «нормального износа».
Последнее поле (здесь со значением «3») соответствует исходному значению атрибута, которое сообщает диск. Обычно это число имеет физическое значение. Здесь это фактическое количество переназначенных секторов. Для других атрибутов это может быть температура в градусах Цельсия, время в часах или минутах или количество раз, когда для диска было выполнено определенное условие.
В дополнение к исходному значению, диск с поддержкой S.M.A.R.T. должен сообщать «нормализованные значения» (значения полей, самые худшие и пороговые). Эти значения нормируются в диапазоне 1-254 (0-255 для пороговых значений). Прошивка диска выполняет эту нормализацию с помощью некоторого внутреннего алгоритма. Кроме того, разные производители могут нормализовать один и тот же атрибут по-разному. Большинство значений представлены в процентах, причем чем выше, тем лучше, но так бывает не всегда. Когда параметр ниже или равен пороговому значению, указанному производителем, диск считается неисправным в терминах этого атрибута. Помня о всех указаниях из первой части статьи, когда атрибут, показывающий ранее значение “pre-fail” все-таки дал сбой, наиболее вероятно, что скоро диск выйдет из строя.
В качестве второго примера возьмем “seek error rate”:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Always - 51710773327
На самом деле (и это основная проблема отчетности S.M.A.R.T.), точное значение полей каждого атрибута понимает только поставщик. В моем случае Seagate использует логарифмическую шкалу для нормализации значения. Таким образом, «71» означает примерно одну ошибку на 10 миллионов запросов (10 в степени 7,1). Забавно, что самым худшим показателем за все время была одна ошибка на 1 миллион запросов (10 в 6-й степени).
Если я правильно понимаю, то это значит, что головки моего диска сейчас расположены точнее, чем раньше. Я не следил за этим диском внимательно, поэтому анализирую полученные данные весьма субъективно. Возможно накопитель просто надо было немного «обкатать» с тех пор как он был введен в эксплуатацию? Или может быть это следствие механического износа деталей и, следовательно, теперь имеет место меньшая сила трения? В любом случае, какова бы ни была причина, это значение является скорее показателем производительности, чем ранним предупреждением об ошибке. Так что меня оно не сильно беспокоит.
Помимо вышеприведенного и трех крайне подозрительных ошибок, записанных около шести месяцев назад, этот диск находится в удивительно хорошем состоянии (по данным S.M.A.R.T.) для стокового диска ноутбука, проработавшего более 1100 дней (26423 часа).
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 26423
Из любопытства я провел этот же тест на гораздо более новом ноутбуке, оснащенном SSD:
sh$ sudo smartctl -i /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-32-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA THNSNK256GVN8
Serial Number: 17FS131LTNLV
LU WWN Device Id: 5 00080d 9109b2ceb
Firmware Version: K8XA4103
User Capacity: 256 060 514 304 bytes [256 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: M.2
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 13 01:03:23 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Первое, что бросается в глаза, так это то, что несмотря на наличие S.M.A.R.T., устройства нет в базе данных smartctl
. Но это не помешает инструменту собирать данные с SSD, однако он не сможет сообщить точные значения различных атрибутов, специфичных для поставщика:
sh$ sudo smartctl -a /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-32-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 11) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 100 100 050 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0013 100 100 050 Pre-fail Always - 0
7 Unknown_SSD_Attribute 0x000b 100 100 050 Pre-fail Always - 0
8 Unknown_SSD_Attribute 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 171
10 Unknown_SSD_Attribute 0x0013 100 100 050 Pre-fail Always - 0
12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 105
166 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0
167 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0
169 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 100
170 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 0
173 Unknown_Attribute 0x0012 200 200 000 Old_age Always - 0
175 Program_Fail_Count_Chip 0x0013 100 100 010 Pre-fail Always - 0
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 18
194 Temperature_Celsius 0x0023 063 032 020 Pre-fail Always - 37 (Min/Max 11/68)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
240 Unknown_SSD_Attribute 0x0013 100 100 050 Pre-fail Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Выше вы видите выходные данные абсолютно нового SSD. Данные понятны даже в случае отсутствия нормализации или метаинформации для данных конкретного поставщика, как в моем случае с “Unknown_SSD_Attribute.” Я могу только надеяться, что в последующих версиях smartctl
в базе данных появятся данные об этой модели диска, и я смогу лучше определять потенциальные проблемы.
Проверьте свой SSD в Linux с помощью smartctl
До сих пор мы рассматривали данные, собранные во время нормальной работы накопителя. Однако протокол S.M.A.R.T. также поддерживает несколько команд для автономного тестирования для запуска диагностики по требованию.
Автономное тестирование может проводиться во время обычных операций с диском, если не было указано иное. Поскольку тест и запросы ввода-вывода хоста будут конкурировать, производительность диска упадет на время теста. Спецификация S.M.A.R.T. определяет несколько видов автономного тестирования:
Короткое автономное тестирование (-t short
)
Такой тест проверит электрическую и механическую, производительность, а также производительность чтения диска. Короткое автономное тестирование обычно занимает всего несколько минут (обычно от 2 до 10).
Расширенное автономное тестирование (-t long
)
Этот тест занимает почти в два раза больше времени. Как правило, это просто более детальная версия короткого автономного тестирования. Кроме того, этот тест будет сканировать всю поверхность диска на наличие ошибок данных без ограничения по времени. Продолжительность теста будет пропорциональна размеру диска.
Транспортировочное автономное тестирование (-t conveyance
)
Этот тестовый набор предложен в качестве сравнительно быстрого способа проверки на возможные повреждения, возникшие во время транспортировки устройства.
Вот примеры, взятые с тех же дисков, что были выше. Я предлагаю вам угадать, где какой:
sh$ sudo smartctl -t short /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-32-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Mon Mar 12 18:06:17 2018
Use smartctl -X to abort test.
Сейчас производится проверка. Давайте дождемся завершения, чтобы посмотреть результат:
sh$ sudo sh -c 'sleep 120 && smartctl -l selftest /dev/sdb'
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.10.0-32-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 171 -
Проведем тот же тест на другом диске:
sh$ sudo smartctl -t short /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Mon Mar 12 21:59:39 2018
Use smartctl -X to abort test.
И еще раз, отправим в сон на две минуты и посмотрим результат:
sh$ sudo sh -c 'sleep 120 && smartctl -l selftest /dev/sdb'
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 26429 -
# 2 Short offline Completed without error 00% 10904 -
# 3 Short offline Completed without error 00% 12 -
# 4 Short offline Completed without error 00% 0 -
Интересно, что в этом случае мы видим, что производители диска и компьютера, похоже, уже тестировали диск (на времени жизни в 0 часов и 12 часов). Я сам определенно был гораздо менее озабочен состоянием диска, чем они. Итак, поскольку я уже показал быстрые тесты, то и расширенный тоже запущу, чтобы посмотреть как это происходит.
sh$ sudo smartctl -t long /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 110 minutes for test to complete.
Test will complete after Tue Mar 13 00:09:08 2018
Use smartctl -X to abort test.
Судя по всему на этот раз ждать придется гораздо дольше, чем при проведении короткого теста. Так что давайте посмотрим:
sh$ sudo bash -c 'sleep $((110*60)) && smartctl -l selftest /dev/sdb'
[sudo] password for sylvain:
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 20% 26430 810665229
# 2 Short offline Completed without error 00% 26429 -
# 3 Short offline Completed without error 00% 10904 -
# 4 Short offline Completed without error 00% 12 -
# 5 Short offline Completed without error 00% 0 -
В последнем тесте обратите внимание на различие в результатах, полученных с помощью короткого и расширенного теста, даже если они были выполнены один за другим. Ну, возможно, этот диск не в таком уж и хорошем состоянии! Отмечу, что тест остановился после первой ошибки чтения. Поэтому, если вы хотите получить исчерпывающую информацию обо всех ошибках чтения, вам придется продолжать тест после каждой ошибки. Я призываю вас взглянуть на одну очень хорошо написанную страницу руководства smartctl(8) для получения дополнительной информации о параметрах -t select
, N-max
и -t select
, чтобы уметь делать так:
sh$ sudo smartctl -t select,810665230-max /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".
SPAN STARTING_LBA ENDING_LBA
0 810665230 976773167
Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful.
Testing has begun.
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Selective offline Completed without error 00% 26432 -
# 2 Extended offline Completed: read failure 20% 26430 810665229
# 3 Short offline Completed without error 00% 26429 -
# 4 Short offline Completed without error 00% 10904 -
# 5 Short offline Completed without error 00% 12 -
# 6 Short offline Completed without error 00% 0 -
Заключение
Определенно, S.M.A.R.T. – это именно та технология, которую стоит добавить в свой инструментарий для мониторинга работоспособности дисков ваших серверов. Вам также стоит взглянуть на S.M.A.R.T. Disk Monitoring Daemon smartd(8), который может помочь вам автоматизировать мониторинг с помощью отчетов системного журнала.
Учитывая статистическую природу прогнозирования сбоев, я не уверен, что агрессивный S.M.A.R.T. мониторинг будет сильно полезен на персональных компьютерах. Помните, что каким бы ни был накопитель, однажды он все равно выйдет из строя – и, как мы видели ранее, в одной трети случаев он сделает это без предупреждения. Поэтому ничто не обеспечит целостность ваших данных лучше, чем RAID технология и резервные копии!
До встречи на курсе, друзья!
SMART Error Log Version: 1
ATA Error Count: 34 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It «wraps» after 49.710 days.
Error 34 occurred at disk power-on lifetime: 17821 hours (742 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 8c 9c 67 8e e6 Error: UNC 140 sectors at LBA = 0x068e679c = 109995932
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — —————- ———————
c8 00 00 28 67 8e e6 00 01:55:17.600 READ DMA
c8 00 00 28 67 8e e6 00 01:55:12.500 READ DMA
c8 00 00 28 67 8e e6 00 01:55:07.400 READ DMA
c8 00 00 28 67 8e e6 00 01:55:02.200 READ DMA
c8 00 00 28 67 8e e6 00 01:54:57.100 READ DMA
Error 33 occurred at disk power-on lifetime: 17821 hours (742 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 8c 9c 67 8e e6 Error: UNC 140 sectors at LBA = 0x068e679c = 109995932
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — —————- ———————
c8 00 00 28 67 8e e6 00 01:55:12.500 READ DMA
c8 00 00 28 67 8e e6 00 01:55:07.400 READ DMA
c8 00 00 28 67 8e e6 00 01:55:02.200 READ DMA
c8 00 00 28 67 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 66 8e e6 00 01:54:57.100 READ DMA
Error 32 occurred at disk power-on lifetime: 17821 hours (742 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 8c 9c 67 8e e6 Error: UNC 140 sectors at LBA = 0x068e679c = 109995932
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — —————- ———————
c8 00 00 28 67 8e e6 00 01:55:07.400 READ DMA
c8 00 00 28 67 8e e6 00 01:55:02.200 READ DMA
c8 00 00 28 67 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 66 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 65 8e e6 00 01:54:57.100 READ DMA
Error 31 occurred at disk power-on lifetime: 17821 hours (742 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 8c 9c 67 8e e6 Error: UNC 140 sectors at LBA = 0x068e679c = 109995932
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — —————- ———————
c8 00 00 28 67 8e e6 00 01:55:02.200 READ DMA
c8 00 00 28 67 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 66 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 65 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 64 8e e6 00 01:54:57.100 READ DMA
Error 30 occurred at disk power-on lifetime: 17821 hours (742 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
— — — — — — —
40 51 8c 9c 67 8e e6 Error: UNC 140 sectors at LBA = 0x068e679c = 109995932
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
— — — — — — — — —————- ———————
c8 00 00 28 67 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 66 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 65 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 64 8e e6 00 01:54:57.100 READ DMA
c8 00 00 28 63 8e e6 00 01:54:57.100 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 10% 17823 109995932
# 2 Extended offline Aborted by host 90% 17822 —
# 3 Extended offline Completed: read failure 30% 17818 109995932
# 4 Short offline Completed without error 00% 17814 —
# 5 Extended offline Completed: read failure 10% 17814 109995932
# 6 Extended offline Completed: read failure 10% 17813 109995932
# 7 Extended offline Completed without error 00% 16713 —
# 8 Short offline Completed without error 00% 16048 —
# 9 Extended offline Aborted by host 90% 16047 —
#10 Extended offline Completed without error 00% 13652 —
#11 Short offline Completed without error 00% 13649 —
#12 Short offline Completed without error 00% 11098 —
#13 Short offline Completed without error 00% 10755 —
#14 Extended offline Completed without error 00% 9490 —
#15 Short offline Completed without error 00% 9489 —
Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum.
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I have a 300G Western Digital Raptor, recently showing UNC SMART, wondering anyone who has experience knows should I replace it and get warranty form WD?
Details of smartctl -a as follows:
smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital VelociRaptor
Device Model: WDC WD3000HLFS-01G6U0
Serial Number: WD-WXD0C79C8807
LU WWN Device Id: 5 0014ee 0ac3cfaf0
Firmware Version: 04.04V01
User Capacity: 300,069,052,416 bytes [300 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Apr 19 16:03:33 2012 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 4800) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 59) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 195 195 051 Pre-fail Always - 49036
3 Spin_Up_Time 0x0003 199 196 021 Pre-fail Always - 3008
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 425
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10292
10 Spin_Retry_Count 0x0012 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 404
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 268
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 426
194 Temperature_Celsius 0x0022 117 100 000 Old_age Always - 30
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 4
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
SMART Error Log Version: 1
ATA Error Count: 749 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 749 occurred at disk power-on lifetime: 6972 hours (290 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 44 cb 53 40 Error: UNC at LBA = 0x0053cb44 = 5491524
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 40 10 a6 4e 58 01 08 00:16:23.812 READ FPDMA QUEUED
60 08 10 9e 32 5b 00 08 00:16:17.646 READ FPDMA QUEUED
60 08 10 9e 32 5b 00 08 00:16:17.645 READ FPDMA QUEUED
ef 02 00 00 00 00 00 08 00:16:17.645 SET FEATURES [Enable write cache]
60 08 10 9e 32 5b 00 08 00:16:11.412 READ FPDMA QUEUED
Error 748 occurred at disk power-on lifetime: 6972 hours (290 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 44 cb 53 40 Error: UNC at LBA = 0x0053cb44 = 5491524
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 10 9e 32 5b 00 08 00:16:11.412 READ FPDMA QUEUED
60 08 10 9e 32 5b 00 08 00:16:11.412 READ FPDMA QUEUED
ef 02 00 00 00 00 00 08 00:16:11.412 SET FEATURES [Enable write cache]
60 00 30 1e cb 53 06 08 00:16:05.199 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:16:05.180 READ FPDMA QUEUED
Error 747 occurred at disk power-on lifetime: 6972 hours (290 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 44 cb 53 40 Error: UNC at LBA = 0x0053cb44 = 5491524
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 1e cb 53 06 08 00:16:05.199 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:16:05.180 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:16:05.178 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:16:05.178 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:16:05.178 READ FPDMA QUEUED
Error 746 occurred at disk power-on lifetime: 6972 hours (290 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 44 cb 53 40 Error: UNC at LBA = 0x0053cb44 = 5491524
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 1e cb 53 06 08 00:15:58.945 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:58.945 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:58.945 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:58.945 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:58.944 READ FPDMA QUEUED
Error 745 occurred at disk power-on lifetime: 6972 hours (290 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 44 cb 53 40 Error: UNC at LBA = 0x0053cb44 = 5491524
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 1e cb 53 06 08 00:15:52.727 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:52.727 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:52.727 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:52.727 READ FPDMA QUEUED
60 00 30 1e cb 53 06 08 00:15:52.726 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Модератор: Модераторы разделов
-
lexikon
- Сообщения: 128
HDD дохнет? 5 ошибок SMART
вебмин вечно кричит что 5 ошибок в SMART
вот вывод smartctl —all /dev/sda1
Код: Выделить всё
smartctl version 5.38 [i386-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.8 family
Device Model: ST3250823A
Serial Number: 3ND24PH4
Firmware Version: 3.03
User Capacity: 250 059 350 016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Dec 13 18:53:36 2009 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 84) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 052 044 006 Pre-fail Always - 201912783
3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1174
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 082 060 030 Pre-fail Always - 199773789
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5629
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1621
194 Temperature_Celsius 0x0022 043 049 000 Old_age Always - 43 (0 16 0 0)
195 Hardware_ECC_Recovered 0x001a 052 043 000 Old_age Always - 201912783
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 5
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 5 occurred at disk power-on lifetime: 4529 hours (188 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 4e 59 0d e0 Error: ICRC, ABRT 1 sectors at LBA = 0x000d594e = 874830
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 47 59 0d e0 00 02:14:15.475 READ DMA EXT
35 00 80 2f 80 0b e0 00 02:14:15.110 WRITE DMA EXT
35 00 80 af 7f 0b e0 00 02:14:15.109 WRITE DMA EXT
35 00 80 2f 7f 0b e0 00 02:14:15.109 WRITE DMA EXT
35 00 80 af 7e 0b e0 00 02:14:15.109 WRITE DMA EXT
Error 4 occurred at disk power-on lifetime: 4529 hours (188 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 4e 59 0d e0 Error: ICRC, ABRT 1 sectors at LBA = 0x000d594e = 874830
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 47 59 0d e0 00 02:09:41.449 READ DMA EXT
35 00 08 9f 29 61 e0 00 02:09:41.386 WRITE DMA EXT
35 00 08 f7 20 61 e0 00 02:09:41.386 WRITE DMA EXT
35 00 08 e7 20 61 e0 00 02:09:41.354 WRITE DMA EXT
35 00 08 c7 20 61 e0 00 02:09:41.354 WRITE DMA EXT
Error 3 occurred at disk power-on lifetime: 834 hours (34 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 df 52 60 e0 Error: UNC at LBA = 0x006052df = 6312671
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 b0 07 51 60 e0 00 00:07:01.877 READ DMA EXT
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
ca 00 08 37 00 5e e0 00 00:07:01.859 WRITE DMA
c8 00 01 00 00 00 e0 00 00:07:01.858 READ DMA
Error 2 occurred at disk power-on lifetime: 834 hours (34 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 df 52 60 e0 Error: UNC at LBA = 0x006052df = 6312671
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 b0 07 51 60 e0 00 00:07:01.877 READ DMA EXT
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
ca 00 08 2f 00 5e e0 00 00:07:01.859 WRITE DMA
c8 00 01 00 00 00 e0 00 00:07:01.858 READ DMA
Error 1 occurred at disk power-on lifetime: 834 hours (34 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 df 52 60 e0 Error: UNC at LBA = 0x006052df = 6312671
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 d8 b0 07 51 60 e0 00 00:06:54.168 READ DMA EXT
c8 d8 80 cf 4f 60 e0 00 00:06:54.168 READ DMA
25 d8 88 47 4d 60 e0 00 00:06:54.163 READ DMA EXT
c8 d8 40 47 49 60 e0 00 00:06:54.163 READ DMA
c8 d8 08 3f 49 60 e0 00 00:06:54.154 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5587 -
# 2 Short offline Aborted by host 60% 5587 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
винт разделен на два раздела при выводе smartctl —all /dev/sda2 так же эти же ошибки
еще бы разбираться во всем этом! Подскажите что делать и что такое 49 дней жить осталось что ли?
-
rm_
- Сообщения: 3340
- Статус: It’s the GNU Age
- ОС: Debian
- Контактная информация:
Re: HDD дохнет? 5 ошибок SMART
Сообщение
rm_ » 14.12.2009 00:02
destr писал(а): ↑
13.12.2009 22:51
Вообще цифры пугают. Raw_Read_Error_Rate Seek_Error_Rate бэкапте данные.
Ошибки при чтении, ошибки позиционирования под 200 миллионов, это плохо.
Это абсолютно нормальные цифры для Seagate, особенность их SMART’а.
Что действительно не радует, так это:
40 51 00 df 52 60 e0 Error: UNC at LBA = 0x006052df = 6312671
Рекомендую:
После часа-двух работы теста, посмотреть результаты (smartctl -a). Если там Test complete: read error, залить весь диск нулями и запустить тест ещё раз.
-
Bizdelnick
- Модератор
- Сообщения: 19825
- Статус: nulla salus bello
- ОС: Debian GNU/Linux
Re: HDD дохнет? 5 ошибок SMART
Сообщение
Bizdelnick » 14.12.2009 00:53
rm_ писал(а): ↑
14.12.2009 00:02
destr писал(а): ↑
13.12.2009 22:51
Вообще цифры пугают. Raw_Read_Error_Rate Seek_Error_Rate бэкапте данные.
Ошибки при чтении, ошибки позиционирования под 200 миллионов, это плохо.Это абсолютно нормальные цифры для Seagate, особенность их SMART’а.
Подтверждаю. У меня такая же картина.
Пишите правильно:
в консоли вку́пе (с чем-либо) в общем вообще |
в течение (часа) новичок нюанс по умолчанию |
приемлемо проблема пробовать трафик |
-
Bizdelnick
- Модератор
- Сообщения: 19825
- Статус: nulla salus bello
- ОС: Debian GNU/Linux
Re: HDD дохнет? 5 ошибок SMART
Сообщение
Bizdelnick » 14.12.2009 02:13
lexikon писал(а): ↑
14.12.2009 01:35
а что делать с ошибками? как их вообще убрать тогда чтобы не маячили!
Я только про цифры, не про ошибки.
А так — неплохо бы и правда проверить, лучше родной сигейтовской утилой. Есть линуксовая версия, но она вроде только для SCSI, так что из-под доса.
Пишите правильно:
в консоли вку́пе (с чем-либо) в общем вообще |
в течение (часа) новичок нюанс по умолчанию |
приемлемо проблема пробовать трафик |
-
lexikon
- Сообщения: 128
Re: HDD дохнет? 5 ошибок SMART
Сообщение
lexikon » 15.12.2009 00:07
запустил тест, ввел smartctl -a /dev/sda
нужно это?
Код: Выделить всё
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
но все равно эти ошибки он показывает
Код: Выделить всё
Error 5 occurred at disk power-on lifetime: 4529 hours (188 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 4e 59 0d e0 Error: ICRC, ABRT 1 sectors at LBA = 0x000d594e = 874830
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 47 59 0d e0 00 02:14:15.475 READ DMA EXT
35 00 80 2f 80 0b e0 00 02:14:15.110 WRITE DMA EXT
35 00 80 af 7f 0b e0 00 02:14:15.109 WRITE DMA EXT
35 00 80 2f 7f 0b e0 00 02:14:15.109 WRITE DMA EXT
35 00 80 af 7e 0b e0 00 02:14:15.109 WRITE DMA EXT
Error 4 occurred at disk power-on lifetime: 4529 hours (188 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 4e 59 0d e0 Error: ICRC, ABRT 1 sectors at LBA = 0x000d594e = 874830
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 47 59 0d e0 00 02:09:41.449 READ DMA EXT
35 00 08 9f 29 61 e0 00 02:09:41.386 WRITE DMA EXT
35 00 08 f7 20 61 e0 00 02:09:41.386 WRITE DMA EXT
35 00 08 e7 20 61 e0 00 02:09:41.354 WRITE DMA EXT
35 00 08 c7 20 61 e0 00 02:09:41.354 WRITE DMA EXT
Error 3 occurred at disk power-on lifetime: 834 hours (34 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 df 52 60 e0 Error: UNC at LBA = 0x006052df = 6312671
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 b0 07 51 60 e0 00 00:07:01.877 READ DMA EXT
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
ca 00 08 37 00 5e e0 00 00:07:01.859 WRITE DMA
c8 00 01 00 00 00 e0 00 00:07:01.858 READ DMA
Error 2 occurred at disk power-on lifetime: 834 hours (34 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 df 52 60 e0 Error: UNC at LBA = 0x006052df = 6312671
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 b0 07 51 60 e0 00 00:07:01.877 READ DMA EXT
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
c8 00 01 00 00 00 e0 00 00:07:01.877 READ DMA
ca 00 08 2f 00 5e e0 00 00:07:01.859 WRITE DMA
c8 00 01 00 00 00 e0 00 00:07:01.858 READ DMA
Error 1 occurred at disk power-on lifetime: 834 hours (34 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 df 52 60 e0 Error: UNC at LBA = 0x006052df = 6312671
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 d8 b0 07 51 60 e0 00 00:06:54.168 READ DMA EXT
c8 d8 80 cf 4f 60 e0 00 00:06:54.168 READ DMA
25 d8 88 47 4d 60 e0 00 00:06:54.163 READ DMA EXT
c8 d8 40 47 49 60 e0 00 00:06:54.163 READ DMA
c8 d8 08 3f 49 60 e0 00 00:06:54.154 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 5638 -
# 2 Short offline Completed without error 00% 5635 -
# 3 Short offline Aborted by host 80% 5635 -
# 4 Extended offline Aborted by host 90% 5635 -
# 5 Extended offline Aborted by host 90% 5635 -
# 6 Short offline Completed without error 00% 5587 -
# 7 Short offline Aborted by host 60% 5587 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I started getting some UNC errors on a drive today and i ran an extended smart test.
Online some people are saying its not that serious, the drive isn’t making any noises if it faulty ?
Below is the output of the drive smart.
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.40-unRAID] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N6KL7C79
LU WWN Device Id: 5 0014ee 26349a6c9
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed May 30 23:31:35 2018 AEST
SMART support is: Available — device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (40320) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 404) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 — 138
3 Spin_Up_Time POS—K 197 177 021 — 5141
4 Start_Stop_Count -O—CK 100 100 000 — 167
5 Reallocated_Sector_Ct PO—CK 200 200 140 — 0
7 Seek_Error_Rate -OSR-K 200 200 000 — 0
9 Power_On_Hours -O—CK 083 083 000 — 12421
10 Spin_Retry_Count -O—CK 100 100 000 — 0
11 Calibration_Retry_Count -O—CK 100 100 000 — 0
12 Power_Cycle_Count -O—CK 100 100 000 — 147
192 Power-Off_Retract_Count -O—CK 200 200 000 — 116
193 Load_Cycle_Count -O—CK 200 200 000 — 1101
194 Temperature_Celsius -O—K 122 108 000 — 28
196 Reallocated_Event_Count -O—CK 200 200 000 — 0
197 Current_Pending_Sector -O—CK 200 200 000 — 0
198 Offline_Uncorrectable —-CK 100 253 000 — 0
199 UDMA_CRC_Error_Count -O—CK 200 200 000 — 0
200 Multi_Zone_Error_Rate —R— 200 200 000 — 6
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0c GPL R/O 2048 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 17
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It «wraps» after 49.710 days.
Error 17 [16] occurred at disk power-on lifetime: 12400 hours (516 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 00 00 00 00 01 6c 0f f0 e1 00 Error: UNC at LBA = 0x016c0ff0 = 23859184
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
c8 00 00 00 00 00 00 01 6c 0f f0 e1 08 37d+08:33:30.526 READ DMA
c8 00 00 00 00 00 00 01 6c 0e f0 e1 08 37d+08:33:30.526 READ DMA
c8 00 00 00 00 00 00 01 6c 0d f0 e1 08 37d+08:33:29.959 READ DMA
c8 00 00 00 00 00 00 01 6c 0c f0 e1 08 37d+08:33:29.958 READ DMA
c8 00 00 00 00 00 00 01 6c 0b f0 e1 08 37d+08:33:29.957 READ DMA
Error 16 [15] occurred at disk power-on lifetime: 12400 hours (516 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 00 00 00 00 01 60 67 60 e1 00 Error: UNC at LBA = 0x01606760 = 23095136
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
c8 00 00 00 00 00 00 01 60 66 f0 e1 08 37d+08:33:14.521 READ DMA
c8 00 00 00 00 00 00 01 60 65 f0 e1 08 37d+08:33:14.519 READ DMA
c8 00 00 00 00 00 00 01 60 64 f0 e1 08 37d+08:33:14.518 READ DMA
c8 00 00 00 00 00 00 01 60 63 f0 e1 08 37d+08:33:14.517 READ DMA
c8 00 00 00 00 00 00 01 60 62 f0 e1 08 37d+08:33:14.516 READ DMA
Error 15 [14] occurred at disk power-on lifetime: 11552 hours (481 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 02 00 00 00 57 54 74 b8 e0 00 Error: UNC 512 sectors at LBA = 0x575474b8 = 1465152696
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
25 00 00 02 00 00 00 57 54 74 a8 e0 08 2d+00:32:57.790 READ DMA EXT
25 00 00 00 08 00 00 b7 13 fb 78 e0 08 2d+00:32:57.774 READ DMA EXT
25 00 00 00 88 00 00 2f e0 e2 28 e0 08 2d+00:32:57.680 READ DMA EXT
25 00 00 02 00 00 00 2f a9 ae 58 e0 08 2d+00:32:57.676 READ DMA EXT
25 00 00 00 88 00 00 2f 9d 72 60 e0 08 2d+00:32:57.654 READ DMA EXT
Error 14 [13] occurred at disk power-on lifetime: 11519 hours (479 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 00 20 00 00 57 54 74 40 e0 00 Error: UNC 32 sectors at LBA = 0x57547440 = 1465152576
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
25 00 00 00 20 00 00 57 54 74 30 e0 08 14:59:10.970 READ DMA EXT
25 00 00 00 30 00 00 57 61 f4 60 e0 08 14:59:10.967 READ DMA EXT
25 00 00 00 c0 00 00 57 55 5a 38 e0 08 14:59:10.967 READ DMA EXT
25 00 00 02 00 00 00 57 61 f2 60 e0 08 14:59:10.966 READ DMA EXT
25 00 00 01 00 00 00 57 55 59 38 e0 08 14:59:10.965 READ DMA EXT
Error 13 [12] occurred at disk power-on lifetime: 11318 hours (471 days + 14 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 00 08 00 00 57 54 d0 80 e0 00 Error: UNC 8 sectors at LBA = 0x5754d080 = 1465176192
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
25 00 00 00 08 00 00 57 54 d0 80 e0 08 3d+15:42:21.276 READ DMA EXT
25 00 00 00 08 00 00 57 54 d0 60 e0 08 3d+15:42:21.276 READ DMA EXT
25 00 00 00 08 00 00 57 54 d0 40 e0 08 3d+15:42:21.274 READ DMA EXT
25 00 00 00 08 00 00 57 54 c4 d0 e0 08 3d+15:42:21.274 READ DMA EXT
25 00 00 00 08 00 00 57 54 c4 90 e0 08 3d+15:42:21.274 READ DMA EXT
Error 12 [11] occurred at disk power-on lifetime: 11250 hours (468 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 00 20 00 00 57 54 29 90 e0 00 Error: UNC 32 sectors at LBA = 0x57542990 = 1465133456
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
25 00 00 00 20 00 00 57 54 29 78 e0 08 19:19:07.438 READ DMA EXT
c8 00 00 00 20 00 00 03 db 6c 20 e3 08 19:19:07.414 READ DMA
25 00 00 02 28 00 01 5a 74 ac 20 e0 08 19:19:07.401 READ DMA EXT
25 00 00 04 00 00 01 5a 74 a8 20 e0 08 19:19:07.399 READ DMA EXT
25 00 00 04 08 00 01 5a 74 a4 18 e0 08 19:19:07.397 READ DMA EXT
Error 11 [10] occurred at disk power-on lifetime: 11250 hours (468 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 00 08 00 00 57 54 2a 38 e0 00 Error: UNC 8 sectors at LBA = 0x57542a38 = 1465133624
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
25 00 00 00 08 00 00 57 54 2a 38 e0 08 19:18:22.228 READ DMA EXT
25 00 00 05 00 00 01 5a 3d c5 b8 e0 08 19:18:22.219 READ DMA EXT
25 00 00 05 40 00 01 5a 3d c0 78 e0 08 19:18:22.185 READ DMA EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 19:18:22.164 FLUSH CACHE EXT
25 00 00 02 28 00 01 5a 3d be 50 e0 08 19:18:22.161 READ DMA EXT
Error 10 [9] occurred at disk power-on lifetime: 11250 hours (468 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER — ST COUNT LBA_48 LH LM LL DV DC
— — — == — == == == — — — — —
40 — 51 00 08 00 00 57 54 2a 38 e0 00 Error: UNC 8 sectors at LBA = 0x57542a38 = 1465133624
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
— == — == — == == == — — — — — ————— ———————
25 00 00 00 08 00 00 57 54 2a 38 e0 08 19:16:15.689 READ DMA EXT
25 00 00 01 08 00 01 59 9c 6d 78 e0 08 19:16:15.682 READ DMA EXT
25 00 00 04 00 00 01 59 9c 69 78 e0 08 19:16:15.680 READ DMA EXT
25 00 00 04 08 00 01 59 9c 65 70 e0 08 19:16:15.678 READ DMA EXT
25 00 00 01 a8 00 01 59 9c 63 c8 e0 08 19:16:15.657 READ DMA EXT
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 12414 8228008
# 2 Extended offline Completed without error 00% 9210 —
# 3 Short offline Completed without error 00% 3631 —
# 4 Short offline Completed without error 00% 3463 —
# 5 Short offline Completed without error 00% 3296 —
# 6 Short offline Completed without error 00% 3128 —
# 7 Short offline Completed without error 00% 2963 —
# 8 Short offline Completed without error 00% 2795 —
# 9 Short offline Completed without error 00% 2627 —
#10 Short offline Completed without error 00% 2459 —
#11 Short offline Completed without error 00% 2291 —
#12 Short offline Completed without error 00% 2123 —
#13 Short offline Completed without error 00% 1960 —
#14 Short offline Completed without error 00% 1792 —
#15 Short offline Completed without error 00% 1627 —
#16 Short offline Completed without error 00% 1461 —
#17 Short offline Completed without error 00% 1293 —
#18 Short offline Completed without error 00% 1125 —
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 28 Celsius
Power Cycle Min/Max Temperature: 25/34 Celsius
Lifetime Min/Max Temperature: 2/42 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (236)
Index Estimated Time Temperature Celsius
237 2018-05-30 15:34 28 *********
… ..( 70 skipped). .. *********
308 2018-05-30 16:45 28 *********
309 2018-05-30 16:46 29 **********
… ..(156 skipped). .. **********
466 2018-05-30 19:23 29 **********
467 2018-05-30 19:24 30 ***********
… ..( 7 skipped). .. ***********
475 2018-05-30 19:32 30 ***********
476 2018-05-30 19:33 31 ************
… ..( 21 skipped). .. ************
20 2018-05-30 19:55 31 ************
21 2018-05-30 19:56 30 ***********
… ..( 42 skipped). .. ***********
64 2018-05-30 20:39 30 ***********
65 2018-05-30 20:40 29 **********
… ..(111 skipped). .. **********
177 2018-05-30 22:32 29 **********
178 2018-05-30 22:33 28 *********
… ..( 57 skipped). .. *********
236 2018-05-30 23:31 28 *********
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 2) ==
0x01 0x008 4 147 — Lifetime Power-On Resets
0x01 0x010 4 12421 — Power-on Hours
0x01 0x018 6 20725578513 — Logical Sectors Written
0x01 0x020 6 83945277 — Number of Write Commands
0x01 0x028 6 227724544507 — Logical Sectors Read
0x01 0x030 6 416125326 — Number of Read Commands
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 11900 — Spindle Motor Power-on Hours
0x03 0x010 4 11854 — Head Flying Hours
0x03 0x018 4 1218 — Head Load Events
0x03 0x020 4 0 — Number of Reallocated Logical Sectors
0x03 0x028 4 1816 — Read Recovery Attempts
0x03 0x030 4 0 — Number of Mechanical Start Failures
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 17 — Number of Reported Uncorrectable Errors
0x04 0x010 4 0 — Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 27 — Current Temperature
0x05 0x010 1 28 — Average Short Term Temperature
0x05 0x018 1 27 — Average Long Term Temperature
0x05 0x020 1 41 — Highest Temperature
0x05 0x028 1 21 — Lowest Temperature
0x05 0x030 1 37 — Highest Average Short Term Temperature
0x05 0x038 1 22 — Lowest Average Short Term Temperature
0x05 0x040 1 34 — Highest Average Long Term Temperature
0x05 0x048 1 23 — Lowest Average Long Term Temperature
0x05 0x050 4 0 — Time in Over-Temperature
0x05 0x058 1 60 — Specified Maximum Operating Temperature
0x05 0x060 4 0 — Time in Under-Temperature
0x05 0x068 1 0 — Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 415 — Number of Hardware Resets
0x06 0x010 4 79324 — Number of ASR Events
0x06 0x018 4 0 — Number of Interface CRC Errors
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c) supported [please try: ‘-l defects’]
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 8 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 8 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 3303456 Vendor specific