hard drive - WD1000FYPS harddrive is marked 0 mb in 3ware (and no SMART)
2013-08
After reboot my SATA 1TB WD1000FYPS (previously is was "Drive error") is marked 0 mb in 3ware web gui.
Complete message:
Available Drives (Controller ID 0)
Port 1 WDC WD1000FYPS-01ZKB0 0.00 MB NOT SUPPORTED [Remove Drive]
SMART gives me only Device Model and ATA protocol version 1 (not 7-8 as it must be for SATA)
What does it mean?
Just before reboot, when is was marked only with "Device Error", smart was:
Device Model: WDC WD1000FYPS-01ZKB0
Serial Number: WD-WCASJ1130***
Firmware Version: 02.01B01
User Capacity: 1,000,204,886,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Mar 7 18:47:35 2010 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 188 186 021 Pre-fail Always - 7591
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 229
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 3
7 Seek_Error_Rate 0x000e 193 193 000 Old_age Always - 125
9 Power_On_Hours 0x0032 078 078 000 Old_age Always - 16615
10 Spin_Retry_Count 0x0012 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 77
192 Power-Off_Retract_Count 0x0032 198 198 000 Old_age Always - 1564
193 Load_Cycle_Count 0x0032 146 146 000 Old_age Always - 164824
194 Temperature_Celsius 0x0022 117 100 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
What can be wrong with he? Can it be restored?
PS
new smart is
=== START OF INFORMATION SECTION ===
Device Model: WDC WD1000FYPS-01ZKB0
Serial Number: [No Information Found]
Firmware Version: [No Information Found]
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 1
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Mar 8 00:29:44 2010 MSK
SMART is only available in ATA Version 3 Revision 3 or greater.
We will try to proceed in spite of this.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
Checking for SMART support by trying SMART ENABLE command.
Command failed, ata.status=(0x00), ata.command=(0x51), ata.flags=(0x01)
Error SMART Enable failed: Input/output error
SMART ENABLE failed - this establishes that this device lacks SMART functionality.
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
PPS There was a rapid grow of " 192 Power-Off_Retract_Count " before dying. The hard was used in raid, with several hards from the same fabric packaging box (close id's). The hard drives were placed identically. Rapid means almost linear grow from 300 to 1700 in 6-7 hours. Maximal temperature was 41C. (thanks to munin's smart monitoring)
UPDATE
On the harddrive's PCB (on bottom) I have found contact pads with unusual colors. The most pads (not soldered) are Yellow, but some are blue and some are somewhere between orange and red. The max temperature for the drive was 42-43 Celsius. The 2 drives, which was next to the died one is normal, all unsoldered pads are yellow.
The harddrive was used for 2 years in RAID with rather big load.
The drive has failed. RMA it back to WD.
view all most popular Amazon Coupons
.
For some reason, my pending sector count to be remapped is unbelievably high (2163 currently). I've seen it go up 20 in one week. But no sectors have been remapped. Dell's computer diagnostics utility reported no problems, smartctl -H returned PASSED, and I have yet to notice any problems with the hard drive.
So do I need to worry about such a high pending count?
Here are the results of smartctl -A
:
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 252 252 025 Pre-fail Always - 2062 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 36147 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 3261 12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2087 191 G-Sense_Error_Rate 0x0032 002 002 000 Old_age Always - 999999 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 47 194 Temperature_Celsius 0x0022 127 094 000 Old_age Always - 37 (Lifetime Min/Max 13/48) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 191990 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 2163 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 19080 199 UDMA_CRC_Error_Count 0x0036 252 252 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x000a 252 252 000 Old_age Always - 0
Edit:
The disk is about 1 1/2 years old. Pending Sector Count was about 2000 when I started keeping an eye on it 2 weeks ago. I have never noticed any problems with the disk. If it makes any difference, I have a Dell M1530 dual boot Vista-Ubuntu. The hard drive is a Samsung HM160HI.
Edit:
Apparently half the problem was that I didn't (still somewhat don't) know how to interpret the data.
Thanks to everyone who gave me feedback.
Your Current Pending Sector Count (2163) is higher than the Reallocation Sector Count (252).
This means that failing sectors can no longer be replaced by the disk firmware.
The disk is failing - make sure you've backups, and get a replacement..
If the drive is under warranty, send it back for replacement. On a stable drive, that number should be 0, just like the Reallocated Sectors Count.
From your data dump, the SMART Attribute value shows 100.
Therefore, this is not a problem flagged by SMART either.
Update: That 100
is an attribute -- it just indicates the health-status, not the count.
The worst value had been 100
too -- so, it never went lower.
For example, look at ID# 194
, the temperature,
Raw value is 37, Attribute value is 127 and worst went in 90s
.
Nothing to worry there too -- just an example on how to interpret attributes.
Again, the attribute value does not suggest your drive is running at 127
C.
Couple of points from Wikipedia.
The inability to read some sectors is not always an indication that a drive is about to fail. One way that unreadable sectors may be created, even when the drive is functioning within specification, is through a sudden power failure while the drive is writing. In order to prevent this problem, modern hard drives will always finish writing at least the current sector immediately after the power fails (typically using rotational energy from the disk). Also, even if the physical disk is damaged at one location, such that a certain sector is unreadable, the disk may be able to use spare space to replace the bad area, so that the sector can be overwritten.
.
Number of "unstable" sectors (waiting to be remapped, because of read errors). If an unstable sector is subsequently written or read successfully, this value is decreased and the sector is not remapped. Read errors on a sector will not remap the sector (since it might be readable later); instead, the drive firmware remembers that the sector needs to be remapped, and remaps it the next time it's written.
Further on the down vote and comment.
- A raw count at
Current Pending Sectors
usually implies sectors that are sort of written-off by the drive. This could be for various reasons that do not always imply an impeding disk failure. - If the raw count keeps increasing at regular intervals (days/weeks) it would then suggest a likely full disk failure. For example, do you recall (or have stored data) from an earlier check that shows this count to be lower or zero?