У меня есть привод (Seagate Momentus 7200.4 2,5 дюйма, 7200 об / мин, 500 ГБ, модель устройства ST9500420AS), который ведет себя странно. Я знаю, что ответ, вероятно, просто «выбросьте его», но мне любопытно понять, есть ли какое-либо объяснение. к тому, что я вижу. В настоящее время у меня нет данных на диске, поэтому я могу свободно поиграть с ними.
Диск начал перераспределять некоторые сектора и давать сбой при самотестировании SMART, поэтому я начал вручную писать в эти сектора, используя hdparm --write-sector
. Однако сейчас я застрял на этапе, когда самопроверка SMART не работает в секторе, считывая сектор с hdparm --read-sector
не удается, запись сектора с hdparm --write-sector
кажется успешным, но не увеличивается Reallocated_EVent_Count
, а затем повторный запуск теста не выполняется в том же секторе.
В частности, вот статус диска перед тестом:
$ sudo smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-4.19.0-8-arm64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 7200.4
Device Model: ST9500420AS
[...]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 112 090 006 Pre-fail Always - 132093263
3 Spin_Up_Time 0x0002 099 097 000 Old_age Always - 0
4 Start_Stop_Count 0x0033 093 093 000 Pre-fail Always - 8061
5 Reallocated_Sector_Ct 0x0033 091 091 036 Pre-fail Always - 187
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 90354048054
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 22236
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0033 099 037 020 Pre-fail Always - 1725
183 Runtime_Bad_Block 0x0032 100 253 000 Old_age Always - 19
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 260
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 12885098584
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 050 045 045 Old_age Always In_the_past 50 (Min/Max 24/55)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 212
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 47
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 340982
194 Temperature_Celsius 0x0022 050 055 000 Old_age Always - 50 (0 15 0 0 0)
195 Hardware_ECC_Recovered 0x001a 052 032 000 Old_age Always - 132093263
196 Reallocated_Event_Count 0x0033 091 091 036 Pre-fail Always - 187
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 2
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 53
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
Провожу долгую самопроверку:
$ sudo smartctl -t long /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-4.19.0-8-arm64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 112 minutes for test to complete.
Test will complete after Wed Apr 29 14:37:18 2020
Use smartctl -X to abort test.
Через некоторое время тест не удался:
$ sudo smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-4.19.0-8-arm64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 7200.4
Device Model: ST9500420AS
[...]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 090 006 Pre-fail Always - 132093263
3 Spin_Up_Time 0x0002 099 097 000 Old_age Always - 0
4 Start_Stop_Count 0x0033 093 093 000 Pre-fail Always - 8061
5 Reallocated_Sector_Ct 0x0033 091 091 036 Pre-fail Always - 187
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 90354048074
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 22237
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0033 099 037 020 Pre-fail Always - 1725
183 Runtime_Bad_Block 0x0032 100 253 000 Old_age Always - 19
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 261
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 12885098584
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 056 045 045 Old_age Always In_the_past 44 (Min/Max 24/55)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 212
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 47
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 340983
194 Temperature_Celsius 0x0022 044 055 000 Old_age Always - 44 (0 15 0 0 0)
195 Hardware_ECC_Recovered 0x001a 052 032 000 Old_age Always - 132093263
196 Reallocated_Event_Count 0x0033 091 091 036 Pre-fail Always - 187
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 2
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 53
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
[...
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 22236 918494442
[...]
Не удается прочитать соответствующий сектор:
$ sudo hdparm --read-sector 918494442 /dev/sda
/dev/sda:
reading sector 918494442: The running kernel lacks CONFIG_IDE_TASK_IOCTL support for this device.
FAILED: Invalid argument
Запись в сектор успешна:
$ sudo hdparm --write-sector 918494442 --yes-i-know-what-i-am-doing /dev/sda
/dev/sda:
re-writing sector 918494442: succeeded
Теперь чтение этого сектора работает:
$ sudo hdparm --read-sector 918494442 /dev/sda
/dev/sda:
reading sector 918494442: succeeded
0000 0000 0000 0000 0000 0000 0000 0000
[...]
Журнал SMART показывает, что Current_Pending_Sector
уменьшилось, но Reallocated_Event_Count
не увеличилось:
$ sudo smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-4.19.0-8-arm64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 7200.4
Device Model: ST9500420AS
[...]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 110 090 006 Pre-fail Always - 132096978
3 Spin_Up_Time 0x0002 099 097 000 Old_age Always - 0
4 Start_Stop_Count 0x0033 093 093 000 Pre-fail Always - 8061
5 Reallocated_Sector_Ct 0x0033 091 091 036 Pre-fail Always - 187
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 90354049556
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 22241
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0033 099 037 020 Pre-fail Always - 1725
183 Runtime_Bad_Block 0x0032 100 253 000 Old_age Always - 19
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 262
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 12885098584
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 049 045 045 Old_age Always In_the_past 51 (Min/Max 24/55)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 212
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 47
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 340984
194 Temperature_Celsius 0x0022 051 055 000 Old_age Always - 51 (0 15 0 0 0)
195 Hardware_ECC_Recovered 0x001a 052 032 000 Old_age Always - 132096978
196 Reallocated_Event_Count 0x0033 091 091 036 Pre-fail Always - 187
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 53
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
Итак, я снова запускаю самотестирование:
$ sudo smartctl -t long /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-4.19.0-8-arm64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 112 minutes for test to complete.
Test will complete after Tue Apr 28 21:06:52 2020
Use smartctl -X to abort test.
А через некоторое время тест снова провалился, на том же самом месте:
$ sudo smartctl -a /dev/sda
smartctl 6.6 2017-11-05 r4594 [aarch64-linux-4.19.0-8-arm64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 7200.4
Device Model: ST9500420AS
[...]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 104 090 006 Pre-fail Always - 143254342
3 Spin_Up_Time 0x0002 099 097 000 Old_age Always - 0
4 Start_Stop_Count 0x0033 093 093 000 Pre-fail Always - 8061
5 Reallocated_Sector_Ct 0x0033 091 091 036 Pre-fail Always - 187
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 90354061019
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 22283
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0033 099 037 020 Pre-fail Always - 1725
183 Runtime_Bad_Block 0x0032 100 253 000 Old_age Always - 19
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 275
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 12885098584
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 058 045 045 Old_age Always In_the_past 42 (Min/Max 24/55)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 212
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 31
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 341000
194 Temperature_Celsius 0x0022 042 055 000 Old_age Always - 42 (0 15 0 0 0)
195 Hardware_ECC_Recovered 0x001a 035 032 000 Old_age Always - 143254342
196 Reallocated_Event_Count 0x0033 091 091 036 Pre-fail Always - 187
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 53
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
[...]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 22262 918494442
# 2 Extended offline Completed: read failure 90% 22236 918494442
[...]
Я повторил это несколько раз, но безрезультатно (кроме одного раза, когда тест прошел успешно, и одного раза, когда он не прошел в другом месте). Вот полный журнал тестов на данный момент (с первым тестом, соответствующим сбоям, когда запись в сектор для принудительного перераспределения сработала):
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 10% 22272 918494442
# 2 Extended offline Completed: read failure 10% 22270 918494442
# 3 Extended offline Completed: read failure 10% 22267 918494442
# 4 Extended offline Completed: read failure 90% 22265 871827620
# 5 Extended offline Completed: read failure 90% 22262 918494442
# 6 Extended offline Completed: read failure 90% 22236 918494442
# 7 Extended offline Completed: read failure 10% 22220 918494442
# 8 Extended offline Completed: read failure 10% 22216 918494442
# 9 Extended offline Completed: read failure 10% 22214 918494442
#10 Extended offline Completed: read failure 10% 22211 918494442
#11 Extended offline Completed: read failure 10% 22200 918494442
#12 Extended offline Completed without error 00% 22198 -
#13 Extended offline Completed: read failure 10% 22196 871847239
#14 Extended offline Completed: read failure 10% 22193 871814225
#15 Extended offline Completed: read failure 90% 22189 918480478
#16 Extended offline Completed: read failure 90% 22188 918480478
#17 Extended offline Completed: read failure 90% 22175 918512077
#18 Extended offline Completed: read failure 90% 22169 918509168
#19 Extended offline Completed: read failure 90% 22169 918442466
#20 Extended offline Completed: read failure 90% 22169 918440526
#21 Extended offline Completed: read failure 90% 22169 918441496
9 of 20 failed self-tests are outdated by newer successful extended offline self-test #12
Я тоже пробовал бежать sudo badblocks -swv /dev/sda
(по крайней мере, первый проход), но, похоже, он не вызвал никаких ошибок или перераспределения секторов.
Опять же, я знаю, что этому диску больше нельзя доверять, но я просто не понимаю, почему диск ведет себя так странно. Любые идеи? Спасибо!