Назад | Перейти на главную страницу

Указывает ли этот отчет SMART на неисправный диск?

На удаленном сервере у меня установлены диски 2x3 ТБ в raid 0 с рейдовой картой SLI. Я попытался запустить тесты SMART, чтобы увидеть, есть ли проблемы с дисками из-за того, что сервер перестал отвечать, и один диск пришлось вручную переводить в режим «онлайн».

Идентификаторы дисков 4 и 5, это результат из smartctl -a -d megaraid,5 /dev/sda

Указывает ли это на серьезную проблему? Стоит ли переходить на другое оборудование?

smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1062.18.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 3.5" DT01ACA... Desktop HDD
Device Model:     TOSHIBA DT01ACA300
Serial Number:    68UVA3BAS
LU WWN Device Id: 5 000039 fe6e8269a
Firmware Version: MX6OABB0
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr  8 21:41:51 2020 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (21791) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    ( 364) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   092   092   016    Pre-fail  Always       -       2949120
  2 Throughput_Performance  0x0005   140   140   054    Pre-fail  Offline      -       67
  3 Spin_Up_Time            0x0007   196   196   024    Pre-fail  Always       -       254 (Average 330)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       17
  5 Reallocated_Sector_Ct   0x0033   097   097   005    Pre-fail  Always       -       217
  7 Seek_Error_Rate         0x000b   099   099   067    Pre-fail  Always       -       65536
  8 Seek_Time_Performance   0x0005   124   124   020    Pre-fail  Offline      -       33
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       12995
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       16
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       317
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       317
194 Temperature_Celsius     0x0002   200   200   000    Old_age   Always       -       30 (Min/Max 25/34)
196 Reallocated_Event_Count 0x0032   092   092   000    Old_age   Always       -       320
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       64
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       48
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 402 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 402 occurred at disk power-on lifetime: 12382 hours (515 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ff 01 a9 3d 0d  Error: UNC at LBA = 0x0d3da901 = 222144769

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 ff 00 01 a9 3d 40 00   3d+17:35:09.451  READ FPDMA QUEUED
  60 00 00 00 a8 3d 40 00   3d+17:35:09.349  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+17:35:09.286  READ LOG EXT
  60 00 00 00 a8 3d 40 00   3d+17:35:05.453  READ FPDMA QUEUED
  60 00 00 00 a6 3d 40 00   3d+17:35:05.425  READ FPDMA QUEUED

Error 401 occurred at disk power-on lifetime: 12382 hours (515 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 a9 3d 0d  Error: UNC at LBA = 0x0d3da900 = 222144768

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 a8 3d 40 00   3d+17:35:05.453  READ FPDMA QUEUED
  60 00 00 00 a6 3d 40 00   3d+17:35:05.425  READ FPDMA QUEUED
  60 00 00 00 a4 3d 40 00   3d+17:35:05.423  READ FPDMA QUEUED
  60 00 00 00 a2 3d 40 00   3d+17:35:05.353  READ FPDMA QUEUED
  60 00 00 00 a0 3d 40 00   3d+17:35:05.351  READ FPDMA QUEUED

Error 400 occurred at disk power-on lifetime: 12355 hours (514 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 eb 15 b4 3c 0d  Error: UNC at LBA = 0x0d3cb415 = 222082069

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 eb 00 15 b4 3c 40 00   2d+14:58:30.623  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   2d+14:58:30.557  READ LOG EXT
  60 ec 00 14 b4 3c 40 00   2d+14:58:26.706  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   2d+14:58:26.639  READ LOG EXT
  60 ed 00 13 b4 3c 40 00   2d+14:58:22.796  READ FPDMA QUEUED

Error 399 occurred at disk power-on lifetime: 12355 hours (514 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ec 14 b4 3c 0d  Error: UNC at LBA = 0x0d3cb414 = 222082068

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 ec 00 14 b4 3c 40 00   2d+14:58:26.706  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   2d+14:58:26.639  READ LOG EXT
  60 ed 00 13 b4 3c 40 00   2d+14:58:22.796  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   2d+14:58:22.730  READ LOG EXT
  60 ee 00 12 b4 3c 40 00   2d+14:58:18.895  READ FPDMA QUEUED

Error 398 occurred at disk power-on lifetime: 12355 hours (514 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ed 13 b4 3c 0d  Error: UNC at LBA = 0x0d3cb413 = 222082067

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 ed 00 13 b4 3c 40 00   2d+14:58:22.796  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   2d+14:58:22.730  READ LOG EXT
  60 ee 00 12 b4 3c 40 00   2d+14:58:18.895  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   2d+14:58:18.829  READ LOG EXT
  60 ef 00 11 b4 3c 40 00   2d+14:58:14.993  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     12009         -
# 2  Extended offline    Completed without error       00%     11990         -
# 3  Extended offline    Completed without error       00%      3984         -
# 4  Extended offline    Completed without error       00%      3953         -
# 5  Extended offline    Completed without error       00%      3776         -
# 6  Extended offline    Completed without error       00%      3758         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Я связался с хостинг-компанией, и они сказали, что это плохой диск и я его заменю.

Это больше похоже на то, что команда теряется на пути к диску. Попробуйте заменить кабель SATA и поставить диск на другой кабель питания, прежде чем осуждать его.