Мне сложно определить, какой диск выходит из строя в моем HP ProLiant DL360p Gen8. Он имеет следующий RAID-контроллер: Интеллектуальный массив P420i. Я вижу массу ошибок в dmesg:
[40425140.998750] sd 0:1:0:1: [sdb] Unaligned partial completion (resid=16312, sector_sz=512)
[40425140.998763] sd 0:1:0:1: [sdb] tag#597 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[40425140.998767] sd 0:1:0:1: [sdb] tag#597 Sense Key : 0x3 [current]
[40425140.998770] sd 0:1:0:1: [sdb] tag#597 ASC=0x11 ASCQ=0x0
[40425140.998775] sd 0:1:0:1: [sdb] tag#597 CDB: opcode=0x88 88 00 00 00 00 00 3c 17 fa f8 00 00 00 08 00 00
[40425140.998778] print_req_error: critical medium error, dev sdb, sector 1008204536
[40425141.001176] sd 0:1:0:1: [sdb] Unaligned partial completion (resid=16312, sector_sz=512)
[40425141.001186] sd 0:1:0:1: [sdb] tag#597 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[40425141.001189] sd 0:1:0:1: [sdb] tag#597 Sense Key : 0x3 [current]
[40425141.001193] sd 0:1:0:1: [sdb] tag#597 ASC=0x11 ASCQ=0x0
[40425141.001197] sd 0:1:0:1: [sdb] tag#597 CDB: opcode=0x88 88 00 00 00 00 00 3c 17 fa f8 00 00 00 08 00 00
[40425141.001199] print_req_error: critical medium error, dev sdb, sector 1008204536
Я думаю, они имеют в виду, что один из дисков в моем sdb
RAID-массив выходит из строя. К сожалению, выполнение самотестирования по какой-то причине невозможно, и я не могу увидеть атрибуты SMART отдельных дисков. Вот результат работы smartctl:
sudo smartctl -a /dev/sdb -d cciss,2 130 ↵
[sudo] password for hypervisor:
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.14.12-2-bfq-mq] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HITACHI
Product: HUC10606 CLAR600
Revision: C3B0
Compliance: SPC-4
User Capacity: 600,000,000,000 bytes [600 GB]
Logical block size: 512 bytes
Rotation Rate: 10020 rpm
Form Factor: 2.5 inches
Logical Unit id: 0x5000cca03ca74bd0
Serial number: PZJZ06RD
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Dec 16 13:19:33 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Disabled or Not Supported
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 28 C
Drive Trip Temperature: 85 C
Manufactured in week 20 of year 2013
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 122
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 2022
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 7447546408787247104
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 2241169767 2284343 0 2243454110 16825755 105146.414 0
write: 0 112305 0 112305 112316 60991.348 0
verify: 722111517 138380 0 722249897 90047 25945.352 0
Non-medium error count: 0
No self-tests have been logged
Вот моя настройка массива:
sudo ssacli ctrl slot=0 pd all show detail
Smart Array P420i in Slot 0 (Embedded)
Array A
physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 500 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: X61130WD
Serial Number: 173566421696
WWID: 3001438031683380
Model: ATA WDC WDS500G2B0A-
SATA NCQ Capable: True
SATA NCQ Enabled: True
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: True
Unrestricted Sanitize Supported: True
Shingled Magnetic Recording Support: None
physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 500 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: X61130WD
Serial Number: 173566420063
WWID: 3001438031683381
Model: ATA WDC WDS500G2B0A-
SATA NCQ Capable: True
SATA NCQ Enabled: True
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: True
Unrestricted Sanitize Supported: True
Shingled Magnetic Recording Support: None
Array B
physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 600 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: C3B0
Serial Number: PZJZ06RD
WWID: 5000CCA03CA74BD1
Model: HITACHI HUC10606 CLAR600
Current Temperature (C): 28
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 600 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: C3B0
Serial Number: PZJE23KD
WWID: 5000CCA03C887F15
Model: HITACHI HUC10606 CLAR600
Current Temperature (C): 30
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 2I:1:5
Port: 2I
Box: 1
Bay: 5
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 600 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: C3B0
Serial Number: PZJAL0JD
WWID: 5000CCA03C83F969
Model: HITACHI HUC10606 CLAR600
Current Temperature (C): 27
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 2I:1:6
Port: 2I
Box: 1
Bay: 6
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 600 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: C3B0
Serial Number: PZK1EU5D
WWID: 5000CCA03CABBAED
Model: HITACHI HUC10606 CLAR600
Current Temperature (C): 29
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 2I:1:7
Port: 2I
Box: 1
Bay: 7
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 600 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: C3B0
Serial Number: PZJJ37VD
WWID: 5000CCA03C8E04A1
Model: HITACHI HUC10606 CLAR600
Current Temperature (C): 29
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
physicaldrive 2I:1:8
Port: 2I
Box: 1
Bay: 8
Status: OK
Drive Type: Data Drive
Interface Type: SAS
Size: 600 GB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Rotational Speed: 10000
Firmware Revision: C3B0
Serial Number: PZJAH46D
WWID: 5000CCA03C83CE25
Model: HITACHI HUC10606 CLAR600
Current Temperature (C): 28
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None
Есть ли способ определить, какой диск генерирует критические ошибки среды во втором массиве? Я не хочу заменять каждый диск, если только один из них выходит из строя ...