У нас есть кластер Hadoop, в котором будут заблокированы произвольные узлы данных. Обычно этому предшествует постоянно увеличивающаяся средняя нагрузка, при этом CPU и IOwait практически не существуют. Сценарий использования затронутых машин - это узлы данных hadoop с большим количеством операций ввода-вывода с большим количеством разархивированных больших архивов и записью множества маленьких и больших файлов. Базовые диски работают под управлением XFS с ядром 2.6.32-358.18.1.el6.x86_64. Все машины имеют более 32 ГБ ОЗУ с 8+ ядрами.
Модель устройства - Dell R720xd
Конфигурация рейда:
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PdList -aAll
Adapter #0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5008e1f239d
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600957SS ESF76SLAH2NQ
FDE Capable: Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e7b6bd1
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5J0NV
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 2
Device Id: 2
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e783fa9
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5FE47
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 3
Device Id: 3
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e7b6ea9
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5J0W4
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 4
Device Id: 4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e78e8cd
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5HPC9
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 5
Device Id: 5
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e7b6e51
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5GFW2
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 6
Device Id: 6
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e7b6ef5
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5J0GC
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 7
Device Id: 7
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e78e991
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5GG86
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 8
Device Id: 8
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c50095a39799
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SLAQM3Y
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 9
Device Id: 9
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e78e7b1
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5HP5A
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 10
Device Id: 10
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e7b6ce5
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5J0MW
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 11
Device Id: 11
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors]
Coerced Size: 558.375 GB [0x45cc0000 Sectors]
Firmware state: Online
SAS Address(0): 0x5000c5005e78e269
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST3600057SS ES666SL5HP7Y
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None
Device Speed: Unknown
Link Speed: Unknown
Media Type: Hard Disk Device
Exit Code: 0x00
Конфигурация виртуального диска Raid:
sudo /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aAll
Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (Target Id: 0)
Name:OS
RAID Level: Primary-1, Secondary-0, RAID Level Qualifier-0
Size:558.375 GB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:2
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disabled
Encryption Type: None
Virtual Disk: 1 (Target Id: 1)
Name:
RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
Size:4.362 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:10
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Exit Code: 0x00
Вывод iostat -x
[user@data1234.svx.foo.bar ~]$ iostat -x
Linux 2.6.32-358.18.1.el6.x86_64 (data1234.svx.foo.bar) 02/17/2016 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
17.72 0.00 3.54 0.10 0.00 78.65
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.31 27.97 0.49 3.35 18.59 250.38 69.96 0.01 2.26 0.31 0.12
sdb 0.00 1.51 26.10 47.14 4989.96 15418.12 278.65 2.58 35.25 0.50 3.64
Содержимое / etc / fstab
UUID=4fe41c9b-f3f1-4c36-99a2-30e2af5c75e1 / ext3 defaults 1 1
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/sdb /data xfs defaults,noatime,nodiratime,logbufs=8,nobarrier 1 2
/data/home /home none bind 0 0
Вывод xfs_info
xfs_info /dev/sdb
meta-data=/dev/sdb isize=256 agcount=32, agsize=36593648 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=1170996736, imaxpct=5
= sunit=16 swidth=128 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Вывод dmesg
INFO: task swh-logfiles_pr:22324 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swh-logfiles_ D 0000000000000000 0 22324 22300 0x00000000
ffff881fe29cdd38 0000000000000086 ffff881fe29cdc98 ffffffff8109f641
ffff881fe29cdcc8 ffffffff8118e05d ffff881fe29cdcc8 ffff881c2e78300a
ffff881ded459ab8 ffff881fe29cdfd8 000000000000fb88 ffff881ded459ab8
Call Trace:
[<ffffffff8109f641>] ? in_group_p+0x31/0x40
[<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0
[<ffffffff8118f541>] ? path_put+0x31/0x40
[<ffffffff8119f922>] ? alloc_fd+0x92/0x160
[<ffffffff8117e249>] do_sys_open+0x69/0x140
[<ffffffff8117e360>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task swh-logfiles_pr:22345 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swh-logfiles_ D 0000000000000001 0 22345 22323 0x00000000
ffff88201044fd38 0000000000000086 0000000000000000 ffffffff8109f641
ffff88201044fcc8 ffffffff8118e05d ffff88201044fcc8 ffff881fc7a1500a
ffff8819d03fe638 ffff88201044ffd8 000000000000fb88 ffff8819d03fe638
Call Trace:
[<ffffffff8109f641>] ? in_group_p+0x31/0x40
[<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0
[<ffffffff811b3ffb>] ? vfs_statfs+0x1b/0xb0
[<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
[<ffffffff8119f922>] ? alloc_fd+0x92/0x160
[<ffffffff8117e249>] do_sys_open+0x69/0x140
[<ffffffff8117e360>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task swh-logfiles_pr:22356 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swh-logfiles_ D 0000000000000001 0 22356 22334 0x00000000
ffff881cc4f8f698 0000000000000086 ffff881cc4f8f85c ffff880e59395038
ffff881cc4f8f6a8 ffffffffa01a670d ffff881cc4f8f908 0000000000000000
ffff881fdf067ab8 ffff881cc4f8ffd8 000000000000fb88 ffff881fdf067ab8
Call Trace:
[<ffffffffa01a670d>] ? xfs_bmap_add_extent+0xad/0x3c0 [xfs]
[<ffffffff8150efa5>] schedule_timeout+0x215/0x2e0
[<ffffffffa01a7562>] ? xfs_bmapi+0xb42/0x1120 [xfs]
[<ffffffff8150fec2>] __down+0x72/0xb0
[<ffffffffa01e78e5>] ? _xfs_buf_find+0xe5/0x230 [xfs]
[<ffffffff8109cb61>] down+0x41/0x50
[<ffffffffa01e7751>] xfs_buf_lock+0x51/0x100 [xfs]
[<ffffffffa01e78e5>] _xfs_buf_find+0xe5/0x230 [xfs]
[<ffffffffa01e7a64>] xfs_buf_get+0x34/0x1b0 [xfs]
[<ffffffffa01e80ec>] xfs_buf_read+0x2c/0x100 [xfs]
[<ffffffffa01dd9a7>] xfs_trans_read_buf+0x1f7/0x410 [xfs]
[<ffffffffa01c0404>] xfs_read_agi+0x74/0x100 [xfs]
[<ffffffffa01c04be>] xfs_ialloc_read_agi+0x2e/0x90 [xfs]
[<ffffffffa01c07a3>] xfs_ialloc_ag_select+0x133/0x270 [xfs]
[<ffffffffa01c1e67>] xfs_dialloc+0x3d7/0x850 [xfs]
[<ffffffffa01e6e25>] ? xfs_buf_rele+0x55/0x100 [xfs]
[<ffffffffa01ddf98>] ? xfs_trans_brelse+0xe8/0x130 [xfs]
[<ffffffffa01b029b>] ? xfs_da_brelse+0x7b/0xc0 [xfs]
[<ffffffffa01c5ba0>] xfs_ialloc+0x60/0x6e0 [xfs]
[<ffffffffa01e2eaa>] ? kmem_zone_zalloc+0x3a/0x50 [xfs]
[<ffffffffa01de534>] xfs_dir_ialloc+0x74/0x2b0 [xfs]
[<ffffffffa01e0610>] xfs_create+0x440/0x640 [xfs]
[<ffffffffa01ed7bd>] xfs_vn_mknod+0xad/0x1c0 [xfs]
[<ffffffffa01ed900>] xfs_vn_create+0x10/0x20 [xfs]
[<ffffffff8118fbd4>] vfs_create+0xb4/0xe0
[<ffffffff811936a0>] do_filp_open+0xb10/0xdc0
[<ffffffff8118f541>] ? path_put+0x31/0x40
[<ffffffff8119f922>] ? alloc_fd+0x92/0x160
[<ffffffff8117e249>] do_sys_open+0x69/0x140
[<ffffffff8117e360>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task swh-logfiles_pr:22386 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swh-logfiles_ D 0000000000000001 0 22386 22362 0x00000000
ffff88200be6dd38 0000000000000082 ffff88200be6dc98 ffffffff8109f641
ffff88200be6dcc8 ffffffff8118e05d ffff88200be6dcc8 ffff881fd395800a
ffff881fce825af8 ffff88200be6dfd8 000000000000fb88 ffff881fce825af8
Call Trace:
[<ffffffff8109f641>] ? in_group_p+0x31/0x40
[<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0
[<ffffffff8118f541>] ? path_put+0x31/0x40
[<ffffffff8119f922>] ? alloc_fd+0x92/0x160
[<ffffffff8117e249>] do_sys_open+0x69/0x140
[<ffffffff8117e360>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task swh-logfiles_pr:22415 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swh-logfiles_ D 0000000000000000 0 22415 22402 0x00000000
ffff881cd8f6dd38 0000000000000086 0000000000000000 ffffffff8109f641
ffff881cd8f6dcc8 ffffffff8118e05d ffff881cd8f6dcc8 ffff881f2073500a
ffff881fd367c5f8 ffff881cd8f6dfd8 000000000000fb88 ffff881fd367c5f8
Call Trace:
[<ffffffff8109f641>] ? in_group_p+0x31/0x40
[<ffffffff8118e05d>] ? acl_permission_check+0x5d/0xc0
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff81192e67>] do_filp_open+0x2d7/0xdc0
[<ffffffff811b3ffb>] ? vfs_statfs+0x1b/0xb0
[<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
[<ffffffff8119f922>] ? alloc_fd+0x92/0x160
[<ffffffff8117e249>] do_sys_open+0x69/0x140
[<ffffffff8117e360>] sys_open+0x20/0x30
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task flush-8:16:5856 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:16 D 000000000000000b 0 5856 2 0x00000000
ffff881fd151b798 0000000000000046 0000000000000000 ffff8820129af380
0000000000000086 ffff881fd151b720 ffff88200b648ea8 0000000000000001
ffff881fda34f058 ffff881fd151bfd8 000000000000fb88 ffff881fda34f058
Call Trace:
[<ffffffff8125ea61>] ? blk_queue_bio+0x121/0x5d0
[<ffffffff81510695>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff81510826>] rwsem_down_read_failed+0x26/0x30
[<ffffffff81283844>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8150fd24>] ? down_read+0x24/0x30
[<ffffffffa01c29cd>] xfs_ilock+0x9d/0xd0 [xfs]
[<ffffffffa01e491b>] xfs_map_blocks+0x1fb/0x250 [xfs]
[<ffffffffa01e4a83>] ? xfs_submit_ioend_bio+0x33/0x40 [xfs]
[<ffffffffa01e5401>] xfs_vm_writepage+0x261/0x5a0 [xfs]
[<ffffffff811198c0>] ? find_get_pages_tag+0x40/0x130
[<ffffffff8112cbb7>] __writepage+0x17/0x40
[<ffffffff8112de6d>] write_cache_pages+0x1fd/0x4c0
[<ffffffff8112cba0>] ? __writepage+0x0/0x40
[<ffffffff8112e154>] generic_writepages+0x24/0x30
[<ffffffffa01e46dd>] xfs_vm_writepages+0x5d/0x80 [xfs]
[<ffffffff8112e181>] do_writepages+0x21/0x40
[<ffffffff811aca0d>] writeback_single_inode+0xdd/0x290
[<ffffffff811ace1e>] writeback_sb_inodes+0xce/0x180
[<ffffffff811acf7b>] writeback_inodes_wb+0xab/0x1b0
[<ffffffff811ad31b>] wb_writeback+0x29b/0x3f0
[<ffffffff8150e130>] ? thread_return+0x4e/0x76e
[<ffffffff81081be2>] ? del_timer_sync+0x22/0x30
[<ffffffff811ad615>] wb_do_writeback+0x1a5/0x240
[<ffffffff811ad713>] bdi_writeback_task+0x63/0x1b0
[<ffffffff81096c67>] ? bit_waitqueue+0x17/0xd0
[<ffffffff8113cc20>] ? bdi_start_fn+0x0/0x100
[<ffffffff8113cca6>] bdi_start_fn+0x86/0x100
[<ffffffff8113cc20>] ? bdi_start_fn+0x0/0x100
[<ffffffff81096a36>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810969a0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
INFO: task java:1114 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 0000000000000006 0 1114 31588 0x00000000
ffff881c5bba7dd8 0000000000000086 0000000000000000 0000000000000001
ffff881c5bba7d58 ffff881d6ccc9500 ffff881d6ccc9500 ffff881d6ccc9500
ffff881d6ccc9ab8 ffff881c5bba7fd8 000000000000fb88 ffff881d6ccc9ab8
Call Trace:
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff8118ebb0>] lookup_create+0x30/0xd0
[<ffffffff811924ac>] sys_mkdirat+0x7c/0x130
[<ffffffff81186f36>] ? sys_newstat+0x36/0x50
[<ffffffff81192578>] sys_mkdir+0x18/0x20
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task java:803 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 0000000000000004 0 803 31612 0x00000000
ffff881c2e7a1dd8 0000000000000082 0000000000000000 0000000000000001
ffff881c2e7a1d58 ffff881fe5494ae0 ffff881fe5494ae0 ffff881fe5494ae0
ffff881fe5495098 ffff881c2e7a1fd8 000000000000fb88 ffff881fe5495098
Call Trace:
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff8118ebb0>] lookup_create+0x30/0xd0
[<ffffffff811924ac>] sys_mkdirat+0x7c/0x130
[<ffffffff81186f36>] ? sys_newstat+0x36/0x50
[<ffffffff81192578>] sys_mkdir+0x18/0x20
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task java:1171 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 0000000000000000 0 1171 31636 0x00000000
ffff881961ce9dd8 0000000000000086 0000000000000000 0000000000000001
ffff881961ce9d58 ffff881cc26f3540 ffff881cc26f3540 ffff881cc26f3540
ffff881cc26f3af8 ffff881961ce9fd8 000000000000fb88 ffff881cc26f3af8
Call Trace:
[<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff8118ebb0>] lookup_create+0x30/0xd0
[<ffffffff811924ac>] sys_mkdirat+0x7c/0x130
[<ffffffff81186f36>] ? sys_newstat+0x36/0x50
[<ffffffff81192578>] sys_mkdir+0x18/0x20
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
INFO: task java:950 blocked for more than 180 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 0000000000000002 0 950 31666 0x00000000
ffff88200d42bdd8 0000000000000082 0000000000000000 0000000000000001
ffff88200d42bd58 ffff881cccccc040 ffff881cccccc040 ffff881cccccc040
ffff881cccccc5f8 ffff88200d42bfd8 000000000000fb88 ffff881cccccc5f8
Call Trace:
[<ffffffff811a20d0>] ? mntput_no_expire+0x30/0x110
[<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8150f62b>] mutex_lock+0x2b/0x50
[<ffffffff8118ebb0>] lookup_create+0x30/0xd0
[<ffffffff811924ac>] sys_mkdirat+0x7c/0x130
[<ffffffff81186f36>] ? sys_newstat+0x36/0x50
[<ffffffff81192578>] sys_mkdir+0x18/0x20
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Как сказано в журнале ядра - у вас проблемы на уровне файловой системы или ниже. Плохо - железо в порядке. И вроде бы более чем достаточно для текущей нагрузки.
По моему опыту, несмотря на то, что XFS рекомендуется в качестве масштабируемой файловой системы, ее использование вызывает больше проблем, чем производительность. Но если переход на EXT4 вам не подходит, вы можете НА СВОЙ СОБСТВЕННЫЙ РИСК попробовать следующую настройку:
#increase number of requests: echo 4096 > /sys/block/sdb/queue/nr_requests #use aggressive mount options: mount -oremount,noatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=131072k,nobarrier /dev/sdb /data
Кроме того, вы можете попробовать перемонтировать каталог / data с параметрами по умолчанию и посмотреть, сохраняется ли проблема.