Один из моих серверов (Ubuntu 12.04, 16CPU, 32 ГБ ОЗУ) запускает раздел приложений моего веб-сайта (Nginx 1.1.19, PHP5-FPM, Php 5.3.10, Symfony2 Web Framework)
Внезапно пользователи начали получать ошибки HTTP 5 **. Когда я зашел на сервер, был огромный дисковый ввод-вывод, и виноват был [flush]
.
Когда я просмотрел журналы ядра, используя dmesg
, это показало причину [flush]
процессам nginx требовалось слишком много времени для очистки. (Думаю)
Запись в журнале ядра с момента инцидента:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.048088] INFO: task nginx:17876 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.050180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057484] nginx D ffffffff81806240 0 17876 17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057488] ffff88075deeda68 0000000000000082 ffff88063383be38 ffff880427d450d0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057491] ffff88075deedfd8 ffff88075deedfd8 ffff88075deedfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057494] ffffffff81c0d020 ffff880734362de0 ffff88075deeda78 ffff88075deedad8
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057501] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057508] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057513] [<ffffffff8126163d>] do_get_write_access+0x27d/0x4f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057519] [<ffffffff8108b920>] ? autoremove_wake_function+0x40/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057523] [<ffffffff8129f1fe>] ? security_inode_alloc+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057526] [<ffffffff81261a00>] jbd2_journal_get_write_access+0x30/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057531] [<ffffffff812439fe>] __ext4_journal_get_write_access+0x3e/0x80
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057536] [<ffffffff8121453a>] ext4_new_inode+0x19a/0xb10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057540] [<ffffffff81220745>] ? ext4_lookup.part.27+0x75/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057543] [<ffffffff81221e58>] ext4_create+0xb8/0x140
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057547] [<ffffffff81185da4>] vfs_create+0xb4/0x120
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057551] [<ffffffff81187a79>] do_last+0x5c9/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057554] [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057559] [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057563] [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057567] [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057570] [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057575] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057578] [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057583] [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057586] [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057591] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057594] INFO: task nginx:17877 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.061709] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066548] nginx D ffffffff81806240 0 17877 17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066551] ffff880703c07c78 0000000000000082 ffff880746bb37b0 ffff880703c07e38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066554] ffff880703c07fd8 ffff880703c07fd8 ffff880703c07fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066556] ffff88075e5896f0 ffff880734360000 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066559] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066562] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066564] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066566] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066568] [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066570] [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066573] [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066575] [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066577] [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066579] [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066581] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066584] [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066586] [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066588] [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066590] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066592] INFO: task nginx:17878 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.070611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075090] nginx D ffffffff81806240 0 17878 17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075093] ffff88075d68fc78 0000000000000082 ffff880746bb37b0 ffff88075d68fe38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075098] ffff88075d68ffd8 ffff88075d68ffd8 ffff88075d68ffd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075103] ffffffff81c0d020 ffff880734365bc0 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075109] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075113] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075117] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075120] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075123] [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075127] [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075130] [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075134] [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075137] [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075140] [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075143] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075146] [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075149] [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075152] [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075156] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075158] INFO: task nginx:17879 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.078900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083527] nginx D ffffffff81806240 0 17879 17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083530] ffff880680f29c78 0000000000000082 ffff880746bb37b0 ffff880680f29e38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083536] ffff880680f29fd8 ffff880680f29fd8 ffff880680f29fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083542] ffffffff81c0d020 ffff8807343644d0 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083548] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083551] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083554] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083558] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083562] [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083565] [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083568] [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083572] [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083575] [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083578] [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083581] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083585] [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083588] [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083591] [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083594] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083597] INFO: task php5-fpm:15541 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.087245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091289] php5-fpm D 0000000000000008 0 15541 17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091291] ffff880670031c28 0000000000000082 ffff88024683e001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091294] ffff880670031fd8 ffff880670031fd8 ffff880670031fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091299] ffff88013cd75bc0 ffff88075ddc96f0 ffff880670031c78 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091305] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091309] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091313] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091316] [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091319] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091323] [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091326] [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091330] [<ffffffff8114f6be>] ? free_pages_and_swap_cache+0x9e/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091334] [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091337] [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091341] [<ffffffff812d32cd>] ? aa_dup_task_context+0x3d/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091345] [<ffffffff812d86bb>] ? apparmor_cred_prepare+0x3b/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091349] [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091352] [<ffffffff81178212>] sys_faccessat+0xa2/0x1e0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091356] [<ffffffff81095c3a>] ? do_gettimeofday+0x1a/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091359] [<ffffffff81178368>] sys_access+0x18/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091363] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091368] INFO: task php5-fpm:15643 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.094542] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098519] php5-fpm D ffffffff81806240 0 15643 17849 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098522] ffff880680d3dbb8 0000000000000086 ffff880557f73001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098527] ffff880680d3dfd8 ffff880680d3dfd8 ffff880680d3dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098533] ffffffff81c0d020 ffff88013cd72de0 ffff880680d3dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098539] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098542] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098545] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098549] [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098552] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098555] [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098558] [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098561] [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098564] [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098568] [<ffffffff8113e127>] ? zap_page_range+0xb7/0xd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098572] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098575] [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098578] [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098582] [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098585] [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098588] [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098591] [<ffffffff8113a855>] ? sys_madvise+0x225/0x280
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098595] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098598] INFO: task php5-fpm:15775 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.102580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107285] php5-fpm D ffffffff81806240 0 15775 17849 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107287] ffff880680f69bb8 0000000000000086 ffff88074bdce001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107292] ffff880680f69fd8 ffff880680f69fd8 ffff880680f69fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107298] ffff88075e5a96f0 ffff88074bbf8000 ffff880680f69c08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107303] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107307] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107311] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107314] [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107317] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107320] [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107323] [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107327] [<ffffffff810d8ea7>] ? irq_to_desc+0x17/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107331] [<ffffffff810dbd8e>] ? irq_get_irq_data+0xe/0x10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107334] [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107337] [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107340] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107344] [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107347] [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107351] [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107354] [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107357] [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107361] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107364] INFO: task php5-fpm:16184 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.110831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115362] php5-fpm D ffffffff81806240 0 16184 17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115364] ffff88070499dbb8 0000000000000082 ffff880037452001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115370] ffff88070499dfd8 ffff88070499dfd8 ffff88070499dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115376] ffff88075e6196f0 ffff8807309cc4d0 ffff88070499dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115381] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115385] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115388] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115391] [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115394] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115397] [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115400] [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115404] [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115407] [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115410] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115413] [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115416] [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115420] [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115423] [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115427] [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115431] [<ffffffff81080b0a>] ? do_sys_times+0xaa/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115434] [<ffffffff81080b6a>] ? sys_times+0x4a/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115437] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115440] INFO: task php5-fpm:16213 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.118940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123356] php5-fpm D ffffffff81806240 0 16213 17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123358] ffff880730811bb8 0000000000000086 ffff8804f99e0001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123363] ffff880730811fd8 ffff880730811fd8 ffff880730811fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123369] ffff88075e5a96f0 ffff8806d47f44d0 ffff880730811c08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123374] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123378] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123381] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123384] [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123387] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123390] [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123393] [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123397] [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123400] [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123403] [<ffffffff8113e127>] ? zap_page_range+0xb7/0xd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123406] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123410] [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123413] [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123416] [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123420] [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123423] [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123426] [<ffffffff8113a855>] ? sys_madvise+0x225/0x280
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123429] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123434] INFO: task php5-fpm:17429 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.127444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131732] php5-fpm D ffffffff81806240 0 17429 17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131735] ffff88063481dbb8 0000000000000086 ffff880634995001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131740] ffff88063481dfd8 ffff88063481dfd8 ffff88063481dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131746] ffff88075e7796f0 ffff88075d0896f0 ffff88063481dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131751] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131755] [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131758] [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131762] [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131765] [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131768] [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131771] [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131774] [<ffffffff810d8ea7>] ? irq_to_desc+0x17/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131777] [<ffffffff810dbd8e>] ? irq_get_irq_data+0xe/0x10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131780] [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131784] [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131787] [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131790] [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131794] [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131797] [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131800] [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131803] [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131807] [<ffffffff81080b0a>] ? do_sys_times+0xaa/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131810] [<ffffffff81080b6a>] ? sys_times+0x4a/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131814] [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Как я могу гарантировать, что такие случаи не создают HIGH DISK IO / Server Unstable?
Я только что столкнулся с этой проблемой на своем NVR (сетевом видеорегистраторе).
Эта ссылка: https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ указывает, что это состояние нехватки памяти:
По умолчанию Linux использует до 40% доступной памяти для кэширования файловой системы. После достижения этой отметки файловая система сбрасывает все ожидающие данные на диск, в результате чего все последующие операции ввода-вывода выполняются синхронно. Для сброса этих данных на диск по умолчанию установлено ограничение по времени в 120 секунд. В данном случае подсистема ввода-вывода недостаточно быстра, чтобы сбросить данные в течение 120 секунд. Поскольку подсистема ввода-вывода медленно реагирует и обслуживается больше запросов, системная память заполняется, что приводит к указанной выше ошибке, таким образом обслуживая HTTP-запросы.
Они предлагают:
someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_ratio=10
someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_background_ratio=5
В моей системе (Ubuntu 16.04.5) это меняет значения из dirty_ratio=20
, и dirty_background_ratio=10
.
Эти два вопроса:
Укажите средние значения ядра:
dirty_ratio
- Содержит в процентах от общей доступной памяти, которая содержит свободные страницы и восстанавливаемые страницы, количество страниц, на которых процесс, производящий запись на диск, сам начнет записывать грязные данные.
dirty_background_ratio
- Содержит в процентах от общей доступной памяти, которая содержит свободные и восстанавливаемые страницы, количество страниц, на которых фоновые потоки очистки ядра начнут записывать грязные данные.
Я не доказал это решение, но оно кажется разумным. Я постараюсь не забыть обновить это, если машина всегда будет стабильной. В моем случае кажется, что машина время от времени входит в это состояние, пока машина выполняет очистку ZFS воскресными вечерами.