Назад | Перейти на главную страницу

«Echo 0> / proc / sys / kernel / hung_task_timeout_secs» отключает это сообщение

Один из моих серверов (Ubuntu 12.04, 16CPU, 32 ГБ ОЗУ) запускает раздел приложений моего веб-сайта (Nginx 1.1.19, PHP5-FPM, Php 5.3.10, Symfony2 Web Framework)

Внезапно пользователи начали получать ошибки HTTP 5 **. Когда я зашел на сервер, был огромный дисковый ввод-вывод, и виноват был [flush].

Когда я просмотрел журналы ядра, используя dmesg, это показало причину [flush] процессам nginx требовалось слишком много времени для очистки. (Думаю)

Запись в журнале ядра с момента инцидента:

Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.048088] INFO: task nginx:17876 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.050180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057484] nginx           D ffffffff81806240     0 17876  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057488]  ffff88075deeda68 0000000000000082 ffff88063383be38 ffff880427d450d0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057491]  ffff88075deedfd8 ffff88075deedfd8 ffff88075deedfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057494]  ffffffff81c0d020 ffff880734362de0 ffff88075deeda78 ffff88075deedad8
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057501] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057508]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057513]  [<ffffffff8126163d>] do_get_write_access+0x27d/0x4f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057519]  [<ffffffff8108b920>] ? autoremove_wake_function+0x40/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057523]  [<ffffffff8129f1fe>] ? security_inode_alloc+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057526]  [<ffffffff81261a00>] jbd2_journal_get_write_access+0x30/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057531]  [<ffffffff812439fe>] __ext4_journal_get_write_access+0x3e/0x80
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057536]  [<ffffffff8121453a>] ext4_new_inode+0x19a/0xb10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057540]  [<ffffffff81220745>] ? ext4_lookup.part.27+0x75/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057543]  [<ffffffff81221e58>] ext4_create+0xb8/0x140
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057547]  [<ffffffff81185da4>] vfs_create+0xb4/0x120
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057551]  [<ffffffff81187a79>] do_last+0x5c9/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057554]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057559]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057563]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057567]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057570]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057575]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057578]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057583]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057586]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057591]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.057594] INFO: task nginx:17877 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.061709] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066548] nginx           D ffffffff81806240     0 17877  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066551]  ffff880703c07c78 0000000000000082 ffff880746bb37b0 ffff880703c07e38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066554]  ffff880703c07fd8 ffff880703c07fd8 ffff880703c07fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066556]  ffff88075e5896f0 ffff880734360000 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066559] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066562]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066564]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066566]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066568]  [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066570]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066573]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066575]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066577]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066579]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066581]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066584]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066586]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066588]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066590]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.066592] INFO: task nginx:17878 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.070611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075090] nginx           D ffffffff81806240     0 17878  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075093]  ffff88075d68fc78 0000000000000082 ffff880746bb37b0 ffff88075d68fe38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075098]  ffff88075d68ffd8 ffff88075d68ffd8 ffff88075d68ffd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075103]  ffffffff81c0d020 ffff880734365bc0 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075109] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075113]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075117]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075120]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075123]  [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075127]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075130]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075134]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075137]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075140]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075143]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075146]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075149]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075152]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075156]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.075158] INFO: task nginx:17879 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.078900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083527] nginx           D ffffffff81806240     0 17879  17875 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083530]  ffff880680f29c78 0000000000000082 ffff880746bb37b0 ffff880680f29e38
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083536]  ffff880680f29fd8 ffff880680f29fd8 ffff880680f29fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083542]  ffffffff81c0d020 ffff8807343644d0 0000000000000081 ffff880746bb3bc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083548] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083551]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083554]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083558]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083562]  [<ffffffff81187764>] do_last+0x2b4/0x730
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083565]  [<ffffffff81188f81>] path_openat+0xd1/0x3f0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083568]  [<ffffffff8152beea>] ? sock_poll+0x1a/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083572]  [<ffffffff811893c2>] do_filp_open+0x42/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083575]  [<ffffffff8131b481>] ? strncpy_from_user+0x31/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083578]  [<ffffffff8118472a>] ? do_getname+0x10a/0x180
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083581]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083585]  [<ffffffff811964d7>] ? alloc_fd+0xf7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083588]  [<ffffffff81178a18>] do_sys_open+0xf8/0x240
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083591]  [<ffffffff81178b80>] sys_open+0x20/0x30
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083594]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.083597] INFO: task php5-fpm:15541 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.087245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091289] php5-fpm        D 0000000000000008     0 15541  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091291]  ffff880670031c28 0000000000000082 ffff88024683e001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091294]  ffff880670031fd8 ffff880670031fd8 ffff880670031fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091299]  ffff88013cd75bc0 ffff88075ddc96f0 ffff880670031c78 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091305] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091309]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091313]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091316]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091319]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091323]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091326]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091330]  [<ffffffff8114f6be>] ? free_pages_and_swap_cache+0x9e/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091334]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091337]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091341]  [<ffffffff812d32cd>] ? aa_dup_task_context+0x3d/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091345]  [<ffffffff812d86bb>] ? apparmor_cred_prepare+0x3b/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091349]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091352]  [<ffffffff81178212>] sys_faccessat+0xa2/0x1e0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091356]  [<ffffffff81095c3a>] ? do_gettimeofday+0x1a/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091359]  [<ffffffff81178368>] sys_access+0x18/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091363]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.091368] INFO: task php5-fpm:15643 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.094542] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098519] php5-fpm        D ffffffff81806240     0 15643  17849 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098522]  ffff880680d3dbb8 0000000000000086 ffff880557f73001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098527]  ffff880680d3dfd8 ffff880680d3dfd8 ffff880680d3dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098533]  ffffffff81c0d020 ffff88013cd72de0 ffff880680d3dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098539] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098542]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098545]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098549]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098552]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098555]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098558]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098561]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098564]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098568]  [<ffffffff8113e127>] ? zap_page_range+0xb7/0xd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098572]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098575]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098578]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098582]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098585]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098588]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098591]  [<ffffffff8113a855>] ? sys_madvise+0x225/0x280
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098595]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.098598] INFO: task php5-fpm:15775 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.102580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107285] php5-fpm        D ffffffff81806240     0 15775  17849 0x00000000
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107287]  ffff880680f69bb8 0000000000000086 ffff88074bdce001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107292]  ffff880680f69fd8 ffff880680f69fd8 ffff880680f69fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107298]  ffff88075e5a96f0 ffff88074bbf8000 ffff880680f69c08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107303] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107307]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107311]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107314]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107317]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107320]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107323]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107327]  [<ffffffff810d8ea7>] ? irq_to_desc+0x17/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107331]  [<ffffffff810dbd8e>] ? irq_get_irq_data+0xe/0x10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107334]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107337]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107340]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107344]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107347]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107351]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107354]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107357]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107361]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.107364] INFO: task php5-fpm:16184 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.110831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115362] php5-fpm        D ffffffff81806240     0 16184  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115364]  ffff88070499dbb8 0000000000000082 ffff880037452001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115370]  ffff88070499dfd8 ffff88070499dfd8 ffff88070499dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115376]  ffff88075e6196f0 ffff8807309cc4d0 ffff88070499dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115381] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115385]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115388]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115391]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115394]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115397]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115400]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115404]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115407]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115410]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115413]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115416]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115420]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115423]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115427]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115431]  [<ffffffff81080b0a>] ? do_sys_times+0xaa/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115434]  [<ffffffff81080b6a>] ? sys_times+0x4a/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115437]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.115440] INFO: task php5-fpm:16213 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.118940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123356] php5-fpm        D ffffffff81806240     0 16213  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123358]  ffff880730811bb8 0000000000000086 ffff8804f99e0001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123363]  ffff880730811fd8 ffff880730811fd8 ffff880730811fd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123369]  ffff88075e5a96f0 ffff8806d47f44d0 ffff880730811c08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123374] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123378]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123381]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123384]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123387]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123390]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123393]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123397]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123400]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123403]  [<ffffffff8113e127>] ? zap_page_range+0xb7/0xd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123406]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123410]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123413]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123416]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123420]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123423]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123426]  [<ffffffff8113a855>] ? sys_madvise+0x225/0x280
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123429]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.123434] INFO: task php5-fpm:17429 blocked for more than 120 seconds.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.127444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131732] php5-fpm        D ffffffff81806240     0 17429  17849 0x00000004
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131735]  ffff88063481dbb8 0000000000000086 ffff880634995001 00000000000043ff
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131740]  ffff88063481dfd8 ffff88063481dfd8 ffff88063481dfd8 0000000000013700
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131746]  ffff88075e7796f0 ffff88075d0896f0 ffff88063481dc08 ffff88074acc9cd0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131751] Call Trace:
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131755]  [<ffffffff8165badf>] schedule+0x3f/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131758]  [<ffffffff8165c8c7>] __mutex_lock_slowpath+0xd7/0x150
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131762]  [<ffffffff81185a84>] ? path_get+0x44/0x50
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131765]  [<ffffffff8165c482>] mutex_lock+0x22/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131768]  [<ffffffff81186446>] do_lookup+0x1d6/0x310
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131771]  [<ffffffff811881ac>] path_lookupat+0x11c/0x750
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131774]  [<ffffffff810d8ea7>] ? irq_to_desc+0x17/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131777]  [<ffffffff810dbd8e>] ? irq_get_irq_data+0xe/0x10
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131780]  [<ffffffff81188811>] do_path_lookup+0x31/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131784]  [<ffffffff81189319>] user_path_at_empty+0x59/0xa0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131787]  [<ffffffff8165d9be>] ? _raw_spin_lock+0xe/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131790]  [<ffffffff8109f8f3>] ? futex_wake+0x113/0x130
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131794]  [<ffffffff81189371>] user_path_at+0x11/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131797]  [<ffffffff8117e39a>] vfs_fstatat+0x3a/0x70
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131800]  [<ffffffff8117e3ee>] vfs_lstat+0x1e/0x20
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131803]  [<ffffffff8117e58a>] sys_newlstat+0x1a/0x40
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131807]  [<ffffffff81080b0a>] ? do_sys_times+0xaa/0xc0
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131810]  [<ffffffff81080b6a>] ? sys_times+0x4a/0x60
Jan 31 22:51:25 ip-10-0-0-160 kernel: [115562.131814]  [<ffffffff81666002>] system_call_fastpath+0x16/0x1b

Как я могу гарантировать, что такие случаи не создают HIGH DISK IO / Server Unstable?

Я только что столкнулся с этой проблемой на своем NVR (сетевом видеорегистраторе).

Эта ссылка: https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ указывает, что это состояние нехватки памяти:

По умолчанию Linux использует до 40% доступной памяти для кэширования файловой системы. После достижения этой отметки файловая система сбрасывает все ожидающие данные на диск, в результате чего все последующие операции ввода-вывода выполняются синхронно. Для сброса этих данных на диск по умолчанию установлено ограничение по времени в 120 секунд. В данном случае подсистема ввода-вывода недостаточно быстра, чтобы сбросить данные в течение 120 секунд. Поскольку подсистема ввода-вывода медленно реагирует и обслуживается больше запросов, системная память заполняется, что приводит к указанной выше ошибке, таким образом обслуживая HTTP-запросы.

Они предлагают:

someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_ratio=10
someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_background_ratio=5

В моей системе (Ubuntu 16.04.5) это меняет значения из dirty_ratio=20, и dirty_background_ratio=10.

Эти два вопроса:

Укажите средние значения ядра:

dirty_ratio - Содержит в процентах от общей доступной памяти, которая содержит свободные страницы и восстанавливаемые страницы, количество страниц, на которых процесс, производящий запись на диск, сам начнет записывать грязные данные.

dirty_background_ratio - Содержит в процентах от общей доступной памяти, которая содержит свободные и восстанавливаемые страницы, количество страниц, на которых фоновые потоки очистки ядра начнут записывать грязные данные.

Я не доказал это решение, но оно кажется разумным. Я постараюсь не забыть обновить это, если машина всегда будет стабильной. В моем случае кажется, что машина время от времени входит в это состояние, пока машина выполняет очистку ZFS воскресными вечерами.