Я исследую, почему два наших процесса были убиты убийцей Linux OOM - хотя, похоже, в обоих случаях было достаточно оперативной памяти и много SWAP.
Когда я интерпретирую это как этот ответ первый запрос памяти запрашивал 2 ^ 2 = 4 страницы (16 КБ) памяти (флаг порядка) и хотел, чтобы это было из «нормальной» зоны.
Jan 27 04:26:14 kernel: [639964.652706] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
И если я правильно разбираю вывод, места более чем достаточно:
Node 0 Normal free:178144kB min:55068kB low:68832kB high:82600kB
Во второй раз тот же запрос через несколько минут - и, похоже, места достаточно.
Почему тогда сработал убийца OOM? Я неправильно разбираю информацию?
vm.overcommit_memory
настройка установлена на «0» (эвристический), что может быть неоптимальным.Первый экземпляр:
Jan 27 04:26:14 kernel: [639964.652706] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 27 04:26:14 kernel: [639964.652711] java cpuset=/ mems_allowed=0
Jan 27 04:26:14 kernel: [639964.652716] CPU: 5 PID: 2152 Comm: java Not tainted 4.4.0-59-generic #80~14.04.1-Ubuntu
Jan 27 04:26:14 kernel: [639964.652717] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
Jan 27 04:26:14 kernel: [639964.652719] 0000000000000000 ffff88041a963b38 ffffffff813dbd6c ffff88041a963cf0
Jan 27 04:26:14 kernel: [639964.652721] 0000000000000000 ffff88041a963bc8 ffffffff811fafc6 0000000000000000
Jan 27 04:26:14 kernel: [639964.652722] 0000000000000000 0000000000000000 ffff88042a6d1b88 0000000000000015
Jan 27 04:26:14 kernel: [639964.652724] Call Trace:
Jan 27 04:26:14 kernel: [639964.652731] [<ffffffff813dbd6c>] dump_stack+0x63/0x87
Jan 27 04:26:14 kernel: [639964.652736] [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
Jan 27 04:26:14 kernel: [639964.652741] [<ffffffff813766f1>] ? apparmor_capable+0xd1/0x180
Jan 27 04:26:14 kernel: [639964.652746] [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
Jan 27 04:26:14 kernel: [639964.652747] [<ffffffff8118916b>] out_of_memory+0x40b/0x460
Jan 27 04:26:14 kernel: [639964.652749] [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
Jan 27 04:26:14 kernel: [639964.652752] [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
Jan 27 04:26:14 kernel: [639964.652754] [<ffffffff8118e32d>] alloc_kmem_pages_node+0x4d/0xd0
Jan 27 04:26:14 kernel: [639964.652758] [<ffffffff8107c125>] copy_process+0x185/0x1ce0
Jan 27 04:26:14 kernel: [639964.652763] [<ffffffff810fd0b4>] ? do_futex+0xf4/0x520
Jan 27 04:26:14 kernel: [639964.652766] [<ffffffff810a71c9>] ? resched_curr+0xa9/0xd0
Jan 27 04:26:14 kernel: [639964.652768] [<ffffffff8107de1a>] _do_fork+0x8a/0x310
Jan 27 04:26:14 kernel: [639964.652769] [<ffffffff8107e149>] SyS_clone+0x19/0x20
Jan 27 04:26:14 kernel: [639964.652775] [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 27 04:26:14 kernel: [639964.652776] Mem-Info:
Jan 27 04:26:14 kernel: [639964.652780] active_anon:1596719 inactive_anon:281182 isolated_anon:0
Jan 27 04:26:14 kernel: [639964.652780] active_file:953586 inactive_file:952370 isolated_file:0
Jan 27 04:26:14 kernel: [639964.652780] unevictable:0 dirty:7358 writeback:0 unstable:0
Jan 27 04:26:14 kernel: [639964.652780] slab_reclaimable:217903 slab_unreclaimable:12162
Jan 27 04:26:14 kernel: [639964.652780] mapped:40068 shmem:34861 pagetables:8261 bounce:0
Jan 27 04:26:14 kernel: [639964.652780] free:71705 free_pcp:0 free_cma:0
Jan 27 04:26:14 kernel: [639964.652783] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 27 04:26:14 kernel: [639964.652787] lowmem_reserve[]: 0 2951 16005 16005 16005
Jan 27 04:26:14 kernel: [639964.652789] Node 0 DMA32 free:92784kB min:12448kB low:15560kB high:18672kB active_anon:1094416kB inactive_anon:368444kB active_file:579188kB inactive_file:561504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:3048784kB mlocked:0kB dirty:1188kB writeback:0kB mapped:32604kB shmem:27372kB slab_reclaimable:336288kB slab_unreclaimable:7196kB kernel_stack:1520kB pagetables:3964kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:26:14 kernel: [639964.652793] lowmem_reserve[]: 0 0 13054 13054 13054
Jan 27 04:26:14 kernel: [639964.652795] Node 0 Normal free:178144kB min:55068kB low:68832kB high:82600kB active_anon:5292460kB inactive_anon:756284kB active_file:3235156kB inactive_file:3247976kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13367448kB mlocked:0kB dirty:28244kB writeback:0kB mapped:127668kB shmem:112072kB slab_reclaimable:535324kB slab_unreclaimable:41436kB kernel_stack:3968kB pagetables:29080kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no
Jan 27 04:26:14 kernel: [639964.652798] lowmem_reserve[]: 0 0 0 0 0
Jan 27 04:26:14 kernel: [639964.652800] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Jan 27 04:26:14 kernel: [639964.652807] Node 0 DMA32: 18127*4kB (UME) 2601*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 93316kB
Jan 27 04:26:14 kernel: [639964.652814] Node 0 Normal: 32943*4kB (UMEH) 5702*8kB (UMEH) 19*16kB (H) 13*32kB (H) 9*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 178940kB
Jan 27 04:26:14 kernel: [639964.652820] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 27 04:26:14 kernel: [639964.652821] 1949078 total pagecache pages
Jan 27 04:26:14 kernel: [639964.652822] 8225 pages in swap cache
Jan 27 04:26:14 kernel: [639964.652824] Swap cache stats: add 1131771, delete 1123546, find 7366438/7540102
Jan 27 04:26:14 kernel: [639964.652824] Free swap = 4080988kB
Jan 27 04:26:14 kernel: [639964.652825] Total swap = 4194300kB
Jan 27 04:26:14 kernel: [639964.652826] 4194174 pages RAM
Jan 27 04:26:14 kernel: [639964.652826] 0 pages HighMem/MovableOnly
Jan 27 04:26:14 kernel: [639964.652827] 86139 pages reserved
Jan 27 04:26:14 kernel: [639964.652828] 0 pages cma reserved
Jan 27 04:26:14 kernel: [639964.652828] 0 pages hwpoisoned
Jan 27 04:26:14 kernel: [639964.652829] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 27 04:26:14 kernel: [639964.652834] [ 424] 0 424 4909 388 14 3 68 0 upstart-udev-br
Jan 27 04:26:14 kernel: [639964.652836] [ 439] 0 439 13075 456 29 3 322 -1000 systemd-udevd
Jan 27 04:26:14 kernel: [639964.652839] [ 724] 0 724 3816 226 13 3 53 0 upstart-socket-
Jan 27 04:26:14 kernel: [639964.652840] [ 813] 0 813 5856 449 16 3 57 0 rpcbind
Jan 27 04:26:14 kernel: [639964.652842] [ 865] 108 865 5386 456 16 3 113 0 rpc.statd
Jan 27 04:26:14 kernel: [639964.652844] [ 1034] 0 1034 3820 281 12 3 35 0 upstart-file-br
Jan 27 04:26:14 kernel: [639964.652846] [ 1041] 102 1041 9817 366 23 3 50 0 dbus-daemon
Jan 27 04:26:14 kernel: [639964.652847] [ 1045] 101 1045 65018 1203 31 3 384 0 rsyslogd
Jan 27 04:26:14 kernel: [639964.652849] [ 1056] 0 1056 10870 525 26 4 49 0 systemd-logind
Jan 27 04:26:14 kernel: [639964.652851] [ 1063] 0 1063 5870 0 16 3 53 0 rpc.idmapd
Jan 27 04:26:14 kernel: [639964.652852] [ 1153] 0 1153 2558 371 9 3 517 0 dhclient
Jan 27 04:26:14 kernel: [639964.652854] [ 1374] 0 1374 3955 401 13 3 40 0 getty
Jan 27 04:26:14 kernel: [639964.652855] [ 1377] 0 1377 3955 406 13 3 38 0 getty
Jan 27 04:26:14 kernel: [639964.652857] [ 1383] 0 1383 3955 406 13 3 39 0 getty
Jan 27 04:26:14 kernel: [639964.652858] [ 1384] 0 1384 3955 418 13 3 37 0 getty
Jan 27 04:26:14 kernel: [639964.652859] [ 1386] 0 1386 3955 418 12 3 38 0 getty
Jan 27 04:26:14 kernel: [639964.652861] [ 1403] 0 1403 15346 735 34 3 142 -1000 sshd
Jan 27 04:26:14 kernel: [639964.652863] [ 1436] 0 1436 4825 408 13 3 28 0 irqbalance
Jan 27 04:26:14 kernel: [639964.652864] [ 1440] 0 1440 1093 379 8 3 35 0 acpid
Jan 27 04:26:14 kernel: [639964.652866] [ 1442] 0 1442 4785 176 14 3 38 0 atd
Jan 27 04:26:14 kernel: [639964.652867] [ 1443] 0 1443 5914 466 17 3 43 0 cron
Jan 27 04:26:14 kernel: [639964.652869] [ 1464] 105 1464 61957 3600 59 3 273 -900 postgres
Jan 27 04:26:14 kernel: [639964.652870] [ 1561] 107 1561 7864 657 21 3 113 0 ntpd
Jan 27 04:26:14 kernel: [639964.652872] [ 1762] 105 1762 62036 35419 117 3 264 0 postgres
Jan 27 04:26:14 kernel: [639964.652873] [ 1763] 105 1763 61957 35051 117 3 266 0 postgres
Jan 27 04:26:14 kernel: [639964.652875] [ 1764] 105 1764 61957 1773 52 3 306 0 postgres
Jan 27 04:26:14 kernel: [639964.652877] [ 1765] 105 1765 62166 4999 116 3 374 0 postgres
Jan 27 04:26:14 kernel: [639964.652878] [ 1766] 105 1766 25910 617 48 3 274 0 postgres
Jan 27 04:26:14 kernel: [639964.652880] [ 1834] 1004 1834 2002886 692615 1549 10 12707 0 java
Jan 27 04:26:14 kernel: [639964.652881] [ 1921] 106 1921 5835 452 16 3 112 0 nrpe
Jan 27 04:26:14 kernel: [639964.652883] [ 1943] 0 1943 175986 420 41 4 50 0 nscd
Jan 27 04:26:14 kernel: [639964.652884] [ 1978] 109 1978 111112 309 48 4 213 0 nslcd
Jan 27 04:26:14 kernel: [639964.652886] [ 2007] 8 2007 3172 326 11 3 52 0 nullmailer-send
Jan 27 04:26:14 kernel: [639964.652887] [ 2092] 0 2092 34005 1947 70 3 3067 0 /usr/bin/monito
Jan 27 04:26:14 kernel: [639964.652889] [ 2110] 0 2110 1901 367 9 3 25 0 getty
Jan 27 04:26:14 kernel: [639964.652891] [ 2146] 65534 2146 34005 1101 67 3 3810 0 monitorix-httpd
Jan 27 04:26:14 kernel: [639964.652893] [24525] 105 24525 1826264 1151331 3568 10 299 0 postgres
Jan 27 04:26:14 kernel: [639964.652895] [20380] 105 20380 62511 36514 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652897] [21273] 105 21273 62532 36508 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652898] [22133] 105 22133 62610 36827 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652900] [22135] 105 22135 62541 34994 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652901] [22428] 0 22428 15436 739 35 4 11 0 cron
Jan 27 04:26:14 kernel: [639964.652903] [22429] 0 22429 15489 749 35 4 12 0 cron
Jan 27 04:26:14 kernel: [639964.652904] [22442] 0 22442 1112 198 8 3 0 0 sh
Jan 27 04:26:14 kernel: [639964.652906] [22443] 1004 22443 1112 191 8 3 0 0 sh
Jan 27 04:26:14 kernel: [639964.652908] [22444] 1004 22444 3102 748 11 3 0 0 syncDaily.sh
Jan 27 04:26:14 kernel: [639964.652909] [22445] 0 22445 1112 420 8 3 0 0 cron-apt
Jan 27 04:26:14 kernel: [639964.652911] [22465] 1004 22465 55074 10532 113 3 0 0 rsync
Jan 27 04:26:14 kernel: [639964.652912] [22466] 0 22466 1087 171 8 3 0 0 sleep
Jan 27 04:26:14 kernel: [639964.652914] [22467] 1004 22467 29820 9151 62 3 0 0 rsync
Jan 27 04:26:14 kernel: [639964.652915] [22468] 1004 22468 61770 7168 125 3 0 0 rsync
Jan 27 04:26:14 kernel: [639964.652917] [22990] 105 22990 62490 35099 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652919] [23138] 105 23138 62491 35578 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652920] [23139] 105 23139 62690 36657 121 3 236 0 postgres
Jan 27 04:26:14 kernel: [639964.652922] [23140] 105 23140 62455 32973 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652923] [23631] 105 23631 62518 34978 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652925] [23635] 105 23635 62506 35193 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652927] [23636] 105 23636 62455 30085 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652928] [23637] 105 23637 62470 33106 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652930] [23639] 105 23639 62511 34295 120 3 237 0 postgres
Jan 27 04:26:14 kernel: [639964.652940] Out of memory: Kill process 24525 (postgres) score 224 or sacrifice child
Jan 27 04:26:14 kernel: [639964.652975] Killed process 24525 (postgres) total-vm:7305056kB, anon-rss:4460244kB, file-rss:145080kB
Экземпляр два
Jan 27 04:34:36 kernel: [640466.131656] java invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jan 27 04:34:36 kernel: [640466.131660] java cpuset=/ mems_allowed=0
Jan 27 04:34:36 kernel: [640466.131665] CPU: 7 PID: 2152 Comm: java Not tainted 4.4.0-59-generic #80~14.04.1-Ubuntu
Jan 27 04:34:36 kernel: [640466.131666] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
Jan 27 04:34:36 kernel: [640466.131668] 0000000000000000 ffff88041a963b38 ffffffff813dbd6c ffff88041a963cf0
Jan 27 04:34:36 kernel: [640466.131670] 0000000000000000 ffff88041a963bc8 ffffffff811fafc6 0000000000000000
Jan 27 04:34:36 kernel: [640466.131671] 0000000000000000 0000000000000000 ffff88042a6d1b88 0000000000000015
Jan 27 04:34:36 kernel: [640466.131673] Call Trace:
Jan 27 04:34:36 kernel: [640466.131698] [<ffffffff813dbd6c>] dump_stack+0x63/0x87
Jan 27 04:34:36 kernel: [640466.131712] [<ffffffff811fafc6>] dump_header+0x5b/0x1d5
Jan 27 04:34:36 kernel: [640466.131721] [<ffffffff813766f1>] ? apparmor_capable+0xd1/0x180
Jan 27 04:34:36 kernel: [640466.131728] [<ffffffff81188b35>] oom_kill_process+0x205/0x3d0
Jan 27 04:34:36 kernel: [640466.131730] [<ffffffff8118916b>] out_of_memory+0x40b/0x460
Jan 27 04:34:36 kernel: [640466.131732] [<ffffffff811fba7f>] __alloc_pages_slowpath.constprop.87+0x742/0x7ad
Jan 27 04:34:36 kernel: [640466.131734] [<ffffffff8118e167>] __alloc_pages_nodemask+0x237/0x240
Jan 27 04:34:36 kernel: [640466.131736] [<ffffffff8118e32d>] alloc_kmem_pages_node+0x4d/0xd0
Jan 27 04:34:36 kernel: [640466.131745] [<ffffffff8107c125>] copy_process+0x185/0x1ce0
Jan 27 04:34:36 kernel: [640466.131755] [<ffffffff810fd0b4>] ? do_futex+0xf4/0x520
Jan 27 04:34:36 kernel: [640466.131761] [<ffffffff810a71c9>] ? resched_curr+0xa9/0xd0
Jan 27 04:34:36 kernel: [640466.131763] [<ffffffff8107de1a>] _do_fork+0x8a/0x310
Jan 27 04:34:36 kernel: [640466.131765] [<ffffffff8107e149>] SyS_clone+0x19/0x20
Jan 27 04:34:36 kernel: [640466.131779] [<ffffffff81802c76>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 27 04:34:36 kernel: [640466.131781] Mem-Info:
Jan 27 04:34:36 kernel: [640466.131784] active_anon:463046 inactive_anon:339934 isolated_anon:0
Jan 27 04:34:36 kernel: [640466.131784] active_file:1074992 inactive_file:1398029 isolated_file:0
Jan 27 04:34:36 kernel: [640466.131784] unevictable:0 dirty:1307 writeback:0 unstable:0
Jan 27 04:34:36 kernel: [640466.131784] slab_reclaimable:626085 slab_unreclaimable:26239
Jan 27 04:34:36 kernel: [640466.131784] mapped:40618 shmem:35429 pagetables:4038 bounce:0
Jan 27 04:34:36 kernel: [640466.131784] free:161367 free_pcp:0 free_cma:0
Jan 27 04:34:36 kernel: [640466.131788] Node 0 DMA free:15892kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 27 04:34:36 kernel: [640466.131792] lowmem_reserve[]: 0 2951 16005 16005 16005
Jan 27 04:34:36 kernel: [640466.131794] Node 0 DMA32 free:112056kB min:12448kB low:15560kB high:18672kB active_anon:465908kB inactive_anon:478436kB active_file:620808kB inactive_file:963844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:3048784kB mlocked:0kB dirty:844kB writeback:0kB mapped:11132kB shmem:5644kB slab_reclaimable:390764kB slab_unreclaimable:8488kB kernel_stack:1408kB pagetables:2304kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:34:36 kernel: [640466.131798] lowmem_reserve[]: 0 0 13054 13054 13054
Jan 27 04:34:36 kernel: [640466.131800] Node 0 Normal free:517520kB min:55068kB low:68832kB high:82600kB active_anon:1386276kB inactive_anon:881300kB active_file:3679160kB inactive_file:4628272kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13367448kB mlocked:0kB dirty:4384kB writeback:0kB mapped:151340kB shmem:136072kB slab_reclaimable:2113576kB slab_unreclaimable:96452kB kernel_stack:3904kB pagetables:13848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 27 04:34:36 kernel: [640466.131803] lowmem_reserve[]: 0 0 0 0 0
Jan 27 04:34:36 kernel: [640466.131805] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Jan 27 04:34:36 kernel: [640466.131812] Node 0 DMA32: 20157*4kB (UME) 4165*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 113948kB
Jan 27 04:34:36 kernel: [640466.131817] Node 0 Normal: 119665*4kB (UMEH) 4706*8kB (UMEH) 12*16kB (H) 13*32kB (H) 10*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 518068kB
Jan 27 04:34:36 kernel: [640466.131824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 27 04:34:36 kernel: [640466.131825] 2516698 total pagecache pages
Jan 27 04:34:36 kernel: [640466.131826] 8199 pages in swap cache
Jan 27 04:34:36 kernel: [640466.131828] Swap cache stats: add 1131970, delete 1123771, find 7374629/7548428
Jan 27 04:34:36 kernel: [640466.131828] Free swap = 4085700kB
Jan 27 04:34:36 kernel: [640466.131829] Total swap = 4194300kB
Jan 27 04:34:36 kernel: [640466.131830] 4194174 pages RAM
Jan 27 04:34:36 kernel: [640466.131830] 0 pages HighMem/MovableOnly
Jan 27 04:34:36 kernel: [640466.131831] 86139 pages reserved
Jan 27 04:34:36 kernel: [640466.131832] 0 pages cma reserved
Jan 27 04:34:36 kernel: [640466.131832] 0 pages hwpoisoned
Jan 27 04:34:36 kernel: [640466.131833] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 27 04:34:36 kernel: [640466.131838] [ 424] 0 424 4909 388 14 3 68 0 upstart-udev-br
Jan 27 04:34:36 kernel: [640466.131841] [ 439] 0 439 13075 456 29 3 322 -1000 systemd-udevd
Jan 27 04:34:36 kernel: [640466.131843] [ 724] 0 724 3816 226 13 3 53 0 upstart-socket-
Jan 27 04:34:36 kernel: [640466.131845] [ 813] 0 813 5856 449 16 3 57 0 rpcbind
Jan 27 04:34:36 kernel: [640466.131846] [ 865] 108 865 5386 456 16 3 113 0 rpc.statd
Jan 27 04:34:36 kernel: [640466.131848] [ 1034] 0 1034 3820 281 12 3 35 0 upstart-file-br
Jan 27 04:34:36 kernel: [640466.131850] [ 1041] 102 1041 9817 366 23 3 50 0 dbus-daemon
Jan 27 04:34:36 kernel: [640466.131852] [ 1045] 101 1045 65018 1255 31 3 362 0 rsyslogd
Jan 27 04:34:36 kernel: [640466.131854] [ 1056] 0 1056 10870 525 26 4 49 0 systemd-logind
Jan 27 04:34:36 kernel: [640466.131855] [ 1063] 0 1063 5870 0 16 3 53 0 rpc.idmapd
Jan 27 04:34:36 kernel: [640466.131857] [ 1153] 0 1153 2558 371 9 3 517 0 dhclient
Jan 27 04:34:36 kernel: [640466.131858] [ 1374] 0 1374 3955 401 13 3 40 0 getty
Jan 27 04:34:36 kernel: [640466.131860] [ 1377] 0 1377 3955 406 13 3 38 0 getty
Jan 27 04:34:36 kernel: [640466.131861] [ 1383] 0 1383 3955 406 13 3 39 0 getty
Jan 27 04:34:36 kernel: [640466.131863] [ 1384] 0 1384 3955 418 13 3 37 0 getty
Jan 27 04:34:36 kernel: [640466.131864] [ 1386] 0 1386 3955 418 12 3 38 0 getty
Jan 27 04:34:36 kernel: [640466.131866] [ 1403] 0 1403 15346 735 34 3 142 -1000 sshd
Jan 27 04:34:36 kernel: [640466.131868] [ 1436] 0 1436 4825 408 13 3 28 0 irqbalance
Jan 27 04:34:36 kernel: [640466.131869] [ 1440] 0 1440 1093 379 8 3 35 0 acpid
Jan 27 04:34:36 kernel: [640466.131871] [ 1442] 0 1442 4785 176 14 3 38 0 atd
Jan 27 04:34:36 kernel: [640466.131872] [ 1443] 0 1443 5914 466 17 3 43 0 cron
Jan 27 04:34:36 kernel: [640466.131874] [ 1464] 105 1464 61957 4409 59 3 254 -900 postgres
Jan 27 04:34:36 kernel: [640466.131876] [ 1561] 107 1561 7864 657 21 3 113 0 ntpd
Jan 27 04:34:36 kernel: [640466.131877] [ 1834] 1004 1834 2002886 692883 1549 10 12598 0 java
Jan 27 04:34:36 kernel: [640466.131879] [ 1921] 106 1921 5835 452 16 3 112 0 nrpe
Jan 27 04:34:36 kernel: [640466.131880] [ 1943] 0 1943 175986 420 41 4 50 0 nscd
Jan 27 04:34:36 kernel: [640466.131882] [ 1978] 109 1978 111112 309 48 4 213 0 nslcd
Jan 27 04:34:36 kernel: [640466.131883] [ 2007] 8 2007 3172 326 11 3 52 0 nullmailer-send
Jan 27 04:34:36 kernel: [640466.131885] [ 2092] 0 2092 34005 1947 70 3 3067 0 /usr/bin/monito
Jan 27 04:34:36 kernel: [640466.131887] [ 2110] 0 2110 1901 367 9 3 25 0 getty
Jan 27 04:34:36 kernel: [640466.131888] [ 2146] 65534 2146 34005 1101 67 3 3810 0 monitorix-httpd
Jan 27 04:34:36 kernel: [640466.131891] [22428] 0 22428 15436 739 35 4 11 0 cron
Jan 27 04:34:36 kernel: [640466.131892] [22429] 0 22429 15489 749 35 4 12 0 cron
Jan 27 04:34:36 kernel: [640466.131894] [22442] 0 22442 1112 198 8 3 0 0 sh
Jan 27 04:34:36 kernel: [640466.131895] [22443] 1004 22443 1112 191 8 3 0 0 sh
Jan 27 04:34:36 kernel: [640466.131897] [22444] 1004 22444 3102 748 11 3 0 0 syncDaily.sh
Jan 27 04:34:36 kernel: [640466.131899] [22445] 0 22445 1112 420 8 3 0 0 cron-apt
Jan 27 04:34:36 kernel: [640466.131900] [22465] 1004 22465 54754 27012 113 3 0 0 rsync
Jan 27 04:34:36 kernel: [640466.131902] [22466] 0 22466 1087 171 8 3 0 0 sleep
Jan 27 04:34:36 kernel: [640466.131903] [22467] 1004 22467 34234 26953 72 3 0 0 rsync
Jan 27 04:34:36 kernel: [640466.131905] [22468] 1004 22468 62154 15613 126 3 0 0 rsync
Jan 27 04:34:36 kernel: [640466.131907] [24170] 105 24170 61990 34251 117 3 237 0 postgres
Jan 27 04:34:36 kernel: [640466.131908] [24171] 105 24171 61957 33191 115 3 238 0 postgres
Jan 27 04:34:36 kernel: [640466.131910] [24172] 105 24172 61957 1190 52 3 238 0 postgres
Jan 27 04:34:36 kernel: [640466.131911] [24173] 105 24173 62166 1333 54 3 239 0 postgres
Jan 27 04:34:36 kernel: [640466.131913] [24174] 105 24174 25876 642 48 3 242 0 postgres
Jan 27 04:34:36 kernel: [640466.131915] [24175] 105 24175 62464 35199 120 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131916] [24203] 105 24203 62467 22296 120 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131918] [24266] 105 24266 62475 36452 120 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131920] [24317] 105 24317 62424 17702 119 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131921] [24318] 105 24318 62449 24858 120 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131923] [24320] 105 24320 62485 24779 120 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131925] [24321] 105 24321 62449 27595 120 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131926] [24452] 105 24452 62484 16118 120 3 232 0 postgres
Jan 27 04:34:36 kernel: [640466.131928] Out of memory: Kill process 1834 (java) score 137 or sacrifice child
Jan 27 04:34:36 kernel: [640466.132070] Killed process 1834 (java) total-vm:8011544kB, anon-rss:2763340kB, file-rss:8192kB
Одна из возможных причин заключается в том, что это ошибка ядра Ubuntu. Существует проблема с панелью запуска, которая описывает ситуации OOM для этого ядра: 1655842
После возврата к более раннему ядру (4.4.0-57) проблема больше не возникала за те две недели, которые мы наблюдали за системой.
Альтернативные объяснения / ответы по-прежнему очень приветствуются!