Примерно раз в неделю OOM-killer сбивает процесс postgres на моем сервере, несмотря на то, что он «бесплатный» заявляет, что у него много доступной памяти.
Я прочитал несколько тем здесь и там, но не вижу никаких реальных объяснений. Это действительно потому что на сервере теперь есть своп? Это ошибка ядра (Ubuntu)?
И в порядке очереди, да, возможно, добавлю своп. Но разве нет другого решения? Или хотя бы объяснение? :)
Server: Physical Dell Memory: 64gb RAM and 0 Swap uname: Linux server-name 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Postgres version: 9.5.10 (8gb shared memory) vm.overcommit_memory = 0
Выход free -m
незадолго до последнего убийства
total used free shared buff/cache available Mem: 64312 2666 450 8699 61196 52126 Swap: 0 0 0
Журнал ядра с последнего убийства
Jun 19 21:29:49 server-name kernel: [17009377.877956] bash invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0 Jun 19 21:29:49 server-name kernel: [17009377.877959] bash cpuset=/ mems_allowed=0-1 Jun 19 21:29:49 server-name kernel: [17009377.877964] CPU: 23 PID: 61771 Comm: bash Not tainted 4.4.0-62-generic #83-Ubuntu Jun 19 21:29:49 server-name kernel: [17009377.877966] Hardware name: Dell Inc. PowerEdge M630/0R10KJ, BIOS 2.4.2 01/09/2017 Jun 19 21:29:49 server-name kernel: [17009377.877967] 0000000000000286 00000000d566bdbf ffff88001369baf0 ffffffff813f7c63 Jun 19 21:29:49 server-name kernel: [17009377.877969] ffff88001369bcc8 ffff88010d85b800 ffff88001369bb60 ffffffff8120ad4e Jun 19 21:29:49 server-name kernel: [17009377.877971] 0000000000000000 0000000000000700 ffffffff81e42a40 ffff8810547850c0 Jun 19 21:29:49 server-name kernel: [17009377.877973] Call Trace: Jun 19 21:29:49 server-name kernel: [17009377.877979] [] dump_stack+0x63/0x90 Jun 19 21:29:49 server-name kernel: [17009377.877984] [] dump_header+0x5a/0x1c5 Jun 19 21:29:49 server-name kernel: [17009377.877988] [] oom_kill_process+0x202/0x3c0 Jun 19 21:29:49 server-name kernel: [17009377.877990] [] ? oom_unkillable_task+0x9e/0xd0 Jun 19 21:29:49 server-name kernel: [17009377.877992] [] out_of_memory+0x219/0x460 Jun 19 21:29:49 server-name kernel: [17009377.877995] [] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70 Jun 19 21:29:49 server-name kernel: [17009377.877997] [] __alloc_pages_nodemask+0x286/0x2a0 Jun 19 21:29:49 server-name kernel: [17009377.877999] [] alloc_kmem_pages_node+0x4b/0xc0 Jun 19 21:29:49 server-name kernel: [17009377.878003] [] copy_process+0x1be/0x1b70 Jun 19 21:29:49 server-name kernel: [17009377.878007] [] ? handle_mm_fault+0xce0/0x1820 Jun 19 21:29:49 server-name kernel: [17009377.878010] [] ? sched_clock+0x9/0x10 Jun 19 21:29:49 server-name kernel: [17009377.878015] [] ? sched_clock_cpu+0x8f/0xa0 Jun 19 21:29:49 server-name kernel: [17009377.878017] [] _do_fork+0x80/0x360 Jun 19 21:29:49 server-name kernel: [17009377.878021] [] ? sigprocmask+0x6f/0xa0 Jun 19 21:29:49 server-name kernel: [17009377.878023] [] SyS_clone+0x19/0x20 Jun 19 21:29:49 server-name kernel: [17009377.878027] [] entry_SYSCALL_64_fastpath+0x16/0x71 Jun 19 21:29:49 server-name kernel: [17009377.878028] Mem-Info: Jun 19 21:29:49 server-name kernel: [17009377.878034] active_anon:2161218 inactive_anon:328736 isolated_anon:0 Jun 19 21:29:49 server-name kernel: [17009377.878034] active_file:9390648 inactive_file:3525717 isolated_file:0 Jun 19 21:29:49 server-name kernel: [17009377.878034] unevictable:923 dirty:3206 writeback:0 unstable:0 Jun 19 21:29:49 server-name kernel: [17009377.878034] slab_reclaimable:427991 slab_unreclaimable:85432 Jun 19 21:29:49 server-name kernel: [17009377.878034] mapped:2177419 shmem:2227151 pagetables:345413 bounce:0 Jun 19 21:29:49 server-name kernel: [17009377.878034] free:122878 free_pcp:1 free_cma:0 Jun 19 21:29:49 server-name kernel: [17009377.878037] Node 0 DMA free:14488kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jun 19 21:29:49 server-name kernel: [17009377.878041] lowmem_reserve[]: 0 1820 32015 32015 32015 Jun 19 21:29:49 server-name kernel: [17009377.878044] Node 0 DMA32 free:123340kB min:2552kB low:3188kB high:3828kB active_anon:1066728kB inactive_anon:123988kB active_file:8kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1985352kB managed:1904732kB mlocked:0kB dirty:0kB writeback:0kB mapped:736056kB shmem:744112kB slab_reclaimable:289824kB slab_unreclaimable:200192kB kernel_stack:1552kB pagetables:84288kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jun 19 21:29:49 server-name kernel: [17009377.878048] lowmem_reserve[]: 0 0 30195 30195 30195 Jun 19 21:29:49 server-name kernel: [17009377.878062] Node 0 Normal free:154400kB min:42332kB low:52912kB high:63496kB active_anon:6913860kB inactive_anon:1054596kB active_file:10248116kB inactive_file:10244456kB unevictable:72kB isolated(anon):0kB isolated(file):0kB present:31457280kB managed:30919988kB mlocked:72kB dirty:1364kB writeback:0kB mapped:7294996kB shmem:7449576kB slab_reclaimable:826280kB slab_unreclaimable:67216kB kernel_stack:6016kB pagetables:1250072kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jun 19 21:29:49 server-name kernel: [17009377.878065] lowmem_reserve[]: 0 0 0 0 0 Jun 19 21:29:49 server-name kernel: [17009377.878067] Node 1 Normal free:199284kB min:45200kB low:56500kB high:67800kB active_anon:664284kB inactive_anon:136360kB active_file:27314468kB inactive_file:3858404kB unevictable:3620kB isolated(anon):0kB isolated(file):0kB present:33554432kB managed:33015880kB mlocked:3620kB dirty:11460kB writeback:0kB mapped:678624kB shmem:714916kB slab_reclaimable:595860kB slab_unreclaimable:74320kB kernel_stack:4608kB pagetables:47292kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jun 19 21:29:49 server-name kernel: [17009377.878084] lowmem_reserve[]: 0 0 0 0 0 Jun 19 21:29:49 server-name kernel: [17009377.878086] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14488kB Jun 19 21:29:49 server-name kernel: [17009377.878092] Node 0 DMA32: 12409*4kB (UME) 4093*8kB (UME) 839*16kB (UME) 111*32kB (UME) 68*64kB (UM) 33*128kB (UME) 24*256kB (UME) 14*512kB (UME) 2*1024kB (M) 0*2048kB 0*4096kB = 123292kB Jun 19 21:29:49 server-name kernel: [17009377.878109] Node 0 Normal: 39107*4kB (UME) 179*8kB (UME) 0*16kB 1*32kB (H) 1*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 159876kB Jun 19 21:29:49 server-name kernel: [17009377.878115] Node 1 Normal: 50883*4kB (UME) 133*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 204596kB Jun 19 21:29:49 server-name kernel: [17009377.878120] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Jun 19 21:29:49 server-name kernel: [17009377.878121] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jun 19 21:29:49 server-name kernel: [17009377.878121] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Jun 19 21:29:49 server-name kernel: [17009377.878122] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jun 19 21:29:49 server-name kernel: [17009377.878123] 15144218 total pagecache pages Jun 19 21:29:49 server-name kernel: [17009377.878124] 0 pages in swap cache Jun 19 21:29:49 server-name kernel: [17009377.878125] Swap cache stats: add 0, delete 0, find 0/0 Jun 19 21:29:49 server-name kernel: [17009377.878126] Free swap = 0kB Jun 19 21:29:49 server-name kernel: [17009377.878126] Total swap = 0kB Jun 19 21:29:49 server-name kernel: [17009377.878127] 16753261 pages RAM Jun 19 21:29:49 server-name kernel: [17009377.878127] 0 pages HighMem/MovableOnly Jun 19 21:29:49 server-name kernel: [17009377.878128] 289137 pages reserved Jun 19 21:29:49 server-name kernel: [17009377.878129] 0 pages cma reserved Jun 19 21:29:49 server-name kernel: [17009377.878129] 0 pages hwpoisoned Jun 19 21:29:49 server-name kernel: [17009377.878130] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name Jun 19 21:29:49 server-name kernel: [17009377.878136] [ 1181] 0 1181 21212 12109 47 3 0 0 systemd-journal Jun 19 21:29:49 server-name kernel: [17009377.878138] [ 1212] 0 1212 23694 319 16 3 0 0 lvmetad Jun 19 21:29:49 server-name kernel: [17009377.878139] [ 1226] 0 1226 11507 1164 24 3 0 -1000 systemd-udevd Jun 19 21:29:49 server-name kernel: [17009377.878141] [ 1776] 0 1776 6512 523 18 3 0 0 atd Jun 19 21:29:49 server-name kernel: [17009377.878142] [ 1778] 107 1778 10758 949 25 3 0 -900 dbus-daemon Jun 19 21:29:49 server-name kernel: [17009377.878143] [ 1790] 104 1790 64100 955 27 3 0 0 rsyslogd Jun 19 21:29:49 server-name kernel: [17009377.878144] [ 1794] 0 1794 7708 591 20 3 0 0 cron Jun 19 21:29:49 server-name kernel: [17009377.878146] [ 1798] 0 1798 7138 656 18 3 0 0 systemd-logind Jun 19 21:29:49 server-name kernel: [17009377.878147] [ 1800] 0 1800 77434 1220 20 3 0 0 lxcfs Jun 19 21:29:49 server-name kernel: [17009377.878148] [ 1805] 0 1805 69421 1395 38 4 0 0 accounts-daemon Jun 19 21:29:49 server-name kernel: [17009377.878150] [ 1807] 0 1807 385362 6819 84 6 0 -900 snapd Jun 19 21:29:49 server-name kernel: [17009377.878151] [ 1809] 0 1809 1101 173 9 3 0 0 acpid Jun 19 21:29:49 server-name kernel: [17009377.878152] [ 1835] 0 1835 3345 42 11 3 0 0 mdadm Jun 19 21:29:49 server-name kernel: [17009377.878154] [ 1852] 0 1852 69296 1880 37 4 0 0 polkitd Jun 19 21:29:49 server-name kernel: [17009377.878155] [ 1959] 0 1959 16381 1507 36 3 0 -1000 sshd Jun 19 21:29:49 server-name kernel: [17009377.878156] [ 1972] 0 1972 1307 412 8 3 0 0 iscsid Jun 19 21:29:49 server-name kernel: [17009377.878158] [ 1973] 0 1973 1432 917 8 3 0 -17 iscsid Jun 19 21:29:49 server-name kernel: [17009377.878159] [ 2036] 0 2036 4441 383 13 3 0 0 agetty Jun 19 21:29:49 server-name kernel: [17009377.878160] [ 2095] 0 2095 4934 597 15 3 0 0 irqbalance Jun 19 21:29:49 server-name kernel: [17009377.878162] [ 2138] 111 2138 27509 1256 25 3 0 0 ntpd Jun 19 21:29:49 server-name kernel: [17009377.878163] [ 2323] 112 2323 13971 727 30 3 0 0 exim4 Jun 19 21:29:49 server-name kernel: [17009377.878164] [ 2329] 0 2329 73510 4000 43 4 0 0 fail2ban-server Jun 19 21:29:49 server-name kernel: [17009377.878166] [ 7103] 113 7103 2203146 66729 188 4 0 -900 postgres Jun 19 21:29:49 server-name kernel: [17009377.878167] [101917] 0 101917 13563 470 27 3 0 0 keepalived Jun 19 21:29:49 server-name kernel: [17009377.878169] [101918] 0 101918 14093 1204 33 3 0 0 keepalived Jun 19 21:29:49 server-name kernel: [17009377.878170] [101919] 0 101919 14093 800 32 3 0 0 keepalived Jun 19 21:29:49 server-name kernel: [17009377.878172] [126772] 115 126772 5994 664 16 4 0 0 nrpe Jun 19 21:29:49 server-name kernel: [17009377.878174] [70979] 113 70979 2203419 2135243 4232 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878175] [70980] 113 70980 2203211 282046 3748 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878176] [70981] 113 70981 2203146 5331 69 4 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878178] [70982] 113 70982 2203399 1773 71 4 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878179] [70983] 113 70983 37911 1097 55 4 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878180] [70984] 113 70984 2203919 115562 1754 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878182] [70985] 113 70985 2204540 68113 1213 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878183] [70986] 113 70986 2205899 471030 3891 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878184] [70992] 113 70992 2204243 111679 1550 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878185] [70993] 113 70993 2203484 2784 75 4 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878187] [70994] 113 70994 2205941 541014 3966 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878188] [70995] 113 70995 2206035 408079 3095 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878189] [70996] 113 70996 2203934 160075 2604 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878190] [70997] 113 70997 2203936 218125 2911 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878192] [70998] 113 70998 2204811 1327751 4263 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878193] [70999] 113 70999 2206100 2081582 4267 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878194] [71000] 113 71000 2204024 1694269 4251 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878196] [71001] 113 71001 2209678 2127573 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878197] [71002] 113 71002 2204028 1683854 4251 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878198] [71003] 113 71003 2209601 2118203 4273 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878199] [71004] 113 71004 2203982 955099 4247 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878201] [71005] 113 71005 2204924 1348990 4262 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878202] [71006] 113 71006 2203995 1255468 4247 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878203] [71014] 113 71014 2204016 1562410 4251 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878204] [71015] 113 71015 2204199 70592 1039 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878206] [71016] 113 71016 2209670 2063214 4273 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878207] [71023] 113 71023 2206079 537513 3839 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878208] [71024] 113 71024 2203922 125526 1820 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878210] [71025] 113 71025 2203943 230822 3084 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878211] [71027] 113 71027 2206498 625052 4028 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878212] [71029] 113 71029 2204012 1614770 4249 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878214] [71030] 113 71030 2209593 2083374 4272 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878215] [71033] 113 71033 2203940 178025 2673 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878216] [71034] 113 71034 2206090 426476 3624 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878218] [71057] 113 71057 2204867 2144196 4265 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878219] [71058] 113 71058 2204546 224493 2893 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878220] [71113] 113 71113 2209581 2127791 4272 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878222] [71276] 113 71276 2209713 2125684 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878223] [71315] 113 71315 2203984 678258 4234 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878224] [71316] 113 71316 2209663 2137633 4273 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878225] [71425] 113 71425 2203985 1229779 4250 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878226] [71426] 113 71426 2207773 2089808 4271 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878228] [71479] 113 71479 2209624 2137703 4273 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878229] [71566] 113 71566 2205224 109789 1497 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878230] [71634] 113 71634 2204084 42530 640 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878232] [71636] 113 71636 2204166 36964 547 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878233] [71637] 113 71637 2203758 8574 167 10 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878234] [71861] 113 71861 2204034 1659821 4249 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878235] [71878] 113 71878 2204101 27948 404 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878237] [73036] 113 73036 2204208 23556 315 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878238] [73043] 113 73043 2204332 15593 234 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878239] [73167] 113 73167 2209593 2124044 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878240] [73168] 113 73168 2204014 1505503 4251 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878242] [73426] 113 73426 2206039 247425 2686 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878243] [73714] 113 73714 2204379 66347 562 9 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878244] [73735] 113 73735 2207590 2128748 4270 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878246] [74062] 113 74062 2209849 2101538 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878247] [74196] 113 74196 2204283 34071 526 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878248] [74203] 113 74203 2204922 150413 1889 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878249] [74581] 113 74581 2209626 2125857 4273 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878250] [74817] 113 74817 2204033 1637797 4250 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878252] [75281] 113 75281 2204276 56982 690 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878253] [75282] 113 75282 2205141 106605 1299 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878254] [75283] 113 75283 2203982 27531 348 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878256] [75309] 113 75309 2205662 128795 1652 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878257] [76231] 113 76231 2204009 53847 803 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878258] [77386] 113 77386 2203883 40653 602 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878260] [77917] 113 77917 2204815 846041 4248 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878261] [77925] 113 77925 2203999 1394845 4251 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878262] [77957] 113 77957 2204961 1375124 4264 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878263] [77958] 113 77958 2203998 1645874 4250 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878264] [77994] 113 77994 2209598 2029427 4273 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878265] [78004] 113 78004 2204876 990859 4261 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878266] [78009] 113 78009 2209693 2074139 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878268] [78010] 113 78010 2204012 1528597 4248 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878269] [78011] 113 78011 2209708 2130929 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878270] [78012] 113 78012 2209906 2116419 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878271] [78013] 113 78013 2203951 568349 4225 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878272] [78021] 113 78021 2203819 14483 210 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878273] [78028] 113 78028 2209930 2138334 4275 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878274] [78076] 113 78076 2204009 1648542 4248 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878275] [78077] 113 78077 2204008 1622033 4250 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878276] [78231] 113 78231 2209778 2125564 4273 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878278] [78232] 113 78232 2204006 1467730 4251 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878279] [78748] 113 78748 2209554 2091379 4272 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878281] [79120] 113 79120 2207656 2129657 4270 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878282] [79121] 113 79121 2209649 2136786 4274 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878283] [79122] 113 79122 2203949 342314 3972 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878285] [80062] 113 80062 2204008 1257889 4249 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878286] [81048] 113 81048 2205151 130415 1862 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878287] [81050] 113 81050 2203941 43779 627 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878289] [84879] 113 84879 2205000 1285857 4263 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878290] [85403] 113 85403 2204492 74870 1073 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878291] [85404] 113 85404 2204962 112681 1425 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878292] [89649] 113 89649 2204322 56650 734 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878294] [90729] 113 90729 2204495 95699 1334 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878295] [90732] 113 90732 2203804 9328 184 10 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878296] [90755] 113 90755 2204363 81196 1100 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878298] [92006] 113 92006 2204032 1592146 4248 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878299] [95662] 113 95662 2203779 11467 223 11 0 0 postgres Jun 19 21:29:49 server-name kernel: [17009377.878300] [100918] 113 100918 2204778 843529 4246 11 0 0 postgres ... Jun 19 21:29:49 server-name kernel: [17009377.878360] [59692] 0 59692 3610 841 12 3 0 0 bash Jun 19 21:29:49 server-name kernel: [17009377.878361] [61771] 0 61771 3610 83 11 3 0 0 bash Jun 19 21:29:49 server-name kernel: [17009377.878362] Out of memory: Kill process 71057 (postgres) score 130 or sacrifice child Jun 19 21:29:49 server-name kernel: [17009377.878616] Killed process 71057 (postgres) total-vm:8819468kB, anon-rss:5948kB, file-rss:8570836kB
Проблема объясняется здесь:
Jun 19 21:29:49 server-name kernel: [17009377.877956] bash invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
И тут:
Jun 19 21:29:49 server-name kernel: [17009377.878115] Node 1 Normal: 50883*4kB (UME) 133*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 204596kB
Ядро запрашивает память из порядка 2 (это 16 КБ или более), флаги GFP указывают, что она ДОЛЖНА поступать из 'thisnode'. Я предполагаю, что с учетом отсутствия памяти второго или более высокого порядка от узла 1 это узел 1, из которого он пытается.
Флаги также указывают на то, что он может выполнять подкачку, чтобы освободить память, чтобы это работало. Хоть и нет свопа.
Я не знаю наверняка, но я полагаю, что даже небольшой объем подкачки исправит это (поскольку он будет менять местами для удовлетворения запроса), и я также подозреваю, что он разрешит ему выполнить сжатие памяти (это переупорядочивает физическую память для увеличения памяти для более высоких заказов), что позволило бы даже избежать регулярной замены - обратите внимание, что это всего лишь предположение.
Вы можете перейти на более новое ядро. Я думал, что есть изменение, которое улучшает сжатие при распределении файловой системы, но у меня нет удобной фиксации. Это похоже на Ubuntu, так что вы можете попробовать Bionic.
Подробнее о фрагментации зон см. Ответ Мэтью несколько лет назад: Linux oom ситуация. Возможно, вы сможете настроить vm.extfrag_threshold
или вручную запустить компактный в аварийной ситуации с /proc/sys/vm/compact_memory