У меня есть веб-приложение вроде
Nginx (proxy) + Tomcat (backend) + PostgreSQL (database).
Это веб-приложение находится на экземпляре бесплатного уровня Amazon (http://aws.amazon.com/free/) и очень часто от 2 до 3 раз в месяц падает PostgreSQL.
Ниже приведен журнал из экземпляра:
[516661.377137] DMA free:2464kB min:80kB low:100kB high:120kB active_anon:5752kB inactive_anon:5900kB active_file:88kB inactive_file:164kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:244kB shmem:248kB slab_reclaimable:8kB slab_unreclaimable:260kB kernel_stack:60kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:384 all_unreclaimable? yes
[516661.377273] lowmem_reserve[]: 0 594 594 594
[516661.377293] Normal free:2976kB min:3076kB low:3844kB high:4612kB active_anon:289468kB inactive_anon:289664kB active_file:684kB inactive_file:1208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:608584kB mlocked:0kB dirty:0kB writeback:0kB mapped:11028kB shmem:13580kB slab_reclaimable:2144kB slab_unreclaimable:5204kB kernel_stack:1816kB pagetables:3824kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2848 all_unreclaimable? yes
[516661.377328] lowmem_reserve[]: 0 0 0 0
[516661.377344] DMA: 43*4kB 31*8kB 48*16kB 18*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2468kB
[516661.377380] Normal: 192*4kB 14*8kB 1*16kB 1*32kB 6*64kB 7*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2976kB
[516661.377416] 3992 total pagecache pages
[516661.377421] 0 pages in swap cache
[516661.377427] Swap cache stats: add 0, delete 0, find 0/0
[516661.377434] Free swap = 0kB
[516661.377439] Total swap = 0kB
[516661.379976] 157439 pages RAM
[516661.379990] 0 pages HighMem
[516661.379995] 3185 pages reserved
[516661.380004] 18287 pages shared
[516661.380040] 149569 pages non-shared
[516661.380047] Out of memory: kill process 18126 (postmaster) score 26011 or a child
[516661.380058] Killed process 18126 (postmaster) vsz:104044kB, anon-rss:2476kB, file-rss:7152kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376890] postmaster invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376908] postmaster cpuset=/ mems_allowed=0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376916] Pid: 10506, comm: postmaster Tainted: G W 2.6.35.11-83.9.amzn1.i686 #1
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376924] Call Trace:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376938] [<c10a0ce5>] dump_header.clone.1+0x65/0x180
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376948] [<c116a899>] ? ___ratelimit+0x89/0x110
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376957] [<c10a0e53>] oom_kill_process.clone.0+0x53/0x130
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376965] [<c10a100a>] __out_of_memory+0xda/0x140
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376973] [<c10a10c2>] out_of_memory+0x52/0xc0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376982] [<c10a3c62>] __alloc_pages_nodemask+0x582/0x5a0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376992] [<c10a5b92>] __do_page_cache_readahead+0xd2/0x1f0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377000] [<c10a5cd1>] ra_submit+0x21/0x30
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377008] [<c109fc82>] filemap_fault+0x392/0x3c0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377016] [<c10b3a97>] __do_fault+0x47/0x530
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377024] [<c10b585e>] handle_mm_fault+0x19e/0xdc0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377034] [<c12aefd0>] ? do_page_fault+0x0/0x400
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377043] [<c12af0fc>] do_page_fault+0x12c/0x400
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377051] [<c10e219d>] ? sys_select+0x3d/0xb0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377060] [<c12aefd0>] ? do_page_fault+0x0/0x400
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377069] [<c12ac637>] error_code+0x73/0x78
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377076] Mem-Info:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377081] DMA per-cpu:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377086] CPU 0: hi: 0, btch: 1 usd: 0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377092] Normal per-cpu:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377098] CPU 0: hi: 186, btch: 31 usd: 30
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377107] active_anon:73805 inactive_anon:73891 isolated_anon:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377108] active_file:193 inactive_file:343 isolated_file:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377109] unevictable:0 dirty:0 writeback:0 unstable:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377110] free:1360 slab_reclaimable:538 slab_unreclaimable:1366
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377111] mapped:2818 shmem:3457 pagetables:966 bounce:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377137] DMA free:2464kB min:80kB low:100kB high:120kB active_anon:5752kB inactive_anon:5900kB active_file:88kB inactive_file:164kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:244kB shmem:248kB slab_reclaimable:8kB slab_unreclaimable:260kB kernel_stack:60kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:384 all_unreclaimable? yes
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377273] lowmem_reserve[]: 0 594 594 594
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377293] Normal free:2976kB min:3076kB low:3844kB high:4612kB active_anon:289468kB inactive_anon:289664kB active_file:684kB inactive_file:1208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:608584kB mlocked:0kB dirty:0kB writeback:0kB mapped:11028kB shmem:13580kB slab_reclaimable:2144kB slab_unreclaimable:5204kB kernel_stack:1816kB pagetables:3824kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2848 all_unreclaimable? yes
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377328] lowmem_reserve[]: 0 0 0 0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377344] DMA: 43*4kB 31*8kB 48*16kB 18*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2468kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377380] Normal: 192*4kB 14*8kB 1*16kB 1*32kB 6*64kB 7*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2976kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377416] 3992 total pagecache pages
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377421] 0 pages in swap cache
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377427] Swap cache stats: add 0, delete 0, find 0/0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377434] Free swap = 0kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377439] Total swap = 0kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.379976] 157439 pages RAM
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.379990] 0 pages HighMem
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.379995] 3185 pages reserved
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380004] 18287 pages shared
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380040] 149569 pages non-shared
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380047] Out of memory: kill process 18126 (postmaster) score 26011 or a child
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380058] Killed process 18126 (postmaster) vsz:104044kB, anon-rss:2476kB, file-rss:7152kB
Еще на Amazon CloudWatch Monitor в Network Out Trafic я видел максимальную пиковую нагрузку.
В чем проблема? Может кто уже сталкивался с такими вещами?
PS: Вот postgres.conf для памяти:
# - Memory -
shared_buffers = 80MB # min 128kB
# (change requires restart)
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
#work_mem = 1MB # min 64kB
#maintenance_work_mem = 16MB # min 1MB
#max_stack_depth = 2MB # min 100kB
# - Kernel Resource Usage -
#max_files_per_process = 1000 # min 25
# (change requires restart)
#shared_preload_libraries = '' # (change requires restart)
# - Cost-Based Vacuum Delay -
#vacuum_cost_delay = 0ms # 0-100 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits
# - Background Writer -
#bgwriter_delay = 200ms # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round
# - Asynchronous Behavior -
#effective_io_concurrency = 1 # 1-1000. 0 disables prefetching
Я предполагаю, что журнал довольно ясен: ядро Linux убило ваш экземпляр PostgreSQL, потому что ему не хватило памяти. Это стандартная функция ядра Linux - когда ему не хватает памяти, вместо того, чтобы уничтожать все приложения, оно выбирает одно из них для завершения. Больше информации: http://linux-mm.org/OOM_Killer
Вы можете проверить настройки памяти в postgresq.conf, по умолчанию в: /var/lib/pgsql/data/postgresql.conf
Обратите внимание, размер памяти (буфера) определяется размером страницы (4К или 8К, в зависимости).