Mesos → Завершенные задачи i Песочница В файле stdout я вижу сигнал killTask:
Received killTask for task sources.b4e2c8e6-5b42-11e7-aec0-024227901b13
Полная привязка файла stdout выглядит следующим образом. Вы можете увидеть даже после получения killTask
сигнализируйте, что мой процесс все еще запущен. то есть мой процесс не завершается сам.
2017-06-27 14:16:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:48,337 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:16:58,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:08,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:38,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 1, bytes sent 188 so far 2017-06-27 14:17:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:08,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:18,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:38,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:18:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:18,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:48,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:19:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:08,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:18,334 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:28,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:38,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far 2017-06-27 14:20:48,332 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far Received killTask for task sources.b4e2c8e6-5b42-11e7-aec0-024227901b13 2017-06-27 14:20:58,333 INFO [Timer-0] com.informatica.vds.transport.ws.WSClient - appmonitor messages sent 2, bytes sent 376 so far
Полная Snap файла stderr выглядит следующим образом:
I0627 19:42:51.959991 7613 fetcher.cpp:533] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0\/frameworks\/0e528b66-37aa-4d7a-933e-4638aabf494a-0000\/executors\/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13\/runs\/219c102b-28ae-41d5-b98f-11829315119e"} I0627 19:42:51.963241 7613 fetcher.cpp:444] Fetching URI 'file:///etc/docker.tar.gz' I0627 19:42:51.963279 7613 fetcher.cpp:285] Fetching directly into the sandbox directory I0627 19:42:51.963295 7613 fetcher.cpp:222] Fetching URI 'file:///etc/docker.tar.gz' I0627 19:42:51.964923 7613 fetcher.cpp:207] Copied resource '/etc/docker.tar.gz' to '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' I0627 19:42:52.070482 7613 fetcher.cpp:123] Extracted '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' into '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e' I0627 19:42:52.070533 7613 fetcher.cpp:582] Fetched 'file:///etc/docker.tar.gz' to '/var/lib/mesos/slaves/632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.b4e2c8e6-5b42-11e7-aec0-024227901b13/runs/219c102b-28ae-41d5-b98f-11829315119e/docker.tar.gz' I0627 19:42:56.096325 7643 exec.cpp:162] Version: 1.3.0 I0627 19:42:56.101958 7647 exec.cpp:237] Executor registered on agent 632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0 WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 221 100 138 100 83 8657 5207 --:--:-- --:--:-- --:--:-- 9200 E0627 19:51:03.219312 7652 process.cpp:951] Failed to accept socket: future discarded
Сообщения Ваше ядро не поддерживает возможности ограничения подкачки или контрольная группа не смонтирована. Ограничение памяти без подкачки. и Не удалось принять сокет: будущее отклонено кажется виноватым, которые убивают мой контейнер.
У меня вопрос: кто убивает мой контейнер через 5-10 минут снова и снова?
Я также обновил /etc/default/grub
файл с
GRUB_CMDLINE_LINUX_DEFAULT="cgroup_enable=memory swapaccount=1"
и перезагрузил мою систему, но без прогресса.
Любые идеи по этому вопросу.
Моя конфигурация Ubuntu VMWare выглядит так:
[EDIT: добавление содержимого файла stderr из пользовательского интерфейса mesos по адресу: /var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000 executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284
]
Добавление содержимого файла stderr другого задания.
I0628 10:15:45.951104 4357 fetcher.cpp:533] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/29df799b-4797-41df-a005-465f211d286b-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"file:\/\/\/etc\/docker.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/29df799b-4797-41df-a005-465f211d286b-S0\/frameworks\/0e528b66-37aa-4d7a-933e-4638aabf494a-0000\/executors\/sources.a634642c-5bbc-11e7-ba8b-024239f32c24\/runs\/1bda209c-c2b8-4bb5-a41b-26361e00a284"} I0628 10:15:45.953835 4357 fetcher.cpp:444] Fetching URI 'file:///etc/docker.tar.gz' I0628 10:15:45.953881 4357 fetcher.cpp:285] Fetching directly into the sandbox directory I0628 10:15:45.953974 4357 fetcher.cpp:222] Fetching URI 'file:///etc/docker.tar.gz' I0628 10:15:45.956663 4357 fetcher.cpp:207] Copied resource '/etc/docker.tar.gz' to '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' I0628 10:15:46.061069 4357 fetcher.cpp:123] Extracted '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' into '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284' I0628 10:15:46.061148 4357 fetcher.cpp:582] Fetched 'file:///etc/docker.tar.gz' to '/var/lib/mesos/slaves/29df799b-4797-41df-a005-465f211d286b-S0/frameworks/0e528b66-37aa-4d7a-933e-4638aabf494a-0000/executors/sources.a634642c-5bbc-11e7-ba8b-024239f32c24/runs/1bda209c-c2b8-4bb5-a41b-26361e00a284/docker.tar.gz' I0628 10:15:49.898803 4389 exec.cpp:162] Version: 1.3.0 I0628 10:15:49.903390 4390 exec.cpp:237] Executor registered on agent 29df799b-4797-41df-a005-465f211d286b-S0 WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 221 100 138 100 83 5385 3239 --:--:-- --:--:-- --:--:-- 11500 W0628 10:15:49.903390 4389 logging.cpp:91] RAW: Received signal SIGTERM from process 3287 of user 0; exiting
Нет новых журналов /var/lib/mesos-master.ERROR
файл сегодня Содержание /var/log/mesos-master.WARNING
файл:
Log file created at: 2017/06/28 10:04:56 Running on machine: ubuntu Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg W0628 10:04:56.387049 3193 authenticator.cpp:512] No credentials provided, authentication requests will be refused W0628 10:14:56.617103 3221 master.cpp:2011] Agent 632f9d21-ae71-4cca-95e4-63e2b3dbd78e-S0 (ubuntu) did not re-register within 10mins after master failover; marking it unreachable
Содержание /var/log/mesos-slave.WARNING
файл такой же, как если mesos-slave.ERROR
файл. Содержание /var/log/mesos-slave.ERROR
файл:
Log file created at: 2017/06/28 10:05:00 Running on machine: ubuntu Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0628 10:05:00.712286 3287 shell.hpp:107] Command 'hadoop version 2>&1' failed; this is the output: sh: 1: hadoop: not found E0628 10:24:45.502921 3326 slave.cpp:4496] Failed to update resources for container 1bda209c-c2b8-4bb5-a41b-26361e00a284 of executor 'sources.a634642c-5bbc-11e7-ba8b-024239f32c24' running task sources.a634642c-5bbc-11e7-ba8b-024239f32c24 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/4469/cgroup: Failed to open file: No such file or directory E0628 10:33:45.789072 3327 slave.cpp:4496] Failed to update resources for container 858170ce-0775-48be-8c85-3a1dbf320569 of executor 'sources.e7e069ed-5bbd-11e7-ba8b-024239f32c24' running task sources.e7e069ed-5bbd-11e7-ba8b-024239f32c24 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/5215/cgroup: Failed to open file: No such file or directory
Я заметил, что сообщение:
Failed to read /proc/5215/cgroup: Failed to open file: No such file or directory
приходит только тогда, когда контейнер / задача убиты. В то время как эти файлы существуют для текущих запущенных контейнеров. Спасибо.
Кажется, что марафон полагается на пользователя для выполнения проверки работоспособности. т.е. если мы предоставляем проверку работоспособности в конфигурации приложения, мы должны ее реализовать. Я удалил всю проверку работоспособности, которую предоставил в конфигурации приложения. После этого марафон показывает состояние приложения как неизвестное, но теперь марафон (в частности, mesos-slave) не убивает задачу.