Назад | Перейти на главную страницу

отладка kubelet не запускается

я использую kubeadm чтобы попытаться настроить dev master. Я столкнулся с проблемой, при которой проверка работоспособности для kubelet не работает. Я ищу направление, как это отладить. Выполнение команды, предложенной для отладки (systemctl status kubelet) не вижу причины ошибки:

kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2017-10-05 15:04:23 CDT; 4s ago
     Docs: http://kubernetes.io/docs/
  Process: 4786 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
 Main PID: 4786 (code=exited, status=1/FAILURE)

Oct 05 15:04:23 master.domain..com systemd[1]: Unit kubelet.service entered failed state.
Oct 05 15:04:23 master.domain.com systemd[1]: kubelet.service failed.

Где я могу найти конкретное сообщение об ошибке, чтобы указать, почему он не работает?

После запуска swapoff -a чтобы отключить своп, я все еще не могу подготовить Kubernetes.

Вот полный вывод kubeadm init:

$ kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.2
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master.my-domain.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.xx.xx.xx 10.xx.xx.xx]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by that:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    - There is no internet connection; so the kubelet can't pull the following control plane images:
        - gcr.io/google_containers/kube-apiserver-amd64:v1.8.2
        - gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2
        - gcr.io/google_containers/kube-scheduler-amd64:v1.8.2

You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

Я также попытался удалить репозиторий докеров и установить Docker 1.12, который не запускается - Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux ...

Дело решилось установкой --fail-swap-on=false в скрипте systemd. Просто внесите изменения в файл /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"

затем беги systemctl daemon-reload а потом systemctl restart kubelet

Нашел проблему по этому поводу: https://github.com/kubernetes/kubernetes/issues/53333

После предыдущего ответа у меня сработало, но не решение, предлагаемое в связанной проблеме.

Так что, возможно, после их предложения отредактировать 90-kubeadm.conf (вместо 10-kubeadm.conf) будет работать

Все это уже описано в выпуске, опубликованном Atom, поэтому я не чувствую, что вношу слишком много, но я могу воспроизвести вашу проблему, если включен своп. Поэтому для меня решение - отключить подкачку и повторить инициализацию:

sudo -i
swapoff -a
kubeadm reset
kubeadm init

Ответ, отправленный грязным мешком, тоже сработал для меня, но на всякий случай после systemctl daemon-reload, Я сделал полный kubeadm reset и kubeadm init, а не просто systemctl restart kubelet.

Если это не сработает, не могли бы вы вставить новый вывод kubeadm init после отключения свопа?

проверьте ошибку: journalctl -xeu kubelet

Примечание: убедитесь, что драйвер cgroup, используемый kubelet, такой же, как и драйвер Docker. Чтобы обеспечить совместимость, вы можете обновить Docker, например:

cat << EOF > /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

У меня была такая же проблема, но на Fedora 30, kubelet 1.15.3, docker-ce 19.03.1 И вывод systemctl status kubelet содержалось так же, как и в вашем случае:

Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf

Шаги, которые необходимо решить: 1. проверьте, есть ли у вас файлы kubelet.service и 10-kubeadm.conf по следующим путям:

ls /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
ls /usr/lib/systemd/system/kubelet.service

10-kubeadm.conf:

more /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf 

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

kubelet.service:

more /usr/lib/systemd/system/kubelet.service

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/kubelet
ExecStart=/usr/bin/kubelet \
        $KUBE_LOGTOSTDERR \
        $KUBE_LOG_LEVEL \
        $KUBELET_API_SERVER \
        $KUBELET_ADDRESS \
        $KUBELET_PORT \
        $KUBELET_HOSTNAME \
        $KUBE_ALLOW_PRIV \
        $KUBELET_ARGS
Restart=on-failure
KillMode=process

[Install]
WantedBy=multi-user.target
  1. Удалите модуль systemd для kubelet в / etc / systemd / system /

    rm -R /etc/systemd/system/kubelet.service.d (confirm "y" for each file)
    rm /etc/systemd/system/kubelet.service
    
  2. Перезагрузите все файлы модулей systemd и воссоздайте все дерево зависимостей.

    systemctl daemon-reload

  3. перезапустить кубелет

    systemctl restart kubelet

Вывод статуса kubelet должен содержать:

  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
  1. Инициализируйте узел плоскости управления Kubernetes:

    kubeadm reset
    systemctl daemon-reload
    kubeadm init --pod-network-cidr=10.244.0.0/16
    

Примечание: у вас есть одна или несколько проблем:

`- There is no internet connection; so the kubelet can't pull the following control plane images:`

Попробуйте вытащить их вручную:

kubeadm config images pull

Возможно, вам потребуется обновить kubeadm, kubelet, kebectl