Kubelet error getting node

kubelet can’t get node #70334 Comments When I Use kubeadm init —config /etc/kubernetes/kubeadm.yml to install kubernetes, it hangs and reports: kubelet logs as follows: But in my /etc/hosts, it has the record: and I can ping k8s-master-001 successful, the uname -n is also k8s-master-001. docker ps -a | grep kube get nothing. Why kubelet […]

Содержание

  1. kubelet can’t get node #70334
  2. Comments
  3. Error updating node status, will retry: error getting node #75607
  4. Comments
  5. kubelet error getting node status: unauthorized #69576
  6. Comments
  7. Footer
  8. kubeadm init fails with node not found #2370
  9. Comments

kubelet can’t get node #70334

When I Use kubeadm init —config /etc/kubernetes/kubeadm.yml to install kubernetes, it hangs and reports:

kubelet logs as follows:

But in my /etc/hosts, it has the record:

and I can ping k8s-master-001 successful, the uname -n is also k8s-master-001.

docker ps -a | grep kube get nothing.

Why kubelet can’t recognize my host, but apiserver and etcd can recognize it. it’s so strange, can somebody explain it, thanks!

Environment:

  • Kubernetes version (use kubectl version ): v1.12.2
  • OS (e.g. from /etc/os-release): Red Hat Enterprise Linux Server release 7.5
  • Kernel (e.g. uname -a ): 3.10.0-862.14.4.el7.x86_64

The text was updated successfully, but these errors were encountered:

/kind bug
/sig cluster-lifecycle
/sig node

I am having similar issue with Ubuntu 16.04 kubeadm 1.12.2.

Downgrade to 1.11.3 solved the issue.

After I downgrade to 1.11.4, It’s ok.

please file this issue in the kubernetes/kubadm repository so that we can keep track.
thank you!
/close

@neolit123: Closing this issue.

please file this issue in the kubernetes/kubadm repository so that we can keep track.
thank you!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

I meet same question.I think kubelet not use /etc/hostname to resolve name,prefer use DNS,so kubelet cann’t revolve node name.
env:
kubelet ver:1.12.2
os:centos 7.2

I found the reason.

Delete imageRepository: «xxxx» .
Then everything is ok.

But it is still a bug.

Hi,
Can I know where «imageRepository: «xxxx».» is ? I run into the same error .

Hi,
Can I know where «imageRepository: «xxxx».» is ? I run into the same error .

Thank you for your response. I did find / -name «kubeadm.» searching the whole box, kubeadm.yaml doesn’t seem to exist on my box.
I am at RHEL 7, kubernetes 1.13.0. ( even tried 1.11.x, and 1.12.3 and 4 )
just can’t make it work. always these same error messages.

all firewall rules have been removed, so no firewall bothers me. selinux is disabled.
what could be causing this?

thanks much for your help.

I am facing the same issue with mingf.
kubeadm 1.12.5-0 and kubelet 1.12.5-0 using CentOS Linux 7.
Also, I cannot find this kubeadm.yaml file anywhere.

Please put a correct path for this kubeadm.yaml
Because I don’t see it in /etc/kubernetes/manifest.

This issue can also manifest itself if your kubeadm controller node is not able to pull the control plane packages from the Internet for some reason. The output of the below error message should really be more descriptive of the problem:

[init] this might take a minute or longer if the control plane images have to be pulled

Unfortunately, an error has occurred:
timed out waiting for the condition

@mattshma mine config, and rm -rf /var/lib/kubelet, reinit by kubeadm, fix this problem

Any updates on this yet? I’m also facing the same issue on Kubernetes v1.13.4

the same issue on Kubernetes v1.13.4 too

the same issue on kube v15.3 too

the same issue on kubenetes V1.60 + centos8 + docker V19.3

the same issue on kubenetes V1.160 + centos8 + docker V19.3

I have the same issue Docker version 18.09.7, kubernetes v1.16.2, Ubuntu 16.04

I was able to resolve this issue for my use-case by having the same cgroup driver for docker and kubelet. In my case on CentOS 7.6 I could fix the issue by adding —exec-opt native.cgroupdriver=systemd to docker systemd process and adding —cgroup-driver=systemd to kubelet systemd process. This way both kubelet and docker are consuming the same cgroup-driver and both operate normally.

Most likely these drivers can be set with any other driver types as well but that was not a part of my testing.

If you try to run Kubernetes with Docker, please follow this configuration.

I have the same issue on v1.23

I’m facing similar issue for version v1.24. Trying to install K8s version v1.24.3 using kubeadm . The kubeadm init command fails with following error logs:

The kubelet service is in a Running state but showing repeated logs as:

When I do docker ps -a | grep kube I get nothing.
Any help is appreciated.

I also have this error during kubeadm init with kubeadm v1.25 on a Debian 11 box running containerd .

I am absolutely at a loss how to further diagnose the error. How can I check whether the cgroups are correct or not?

I also have this error during kubeadm init with kubeadm v1.25 on a Debian 11 box running containerd .

I am absolutely at a loss how to further diagnose the error. How can I check whether the cgroups are correct or not?
Maybe more logs can be helpful?

@chenliu1993 sorry for my bad post. Here is the missing information:

I am running on a Debian GNU/Linux 11 (bullseye) system with kubeadm version 1.24.8-00.

I followed the official guideline on kubernetes.io. I installed containerd and have set SystemdCgroup = true under the seciton [plugins.”io.containerd.grpc.v1.cri”.containerd.runtimes.runc.options] in the /etc/containerd/config.toml .

containerd seems to be fine:

When I run kubeadm init the system hangs:

There seems to be no firewall issue and kubeadm seems to detect the containerd and the cgroups correctly:

Than the following warning shows up when waiting for the kubelet to boot. This message is shown until the timeout after 4 minutes:

Checking the kublet status shows:

Checking the journalctl shows:

As this issue is very old I may ask if I should open a separate one?

Источник

Error updating node status, will retry: error getting node #75607

What happened:
Node not joining kops cluster

What you expected to happen:
The node to join the cluster and start scheduling pods

How to reproduce it (as minimally and precisely as possible):
Not sure

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version ):
  • Cloud provider or hardware configuration:
    AWS
  • OS (e.g: cat /etc/os-release ):

Kernel (e.g. uname -a ):
Linux ip-172-20-47-6 4.4.102-k8s #1 SMP Sun Nov 26 23:32:43 UTC 2017 x86_64 GNU/Linux

Install tools:
Kops:
Version 1.11.0 (git-2c2042465)
Helm:

From Master node:
I did grep for the ip at all logs:

I’m trying to see what happened but can’t find it. Is there anything else I could upload to help?

The text was updated successfully, but these errors were encountered:

These SIGs are my best guesses for this issue. Please comment /remove-sig if I am incorrect about one.

🤖 I am a bot run by @vllry. 👩‍🔬

How are you joining the worker node to this cluster (e.g. did the cluster already exist and you’re using TLS bootstrapping to add the node)?

@pulpbill thanks for posting.

In addition (or maybe instead of) this issue, I’d also recommend opening an issue on the Kops project. My guess is they’ll have the most direct expertise for debugging these types of issues.

How are you joining the worker node to this cluster (e.g. did the cluster already exist and you’re using TLS bootstrapping to add the node)?

Kops do all this work.

In addition (or maybe instead of) this issue, I’d also recommend opening an issue on the Kops project. My guess is they’ll have the most direct expertise for debugging these types of issues.

Thanks, I’ll do that, I’m trying to find something in the logs more specific but couldn’t.

How are you joining the worker node to this cluster (e.g. did the cluster already exist and you’re using TLS bootstrapping to add the node)?

Kops do all this work.

I see. I do think it’s a good idea to ask that project. I only asked because I setup a few clusters by hand or with kubeadm, and I ran into a similar issue that was resolved by setting the kubelet.service —node-ip parameter. This was because I had two interfaces and this was mixing things up.

minimum supported k8s version is 1.12.0. 1.8.4. is not supported.
please update.

you can report in the kops issue tracker, but as mentioned above, this k8s version is old and unsupported.

@neolit123: Closing this issue.

minimum supported k8s version is 1.12.0. 1.8.4. is not supported.
please update.

you can report in the kops issue tracker, but as mentioned above, this k8s version is old and unsupported.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Источник

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:

Setup new cluster in AWS EKS, with RBAC, checking all logs to verify all is good. Nodes appear to be coming online and Ready, but I’m seeing the following errors for the kubelet service on the nodes.

Have 3 nodes and seeing the same errors on all 3.

Appears it is coming from this line in kubelet_node_status.go

From the workers I can execute get nodes using kubectl just fine:

What you expected to happen:

I’d expect the kubelet logs to not have this error. Since the worker node can join the cluster it would appear that I have the ClusterRoleBindings setup properly.

The text was updated successfully, but these errors were encountered:

unauthorized errors mean the credentials the kubelet is using are invalid. are there stale/expired client certificates in the kubelet certificate directory, and was the client-ca for the cluster rotated?

@liggitt This is a brand new cluster in EKS. I completely recreated last night just to double check everything. Nodes come up as Ready in the cluster, I then tail the logs for the kubelet service and I can see logs of pods being scheduled but like clockwork, every 21 minutes I get the unauthorized error about «error getting node»

That’s an odd internal, especially to be a non-persistent error. Does it resolve itself after one occurrance without a kubelet restart? Do you know the duration of the client certificates issued to the kubelet?

@awly, do you remember if there were improvements to client cert rotation in 1.11 or 1.12 that would explain a stale client cert used for kubelet heartbeat, which would resolve itself after a connection?

Unsure on length of certificate. I’d have to research a bit into EKS to find that out.

Doesn’t resolve itself. I’ve restarted the kubelet service several times, and the error persists. It doesn’t seem to be impacting anything from what I can tell. The node stays in service, and pods continue to be scheduled as I add new services. It’s just a little concerning that this could lead to some issues that I’m not aware of.

Showing the 21 minute interval of the error:

I meant does it persist at a 10-second interval (encountered at every heartbeat attempt) or at a 21-minute interval?

Only at 21 minutes. That’s pretty much all I see in the logs except for when I add a new service and it gets scheduled to the node.

@liggitt AFAIK EKS uses IAM authenticator for node auth. I’m not sure if it’s only used to provision certs or always.
/sig aws

Any updates here?

Was just going to come here and post that @szymonpk Going to go ahead and close this out.

© 2023 GitHub, Inc.

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Источник

kubeadm init fails with node not found #2370

I am trying to initialize the cluster for the first time. Seems to go fine until it tries to boot kubelet. Then it gets a node not found.

$ sudo kubeadm init —config=kubeadm_config.yaml —upload-certs -v=6
I1223 17:40:03.468687 91734 initconfiguration.go:201] loading configuration from «kubeadm_config.yaml»
I1223 17:40:03.469726 91734 initconfiguration.go:104] detected and using CRI socket: /run/containerd/containerd.sock
I1223 17:40:03.469838 91734 interface.go:400] Looking for default routes with IPv4 addresses
I1223 17:40:03.469844 91734 interface.go:405] Default route transits interface «ens192»
I1223 17:40:03.469991 91734 interface.go:208] Interface ens192 is up
I1223 17:40:03.470019 91734 interface.go:256] Interface «ens192» has 2 addresses :[10.220.12.145/24 fe80::e411:63f8:561d:1e17/64].
I1223 17:40:03.470033 91734 interface.go:223] Checking addr 10.220.12.145/24.
I1223 17:40:03.470039 91734 interface.go:230] IP found 10.220.12.145
I1223 17:40:03.470045 91734 interface.go:262] Found valid IPv4 address 10.220.12.145 for interface «ens192».
I1223 17:40:03.470050 91734 interface.go:411] Found active IP 10.220.12.145
I1223 17:40:03.479509 91734 version.go:182] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable.txt
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
I1223 17:40:03.699015 91734 checks.go:577] validating Kubernetes and kubeadm version
I1223 17:40:03.699033 91734 checks.go:166] validating if the firewall is enabled and active
I1223 17:40:03.719375 91734 checks.go:201] validating availability of port 6443
I1223 17:40:03.719507 91734 checks.go:201] validating availability of port 10259
I1223 17:40:03.719523 91734 checks.go:201] validating availability of port 10257
I1223 17:40:03.719539 91734 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
I1223 17:40:03.719553 91734 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
I1223 17:40:03.719560 91734 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
I1223 17:40:03.719567 91734 checks.go:286] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I1223 17:40:03.719575 91734 checks.go:432] validating if the connectivity type is via proxy or direct
I1223 17:40:03.719594 91734 checks.go:471] validating http connectivity to first IP address in the CIDR
I1223 17:40:03.719616 91734 checks.go:471] validating http connectivity to first IP address in the CIDR
I1223 17:40:03.719624 91734 checks.go:102] validating the container runtime
I1223 17:40:03.730798 91734 checks.go:376] validating the presence of executable crictl
I1223 17:40:03.730835 91734 checks.go:335] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I1223 17:40:03.730871 91734 checks.go:335] validating the contents of file /proc/sys/net/ipv4/ip_forward
I1223 17:40:03.730889 91734 checks.go:649] validating whether swap is enabled or not
I1223 17:40:03.730906 91734 checks.go:376] validating the presence of executable conntrack
I1223 17:40:03.730935 91734 checks.go:376] validating the presence of executable ip
I1223 17:40:03.730948 91734 checks.go:376] validating the presence of executable iptables
I1223 17:40:03.730966 91734 checks.go:376] validating the presence of executable mount
I1223 17:40:03.730985 91734 checks.go:376] validating the presence of executable nsenter
I1223 17:40:03.731001 91734 checks.go:376] validating the presence of executable ebtables
I1223 17:40:03.731010 91734 checks.go:376] validating the presence of executable ethtool
I1223 17:40:03.731019 91734 checks.go:376] validating the presence of executable socat
I1223 17:40:03.731032 91734 checks.go:376] validating the presence of executable tc
I1223 17:40:03.731053 91734 checks.go:376] validating the presence of executable touch
I1223 17:40:03.731067 91734 checks.go:520] running all checks
I1223 17:40:03.741754 91734 checks.go:406] checking whether the given node name is reachable using net.LookupHost
I1223 17:40:03.742557 91734 checks.go:618] validating kubelet version
I1223 17:40:03.815761 91734 checks.go:128] validating if the «kubelet» service is enabled and active
I1223 17:40:03.830576 91734 checks.go:201] validating availability of port 10250
I1223 17:40:03.830645 91734 checks.go:201] validating availability of port 2379
I1223 17:40:03.830670 91734 checks.go:201] validating availability of port 2380
I1223 17:40:03.830697 91734 checks.go:249] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using ‘kubeadm config images pull’
I1223 17:40:03.838710 91734 checks.go:845] pulling k8s.gcr.io/kube-apiserver:v1.20.1
I1223 17:40:06.289721 91734 checks.go:845] pulling k8s.gcr.io/kube-controller-manager:v1.20.1
I1223 17:40:08.511022 91734 checks.go:845] pulling k8s.gcr.io/kube-scheduler:v1.20.1
I1223 17:40:09.955515 91734 checks.go:845] pulling k8s.gcr.io/kube-proxy:v1.20.1
I1223 17:40:12.814572 91734 checks.go:845] pulling k8s.gcr.io/pause:3.2
I1223 17:40:13.497729 91734 checks.go:845] pulling k8s.gcr.io/etcd:3.4.13-0
I1223 17:40:18.550367 91734 checks.go:845] pulling k8s.gcr.io/coredns:1.7.0
[certs] Using certificateDir folder «/etc/kubernetes/pki»
I1223 17:40:19.970967 91734 certs.go:110] creating a new certificate authority for ca
[certs] Generating «ca» certificate and key
I1223 17:40:20.142968 91734 certs.go:474] validating certificate period for ca certificate
[certs] Generating «apiserver» certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.nalk8s.railcarmgt.com nalshsvrk8ss01.railcarmgt.com] and IPs [10.96.0.1 10.220.12.145]
[certs] Generating «apiserver-kubelet-client» certificate and key
I1223 17:40:20.936427 91734 certs.go:110] creating a new certificate authority for front-proxy-ca
[certs] Generating «front-proxy-ca» certificate and key
I1223 17:40:21.064115 91734 certs.go:474] validating certificate period for front-proxy-ca certificate
[certs] Generating «front-proxy-client» certificate and key
I1223 17:40:21.214855 91734 certs.go:110] creating a new certificate authority for etcd-ca
[certs] Generating «etcd/ca» certificate and key
I1223 17:40:21.293940 91734 certs.go:474] validating certificate period for etcd/ca certificate
[certs] Generating «etcd/server» certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost nalshsvrk8ss01.railcarmgt.com] and IPs [10.220.12.145 127.0.0.1 ::1]
[certs] Generating «etcd/peer» certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost nalshsvrk8ss01.railcarmgt.com] and IPs [10.220.12.145 127.0.0.1 ::1]
[certs] Generating «etcd/healthcheck-client» certificate and key
[certs] Generating «apiserver-etcd-client» certificate and key
I1223 17:40:22.313697 91734 certs.go:76] creating new public/private key files for signing service account users
[certs] Generating «sa» key and public key
[kubeconfig] Using kubeconfig folder «/etc/kubernetes»
I1223 17:40:22.407630 91734 kubeconfig.go:101] creating kubeconfig file for admin.conf
[kubeconfig] Writing «admin.conf» kubeconfig file
I1223 17:40:22.658168 91734 kubeconfig.go:101] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing «kubelet.conf» kubeconfig file
I1223 17:40:22.791828 91734 kubeconfig.go:101] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing «controller-manager.conf» kubeconfig file
I1223 17:40:23.012512 91734 kubeconfig.go:101] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing «scheduler.conf» kubeconfig file
I1223 17:40:23.098365 91734 kubelet.go:63] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file «/var/lib/kubelet/kubeadm-flags.env»
[kubelet-start] Writing kubelet configuration to file «/var/lib/kubelet/config.yaml»
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder «/etc/kubernetes/manifests»
[control-plane] Creating static Pod manifest for «kube-apiserver»
I1223 17:40:23.372452 91734 manifests.go:96] [control-plane] getting StaticPodSpecs
I1223 17:40:23.372821 91734 certs.go:474] validating certificate period for CA certificate
I1223 17:40:23.372892 91734 manifests.go:109] [control-plane] adding volume «ca-certs» for component «kube-apiserver»
I1223 17:40:23.372898 91734 manifests.go:109] [control-plane] adding volume «etc-pki» for component «kube-apiserver»
I1223 17:40:23.372902 91734 manifests.go:109] [control-plane] adding volume «k8s-certs» for component «kube-apiserver»
I1223 17:40:23.388156 91734 manifests.go:126] [control-plane] wrote static Pod manifest for component «kube-apiserver» to «/etc/kubernetes/manifests/kube-apiserver.yaml»
[control-plane] Creating static Pod manifest for «kube-controller-manager»
I1223 17:40:23.388193 91734 manifests.go:96] [control-plane] getting StaticPodSpecs
I1223 17:40:23.388573 91734 manifests.go:109] [control-plane] adding volume «ca-certs» for component «kube-controller-manager»
I1223 17:40:23.388580 91734 manifests.go:109] [control-plane] adding volume «etc-pki» for component «kube-controller-manager»
I1223 17:40:23.388585 91734 manifests.go:109] [control-plane] adding volume «flexvolume-dir» for component «kube-controller-manager»
I1223 17:40:23.388590 91734 manifests.go:109] [control-plane] adding volume «k8s-certs» for component «kube-controller-manager»
I1223 17:40:23.388594 91734 manifests.go:109] [control-plane] adding volume «kubeconfig» for component «kube-controller-manager»
I1223 17:40:23.389535 91734 manifests.go:126] [control-plane] wrote static Pod manifest for component «kube-controller-manager» to «/etc/kubernetes/manifests/kube-controller-manager.yaml»
[control-plane] Creating static Pod manifest for «kube-scheduler»
I1223 17:40:23.389549 91734 manifests.go:96] [control-plane] getting StaticPodSpecs
I1223 17:40:23.389872 91734 manifests.go:109] [control-plane] adding volume «kubeconfig» for component «kube-scheduler»
I1223 17:40:23.390482 91734 manifests.go:126] [control-plane] wrote static Pod manifest for component «kube-scheduler» to «/etc/kubernetes/manifests/kube-scheduler.yaml»
[etcd] Creating static Pod manifest for local etcd in «/etc/kubernetes/manifests»
I1223 17:40:23.391376 91734 local.go:74] [etcd] wrote Static Pod manifest for a local etcd member to «/etc/kubernetes/manifests/etcd.yaml»
I1223 17:40:23.391388 91734 waitcontrolplane.go:87] [wait-control-plane] Waiting for the API server to be healthy
I1223 17:40:23.393182 91734 loader.go:379] Config loaded from file: /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory «/etc/kubernetes/manifests». This can take up to 4m0s
I1223 17:40:23.396469 91734 round_trippers.go:445] GET https://nalshsvrk8ss01.railcarmgt.com:6443/healthz?timeout=10s in 1 milliseconds
I1223 17:40:23.904958 91734 round_trippers.go:445] GET https://nalshsvrk8ss01.railcarmgt.com:6443/healthz?timeout=10s in 6 milliseconds
I1223 17:41:02.897664 91734 round_trippers.go:445] GET https://nalshsvrk8ss01.railcarmgt.com:6443/healthz?timeout=10s in
0 milliseconds
.
[kubelet-check] Initial timeout of 40s passed.
I1223 17:41:03.397504 91734 round_trippers.go:445] GET https://nalshsvrk8ss01.railcarmgt.com:6443/healthz?timeout=10s in 0 milliseconds
.
^C

From /var/log/messages
Dec 23 17:42:36 nalshsvrk8ss01 kubelet[92039]: E1223 17:42:36.551307 92039 kubelet.go:2240] node «nalshsvrk8ss01.railcarmgt.com» not found
Dec 23 17:42:36 nalshsvrk8ss01 kubelet[92039]: E1223 17:42:36.651514 92039 kubelet.go:2240] node «nalshsvrk8ss01.railcarmgt.com» not found

The text was updated successfully, but these errors were encountered:

Источник

I had restarted the server (master node) and I get since then the following message when I want to use kubelet:
The connection to the server YYY.YYY.YYY.YY:6443 was refused — did you specify the right host or port?

In the log of kubelet it says “Error getting node” err=“node «jupyterhub-test» not found”. jupyterhub-test is the master node.

The only things I always found while researching, I already tested.

  • restart kubelet
  • restart docker
  • swapoff -a

I can’t figure out from the log of kubelet where exactly the problem is and I hope you can help me.

Cluster information:
Kubernetes version: Client Version: version.Info{Major:“1”, Minor:“24”, GitVersion:“v1.24.1”, GitCommit:“3ddd0f45aa91e2f30c70734b175631bec5b5825a”, GitTreeState:“clean”, BuildDate:“2022-05-24T12:26:19Z”, GoVersion:“go1.18.2”, Compiler:“gc”, Platform:“linux/amd64”}
Installation method: kubeadm init
Host OS: Ubuntu Ubuntu 20.04.4 LTS

kubelet log:

Sep 14 08:20:37 jupyterhub-test systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: A start job for unit kubelet.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit kubelet.service has finished successfully.
--
-- The job identifier is 7916.
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: Flag --pod-infra-container-image has been deprecated, will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.562908  178160 server.go:193] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: Flag --pod-infra-container-image has been deprecated, will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.571728  178160 server.go:399] "Kubelet version" kubeletVersion="v1.24.1"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.571751  178160 server.go:401] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.571927  178160 server.go:813] "Client rotation is on, will bootstrap in background"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.572804  178160 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.573378  178160 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.575898  178160 server.go:648] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.576200  178160 container_manager_linux.go:262] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.576261  178160 container_manager_linux.go:267] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.576319  178160 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.576328  178160 container_manager_linux.go:302] "Creating device plugin manager" devicePluginEnabled=true
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.576359  178160 state_mem.go:36] "Initialized new in-memory state store"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.579169  178160 kubelet.go:376] "Attempting to sync node with API server"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.579187  178160 kubelet.go:267] "Adding static pod path" path="/etc/kubernetes/manifests"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.579203  178160 kubelet.go:278] "Adding apiserver pod source"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.579211  178160 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: WARN 08:20:37.580088  178160 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.Node: Get "https://YYY.YYY.YYY.YY:6443/api/v1/nodes?fieldSelector=metadata.name%3Djupyterhub-test&limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.580150  178160 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://YYY.YYY.YYY.YY:6443/api/v1/nodes?fieldSelector=metadata.name%3Djupyterhub-test&limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.580230  178160 kuberuntime_manager.go:239] "Container runtime initialized" containerRuntime="containerd" version="1.6.6" apiVersion="v1"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.581022  178160 server.go:1181] "Started kubelet"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.581694  178160 server.go:150] "Starting to listen" address="0.0.0.0" port=10250
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.582261  178160 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.582351  178160 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.582738  178160 event.go:276] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"jupyterhub-test.1714ac53a07755b3", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ZZZ_DeprecatedClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jupyterhub-test", UID:"jupyterhub-test", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"jupyterhub-test"}, FirstTimestamp:time.Date(2022, time.September, 14, 8, 20, 37, 580993971, time.Local), LastTimestamp:time.Date(2022, time.September, 14, 8, 20, 37, 580993971, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://YYY.YYY.YYY.YY:6443/api/v1/namespaces/default/events": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused'(may retry after sleeping)
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: WARN 08:20:37.582943  178160 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.Service: Get "https://YYY.YYY.YYY.YY:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.583036  178160 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://YYY.YYY.YYY.YY:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.583351  178160 server.go:410] "Adding debug handlers to kubelet server"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.582359  178160 kubelet.go:1298] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.585879  178160 volume_manager.go:289] "Starting Kubelet Volume Manager"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.585928  178160 desired_state_of_world_populator.go:145] "Desired state populator starts to run"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.586167  178160 controller.go:144] failed to ensure lease exists, will retry in 200ms, error: Get "https://YYY.YYY.YYY.YY:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/jupyterhub-test?timeout=10s": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: WARN 08:20:37.586759  178160 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.CSIDriver: Get "https://YYY.YYY.YYY.YY:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.586877  178160 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://YYY.YYY.YYY.YY:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.598425  178160 kubelet_network_linux.go:76] "Initialized protocol iptables rules." protocol=IPv4
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.614860  178160 kubelet_network_linux.go:76] "Initialized protocol iptables rules." protocol=IPv6
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.614881  178160 status_manager.go:161] "Starting to sync pod status with apiserver"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.614893  178160 kubelet.go:1974] "Starting kubelet main sync loop"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.615122  178160 kubelet.go:1998] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: WARN 08:20:37.616041  178160 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.RuntimeClass: Get "https://YYY.YYY.YYY.YY:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.616074  178160 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://YYY.YYY.YYY.YY:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.665155  178160 cpu_manager.go:213] "Starting CPU manager" policy="none"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.665168  178160 cpu_manager.go:214] "Reconciling" reconcilePeriod="10s"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.665179  178160 state_mem.go:36] "Initialized new in-memory state store"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.665277  178160 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.665286  178160 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.665291  178160 policy_none.go:49] "None policy: Start"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.666175  178160 memory_manager.go:168] "Starting memorymanager" policy="None"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.666199  178160 state_mem.go:35] "Initializing new in-memory state store"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.666468  178160 state_mem.go:75] "Updated machine memory state"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.670363  178160 manager.go:610] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.670567  178160 plugin_manager.go:114] "Starting Kubelet Plugin Manager"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.671373  178160 eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node "jupyterhub-test" not found"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.686080  178160 kubelet.go:2419] "Error getting node" err="node "jupyterhub-test" not found"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.686550  178160 kubelet_node_status.go:70] "Attempting to register node" node="jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.686984  178160 kubelet_node_status.go:92] "Unable to register node with API server" err="Post "https://YYY.YYY.YYY.YY:6443/api/v1/nodes": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused" node="jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.716206  178160 topology_manager.go:200] "Topology Admit Handler"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.716954  178160 topology_manager.go:200] "Topology Admit Handler"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.717589  178160 topology_manager.go:200] "Topology Admit Handler"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.717932  178160 status_manager.go:664] "Failed to get status for pod" podUID=6caff56f64d1ee735407a5a7ba6a787a pod="kube-system/etcd-jupyterhub-test" err="Get "https://YYY.YYY.YYY.YY:6443/api/v1/namespaces/kube-system/pods/etcd-jupyterhub-test": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.718167  178160 topology_manager.go:200] "Topology Admit Handler"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.718644  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="daf5debb6988d121c82972b4d0f6c1935c005063b7f42da0c10773efbf0525e1"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.718776  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="435139d2c5996b2f24600f7cd4911fc77a92bf06e7361347971e70a68e0a54e5"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.718849  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="3554bcaebcaf46d227f480181f505d48af9a6f9d389e93f1c6d3fab703db8df4"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.718927  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="257b423a7d1730e88f277239d535923ffac18d5430b723aee160d50af5417553"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.718995  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="cae22d913fbc0ce98a6e7b0d8b13cef758179416dea8f7a9dd875ab5172d8a4f"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719055  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="c86abd3d68a6a6f596481f8784db328cc2bf9951b0a514bff979e4243b63e9ce"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719113  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="0db53f26fd3fc078c02941576d2187b8525fe2da77a2e6dffff797bba8b12213"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719177  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="19d5c5e57cd43c00235306c69378fd6a9d224a337a79ab0d19417bb7ae8c91b4"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719239  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="84298f6839000667cb1d02bc812a84c94920e088d5649821fb6ebe3dabc13698"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719301  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="4d624b46c1721a68bf6b8b1a57d0aa23ee8d664108114337234826a1e053c991"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719370  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="03454711bc28b88c9a5b4eeb8ab8c962e758bb8ae34cbee798ecd5651ca37bc8"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719429  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="1a29e83515332ebd4cf4eabecdaa4c305e280de78a90d90210fd4d467a28bffd"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719490  178160 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="1b8edc5b07c4a5eab18f9e7edb73b089ea303a5f393b650deb550935456c5df8"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.718813  178160 status_manager.go:664] "Failed to get status for pod" podUID=8cdd26801ec71603976b3d4c3c72beae pod="kube-system/kube-apiserver-jupyterhub-test" err="Get "https://YYY.YYY.YYY.YY:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-jupyterhub-test": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.719777  178160 status_manager.go:664] "Failed to get status for pod" podUID=c9c46b502c40273f3cde89de382eb13b pod="kube-system/kube-controller-manager-jupyterhub-test" err="Get "https://YYY.YYY.YYY.YY:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-jupyterhub-test": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.720022  178160 status_manager.go:664] "Failed to get status for pod" podUID=a7527c69e782fb5d6404c82767da6341 pod="kube-system/kube-scheduler-jupyterhub-test" err="Get "https://YYY.YYY.YYY.YY:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-jupyterhub-test": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.786540  178160 controller.go:144] failed to ensure lease exists, will retry in 400ms, error: Get "https://YYY.YYY.YYY.YY:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/jupyterhub-test?timeout=10s": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.786583  178160 kubelet.go:2419] "Error getting node" err="node "jupyterhub-test" not found"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.786993  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-certs" (UniqueName: "kubernetes.io/host-path/8cdd26801ec71603976b3d4c3c72beae-k8s-certs") pod "kube-apiserver-jupyterhub-test" (UID: "8cdd26801ec71603976b3d4c3c72beae") " pod="kube-system/kube-apiserver-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.787171  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "usr-local-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/8cdd26801ec71603976b3d4c3c72beae-usr-local-share-ca-certificates") pod "kube-apiserver-jupyterhub-test" (UID: "8cdd26801ec71603976b3d4c3c72beae") " pod="kube-system/kube-apiserver-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.787275  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "usr-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/8cdd26801ec71603976b3d4c3c72beae-usr-share-ca-certificates") pod "kube-apiserver-jupyterhub-test" (UID: "8cdd26801ec71603976b3d4c3c72beae") " pod="kube-system/kube-apiserver-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.787372  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-certs" (UniqueName: "kubernetes.io/host-path/6caff56f64d1ee735407a5a7ba6a787a-etcd-certs") pod "etcd-jupyterhub-test" (UID: "6caff56f64d1ee735407a5a7ba6a787a") " pod="kube-system/etcd-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.787406  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "etcd-data" (UniqueName: "kubernetes.io/host-path/6caff56f64d1ee735407a5a7ba6a787a-etcd-data") pod "etcd-jupyterhub-test" (UID: "6caff56f64d1ee735407a5a7ba6a787a") " pod="kube-system/etcd-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.787431  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "ca-certs" (UniqueName: "kubernetes.io/host-path/8cdd26801ec71603976b3d4c3c72beae-ca-certs") pod "kube-apiserver-jupyterhub-test" (UID: "8cdd26801ec71603976b3d4c3c72beae") " pod="kube-system/kube-apiserver-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.787452  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "etc-ca-certificates" (UniqueName: "kubernetes.io/host-path/8cdd26801ec71603976b3d4c3c72beae-etc-ca-certificates") pod "kube-apiserver-jupyterhub-test" (UID: "8cdd26801ec71603976b3d4c3c72beae") " pod="kube-system/kube-apiserver-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.787479  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "etc-pki" (UniqueName: "kubernetes.io/host-path/8cdd26801ec71603976b3d4c3c72beae-etc-pki") pod "kube-apiserver-jupyterhub-test" (UID: "8cdd26801ec71603976b3d4c3c72beae") " pod="kube-system/kube-apiserver-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.887309  178160 kubelet.go:2419] "Error getting node" err="node "jupyterhub-test" not found"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.887665  178160 kubelet_node_status.go:70] "Attempting to register node" node="jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889786  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "ca-certs" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-ca-certs") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889824  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "etc-pki" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-etc-pki") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889845  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-certs" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-k8s-certs") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889865  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "usr-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-usr-share-ca-certificates") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889885  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "kubeconfig" (UniqueName: "kubernetes.io/host-path/a7527c69e782fb5d6404c82767da6341-kubeconfig") pod "kube-scheduler-jupyterhub-test" (UID: "a7527c69e782fb5d6404c82767da6341") " pod="kube-system/kube-scheduler-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889930  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "flexvolume-dir" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-flexvolume-dir") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889950  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "kubeconfig" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-kubeconfig") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.889970  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "usr-local-share-ca-certificates" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-usr-local-share-ca-certificates") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: INFO 08:20:37.890011  178160 reconciler.go:270] "operationExecutor.VerifyControllerAttachedVolume started for volume "etc-ca-certificates" (UniqueName: "kubernetes.io/host-path/c9c46b502c40273f3cde89de382eb13b-etc-ca-certificates") pod "kube-controller-manager-jupyterhub-test" (UID: "c9c46b502c40273f3cde89de382eb13b") " pod="kube-system/kube-controller-manager-jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.890469  178160 kubelet_node_status.go:92] "Unable to register node with API server" err="Post "https://YYY.YYY.YYY.YY:6443/api/v1/nodes": dial tcp YYY.YYY.YYY.YY:6443: connect: connection refused" node="jupyterhub-test"
Sep 14 08:20:37 jupyterhub-test kubelet[178160]: ERROR 08:20:37.988241  178160 kubelet.go:2419] "Error getting node" err="node "jupyterhub-test" not found"
Sep 14 08:20:38 jupyterhub-test kubelet[178160]: INFO 08:20:38.019896  178160 scope.go:110] "RemoveContainer" containerID="cc4b38a68d8e34264d83b0859a9b9c84b6d71b07f886ba76a9d18e6e2fa63d81"
Sep 14 08:20:38 jupyterhub-test kubelet[178160]: INFO 08:20:38.019999  178160 scope.go:110] "RemoveContainer" containerID="8ceae598793f115e91fdd87e77b087d5ce5ccede2c719ae78c3c7a3c477bf714"
Sep 14 08:20:38 jupyterhub-test kubelet[178160]: ERROR 08:20:38.088723  178160 kubelet.go:2419] "Error getting node" err="node "jupyterhub-test" not found"
Sep 14 08:20:38 jupyterhub-test kubelet[178160]: ERROR 08:20:38.188858  178160 kubelet.go:2419] "Error getting node" err="node "jupyterhub-test" not found"

I followed the guide to create my cluster on Ali Cloud, and the two instances with 2cpu, 8G.

[email protected]:~# cat /etc/hosts
10.250.115.210  master
10.250.115.211  slaver

[email protected]:~# hostname
master

the kubeadm init always block at following block

I0201 00:57:06.271718   29692 waitcontrolplane.go:91] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

I googled it and found the following issue seems like what I met.

https://github.com/cri-o/cri-o/issues/2357
https://github.com/kubernetes/kubeadm/issues/1153
https://github.com/kubernetes/kubeadm/issues/2370
https://github.com/kubernetes/kubernetes/issues/106464

I did remove the docker if it exists, and double confirm the type of container group is the same in crio and kubelet. the error reporting is still kubelet’s problem like below:

Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.198073   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.298393   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.398656   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.499651   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.599724   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.700032   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.800410   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.900674   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:20 master kubelet[29902]: E0201 00:57:20.001051   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"
Feb 01 00:57:20 master kubelet[29902]: E0201 00:57:20.101439   29902 kubelet.go:2422] "Error getting node" err="node "master" not found"

I tried to upgrade the kubeadm, kubelet, kubectl to the newest version 1.23.3. it seems not to work. is there anyone who may give some insight about it? thanks.

BTW, below is the kubeadm.yaml serve for kubeadm init.

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/crio/crio.sock
  name: master
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.23.3
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 192.168.0.0/16
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

I am using Magnum stable/mitaka branch. I can instantiate a k8s bay, but I see the following errors in the master kubelet log:

— k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal ping statistics —
14 packets transmitted, 14 received, 0% packet loss, time 13001ms
rtt min/avg/max/mdev = 0.025/0.033/0.050/0.008 ms

[minion@k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5 ~]$ sudo systemctl status kubelet -l
● kubelet.service — Kubernetes Kubelet Server
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled)
   Active: active (running) since Tue 2016-08-23 17:27:29 UTC; 4min 42s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 1318 (kubelet)
   CGroup: /system.slice/kubelet.service
           ├─1318 /usr/bin/kubelet —logtostderr=true —v=0 —api_servers=http://127.0.0.1:8080 —address=0.0.0.0 —allow_privileged=true —register-node=false —config=/etc/kubernetes/manifests
           └─1341 journalctl -f

Aug 23 17:31:59 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:31:59.160906 1318 kubelet.go:1933] Error updating node status, will retry: error getting node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal»: node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal» not found
Aug 23 17:31:59 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:31:59.163727 1318 kubelet.go:1933] Error updating node status, will retry: error getting node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal»: node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal» not found
Aug 23 17:31:59 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:31:59.164302 1318 kubelet.go:839] Unable to update node status: update node status exceeds retry count
Aug 23 17:32:06 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:32:06.831733 1318 kubelet.go:1641] error getting node: node k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal not found
Aug 23 17:32:09 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:32:09.171889 1318 kubelet.go:1933] Error updating node status, will retry: error getting node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal»: node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal» not found
Aug 23 17:32:09 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:32:09.178053 1318 kubelet.go:1933] Error updating node status, will retry: error getting node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal»: node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal» not found
Aug 23 17:32:09 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:32:09.183941 1318 kubelet.go:1933] Error updating node status, will retry: error getting node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal»: node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal» not found
Aug 23 17:32:09 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:32:09.188885 1318 kubelet.go:1933] Error updating node status, will retry: error getting node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal»: node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal» not found
Aug 23 17:32:09 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:32:09.193694 1318 kubelet.go:1933] Error updating node status, will retry: error getting node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal»: node «k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal» not found
Aug 23 17:32:09 k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal kubelet[1318]: E0823 17:32:09.194279 1318 kubelet.go:839] Unable to update node status: update node status exceeds retry count
[minion@k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5 ~]$ date
Tue Aug 23 17:32:13 UTC 2016

However, I can ping the node name:
[minion@k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5 ~]$ sudo cp /usr/bin/ping ~/ping
[minion@k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5 ~]$ sudo ~/ping k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal
PING k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6) 56(84) bytes of data.
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=1 ttl=64 time=0.025 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=2 ttl=64 time=0.029 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=3 ttl=64 time=0.032 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=4 ttl=64 time=0.032 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=5 ttl=64 time=0.031 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=6 ttl=64 time=0.050 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=7 ttl=64 time=0.033 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=8 ttl=64 time=0.033 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=9 ttl=64 time=0.034 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=10 ttl=64 time=0.037 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=11 ttl=64 time=0.027 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=12 ttl=64 time=0.033 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=13 ttl=64 time=0.030 ms
64 bytes from k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5.novalocal (10.0.0.6): icmp_seq=14 ttl=64 time=0.044 ms
^C

I am using the f-atomic-5 image:
[minion@k8-csc3nktiyz-0-i5bhydwhwz7q-kube-master-jd3fhyh23en5 ~]$ cat /etc/os-release
NAME=Fedora
VERSION=»21 (Twenty One)»
ID=fedora
VERSION_ID=21
PRETTY_NAME=»Fedora 21 (Twenty One)»
ANSI_COLOR=»0;34″
CPE_NAME=»cpe:/o:fedoraproject:fedora:21″
HOME_URL=»https://fedoraproject.org/»
BUG_REPORT_URL=»https://bugzilla.redhat.com/»
REDHAT_BUGZILLA_PRODUCT=»Fedora»
REDHAT_BUGZILLA_PRODUCT_VERSION=21
REDHAT_SUPPORT_PRODUCT=»Fedora»
REDHAT_SUPPORT_PRODUCT_VERSION=21

What is the Kubernetes Node Not Ready Error?

A Kubernetes node is a physical or virtual machine participating in a Kubernetes cluster, which can be used to run pods. When a node shuts down or crashes, it enters the NotReady state, meaning it cannot be used to run pods. All stateful pods running on the node then become unavailable.

Common reasons for a Kubernetes node not ready error include lack of resources on the node, a problem with the kubelet (the agent enabling the Kubernetes control plane to access and control the node), or an error related to kube-proxy (the networking agent on the node).

To identify a Kubernetes node not ready error: run the kubectl get nodes command. Nodes that are not ready will appear like this:

NAME                   STATUS    ROLES   AGE      VERSION
master.example.com     Ready     master  5h       v1.17
node1.example.com      NotReady  compute 5h       v1.17
node2.example.com      Ready     compute 5h       v1.17

We’ll provide best practices for diagnosing simple cases of the node not ready error, but more complex cases will require advanced diagnosis and troubleshooting, which is beyond the scope of this article.

The 4 Kubernetes Node States

At any given time, a Kubernetes node can be in one of the following states:

  • Ready—able to run pods.
  • NotReady—not operating due to a problem, and cannot run pods.
  • SchedulingDisabled—the node is healthy but has been marked by the cluster as not schedulable.
  • Unknown—if the node controller cannot communicate with the node, it waits a default of 40 seconds, and then sets the node status to unknown.

If a note is in the NodeReady state, it indicates that the kubelet is installed on the node, but Kubernetes has detected a problem on the node that prevents it from running pods.

Troubleshooting Node Not Ready Error

Common Causes and Diagnosis

Here are some common reasons that a Kubernetes node may enter the NotRead state:

Lack of System Resources

Why It Prevents the Node from Running Pods
A node must have enough disk space, memory, and processing power to run Kubernetes workloads.

If non-Kubernetes processes on the node are taking up too many resources, or if there are too many processes running on the node, it can be marked by the control plane as NotReady.

How to Diagnose
Run kubectl describe node and look in the Conditions section to see if resources are missing on the node:

MemoryPressure—node is running out of memory.
DiskPressure—node is running out of disk space.
PIDPressure—node is running too many processes.

kubelet Issue

Why It Prevents the Node from Running Pods
The kubelet must run on each node to enable it to participate in the cluster. If the kubelet crashes or stops on a node, it cannot communicate with the API server and the node goes into a not ready state.

How to Diagnose
Run kubectl describe node [name] and look in the Conditions section—if all the conditions are unknown, this indicates the kubelet is down.

kube-proxy Issue

Why It Prevents the Node from Running Pods
kube-proxy runs on every node and is responsible for regulating network traffic between the node and other entities inside and outside the cluster. If kube-proxy stops running for any reason, the node goes into a not ready state.

How to Diagnose
Run kubectl get pods -n kube-system to show pods belonging to the Kubernetes system.

Connectivity Issue

Why It Prevents the Node from Running Pods
Even if a node is configured perfectly, but it has no network connectivity, Kubernetes treats the node as not ready. This could be due to a disconnected network cable, no Internet access, or misconfigured networking on the machine.

How to Diagnose
Run kubectl describe node [name] and look in the Conditions section—if the NetworkUnavailable flag is True, this means the node has a connectivity issue.

Resolving Node Not Ready Issues

Resolving Lack of System Resources

Here are a few ways to resolve a system resource issue on the node:

  • Identify which non-Kubernetes processes are running on the node. If there are any, shut them down or reduce them to a minimum to conserve resources.
  • Run a malware scan—there may be hidden malicious processes taking up system resources.
  • Upgrade the node.
  • Check for hardware issues or misconfigurations and resolve them.

Resolving kubelet Issues

To resolve a kubelet issue, SSH into the node and run the command systemctl status kubelet

Look at the value of the Active field:

  • active (running) means the kubelet is actually operational, look for the problem elsewhere.
  • active (exited) means the kubelet was exited, probably in error. Restart it.>
  • inactive (dead) means the kubelet crashed. To identify why, run the command journalctl -u kubelet and examine the kubelet logs.

Resolving kube-proxy Issues

Try looking in the following places to identify what is the issue with kube-proxy:

  • Run the command kubectl describe pod using the name of the kube-proxy pod that failed, and check the Events section in the output.
  • Run the command kubectl logs [pod-name] -n kube-system to see a full log of the failing kube-proxy pod.
  • Run the command kubectl describe daemonset kube-proxy -n kube-system to see the status of the kube-proxy daemonset, which is responsible for ensuring there is a kube-proxy running on every Kubernetes node.

Please note that these procedures can help you gather more information about the problem, but additional steps may be needed to resolve the problem. If one of the quick fixes above did not work, you’ll need to undertake a more complex, non-linear diagnosis procedure to identify which parts of the Kubernetes environment contribute to the node not ready problem and resolve it.

Solving Kubernetes Node Errors with Komodor

Kubernetes troubleshooting relies on the ability to quickly contextualize the problem with what’s happening in the rest of the cluster. More often than not, you will be conducting your investigation during fires in production. The major challenge is correlating service-level incidents with other events happening in the underlying infrastructure.

Komodor can help with our ‘Node Status’ view, built to pinpoint correlations between service or deployment issues and changes in the underlying node infrastructure. With this view you can rapidly:

  • See service-to-node associations
  • Correlate service and node health issues
  • Gain visibility over node capacity allocations, restrictions, and limitations
  • Identify “noisy neighbors” that use up cluster resources
  • Keep track of changes in managed clusters
  • Get fast access to historical node-level event data

komodor-node-status

Beyond node error remediations, Komodor can help troubleshoot a variety of Kubernetes errors and issues, acting as a single source of truth (SSOT) for all of your K8s troubleshooting needs. Komodor provides:

  • Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
  • In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
  • Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
  • Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial.

Понравилась статья? Поделить с друзьями:

Читайте также:

  • Kubectl error from server notfound the server could not find the requested resource
  • Ksun140hfan3 коды ошибок
  • Ksuite checksum error
  • Ksp ошибка unity
  • Ksp как изменить фокус

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии