Kubernetes cluster recipes not seeing nodes

fred974

Hi,

I just used XO ro deploy a 1x master, 3 node Kubernetes Recipe and I don't thing the deployment was successful.

Looking at the devblog, I was expecting to enter network CIDR but there was no option for that on my screen.

When I login and run kubectl get nodes I get the following:

E0323 10:59:45.750544   23576 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0323 10:59:45.752762   23576 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0323 10:59:45.753346   23576 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0323 10:59:45.755437   23576 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0323 10:59:45.757850   23576 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

I deployed this recipe in view of leaning it so I am really not sure if what I am seeing is correct or not but looking at other posts on this forum I was expected to see something like

$ kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   22m   v1.19.2
node-1   Ready    <none>   15m   v1.19.2
node-2   Ready    <none>   15m   v1.19.2
node-3   Ready    <none>   15m   v1.19.2

The Kubernetes Master can ping all the nodes.
XOA 5.78.0
xcp-ng 8.2
XOSTOR SR

Thank you

olivierlambert

Let me ping the right person on this

@BenjiReis can you ask Gabriel to come here?

GabrielG

Hi,

This error can occur when something went wrong during the installation.

Is there a config file located in "$HOME/.kube/"?

fred974

@GabrielG said in Kubernetes cluster recipes not seeing nodes:

Is there a config file located in "$HOME/.kube/"?

I do have a config file in the Kubernetes Master node. in /home/debian/.kube/config but not in any of the Kubernetes node. Is that normal? Should I copy the config file to the nodes?

GabrielG

There is no need to copy this file into the worker nodes.

Were you able to see or capture any error message during the installation?

fred974

@GabrielG the installation took a very long time so I left it. When I came back only the master was up and running. The nodes were down. I powered them up manually

fred974

@GabrielG Do you have any suggestion on how to fix the cluster?

GabrielG

It's hard to say without knowing what went wrong during the installation.

First, I would say to check if the config file /home/debian/.kube/config is the same as /etc/kubernetes/admin.conf and if debian is correctly assign as the owner of the file.

fred974

@GabrielG the file content are identical but the file ownership is different. admin.conf is owned by root and not 'debian'. Should it be debian?

debian@master:~/.kube$ pwd
/home/debian/.kube

debian@master:~/.kube$ ls -la
total 20
drwxr-xr-x 3 root   root   4096 Mar 21 13:36 .
drwxr-xr-x 4 debian debian 4096 Mar 21 13:36 ..
drwxr-x--- 4 root   root   4096 Mar 21 13:36 cache
-rw------- 1 debian debian 5638 Mar 21 13:36 config

debian@master:/etc/kubernetes$ pwd
/etc/kubernetes
debian@master:/etc/kubernetes$ ls -la
total 44
drwxr-xr-x  4 root root 4096 Mar 21 13:36 .
drwxr-xr-x 77 root root 4096 Mar 27 04:07 ..
-rw-------  1 root root 5638 Mar 21 13:36 admin.conf
-rw-------  1 root root 5674 Mar 21 13:36 controller-manager.conf
-rw-------  1 root root 1962 Mar 21 13:36 kubelet.conf
drwxr-xr-x  2 root root 4096 Mar 21 13:36 manifests
drwxr-xr-x  3 root root 4096 Mar 21 13:36 pki
-rw-------  1 root root 5622 Mar 21 13:36 scheduler.conf

GabrielG

@fred974 said in Kubernetes cluster recipes not seeing nodes:

Should it be debian?

No, only the /home/debian/.kube/config is meant to be owned by debian user.

Are you using kubectl with debian user or with the root user?

fred974

@GabrielG said in Kubernetes cluster recipes not seeing nodes:

Are you using kubectl with debian user or with the root user?

I was using the root account I tried with the debian user and I now get something

debian@master:~$ kubectl get nodes
NAME     STATUS   ROLES           AGE     VERSION
master   Ready    control-plane   5d23h   v1.26.3
node-2   Ready    <none>          5d23h   v1.26.3

I have created a cluster with 1x master and 3x nodes. Should the output of the command above return 2 nodes?

GabrielG

Yes, you should have something like that:

debian@master:~$ kubectl get nodes
NAME     STATUS   ROLES           AGE     VERSION
master   Ready    control-plane   6m52s   v1.26.3
node-1   Ready    <none>          115s    v1.26.3
node-2   Ready    <none>          2m47s   v1.26.3
node-3   Ready    <none>          2m36s   v1.26.3

Are all worker nodes vm started? What's the output of kubectl get events?

fred974

@GabrielG Sorry for the late reply. Here is what I have.

debian@master:~$ kubectl get nodes
NAME     STATUS   ROLES           AGE     VERSION
master   Ready    control-plane   7d22h   v1.26.3
node-2   Ready    <none>          7d22h   v1.26.3

and

debian@master:~$ kubectl get events
No resources found in default namespace.

GabrielG

Thank you.

Are all VMs started?

What's the output of kubectl get pods --all-namespaces?

fred974

@GabrielG said in Kubernetes cluster recipes not seeing nodes:

Are all VMs started?

Yes, all the VMs are up and running

@GabrielG said in Kubernetes cluster recipes not seeing nodes:

What's the output of kubectl get pods --all-namespaces?

debian@master:~$ kubectl get pods --all-namespaces
NAMESPACE      NAME                             READY   STATUS    RESTARTS        AGE
kube-flannel   kube-flannel-ds-mj4n6            1/1     Running   2 (3d ago)      8d
kube-flannel   kube-flannel-ds-vtd2k            1/1     Running   2 (6d19h ago)   8d
kube-system    coredns-787d4945fb-85867         1/1     Running   2 (6d19h ago)   8d
kube-system    coredns-787d4945fb-dn96g         1/1     Running   2 (6d19h ago)   8d
kube-system    etcd-master                      1/1     Running   2 (6d19h ago)   8d
kube-system    kube-apiserver-master            1/1     Running   2 (6d19h ago)   8d
kube-system    kube-controller-manager-master   1/1     Running   2 (6d19h ago)   8d
kube-system    kube-proxy-fmjnv                 1/1     Running   2 (6d19h ago)   8d
kube-system    kube-proxy-gxsrs                 1/1     Running   2 (3d ago)      8d
kube-system    kube-scheduler-master            1/1     Running   2 (6d19h ago)   8d

Thank you very much

fred974

@GabrielG Do you think I should delete all the VMs and reun the deploy recipe again? Also is it normal that I no longer have the option to set a network CIDR like before?

GabrielG

You can do that but it won't help us to understand what when wrong during the installation of the worker nodes 1 and 3.

Can you show me what's the output of sudo cat /var/log/messages for each nodes (master and workers)?

Concerning the CIDR, we are now using flannel as Container Network Interface, which uses a default CIDR (10.244.0.0/16) allocated to the pods network.

fred974

@GabrielG said in Kubernetes cluster recipes not seeing nodes:

Can you show me what's the output of sudo cat /var/log/messages for each nodes (master and workers)?

From the master:

debian@master:~$ sudo cat /var/log/messages
Mar 26 00:10:18 master rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="572" x-info="https://www.rsyslog.com"] rsyslogd was HUPed

From node1:
https://pastebin.com/xrqPd88V

From node2:
https://pastebin.com/aJch3diH

From node3:
https://pastebin.com/Zc1y42NA

GabrielG

Thank you, I'll take a look tomorrow.

Is it the whole output for the master?

fred974

@GabrielG yes, all of it