Wednesday, January 8, 2020

Kubernetes with vSphere CSI and CPI Part 2

In the last post we covered how to setup the initial cluster and join worker nodes with a vSphere external provider. In this post we'll cover actually installing the CNI (Cluster Network Interface), CPI (Cloud Provider Interface), and CSI (Container Storage Interface).

Calico Networking

We'll be using Calico for our CNI. You might want to check for the latest version, but installation is as simple as this on your master node:
ubuntu@k8s-master:~$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
While this will complete, it won't actually start properly because the nodes are still tainted. They need the CPI to be running first.

Configuration Files

I'm going to setup all of the required configurations at once. There are a few required:

CPI Config Map
Hopefully this makes sense. We're going to reference a secret called vsphere-credentials, which we'll create in a moment, and under the virtual center with the IP listed, kubernetes is deployed within the data center Vancouver.
ubuntu@k8s-master:~$ cat vsphere.conf 
[Global]
port = "443"
insecure-flag = "true"
secret-name = "vsphere-credentials"
secret-namespace = "kube-system"

[VirtualCenter "10.9.178.236"]
datacenters = "Vancouver"
ubuntu@k8s-master:~$ kubectl create configmap cloud-config --from-file=vsphere.conf --namespace=kube-system
CPI Secret
Again, this should be pretty simple. I was hoping to use base64 encoding for the username and password as it gets around any special character problems but I can't seem to get that working. The best I can come up with is single tick marks, so if one of those is in your password or username you'll need to figure that out (or change it). This becomes a bigger problem with the CSI configuration below, so read ahead if you've got lots of special characters in your password.
ubuntu@k8s-master:~$ cat vsphere-credentials.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: vsphere-credentials
  namespace: kube-system
stringData:
  10.9.178.236.username: 'domain\mengland'
  10.9.178.236.password: 'lJSIuej5IU$'
ubuntu@k8s-master:~$ kubectl create -f vsphere-credentials.yaml
CSI Secret
ubuntu@k8s-master:~$ cat csi-vsphere.conf 
[Global]
cluster-id = "k8s-cluster1"
[VirtualCenter "10.9.178.236"]
insecure-flag = "true"
user = "mengland@itlab.domain.com"
password = "lJSIuej5IU$"
port = "443"
datacenters = "Vancouver"
ubuntu@k8s-master:~$ kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=kube-system

Deploy CPI and CSI

Once those are completed, you need to create the roles and deploy the CPI and CSI as follows:
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/master/manifests/controller-manager/cloud-controller-manager-roles.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/master/manifests/controller-manager/cloud-controller-manager-role-bindings.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://github.com/kubernetes/cloud-provider-vsphere/raw/master/manifests/controller-manager/vsphere-cloud-controller-manager-ds.yaml
The nodes should now no longer be tainted and Calico should start up. If they don't you can have a look at the logs for your vsphere-cloud-controller; it likely has communication or account problems with vsphere.
ubuntu@k8s-master:~$ kubectl describe nodes | egrep "Taints:|Name:"
Name:               k8s-master
Taints:             node-role.kubernetes.io/master:NoSchedule
Name:               k8s-worker
Taints:             
ubuntu@k8s-master:~$ kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-648f4868b8-lbfjk   1/1     Running   0          3m57s
calico-node-d7gkh                          1/1     Running   0          3m57s
calico-node-r8b72                          1/1     Running   0          3m57s
coredns-6955765f44-rsvhh                   1/1     Running   0          20m
coredns-6955765f44-sp7xv                   1/1     Running   0          20m
etcd-k8s-master7                           1/1     Running   0          20m
kube-apiserver-k8s-master7                 1/1     Running   0          20m
kube-controller-manager-k8s-master7        1/1     Running   0          20m
kube-proxy-67cgt                           1/1     Running   0          20m
kube-proxy-nctns                           1/1     Running   0          4m30s
kube-scheduler-k8s-master7                 1/1     Running   0          20m
vsphere-cloud-controller-manager-7b44g     1/1     Running   0          71s
ubuntu@k8s-master:~$ kubectl logs -n kube-system vsphere-cloud-controller-manager-7b44g
And then deploy the CSI driver with these commands
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/v2.0.0/vsphere-67u3/vanilla/rbac/vsphere-csi-controller-rbac.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/v2.0.0/vsphere-67u3/vanilla/deploy/vsphere-csi-controller-deployment.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/v2.0.0/vsphere-67u3/vanilla/deploy/vsphere-csi-node-ds.yaml

Check Status

At this point you should have a running CPI and CSI implementation with some checks to verify
ubuntu@k8s-master:~$ kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-648f4868b8-lbfjk   1/1     Running   0          3d2h
calico-node-d7gkh                          1/1     Running   0          3d2h
calico-node-r8b72                          1/1     Running   0          3d2h
coredns-6955765f44-rsvhh                   1/1     Running   0          3d2h
coredns-6955765f44-sp7xv                   1/1     Running   0          3d2h
etcd-k8s-master7                           1/1     Running   0          3d2h
kube-apiserver-k8s-master7                 1/1     Running   0          3d2h
kube-controller-manager-k8s-master7        1/1     Running   0          3d2h
kube-proxy-67cgt                           1/1     Running   0          3d2h
kube-proxy-nctns                           1/1     Running   0          3d2h
kube-scheduler-k8s-master7                 1/1     Running   0          3d2h
vsphere-cloud-controller-manager-7b44g     1/1     Running   0          3d2h
vsphere-csi-controller-0                   5/5     Running   0          146m
vsphere-csi-node-q6pr4                     3/3     Running   0          13m
Other checks:
ubuntu@k8s-master:~$ kubectl get csidrivers
NAME                     CREATED AT
csi.vsphere.vmware.com   2020-01-04T00:48:57Z
ubuntu@k8s-master:~$ kubectl describe nodes | grep "ProviderID"
ProviderID:                   vsphere://a8f157cf-3607-43a2-8209-60200817677f
ProviderID:                   vsphere://0c723a5d-cb66-4aae-87d2-7fe673728fee

Likely Problems

If some of the pods aren't ready, you've got problems. The biggest one I had was a permission problem when trying to load CSI. The CSI driver uses a secret created from a configuration file instead of straight YAML. I'm not sure if that's part of the problem but it seems to have problems with backslashes and potentially other special characters. This means you can't have an account with domain\user format and can't have any backslashes in your password. The error looks like this:
ubuntu@k8s-master:~$ kubectl logs -n kube-system vsphere-csi-controller-0 vsphere-csi-controller
time="2020-01-06T21:45:27Z" level=fatal msg="grpc failed" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password."
I've filed a project bug #121 but I'm not sure if everyone will agree it's a bug or if it'll be fixed. For now, my solution is to ensure you use double quotes around all strings and ensure your special characters are limited to the benign ones (no asterisk, backslash, single or double quotes, or tick marks).

If you need to reload your CSI secret file you can do so like this:
ubuntu@k8s-master:~$ kubectl delete secret -n kube-system vsphere-config-secret
ubuntu@k8s-master:~$ kubectl delete statefulset -n kube-system vsphere-csi-controller
And then re-upload with
ubuntu@k8s-master:~$ kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf -n kube-system
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/1.14/deploy/vsphere-csi-controller-ss.yaml

Using Storage

First off, the fancy Cloud Native Storage for persistent volumes seems to only apply if you have VSAN, so break out your cheque book. For the rest of us, you will need to create a Storage Policy, in my case I did this based on tags. Basically, you create the tag, assign it to whatever datastores you'd like to use and then create the policy based on that tag.

Once that's done, we still need to define a storage class within the kubernetes cluster. This is similar when we added a storage class in my original blog post but we no longer need to specify a disk format or datastore target. Instead, we just reference the created storage policy, in my case, I called it k8s-default. The name of the storage class, in this example, vsphere-ssd, is what the rest of the kubernetes cluster will know the storage as. We also define this as the default so if a user doesn't request the storage name, this is what they get.
ubuntu@k8s-master:~$ cat vmware-storage.yaml 
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: vsphere-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
parameters:
  storagepolicyname: "k8s-default"
provisioner: csi.vsphere.vmware.com
ubuntu@k8s-master:~$ kubectl create -f vmware-storage.yaml 
And then to use it it's just like any other PVC on any other cluster
ubuntu@k8s-master:~$ cat pvc_test.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
ubuntu@k8s-master:~$ kubectl create -f pvc_test.yaml
ubuntu@k8s-master:~$ kubectl get pvc
NAME   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
test   Bound    pvc-bf37afa8-0065-47a9-8c7c-89db7907d538   5Gi        RWO            vsphere-ssd    32m

Storage Problems

If, when you try to attach a PV to a pod you get a message like this:
Warning  FailedMount 8s (x7 over 40s) kubelet, k8s-worker2 MountVolume.MountDevice failed for volume "pvc-194d886b-e988-48c4-8802-04da2015db4b" : rpc error: code = Internal desc = Error trying to read attached disks: open /dev/disk/by-id: no such file or directory
This is likely because you forgot to set disk.enableUUID on the virtual machine. You can have a look at Node Setup under my earlier blog post. You can also run govc if you have that setup on your machine with the following command:
govc vm.change -vm k8s-worker2 -e="disk.enableUUID=1"
You'll need to reboot the worker node, and recreate any pvc.

Monday, January 6, 2020

Kubernetes with vSphere CSI and CPI Part 1

About a year ago I wrote an article outlining steps to follow to get vSphere and Kubernetes working together. At that time I mentioned that cloud providers within Kubernetes had been deprecated but the replacements weren't ready yet. Well that's changed, so I thought I'd get an updated article outlining the new steps.
Unlike the last time I did this, there's decent documentation out there, and I'd encourage you to have a read through it, but a couple of things bothered me.
  • The document uses outdated APIs. Kubernetes still has a long way to go (in my opinion) and the pace of change is remarkable, so I guess this is to be expected
  • It isn't explained why some of the operations need to be done, so I'm going to try to explain the why with the minimum required steps

Documentation Links


Kubernetes Installation

I'm actually going to assume you have at least two Linux machine available to install kubernetes on, one master and one worker. If you follow the guide above you should be in a pretty good position to deploy kubernetes, so to start, we'll need a configuration file. This is required because you can't specify a cloud provider from the command line, and because of that you need to specify everything within a config file. What I'd really like to do is just add a "--cloud-provider" and be done, but no, at least, not for now.

The vSphere guide includes a lot of things I don't like. It specifies a specific etcd and coreDNS version, it also specifies a specific kubernetes version, none of which I wanted. Here is a minimal configuration. It's using the current apiVersion, v1beta2, although you can check for later with this link, package kubeadm.
ubuntu@k8s-master:~$ cat kubeadminit.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
  kubeletExtraArgs:
    cloud-provider: external
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
networking:
  podSubnet: "192.168.0.0/16"
We've actually got two configuration options in this file. An Init Configuration which simply tells kubernetes to use an external cloud provider, and because I want to use Calico with a 192.168.0.0/16 subnet, that's in the Cluster Configuration section. The official guide also has a bootstrap token, which you can specify if you like (we'll need it later), but I let kubeadm generate one that we can use when joining nodes. It doesn't matter where this file goes, your home folder is just fine.

We then use this config file to initialize the cluster
ubuntu@k8s-master:~$ sudo kubeadm init --config kubeadminit.yaml
You do need to run this as root (or sudo in this case) and need to pay attention to a couple of things in the output
  • Setup kubectl, which will look like this; do these now, you'll need kubectl for all the other commands:
  • mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
  • The join command which has our token
  • kubeadm join 10.2.44.53:6443 --token fa5p9m.j4qygsv5t601ug62 \
        --discovery-token-ca-cert-hash sha256:c1653ee75b86dcff36cd006730d5989048ab54e29c30290e8826aeaa752b3428 
    
Note the highlighted token that was generated for you, or if you specified one, it should be also listed here.

Normally, we'd just run the cluster join command on all the workers, but because we need to tell them to use an external cloud provider we have a chicken and egg problem as outlined in the Kubernetes Cloud Controller Manager link above. To get around this, we need to export discovery information from the master which includes address and certificate information with this command.
ubuntu@k8s-master:~$ kubectl -n kube-public get configmap cluster-info -o jsonpath='{.data.kubeconfig}' > discovery.yaml
This will produce a file that looks something like this:
ubuntu@k8s-master:~$ cat discovery.yaml
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <long_cert_will_be_here>
    server: https://10.2.44.53:6443
  name: ""
contexts: null
current-context: ""
kind: Config
preferences: {}
users: null
Now you'll need to scp that to all worker nodes. Again, it doesn't matter where it goes. It can also technically be on a web server over https if you'd rather.
ubuntu@k8s-master:~$ scp discovery.yaml ubuntu@k8s-worker1:/home/ubuntu/

Joining Worker Nodes

Like the master, we need to specify an external cloud provider, and because there isn't a command line option, we need a new configuration file. There are three important parts to this file:
  • A path to our discovery file
  • The tls bootstrap token when we initialized the cluster
  • Tell the worker to use an external cloud provider (the point of all of this)
To do that we'll have a file like this:
ubuntu@k8s-master:~$ cat kubeadminitworker.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  file:
    kubeConfigPath: /home/ubuntu/discovery.yaml
  tlsBootstrapToken: fa5p9m.j4qygsv5t601ug62
nodeRegistration:
  kubeletExtraArgs:
    cloud-provider: external
And then it's a simple command to join:
ubuntu@k8s-worker-1:~$ sudo kubeadm join --config /home/ubuntu/kubeadminitworker.yaml

Node Verification

Back on the master, make sure any new nodes show up and that they have a tainted flag applied
ubuntu@k8s-master:~$ kubectl describe nodes | egrep "Taints:|Name:"
Name:               k8s-master
Taints:             node-role.kubernetes.io/master:NoSchedule
Name:               k8s-worker1
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
Name:               k8s-worker2
Taints:             node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
Not quite in a running state but we'll finish things off in part 2 of this post.