Logical Shift: Kubernetes with vSphere CSI and CPI Part 2

In the last post we covered how to setup the initial cluster and join worker nodes with a vSphere external provider. In this post we'll cover actually installing the CNI (Cluster Network Interface), CPI (Cloud Provider Interface), and CSI (Container Storage Interface).

Calico Networking

We'll be using Calico for our CNI. You might want to check for the latest version, but installation is as simple as this on your master node:

ubuntu@k8s-master:~$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

While this will complete, it won't actually start properly because the nodes are still tainted. They need the CPI to be running first.

Configuration Files

I'm going to setup all of the required configurations at once. There are a few required:

CPI Config Map
Hopefully this makes sense. We're going to reference a secret called vsphere-credentials, which we'll create in a moment, and under the virtual center with the IP listed, kubernetes is deployed within the data center Vancouver.

ubuntu@k8s-master:~$ cat vsphere.conf 
[Global]
port = "443"
insecure-flag = "true"
secret-name = "vsphere-credentials"
secret-namespace = "kube-system"

[VirtualCenter "10.9.178.236"]
datacenters = "Vancouver"
ubuntu@k8s-master:~$ kubectl create configmap cloud-config --from-file=vsphere.conf --namespace=kube-system

CPI Secret
Again, this should be pretty simple. I was hoping to use base64 encoding for the username and password as it gets around any special character problems but I can't seem to get that working. The best I can come up with is single tick marks, so if one of those is in your password or username you'll need to figure that out (or change it). This becomes a bigger problem with the CSI configuration below, so read ahead if you've got lots of special characters in your password.

ubuntu@k8s-master:~$ cat vsphere-credentials.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: vsphere-credentials
  namespace: kube-system
stringData:
  10.9.178.236.username: 'domain\mengland'
  10.9.178.236.password: 'lJSIuej5IU$'
ubuntu@k8s-master:~$ kubectl create -f vsphere-credentials.yaml

CSI Secret

ubuntu@k8s-master:~$ cat csi-vsphere.conf 
[Global]
cluster-id = "k8s-cluster1"
[VirtualCenter "10.9.178.236"]
insecure-flag = "true"
user = "mengland@itlab.domain.com"
password = "lJSIuej5IU$"
port = "443"
datacenters = "Vancouver"
ubuntu@k8s-master:~$ kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=kube-system

Deploy CPI and CSI

Once those are completed, you need to create the roles and deploy the CPI and CSI as follows:

ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/master/manifests/controller-manager/cloud-controller-manager-roles.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-vsphere/master/manifests/controller-manager/cloud-controller-manager-role-bindings.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://github.com/kubernetes/cloud-provider-vsphere/raw/master/manifests/controller-manager/vsphere-cloud-controller-manager-ds.yaml

The nodes should now no longer be tainted and Calico should start up. If they don't you can have a look at the logs for your vsphere-cloud-controller; it likely has communication or account problems with vsphere.

ubuntu@k8s-master:~$ kubectl describe nodes | egrep "Taints:|Name:"
Name:               k8s-master
Taints:             node-role.kubernetes.io/master:NoSchedule
Name:               k8s-worker
Taints:             
ubuntu@k8s-master:~$ kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-648f4868b8-lbfjk   1/1     Running   0          3m57s
calico-node-d7gkh                          1/1     Running   0          3m57s
calico-node-r8b72                          1/1     Running   0          3m57s
coredns-6955765f44-rsvhh                   1/1     Running   0          20m
coredns-6955765f44-sp7xv                   1/1     Running   0          20m
etcd-k8s-master7                           1/1     Running   0          20m
kube-apiserver-k8s-master7                 1/1     Running   0          20m
kube-controller-manager-k8s-master7        1/1     Running   0          20m
kube-proxy-67cgt                           1/1     Running   0          20m
kube-proxy-nctns                           1/1     Running   0          4m30s
kube-scheduler-k8s-master7                 1/1     Running   0          20m
vsphere-cloud-controller-manager-7b44g     1/1     Running   0          71s
ubuntu@k8s-master:~$ kubectl logs -n kube-system vsphere-cloud-controller-manager-7b44g

And then deploy the CSI driver with these commands

ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/v2.0.0/vsphere-67u3/vanilla/rbac/vsphere-csi-controller-rbac.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/v2.0.0/vsphere-67u3/vanilla/deploy/vsphere-csi-controller-deployment.yaml
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/v2.0.0/vsphere-67u3/vanilla/deploy/vsphere-csi-node-ds.yaml

Check Status

At this point you should have a running CPI and CSI implementation with some checks to verify

ubuntu@k8s-master:~$ kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-648f4868b8-lbfjk   1/1     Running   0          3d2h
calico-node-d7gkh                          1/1     Running   0          3d2h
calico-node-r8b72                          1/1     Running   0          3d2h
coredns-6955765f44-rsvhh                   1/1     Running   0          3d2h
coredns-6955765f44-sp7xv                   1/1     Running   0          3d2h
etcd-k8s-master7                           1/1     Running   0          3d2h
kube-apiserver-k8s-master7                 1/1     Running   0          3d2h
kube-controller-manager-k8s-master7        1/1     Running   0          3d2h
kube-proxy-67cgt                           1/1     Running   0          3d2h
kube-proxy-nctns                           1/1     Running   0          3d2h
kube-scheduler-k8s-master7                 1/1     Running   0          3d2h
vsphere-cloud-controller-manager-7b44g     1/1     Running   0          3d2h
vsphere-csi-controller-0                   5/5     Running   0          146m
vsphere-csi-node-q6pr4                     3/3     Running   0          13m

Other checks:

ubuntu@k8s-master:~$ kubectl get csidrivers
NAME                     CREATED AT
csi.vsphere.vmware.com   2020-01-04T00:48:57Z
ubuntu@k8s-master:~$ kubectl describe nodes | grep "ProviderID"
ProviderID:                   vsphere://a8f157cf-3607-43a2-8209-60200817677f
ProviderID:                   vsphere://0c723a5d-cb66-4aae-87d2-7fe673728fee

Likely Problems

If some of the pods aren't ready, you've got problems. The biggest one I had was a permission problem when trying to load CSI. The CSI driver uses a secret created from a configuration file instead of straight YAML. I'm not sure if that's part of the problem but it seems to have problems with backslashes and potentially other special characters. This means you can't have an account with domain\user format and can't have any backslashes in your password. The error looks like this:

ubuntu@k8s-master:~$ kubectl logs -n kube-system vsphere-csi-controller-0 vsphere-csi-controller
time="2020-01-06T21:45:27Z" level=fatal msg="grpc failed" error="ServerFaultCode: Cannot complete login due to an incorrect user name or password."

I've filed a project bug #121 but I'm not sure if everyone will agree it's a bug or if it'll be fixed. For now, my solution is to ensure you use double quotes around all strings and ensure your special characters are limited to the benign ones (no asterisk, backslash, single or double quotes, or tick marks).

If you need to reload your CSI secret file you can do so like this:

ubuntu@k8s-master:~$ kubectl delete secret -n kube-system vsphere-config-secret
ubuntu@k8s-master:~$ kubectl delete statefulset -n kube-system vsphere-csi-controller

And then re-upload with

ubuntu@k8s-master:~$ kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf -n kube-system
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/1.14/deploy/vsphere-csi-controller-ss.yaml

Using Storage

First off, the fancy Cloud Native Storage for persistent volumes seems to only apply if you have VSAN, so break out your cheque book. For the rest of us, you will need to create a Storage Policy, in my case I did this based on tags. Basically, you create the tag, assign it to whatever datastores you'd like to use and then create the policy based on that tag.

Once that's done, we still need to define a storage class within the kubernetes cluster. This is similar when we added a storage class in my original blog post but we no longer need to specify a disk format or datastore target. Instead, we just reference the created storage policy, in my case, I called it k8s-default. The name of the storage class, in this example, vsphere-ssd, is what the rest of the kubernetes cluster will know the storage as. We also define this as the default so if a user doesn't request the storage name, this is what they get.

ubuntu@k8s-master:~$ cat vmware-storage.yaml 
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: vsphere-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
parameters:
  storagepolicyname: "k8s-default"
provisioner: csi.vsphere.vmware.com
ubuntu@k8s-master:~$ kubectl create -f vmware-storage.yaml

And then to use it it's just like any other PVC on any other cluster

ubuntu@k8s-master:~$ cat pvc_test.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
ubuntu@k8s-master:~$ kubectl create -f pvc_test.yaml
ubuntu@k8s-master:~$ kubectl get pvc
NAME   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
test   Bound    pvc-bf37afa8-0065-47a9-8c7c-89db7907d538   5Gi        RWO            vsphere-ssd    32m

Storage Problems

If, when you try to attach a PV to a pod you get a message like this:

Warning  FailedMount 8s (x7 over 40s) kubelet, k8s-worker2 MountVolume.MountDevice failed for volume "pvc-194d886b-e988-48c4-8802-04da2015db4b" : rpc error: code = Internal desc = Error trying to read attached disks: open /dev/disk/by-id: no such file or directory

This is likely because you forgot to set disk.enableUUID on the virtual machine. You can have a look at Node Setup under my earlier blog post. You can also run govc if you have that setup on your machine with the following command:

govc vm.change -vm k8s-worker2 -e="disk.enableUUID=1"

You'll need to reboot the worker node, and recreate any pvc.

Logical Shift

Wednesday, January 8, 2020

Kubernetes with vSphere CSI and CPI Part 2