Logical Shift: External DNS For Kubernetes Services

A service isn't useful if you can't access it, and while IP addresses are nice, it doesn't really help deliver user facing services. Really we want DNS, but given the dynamic nature of kubernetes it's impractical to implement the static configurations of the past. To solve that, we're going to implement ExternalDNS for kubernetes which will scan services and ingress points to automatically create and destroy DNS records for the cluster. Of course, nothing is completely simple in kubernetes, so we'll need a few pieces in place:

ExternalDNS - the scanning engine to create and destroy DNS records
CoreDNS - a lightweight kubernetes based DNS server to respond to client requests
Etcd - a key/value store to hold DNS records

Namespace

The first thing we're going to need is a namespace to put things. I normally keep this with one of the key pieces but felt it was better as a separate file in this case.

$ cat dns-namespace.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: dns

Etcd Cluster Setup

Technically we only need one etcd node as we don't really need the data to persist, it'd just be regenerated on the next scan, but it would halt all non-cached dns queries, so, I opted to create 3 instances. I didn't want to use an external etcd discovery service so I needed to have predictable pod names, and in order to do that, we need a stateful set rather than a deployment. If we lost a pod in the stateful set, the pod won't rejoin the cluster without having a persistent volume containing the configuration information, which is why we have a small pv for each.

If you're going to change any of the names, make sure the service name "etcd-dns" exactly matches the stateful set name. If it doesn't, kubernetes won't create an internal DNS record and the nodes won't be able to find each other; speaking from experience.

$ cat etcd.yaml 
apiVersion: v1
kind: Service
metadata:
  name: etcd-dns
  namespace: dns
spec:
  ports:
  - name: etcd-client
    port: 2379
    protocol: TCP
  - name: etcd-peer
    port: 2380
    protocol: TCP
  selector:
    app: etcd-dns
  publishNotReadyAddresses: true
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: etcd-dns
  namespace: dns
  labels:
    app: etcd-dns
spec:
  serviceName: "etcd-dns"
  replicas: 3
  selector:
    matchLabels:
      app: etcd-dns
  template:
    metadata:
      labels:
        app: etcd-dns
    spec:
      containers:
      - name: etcd-dns
        image: quay.io/coreos/etcd:latest
        ports:
        - containerPort: 2379
          name: client
        - containerPort: 2380
          name: peer
        env:
        - name: CLUSTER_SIZE
          value: "3"
        - name: SET_NAME
          value: "etcd-dns"
        volumeMounts:
        - name: datadir
          mountPath: /var/run/etcd
        command:
          - /bin/sh
          - -c
          - |
            IP=$(hostname -i)
            PEERS=""
            for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do
                PEERS="${PEERS}${PEERS:+,}${SET_NAME}-${i}=http://${SET_NAME}-${i}.${SET_NAME}:2380"
            done

            exec /usr/local/bin/etcd --name ${HOSTNAME} \
              --listen-peer-urls http://${IP}:2380 \
              --listen-client-urls http://${IP}:2379,http://127.0.0.1:2379 \
              --advertise-client-urls http://${HOSTNAME}.${SET_NAME}:2379 \
              --initial-advertise-peer-urls http://${HOSTNAME}.${SET_NAME}:2380 \
              --initial-cluster-token etcd-cluster-1 \
              --initial-cluster ${PEERS} \
              --initial-cluster-state new \
              --data-dir /var/run/etcd/default.etcd
        ports:
        - containerPort: 2379
          name: client
          protocol: TCP
        - containerPort: 2380
          name: peer
          protocol: TCP
  volumeClaimTemplates:
  - metadata:
      name: datadir
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: 1Gi

Cluster initialization is the more complicated part in this set. We're running some shell commands within the newly booted pod to fill in the required values with the PEERS variable looking like this when it's done. Could you hard code it? Sure, but that would complicate things if you change the set name or number of replicas. You can also do lots and lots of fancy stuff to remove, add, or rejoin nodes but we don't really need more than an initial static value (three in this case) so I'll leave things simple. You can check out the links in the notes section for more complicated examples.

etcd-dns-0=http://etcd-dns-0.etcd-dns:2380,etcd-dns-1=http://etcd-dns-1.etcd-dns:2380,etcd-dns-2=http://etcd-dns-2.etcd-dns:2380

If you'd like to enable https on your etcd cluster, you can easily do so by adding --auto-tls and --peer-auto-tls but this will create problems getting coredns and external-dns to connect without adding the certs there too.

CoreDNS Setup

As the end point to actually serve client requests, this is also an important piece to ensure it stays running, however, we don't really care about the data as it's backed by etcd. So, to handle this, we'll use a 3 pod deployment with a front end service. This uses a service type of LoadBalancer making it easily available to clients, so make sure you have that available. If you don't, see a previous post to install and configure MetalLB.

You might also notice that we're opening up both TCP and UDP DNS ports but only exposing UDP from the load balancer. This is largely because a load balancer can't implement both UDP and TCP at the same time, so feel free to remove TCP if you like. At some point I have hope multi protocol load balancers will be easier to manage so for now I'm leaving it in.

$ cat coredns.yaml 
apiVersion: v1
kind: Service
metadata:
  name: coredns
  namespace: dns
spec:
  ports:
  - name: coredns
    port: 53
    protocol: UDP
    targetPort: 53
  selector:
    app: coredns
  type: LoadBalancer
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: dns
data:
  Corefile: |
    . {
        errors
        health
        log
        etcd {
           endpoint http://etcd-dns:2379
        }
        cache 30
        prometheus 0.0.0.0:9153
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: dns
  labels:
    app: coredns
spec:
  replicas: 3
  selector:
    matchLabels:
      app: coredns
  template:
    metadata:
      labels:
        app: coredns
        k8s_app: kube-dns
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9153"
        prometheus.io/path: /metrics
    spec:
      containers:
      - name: coredns
        image: coredns/coredns:latest
        imagePullPolicy: IfNotPresent
        args: [ "-conf", "/etc/coredns/Corefile" ]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/coredns
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
      volumes:
      - name: config-volume
        configMap:
          name: coredns
          items:
          - key: Corefile
            path: Corefile

There are quite a few plugins [https://coredns.io/plugins/] you can apply to your coredns implementation, some of which you might want to play with. The documentation for these is quite good and easy to implement; they'd go in the ConfigMap with the errors and health entry. Just add the plugin name and any parameters they might take on a line and you're good to go. You may want to remove the log entry if your dns server is really busy or you don't want to see the continual stream of dns updates.

I'll also make special mention of the . { } block in the config map. This tells coredns to accept an entry for any domain which might not be to your liking. In my opinion, this provides the most flexibility as this shouldn't be your site's primary DNS server. Requests for a specific domain or subdomain should be forwarded here from your primary DNS, however, if you want to change this you'd simply enter one or more blocks such as example.org { } instead of . { }.

External DNS

Finally, the reason where here, deploying external-dns to our cluster. A couple of notes here; I've selected to scan the cluster for new or missing services every 15 seconds. This makes the DNS system feel very snappy when creating a service but might be too much or too little for your environment. I found the documentation particularly frustrating here. The closest example I found using coredns leverages minikube with confusing options and commands to diff a helm chart which doesn't feel very complete or intuitive to me.

$ cat external-dns.yaml 
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get","watch","list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get","watch","list"]
- apiGroups: ["extensions"]
  resources: ["ingresses"]
  verbs: ["get","watch","list"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: dns
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-dns
  namespace: dns
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
  namespace: dns
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: external-dns
  template:
    metadata:
      labels:
        app: external-dns
    spec:
      serviceAccountName: external-dns
      containers:
      - name: external-dns
        image: registry.opensource.zalan.do/teapot/external-dns:latest
        args:
        - --source=service
        - --source=ingress
        - --provider=coredns
        - --registry=txt
        - --log-level=info
        - --interval=15s
        env:
          - name: ETCD_URLS 
            value: http://etcd-dns:2379

I've left the log-level entry in although the default is info anyway as it's a helpful placeholder when you want/need to change it. The log options, which I couldn't find any documentation for and had to look within the code are: panic, debug, info, warning, error, fatal. You'll also notice a reference to our Etcd cluster service here so if you've changed that name make sure you change it here too.

Deployment and Cleanup Scripts

As I like to do, here are some quick deployment and cleanup scripts which can be helpful when testing over and over again:

$ cat deploy.sh 
kubectl create -f dns-namespace.yaml
kubectl create -f etcd.yaml
kubectl create -f external-dns.yaml
kubectl create -f coredns.yaml

As a reminder, deleting the namespace will cleanup all the persistent volumes too. All of the data will be recreated on the fly but it means a few extra seconds for the system to reclaim them and recreate when you deploy.

$ cat cleanup.sh 
kubectl delete namespace dns
kubectl delete clusterrole external-dns
kubectl delete clusterrolebinding external-dns-viewer

Success State

I also had trouble finding out what good looked like so here's what you're looking for in the logs:

$ kubectl logs -n dns external-dns-57959dcfd8-fgqpn
time="2019-06-27T01:45:21Z" level=error msg="context deadline exceeded"
time="2019-06-27T01:45:31Z" level=info msg="Add/set key /skydns/org/example/nginx/66eeb21d to Host=10.9.176.196, Text=\"heritage=external-dns,external-dns/owner=default,external-dns/resource=service/default/nginx-frontend\", TTL=0"

The actual pod name will be different for you as we used a deployment. You can get the exact name using kubectl get pods -n dns. In this example, the "context deadline exceeded" is bad. It means external dns wasn't able to register the entry with etcd, in this case because that cluster was still booting. The last line shows a successful update into etcd.

Etcd has too much to post here, but you'll see entries indicating it can't resolve a host as they boot up, and potentially several MsgVote requests as the services start on all pods. In the end it should establish a peer connection with all of the nodes and indicate the api is enabled.

$ kubectl logs -n dns etcd-dns-0
2019-06-27 01:45:15.124897 W | rafthttp: health check for peer c77fa62c6a3a8c7e could not connect: dial tcp: lookup etcd-dns-1.etcd-dns on 10.96.0.10:53: no such host
2019-06-27 01:45:15.128194 W | rafthttp: health check for peer dcb7067c28407ab9 could not connect: dial tcp: lookup etcd-dns-2.etcd-dns on 10.96.0.10:53: no such host

2019-06-27 01:45:15.272084 I | raft: 7300ad5a4b7e21a6 received MsgVoteResp from 7300ad5a4b7e21a6 at term 4
2019-06-27 01:45:15.272096 I | raft: 7300ad5a4b7e21a6 [logterm: 1, index: 3] sent MsgVote request to c77fa62c6a3a8c7e at term 4
2019-06-27 01:45:15.272105 I | raft: 7300ad5a4b7e21a6 [logterm: 1, index: 3] sent MsgVote request to dcb7067c28407ab9 at term 4
2019-06-27 01:45:17.127836 E | etcdserver: publish error: etcdserver: request timed out

2019-06-27 01:45:41.087147 I | rafthttp: peer dcb7067c28407ab9 became active
2019-06-27 01:45:41.087174 I | rafthttp: established a TCP streaming connection with peer dcb7067c28407ab9 (stream Message writer)
2019-06-27 01:45:41.098636 I | rafthttp: established a TCP streaming connection with peer dcb7067c28407ab9 (stream MsgApp v2 writer)
2019-06-27 01:45:42.350041 N | etcdserver/membership: updated the cluster version from 3.0 to 3.3
2019-06-27 01:45:42.350158 I | etcdserver/api: enabled capabilities for version 3.3

If your cluster won't start or ends up in a CrashLoopBackOff, most of the time I found the problem to be host resolution (dns). You can try changing the PEER entry from ${SET_NAME}-${i}.${SET_NAME} to just ${SET_NAME}. This won't let the cluster work, but should let you get far enough to see what's going on inside the pod. I'd also recommend setting the replicas to 1 when troubleshooting.

CoreDNS is pretty straight forward. It'll just log a startup and then client queries which looks like these examples, where the first response, nginx.example.org, returns noerror (this is good) and the second, ngingx2.example.org, returning nxdomain meaning the record doesn't exist. Again, if you want to cut down on these messages remove the log line from the config file as stated above

$ kubectl logs -n dns coredns-6c8d7c7d79-6jm5l
.:53
2019-06-27T01:44:44.570Z [INFO] CoreDNS-1.5.0
2019-06-27T01:44:44.570Z [INFO] linux/amd64, go1.12.2, e3f9a80
CoreDNS-1.5.0
linux/amd64, go1.12.2, e3f9a80
2019-06-27T02:11:43.552Z [INFO] 192.168.215.64:58369 - 10884 "A IN nginx.example.org. udp 35 false 512" NOERROR qr,aa,rd 68 0.002999881s
2019-06-27T02:13:08.448Z [INFO] 192.168.215.64:64219 - 40406 "A IN nginx2.example.org. udp 36 false 512" NXDOMAIN qr,aa,rd 87 0.007469218s

Using External DNS

To actually have a DNS name register with external DNS, you need to add an annotation to your service. Here's one for nginx that would register an external load balancer and that IP with the name nginx.example.org

$ cat nginx-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: nginx-frontend
  annotations:
    external-dns.alpha.kubernetes.io/hostname: "nginx.example.org"
spec:
  ports:
  - name: "web"
    port: 80
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

From a linux or mac host, you can use nslookup to verify the entry where 10.9.176.212 is the IP of my coredns service.

$ kubectl get svc -n dns
NAME       TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)             AGE
coredns    LoadBalancer   10.100.208.145   10.9.176.212   53:31985/UDP        20h
etcd-dns   ClusterIP      10.100.83.154    <none>         2379/TCP,2380/TCP   20h
$ nslookup nginx.example.org 10.9.176.212
Server:  10.9.176.212
Address: 10.9.176.212#53

Name: nginx.example.org
Address: 10.9.176.213

Notes

Kubernetes already comes with an etcd, and for newer releases, coredns, so why not use those? We'll you probably can but, in my opinion, these are meant for core cluster functions and we shouldn't be messing around with them, and, they're secured with https so you'd need to go through the process of getting certificates set up. While I didn't find any links that really suited my needs, here are some that helped me along, maybe they'll help you too.

Logical Shift

Tuesday, July 2, 2019

External DNS For Kubernetes Services