Running Cilium in K3S and K3D (lightweight Kubernetes) on Mac OS for developmentby Sebastian Kurfürst on 19.12.2020
Every year around Christmas time, I am revisiting our internal architecture. In the beginning of the company, we used classical servers provisioned via Ansible. Then, we started using dokku - a heroku-inspired PaaS based on Docker and containers. And about two years ago, we took the plunge and are running Kubernetes on bare-metal hardware (first manually installed, currently with Rancher 2).
Due to an upcoming server move, I have revisited our options for running Kubernetes - and I'd really love to use k3s. k3s is an ultra-lightweight Kubernetes distribution, also built by the folks at Rancher labs; and also manageable via Rancher itself.
In addition to that, I've followed the developments around eBPF closely; especially in the Kubernetes space. There, the Cilium project provides a networking plugin for Kubernetes built around eBPF, which provides great observability into traffic flows, and accepted and discarded packets. Having used Calico/Flannel/Canal before, I feel this understanding why a packet is dropped is a huge benefit.
Finally, for testing out everything, I love to use my local Mac OS machine - and k3d is a command-line tool which makes it really easy to run k3s (in docker-for-mac).
It turns out that k3s on Docker for Mac with Cilium has some difficulties - which is why I am summarizing how to make it work and what the pitfalls are.
Prerequisites for following along
- Docker for Mac installed
- k3d installed, e.g. via brew install k3d
- Helm installed, e.g. via brew install helm
1. Start up a Kubernetes instance
- k3d cluster create foo --agents 1 \
- --k3s-server-arg "--disable=servicelb" --k3s-server-arg "--disable=traefik" --no-lb \
- --k3s-server-arg "--disable-network-policy" --k3s-server-arg "--flannel-backend=none"
The important k3s arguments here are --disable-network-policy and --flannel-backend=none. The other arguments are just relevant to our instance, as we want to run nginx ingress instead of traefik for instance.
Using docker ps, you can ensure that two containers where spinned up and stay healthy.
2. Mount the BPF file system in the k3s docker containers
This is the secret sauce which lateron makes cilium work. It turns out that you cannot use a cilium init-container in the k3d/k3s combo, because it tries to run a script in the context of the above containers using the /bin/bash interpreter (but the k3s images are based on busybox, and only have a /bin/sh interpreter available).
- docker exec -it k3d-foo-agent-0 mount bpffs /sys/fs/bpf -t bpf
- docker exec -it k3d-foo-agent-0 mount --make-shared /sys/fs/bpf
- # this needs to be done for every container (every agent and every server)
- docker exec -it k3d-foo-server-0 mount bpffs /sys/fs/bpf -t bpf
- docker exec -it k3d-foo-server-0 mount --make-shared /sys/fs/bpf
3. Everything prepared - install cilium
Now you can install cilium, as explained in the docs. NOTE: We adapted the installation process from the kind instructions, as this used helm, while the normal cilium k3s install guide uses pre-built manifests (which is less flexible).
- helm repo add cilium https://helm.cilium.io/
- helm install cilium cilium/cilium --version 1.9.1 \
- --namespace kube-system \
- --set kubeProxyReplacement=partial \
- --set hostServices.enabled=false \
- --set externalIPs.enabled=true \
- --set nodePort.enabled=true \
- --set hostPort.enabled=true \
- --set bpf.masquerade=false \
- --set image.pullPolicy=IfNotPresent \
- --set ipam.mode=kubernetes
- helm upgrade cilium cilium/cilium --version 1.9.1 \
- --namespace kube-system \
- --reuse-values \
- --set hubble.listenAddress=":4244" \
- --set hubble.relay.enabled=true \
- --set hubble.ui.enabled=true
I do not yet fully understand all the parameters to the Cilium helm chart - but now I can easily create and tear down K3s instances with Cilium on my local machine - and play around with these parameters in detail :-)
You can now watch the Cilium containers being all started properly, using kubectl -n kube-system get pods --watch.
Problems and learnings
There were quite some learnings to me in the process.
First, it is important to remember that Docker-for-Mac runs an ultra-lightweight VM, where all docker containers are scheduled. This means Cilium on k3s inside k3d behaves differently than on "real" servers, because both k3s agents (=docker containers) share the same kernel (in the lightweight docker-for-mac VM), and thus the same eBPF runtime!
I had the problem that the containers were not properly booting up with very weird error messages; and finally, a complete docker restart has fixed the problem. This is because eBPF as a kernel technology is stateful; and to have a safe state again, you need to restart the Linux kernel, and thus the docker-for-mac VM completely.