Kubernetes for stateless workloads (updated)


Kubernetes is, in our opinion, the emerging standard for container orchestration and a robust open source basis for building PAAS. Since fall 2017, we're using it ourselves for all our internal and external services and with lots of customers, that's why we can wholeheartedly recommend to ADOPT Kubernetes, despite quite a few moving parts.

Updates in this radar

  • Helm usage: if you want to use Helm, do so in a without-tiller fashion: The server component (tiller) is effectively circumventing Kubernetes RBAC.
  • Kismatic (which was the way we installed Kubernetes beforehand) is not supported anymore; we switched to Rancher 2; and made great experiences.
  • By now, we're using the Custom Operator pattern based on Operator-SDK - we specifically use the Helm and Ansible-based operators because they converge the state automatically.


  • Provides a unified API for smaller and bigger workloads, with the option of scaling the applications from small to big.
  • kubectl allows developers to self-service and debug (e.g. log into the containers)
  • Can be used on a single host (using minikube or k3s), or on a fleet of many services.
  • Is the emerging de-facto standard for containerized applications.
  • consists of very clever concepts/atoms (like Deployment, Pod, Ingress, Service), which are pluggable in flexible ways
  • Using Google Container Engine, there exists a professional hosted service for critical workloads.
  • Supports Role-Based Access Control (RBAC).
  • Supported by lots of external applications.


  • the yaml configuration can be quite verbose
  • Has some learning curve, but a great interactive tutorial
  • The initial setup is nontrivial; and it is hard to figure out how to best install a working cluster.
  • quite some churn in the ecosystem

Kubernetes at Sandstorm

We at Sandstorm use Kubernetes in a little special way, which is outlined below:

  • (UPDATED) We installed Kubernetes using Rancher 2, which worked well for us. Before, we've used Kismatic, but this is not supported anymore, but Rancher 2 has come a long way.
  • As we currently only have two Kubernetes servers, we are not yet running in high-availability mode. Instead, we manually pin pods to the worker nodes for now. Because of this, we can use the hostPath persistent volumes using the host-path-provisioner, which are very easy to understand.
  • We're heavily using ingress-nginx, e.g. for things like SSO; and cert-manager for Let's Encrypt SSL certificates. Both are installed using the Rancher (helm) catalog.
  • As Docker Registry and CI system, we are using GitLab.
  • "Big" stateful services like MySQL/Postgres databases are not hosted by Kubernetes, but colocated on the same infrastructure (installed using apt etc) - for much of the reasoning explained in this blog post.
  • For user authentication, we're using the in-built rancher features.
  • Backups are automatically done using CronJob and Restic using our own scripts.
  • Monitoring is done using Prometheus and Grafana, using the built-in features in Rancher.

In the mid term term, we plan the following infrastructure additions:

  • Add a third server. Once this is done, we can in principle tolerate one failure.
  • Set up MySQL (Galera) Cluster and Postgres Replication, to have a HA DB cluster.
  • For resilient storage, the following options exist (this blog post structures it nicely):
    • (NEW) Longhorn - we're using this as a test, and works well so far. However, only supports ReadWriteOnce.
    • GlusterFS - seems like high performance and HA, and supports ReadWriteMany (that would be the main benefit for GlusterFS).
    • OpenEBS - looks interesting, but is only a block storage; thus it does not seem to support ReadWriteMany mounts (which we need).
    • Rook / CephFS - seems to be way more hard to deploy (compared to GlusterFS)

(NEW) Important links / good articles