Kubernetes Operators — Automate Complex Stateful Applications
Deploying a stateless web server to Kubernetes is straightforward — a Deployment, a Service, done. But try deploying a PostgreSQL cluster with streaming replication, automatic failover, point-in-time recovery, and backup schedules using just Deployments and StatefulSets. You end up with a mountain of init scripts, sidecar containers, and CronJobs that break every time you upgrade. Operators solve this by encoding all that operational knowledge into software that runs inside the cluster.
What Is an Operator?
An Operator is a Kubernetes controller that uses Custom Resource Definitions (CRDs) to manage applications the way a human operator would. Instead of you writing runbooks for "how to scale PostgreSQL" or "how to upgrade Elasticsearch," the Operator encodes those procedures as code and executes them automatically.
The pattern:
- Define a CRD — A new Kubernetes resource type (e.g.,
PostgresCluster) - Build a controller — Software that watches instances of your CRD
- Reconciliation loop — The controller continuously compares the desired state (your CRD spec) with the actual state (running pods, PVCs, configs) and takes action to converge
User creates: Operator creates and manages:
┌─────────────────┐ ┌──────────────────── ──────┐
│ PostgresCluster │ ────► │ StatefulSet (primary) │
│ replicas: 3 │ │ StatefulSet (replicas) │
│ version: 15 │ │ Service (read-write) │
│ backups: daily │ │ Service (read-only) │
└─────────────────┘ │ CronJob (backups) │
│ ConfigMap (pg_hba, conf) │
│ Secret (credentials) │
└──────────────────────────┘
Custom Resource Definitions (CRDs)
CRDs extend the Kubernetes API with your own resource types. Once a CRD is installed, you can kubectl get, kubectl describe, and kubectl apply your custom resources just like built-in ones.
# Example: Defining a custom resource for a PostgreSQL cluster
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: postgresclusters.database.example.com
spec:
group: database.example.com
names:
kind: PostgresCluster
listKind: PostgresClusterList
plural: postgresclusters
singular: postgrescluster
shortNames:
- pg
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
version:
type: integer
replicas:
type: integer
storage:
type: string
Once the CRD is applied, you can create instances:
apiVersion: database.example.com/v1
kind: PostgresCluster
metadata:
name: orders-db
namespace: production
spec:
version: 15
replicas: 3
storage: 100Gi
kubectl get pg -n production
# NAME VERSION REPLICAS STATUS
# orders-db 15 3 Running
Popular Kubernetes Operators
You rarely need to build operators from scratch. The ecosystem already has production-grade operators for most stateful workloads:
| Operator | Manages | Key Features |
|---|---|---|
| Prometheus Operator | Prometheus, AlertManager, Grafana | ServiceMonitor CRDs, auto-scrape discovery |
| CloudNativePG | PostgreSQL | Streaming replication, failover, PITR backups |
| Zalando Postgres Operator | PostgreSQL | Patroni-based HA, connection pooling |
| MySQL Operator (Oracle) | MySQL InnoDB Cluster | Group replication, MySQL Router |
| Redis Operator (Spotahome) | Redis Sentinel / Cluster | Failover, persistence, scaling |
| Strimzi | Apache Kafka | Kafka clusters, topics, users as CRDs |
| Elasticsearch Operator (ECK) | Elasticsearch, Kibana | Rolling upgrades, snapshot/restore |
| Cert-Manager | TLS certificates | Auto-provision from Let's Encrypt, Vault |
| Rook | Ceph storage | Block, file, object storage on K8s |
| Istio Operator | Istio service mesh | Mesh installation and configuration |
Example: Cert-Manager in Action
Cert-Manager is one of the most widely used operators. It automates TLS certificate management:
# Install Cert-Manager
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
# Create a ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
---
# Request a certificate — Cert-Manager handles the rest
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: myapp-tls
namespace: production
spec:
secretName: myapp-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- myapp.example.com
- api.myapp.example.com
Cert-Manager will automatically provision the certificate from Let's Encrypt, store it in a Kubernetes Secret, and renew it before expiry. No cron jobs, no manual renewal.
Building Operators with the Operator SDK
When no existing operator fits your use case, the Operator SDK helps you build one. It supports three approaches:
| Approach | Language | Complexity | Best For |
|---|---|---|---|
| Helm-based | Helm charts | Low | Wrapping existing Helm charts with CRD interface |
| Ansible-based | Ansible playbooks | Medium | Teams with Ansible experience |
| Go-based | Go | High | Full control, complex reconciliation logic |
# Initialize a Go-based operator project
operator-sdk init --domain example.com --repo github.com/myorg/my-operator
# Create an API (CRD + controller)
operator-sdk create api --group app --version v1 --kind MyApp --resource --controller
The generated controller has a Reconcile function — this is the core logic:
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// Fetch the MyApp instance
myApp := &appv1.MyApp{}
if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Define the desired Deployment
dep := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: myApp.Name,
Namespace: myApp.Namespace,
},
Spec: appsv1.DeploymentSpec{
Replicas: &myApp.Spec.Replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{"app": myApp.Name},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{"app": myApp.Name},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Name: "app",
Image: myApp.Spec.Image,
}},
},
},
},
}
// Set MyApp as the owner (garbage collection)
ctrl.SetControllerReference(myApp, dep, r.Scheme)
// Create or update the Deployment
found := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: dep.Name, Namespace: dep.Namespace}, found)
if err != nil && errors.IsNotFound(err) {
log.Info("Creating Deployment", "name", dep.Name)
return ctrl.Result{}, r.Create(ctx, dep)
}
// Update if spec changed
found.Spec = dep.Spec
return ctrl.Result{}, r.Update(ctx, found)
}
Operator Maturity Levels
The Operator Framework defines five maturity levels:
| Level | Capability | Example |
|---|---|---|
| Level 1 | Basic install | Helm-based operator that deploys the app |
| Level 2 | Seamless upgrades | Handles version upgrades without downtime |
| Level 3 | Full lifecycle | Backup, restore, failure recovery |
| Level 4 | Deep insights | Metrics, alerts, log aggregation built in |
| Level 5 | Auto pilot | Auto-scaling, auto-tuning, self-healing |
Most production operators aim for Level 3-4. True Level 5 operators are rare and typically managed by the application vendor.
OperatorHub — Discover and Install
OperatorHub.io is the public registry for Kubernetes operators. If you use OLM (Operator Lifecycle Manager), you can install operators directly:
# Install OLM
operator-sdk olm install
# Browse and install from OperatorHub
kubectl get catalogsources -n olm
kubectl get packagemanifests | grep postgres
When to Use Operators vs Helm Charts
| Scenario | Use Helm | Use an Operator |
|---|---|---|
| Stateless app deployment | Yes | Overkill |
| One-time install with config | Yes | Unnecessary |
| Day-2 operations (backup, failover) | Difficult | Built for this |
| Application-specific scaling logic | Cannot do | Yes |
| Self-healing beyond pod restarts | Cannot do | Yes |
| Managing CRDs for end users | No | Yes |
The rule of thumb: if your Helm chart's templates/ directory is full of CronJobs, init containers, and sidecar scripts trying to handle operational tasks — you need an operator.
Wrapping Up
Operators bring the power of automation to the hardest problems in Kubernetes — managing stateful, complex applications that need more than just "keep N pods running." Whether you use a community operator for PostgreSQL or build your own with the Operator SDK, the pattern is the same: encode operational knowledge as code, define your desired state with a CRD, and let the controller handle the rest.
Operators manage individual applications, but what about the communication between them? In the next post, we will compare service meshes — Istio, Linkerd, and Cilium — and decide when you actually need one.
