Skip to main content

Kubernetes Operators — Automate Complex Stateful Applications

· 6 min read
Goel Academy
DevOps & Cloud Learning Hub

Deploying a stateless web server to Kubernetes is straightforward — a Deployment, a Service, done. But try deploying a PostgreSQL cluster with streaming replication, automatic failover, point-in-time recovery, and backup schedules using just Deployments and StatefulSets. You end up with a mountain of init scripts, sidecar containers, and CronJobs that break every time you upgrade. Operators solve this by encoding all that operational knowledge into software that runs inside the cluster.

What Is an Operator?

An Operator is a Kubernetes controller that uses Custom Resource Definitions (CRDs) to manage applications the way a human operator would. Instead of you writing runbooks for "how to scale PostgreSQL" or "how to upgrade Elasticsearch," the Operator encodes those procedures as code and executes them automatically.

The pattern:

  1. Define a CRD — A new Kubernetes resource type (e.g., PostgresCluster)
  2. Build a controller — Software that watches instances of your CRD
  3. Reconciliation loop — The controller continuously compares the desired state (your CRD spec) with the actual state (running pods, PVCs, configs) and takes action to converge
User creates:                    Operator creates and manages:
┌─────────────────┐ ┌──────────────────────────┐
│ PostgresCluster │ ────► │ StatefulSet (primary) │
│ replicas: 3 │ │ StatefulSet (replicas) │
│ version: 15 │ │ Service (read-write) │
│ backups: daily │ │ Service (read-only) │
└─────────────────┘ │ CronJob (backups) │
│ ConfigMap (pg_hba, conf) │
│ Secret (credentials) │
└──────────────────────────┘

Custom Resource Definitions (CRDs)

CRDs extend the Kubernetes API with your own resource types. Once a CRD is installed, you can kubectl get, kubectl describe, and kubectl apply your custom resources just like built-in ones.

# Example: Defining a custom resource for a PostgreSQL cluster
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: postgresclusters.database.example.com
spec:
group: database.example.com
names:
kind: PostgresCluster
listKind: PostgresClusterList
plural: postgresclusters
singular: postgrescluster
shortNames:
- pg
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
version:
type: integer
replicas:
type: integer
storage:
type: string

Once the CRD is applied, you can create instances:

apiVersion: database.example.com/v1
kind: PostgresCluster
metadata:
name: orders-db
namespace: production
spec:
version: 15
replicas: 3
storage: 100Gi
kubectl get pg -n production
# NAME VERSION REPLICAS STATUS
# orders-db 15 3 Running

You rarely need to build operators from scratch. The ecosystem already has production-grade operators for most stateful workloads:

OperatorManagesKey Features
Prometheus OperatorPrometheus, AlertManager, GrafanaServiceMonitor CRDs, auto-scrape discovery
CloudNativePGPostgreSQLStreaming replication, failover, PITR backups
Zalando Postgres OperatorPostgreSQLPatroni-based HA, connection pooling
MySQL Operator (Oracle)MySQL InnoDB ClusterGroup replication, MySQL Router
Redis Operator (Spotahome)Redis Sentinel / ClusterFailover, persistence, scaling
StrimziApache KafkaKafka clusters, topics, users as CRDs
Elasticsearch Operator (ECK)Elasticsearch, KibanaRolling upgrades, snapshot/restore
Cert-ManagerTLS certificatesAuto-provision from Let's Encrypt, Vault
RookCeph storageBlock, file, object storage on K8s
Istio OperatorIstio service meshMesh installation and configuration

Example: Cert-Manager in Action

Cert-Manager is one of the most widely used operators. It automates TLS certificate management:

# Install Cert-Manager
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
# Create a ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx

---
# Request a certificate — Cert-Manager handles the rest
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: myapp-tls
namespace: production
spec:
secretName: myapp-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- myapp.example.com
- api.myapp.example.com

Cert-Manager will automatically provision the certificate from Let's Encrypt, store it in a Kubernetes Secret, and renew it before expiry. No cron jobs, no manual renewal.

Building Operators with the Operator SDK

When no existing operator fits your use case, the Operator SDK helps you build one. It supports three approaches:

ApproachLanguageComplexityBest For
Helm-basedHelm chartsLowWrapping existing Helm charts with CRD interface
Ansible-basedAnsible playbooksMediumTeams with Ansible experience
Go-basedGoHighFull control, complex reconciliation logic
# Initialize a Go-based operator project
operator-sdk init --domain example.com --repo github.com/myorg/my-operator

# Create an API (CRD + controller)
operator-sdk create api --group app --version v1 --kind MyApp --resource --controller

The generated controller has a Reconcile function — this is the core logic:

func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)

// Fetch the MyApp instance
myApp := &appv1.MyApp{}
if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}

// Define the desired Deployment
dep := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: myApp.Name,
Namespace: myApp.Namespace,
},
Spec: appsv1.DeploymentSpec{
Replicas: &myApp.Spec.Replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{"app": myApp.Name},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{"app": myApp.Name},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Name: "app",
Image: myApp.Spec.Image,
}},
},
},
},
}

// Set MyApp as the owner (garbage collection)
ctrl.SetControllerReference(myApp, dep, r.Scheme)

// Create or update the Deployment
found := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: dep.Name, Namespace: dep.Namespace}, found)
if err != nil && errors.IsNotFound(err) {
log.Info("Creating Deployment", "name", dep.Name)
return ctrl.Result{}, r.Create(ctx, dep)
}

// Update if spec changed
found.Spec = dep.Spec
return ctrl.Result{}, r.Update(ctx, found)
}

Operator Maturity Levels

The Operator Framework defines five maturity levels:

LevelCapabilityExample
Level 1Basic installHelm-based operator that deploys the app
Level 2Seamless upgradesHandles version upgrades without downtime
Level 3Full lifecycleBackup, restore, failure recovery
Level 4Deep insightsMetrics, alerts, log aggregation built in
Level 5Auto pilotAuto-scaling, auto-tuning, self-healing

Most production operators aim for Level 3-4. True Level 5 operators are rare and typically managed by the application vendor.

OperatorHub — Discover and Install

OperatorHub.io is the public registry for Kubernetes operators. If you use OLM (Operator Lifecycle Manager), you can install operators directly:

# Install OLM
operator-sdk olm install

# Browse and install from OperatorHub
kubectl get catalogsources -n olm
kubectl get packagemanifests | grep postgres

When to Use Operators vs Helm Charts

ScenarioUse HelmUse an Operator
Stateless app deploymentYesOverkill
One-time install with configYesUnnecessary
Day-2 operations (backup, failover)DifficultBuilt for this
Application-specific scaling logicCannot doYes
Self-healing beyond pod restartsCannot doYes
Managing CRDs for end usersNoYes

The rule of thumb: if your Helm chart's templates/ directory is full of CronJobs, init containers, and sidecar scripts trying to handle operational tasks — you need an operator.

Wrapping Up

Operators bring the power of automation to the hardest problems in Kubernetes — managing stateful, complex applications that need more than just "keep N pods running." Whether you use a community operator for PostgreSQL or build your own with the Operator SDK, the pattern is the same: encode operational knowledge as code, define your desired state with a CRD, and let the controller handle the rest.

Operators manage individual applications, but what about the communication between them? In the next post, we will compare service meshes — Istio, Linkerd, and Cilium — and decide when you actually need one.