AKS — Running Kubernetes on Azure Like a Pro
You have learned Kubernetes concepts — pods, deployments, services. You have played with Minikube locally. Now you need to run a real production cluster, and the thought of managing etcd backups, control plane upgrades, and certificate rotations makes you want to reconsider your career choices. That is exactly why AKS exists. Azure manages the control plane for free. You manage your workloads.
AKS Architecture
AKS splits Kubernetes into two parts:
- Control Plane (managed by Azure): API server, etcd, scheduler, controller manager. You never see the VMs running these components. Azure patches them, backs them up, and scales them. This is free.
- Data Plane (managed by you): Worker nodes that run your pods. These are Azure VMs you pay for.
This means you focus on deploying applications, not babysitting the cluster infrastructure. Upgrades to the control plane take a few CLI commands. Node pool upgrades can be done with zero downtime using surge upgrades.
Creating an AKS Cluster
# Create a resource group
az group create \
--name rg-aks-prod \
--location eastus
# Create an AKS cluster
az aks create \
--resource-group rg-aks-prod \
--name aks-prod-cluster \
--node-count 3 \
--node-vm-size Standard_D4s_v5 \
--network-plugin azure \
--network-policy calico \
--generate-ssh-keys \
--enable-managed-identity \
--enable-addons monitoring \
--kubernetes-version 1.29.4 \
--zones 1 2 3 \
--tags Environment=Production Team=Platform
# Get credentials to interact with the cluster
az aks get-credentials \
--resource-group rg-aks-prod \
--name aks-prod-cluster
# Verify connectivity
kubectl get nodes -o wide
Key flags explained:
--network-plugin azure— Use Azure CNI networking (pod IPs from VNet)--network-policy calico— Enable Calico network policies for pod-to-pod traffic control--zones 1 2 3— Spread nodes across three availability zones for high availability--enable-managed-identity— Use managed identity instead of service principals
Node Pools — System vs User
AKS supports multiple node pools. The system node pool runs Kubernetes system pods (CoreDNS, metrics-server). User node pools run your application workloads.
# Add a user node pool for application workloads
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--name apppool \
--node-count 5 \
--node-vm-size Standard_D8s_v5 \
--mode User \
--zones 1 2 3 \
--labels workload=app tier=frontend \
--max-pods 50
# Add a spot VM node pool for batch processing (up to 90% cheaper)
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--name spotpool \
--node-count 3 \
--node-vm-size Standard_D8s_v5 \
--mode User \
--priority Spot \
--spot-max-price -1 \
--eviction-policy Delete \
--labels workload=batch kubernetes.azure.com/scalesetpriority=spot \
--node-taints kubernetes.azure.com/scalesetpriority=spot:NoSchedule
# List all node pools
az aks nodepool list \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--output table
Spot VMs can be evicted by Azure with 30 seconds notice when Azure needs the capacity back. Use them for batch jobs, CI runners, and workloads that can handle interruption — never for stateful production services.
Azure CNI vs Kubenet Networking
| Feature | Azure CNI | Kubenet |
|---|---|---|
| Pod IPs | From VNet subnet (routable) | Private IPs behind NAT |
| Performance | Better (no overlay) | Slightly slower (NAT hop) |
| IP consumption | High (1 IP per pod) | Low (only node IPs from VNet) |
| Network Policies | Azure + Calico | Calico only |
| Best For | Production, VNet integration | Dev/test, IP-constrained networks |
Azure CNI is the recommended choice for production. Every pod gets a real VNet IP address, which means pods can communicate directly with VNet resources (databases, VMs, storage) without NAT.
Ingress Controller
An ingress controller routes external HTTP traffic to your Kubernetes services. The two common choices on AKS are NGINX and Application Gateway Ingress Controller (AGIC).
# Install NGINX Ingress Controller using Helm
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress \
--create-namespace \
--set controller.replicaCount=2 \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz \
--set controller.nodeSelector."kubernetes\.io/os"=linux
Then create an Ingress resource:
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/use-regex: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- myapp.com
secretName: tls-secret
rules:
- host: myapp.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
Azure Container Registry Integration
ACR stores your container images privately. Attach it to AKS so nodes can pull images without storing credentials:
# Create an Azure Container Registry
az acr create \
--resource-group rg-aks-prod \
--name acrprod2025 \
--sku Standard
# Attach ACR to AKS (grants AcrPull role to cluster identity)
az aks update \
--resource-group rg-aks-prod \
--name aks-prod-cluster \
--attach-acr acrprod2025
# Build and push an image to ACR
az acr build \
--registry acrprod2025 \
--image myapp:v1.0 \
--file Dockerfile .
Cluster Autoscaler
The cluster autoscaler automatically adds or removes nodes based on pending pods and resource utilization:
# Enable autoscaler on a node pool
az aks nodepool update \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--name apppool \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 20
# Check autoscaler status
kubectl describe configmap cluster-autoscaler-status -n kube-system
The autoscaler works with the Horizontal Pod Autoscaler (HPA). HPA scales pods within nodes, and when nodes fill up, the cluster autoscaler adds more nodes. When demand drops, it removes underutilized nodes.
Azure Key Vault CSI Driver
Store secrets in Azure Key Vault and mount them directly into pods — no Kubernetes Secrets needed:
# Enable the Key Vault CSI driver addon
az aks enable-addons \
--resource-group rg-aks-prod \
--name aks-prod-cluster \
--addons azure-keyvault-secrets-provider
# Verify the addon is running
kubectl get pods -n kube-system -l app=secrets-store-csi-driver
Then create a SecretProviderClass:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: azure-kv-secrets
spec:
provider: azure
parameters:
keyvaultName: "kv-prod-2025"
tenantId: "<tenant-id>"
objects: |
array:
- |
objectName: db-connection-string
objectType: secret
- |
objectName: api-key
objectType: secret
usePodIdentity: "false"
useVMManagedIdentity: "true"
userAssignedIdentityID: "<managed-identity-client-id>"
AKS Monitoring with Azure Monitor
When you create a cluster with --enable-addons monitoring, AKS deploys the Azure Monitor agent to every node. This collects container logs, performance metrics, and Kubernetes events.
# Query container logs via CLI
az monitor log-analytics query \
--workspace <workspace-id> \
--analytics-query "ContainerLogV2 | where ContainerName == 'myapp' | where LogMessage contains 'error' | top 50 by TimeGenerated" \
--output table
Key tables in Log Analytics for AKS:
ContainerLogV2— stdout/stderr from containersKubeEvents— Kubernetes events (pod scheduling, failures)KubePodInventory— Pod status across nodesPerf— Node-level CPU, memory, disk, networkInsightsMetrics— Container-level resource usage
Cost Optimization
| Strategy | Savings |
|---|---|
| Use Spot VMs for non-critical workloads | Up to 90% |
| Right-size node VM SKUs | 20-40% |
| Enable cluster autoscaler with conservative max | Avoid overprovisioning |
| Use Azure Reservations for baseline nodes | 30-60% |
| Schedule dev clusters to shut down at night | 60% |
| Use the free tier for non-production clusters | Control plane free |
# Stop a dev cluster outside business hours
az aks stop \
--resource-group rg-aks-dev \
--name aks-dev-cluster
# Start it back in the morning
az aks start \
--resource-group rg-aks-dev \
--name aks-dev-cluster
Wrapping Up
AKS takes the undifferentiated heavy lifting out of Kubernetes — free control plane, managed upgrades, integrated monitoring, and native Azure networking. Start with a single node pool, attach ACR for private images, and enable the cluster autoscaler from day one. Add spot node pools for batch workloads and use the Key Vault CSI driver to keep secrets out of your Kubernetes manifests. The goal is to spend your time building applications, not managing cluster infrastructure.
Next up: We will explore Azure Key Vault — centralizing secrets, certificates, and encryption keys so your applications never store credentials in code or configuration files.
