Skip to main content

AKS — Running Kubernetes on Azure Like a Pro

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

You have learned Kubernetes concepts — pods, deployments, services. You have played with Minikube locally. Now you need to run a real production cluster, and the thought of managing etcd backups, control plane upgrades, and certificate rotations makes you want to reconsider your career choices. That is exactly why AKS exists. Azure manages the control plane for free. You manage your workloads.

AKS Architecture

AKS splits Kubernetes into two parts:

  • Control Plane (managed by Azure): API server, etcd, scheduler, controller manager. You never see the VMs running these components. Azure patches them, backs them up, and scales them. This is free.
  • Data Plane (managed by you): Worker nodes that run your pods. These are Azure VMs you pay for.

This means you focus on deploying applications, not babysitting the cluster infrastructure. Upgrades to the control plane take a few CLI commands. Node pool upgrades can be done with zero downtime using surge upgrades.

Creating an AKS Cluster

# Create a resource group
az group create \
--name rg-aks-prod \
--location eastus

# Create an AKS cluster
az aks create \
--resource-group rg-aks-prod \
--name aks-prod-cluster \
--node-count 3 \
--node-vm-size Standard_D4s_v5 \
--network-plugin azure \
--network-policy calico \
--generate-ssh-keys \
--enable-managed-identity \
--enable-addons monitoring \
--kubernetes-version 1.29.4 \
--zones 1 2 3 \
--tags Environment=Production Team=Platform

# Get credentials to interact with the cluster
az aks get-credentials \
--resource-group rg-aks-prod \
--name aks-prod-cluster

# Verify connectivity
kubectl get nodes -o wide

Key flags explained:

  • --network-plugin azure — Use Azure CNI networking (pod IPs from VNet)
  • --network-policy calico — Enable Calico network policies for pod-to-pod traffic control
  • --zones 1 2 3 — Spread nodes across three availability zones for high availability
  • --enable-managed-identity — Use managed identity instead of service principals

Node Pools — System vs User

AKS supports multiple node pools. The system node pool runs Kubernetes system pods (CoreDNS, metrics-server). User node pools run your application workloads.

# Add a user node pool for application workloads
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--name apppool \
--node-count 5 \
--node-vm-size Standard_D8s_v5 \
--mode User \
--zones 1 2 3 \
--labels workload=app tier=frontend \
--max-pods 50

# Add a spot VM node pool for batch processing (up to 90% cheaper)
az aks nodepool add \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--name spotpool \
--node-count 3 \
--node-vm-size Standard_D8s_v5 \
--mode User \
--priority Spot \
--spot-max-price -1 \
--eviction-policy Delete \
--labels workload=batch kubernetes.azure.com/scalesetpriority=spot \
--node-taints kubernetes.azure.com/scalesetpriority=spot:NoSchedule

# List all node pools
az aks nodepool list \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--output table

Spot VMs can be evicted by Azure with 30 seconds notice when Azure needs the capacity back. Use them for batch jobs, CI runners, and workloads that can handle interruption — never for stateful production services.

Azure CNI vs Kubenet Networking

FeatureAzure CNIKubenet
Pod IPsFrom VNet subnet (routable)Private IPs behind NAT
PerformanceBetter (no overlay)Slightly slower (NAT hop)
IP consumptionHigh (1 IP per pod)Low (only node IPs from VNet)
Network PoliciesAzure + CalicoCalico only
Best ForProduction, VNet integrationDev/test, IP-constrained networks

Azure CNI is the recommended choice for production. Every pod gets a real VNet IP address, which means pods can communicate directly with VNet resources (databases, VMs, storage) without NAT.

Ingress Controller

An ingress controller routes external HTTP traffic to your Kubernetes services. The two common choices on AKS are NGINX and Application Gateway Ingress Controller (AGIC).

# Install NGINX Ingress Controller using Helm
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress \
--create-namespace \
--set controller.replicaCount=2 \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz \
--set controller.nodeSelector."kubernetes\.io/os"=linux

Then create an Ingress resource:

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/use-regex: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- myapp.com
secretName: tls-secret
rules:
- host: myapp.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80

Azure Container Registry Integration

ACR stores your container images privately. Attach it to AKS so nodes can pull images without storing credentials:

# Create an Azure Container Registry
az acr create \
--resource-group rg-aks-prod \
--name acrprod2025 \
--sku Standard

# Attach ACR to AKS (grants AcrPull role to cluster identity)
az aks update \
--resource-group rg-aks-prod \
--name aks-prod-cluster \
--attach-acr acrprod2025

# Build and push an image to ACR
az acr build \
--registry acrprod2025 \
--image myapp:v1.0 \
--file Dockerfile .

Cluster Autoscaler

The cluster autoscaler automatically adds or removes nodes based on pending pods and resource utilization:

# Enable autoscaler on a node pool
az aks nodepool update \
--resource-group rg-aks-prod \
--cluster-name aks-prod-cluster \
--name apppool \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 20

# Check autoscaler status
kubectl describe configmap cluster-autoscaler-status -n kube-system

The autoscaler works with the Horizontal Pod Autoscaler (HPA). HPA scales pods within nodes, and when nodes fill up, the cluster autoscaler adds more nodes. When demand drops, it removes underutilized nodes.

Azure Key Vault CSI Driver

Store secrets in Azure Key Vault and mount them directly into pods — no Kubernetes Secrets needed:

# Enable the Key Vault CSI driver addon
az aks enable-addons \
--resource-group rg-aks-prod \
--name aks-prod-cluster \
--addons azure-keyvault-secrets-provider

# Verify the addon is running
kubectl get pods -n kube-system -l app=secrets-store-csi-driver

Then create a SecretProviderClass:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: azure-kv-secrets
spec:
provider: azure
parameters:
keyvaultName: "kv-prod-2025"
tenantId: "<tenant-id>"
objects: |
array:
- |
objectName: db-connection-string
objectType: secret
- |
objectName: api-key
objectType: secret
usePodIdentity: "false"
useVMManagedIdentity: "true"
userAssignedIdentityID: "<managed-identity-client-id>"

AKS Monitoring with Azure Monitor

When you create a cluster with --enable-addons monitoring, AKS deploys the Azure Monitor agent to every node. This collects container logs, performance metrics, and Kubernetes events.

# Query container logs via CLI
az monitor log-analytics query \
--workspace <workspace-id> \
--analytics-query "ContainerLogV2 | where ContainerName == 'myapp' | where LogMessage contains 'error' | top 50 by TimeGenerated" \
--output table

Key tables in Log Analytics for AKS:

  • ContainerLogV2 — stdout/stderr from containers
  • KubeEvents — Kubernetes events (pod scheduling, failures)
  • KubePodInventory — Pod status across nodes
  • Perf — Node-level CPU, memory, disk, network
  • InsightsMetrics — Container-level resource usage

Cost Optimization

StrategySavings
Use Spot VMs for non-critical workloadsUp to 90%
Right-size node VM SKUs20-40%
Enable cluster autoscaler with conservative maxAvoid overprovisioning
Use Azure Reservations for baseline nodes30-60%
Schedule dev clusters to shut down at night60%
Use the free tier for non-production clustersControl plane free
# Stop a dev cluster outside business hours
az aks stop \
--resource-group rg-aks-dev \
--name aks-dev-cluster

# Start it back in the morning
az aks start \
--resource-group rg-aks-dev \
--name aks-dev-cluster

Wrapping Up

AKS takes the undifferentiated heavy lifting out of Kubernetes — free control plane, managed upgrades, integrated monitoring, and native Azure networking. Start with a single node pool, attach ACR for private images, and enable the cluster autoscaler from day one. Add spot node pools for batch workloads and use the Key Vault CSI driver to keep secrets out of your Kubernetes manifests. The goal is to spend your time building applications, not managing cluster infrastructure.


Next up: We will explore Azure Key Vault — centralizing secrets, certificates, and encryption keys so your applications never store credentials in code or configuration files.