Multi-Cloud DevOps — Terraform, K8s, and Cross-Cloud CI/CD

January 10, 2026 · 8 min read

DevOps & Cloud Learning Hub

When GitLab suffered a major outage in 2023, companies running exclusively on their platform scrambled. When AWS us-east-1 went down for hours in 2021, single-cloud shops lost millions. Multi-cloud is no longer a luxury — it is a strategic decision that protects your business. But doing it wrong costs more than doing nothing at all.

Why Multi-Cloud?

The case for multi-cloud goes beyond buzzwords:

Business Drivers for Multi-Cloud:

1. Avoid Vendor Lock-in
   └─ Negotiate from strength, not dependency

2. Best-of-Breed Services
   └─ AWS for compute, GCP for ML, Azure for enterprise integration

3. Regulatory Compliance
   └─ Data sovereignty: EU data on EU-region providers
   └─ Government contracts requiring specific clouds

4. Resilience
   └─ Survive single-provider outages
   └─ Geographic redundancy beyond one provider's regions

5. M&A Reality
   └─ Acquired company uses different cloud
   └─ Faster integration than migration

Multi-Cloud Challenges (Be Honest About These)

Challenge	Impact	Mitigation
Complexity explosion	3x the networking, IAM, monitoring to manage	Abstraction layers (Terraform, K8s)
Cost management	Billing across providers is painful	FinOps tooling (Kubecost, Infracost)
Skills gap	Team needs expertise in multiple clouds	Invest in cloud-agnostic tools
Lowest common denominator	Using only features available everywhere	Accept some provider-specific code (80/20 rule)
Networking complexity	Cross-cloud latency, security, DNS	Dedicated interconnects, service mesh
Inconsistent IAM	Different permission models per cloud	Centralized identity (Okta/Azure AD + OIDC)

Terraform as the Unified IaC Layer

Terraform's provider model makes it the natural choice for multi-cloud infrastructure:

# main.tf — Multi-cloud infrastructure with Terraform

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

# --- AWS: Primary compute ---
provider "aws" {
  region = "us-east-1"
  alias  = "primary"
}

resource "aws_eks_cluster" "primary" {
  provider = aws.primary
  name     = "app-primary"
  role_arn = aws_iam_role.eks.arn
  version  = "1.29"

  vpc_config {
    subnet_ids = module.aws_vpc.private_subnets
  }
}

# --- Azure: EU compliance workloads ---
provider "azurerm" {
  features {}
  subscription_id = var.azure_subscription_id
}

resource "azurerm_kubernetes_cluster" "eu" {
  name                = "app-eu"
  location            = "westeurope"
  resource_group_name = azurerm_resource_group.eu.name

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D4s_v3"
  }

  identity {
    type = "SystemAssigned"
  }
}

# --- GCP: ML workloads (TPU access) ---
provider "google" {
  project = var.gcp_project
  region  = "us-central1"
}

resource "google_container_cluster" "ml" {
  name     = "ml-cluster"
  location = "us-central1"

  node_config {
    machine_type = "n2-standard-8"

    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  initial_node_count = 3
}

Multi-Cloud Module Pattern

# modules/kubernetes-cluster/main.tf
# Abstract the cloud-specific details behind a unified interface

variable "provider" {
  type        = string
  description = "Cloud provider: aws, azure, gcp"
}

variable "cluster_name" {
  type = string
}

variable "node_count" {
  type    = number
  default = 3
}

variable "node_size" {
  type        = string
  description = "Normalized size: small, medium, large"
}

locals {
  # Normalize instance sizes across providers
  instance_types = {
    aws = {
      small  = "t3.medium"
      medium = "m5.xlarge"
      large  = "m5.2xlarge"
    }
    azure = {
      small  = "Standard_D2s_v3"
      medium = "Standard_D4s_v3"
      large  = "Standard_D8s_v3"
    }
    gcp = {
      small  = "e2-medium"
      medium = "n2-standard-4"
      large  = "n2-standard-8"
    }
  }
}

# Usage:
# module "primary_cluster" {
#   source       = "./modules/kubernetes-cluster"
#   provider     = "aws"
#   cluster_name = "app-primary"
#   node_count   = 5
#   node_size    = "medium"
# }

Kubernetes as the Unified Runtime

Kubernetes provides the application-level abstraction that makes workloads portable:

# deployment.yml — Cloud-agnostic application deployment
# This same manifest deploys to EKS, AKS, and GKE
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  labels:
    app: api-service
    cloud: "{{ .Values.cloud }}"  # Helm value: aws|azure|gcp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
        - name: api
          image: ghcr.io/myorg/api-service:v2.4.1
          ports:
            - containerPort: 8080
          env:
            - name: CLOUD_PROVIDER
              value: "{{ .Values.cloud }}"
            - name: DB_CONNECTION
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: connection_string
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"

Cross-Cloud CI/CD with GitHub Actions

# .github/workflows/multi-cloud-deploy.yml
name: Multi-Cloud Deploy

on:
  push:
    branches: [main]

env:
  IMAGE: ghcr.io/myorg/api-service

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v4

      - name: Build and push container image
        id: meta
        run: |
          IMAGE_TAG="${IMAGE}:${GITHUB_SHA::8}"
          docker build -t $IMAGE_TAG .
          docker push $IMAGE_TAG
          echo "tags=$IMAGE_TAG" >> "$GITHUB_OUTPUT"

      - name: Run security scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE }}:${{ github.sha }}

  deploy-aws:
    needs: build
    runs-on: ubuntu-latest
    environment: production-aws
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1

      - name: Deploy to EKS
        run: |
          aws eks update-kubeconfig --name app-primary
          helm upgrade --install api-service ./charts/api-service \
            --set image.tag=${GITHUB_SHA::8} \
            --set cloud=aws \
            --wait --timeout 300s

  deploy-azure:
    needs: build
    runs-on: ubuntu-latest
    environment: production-azure
    steps:
      - uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Deploy to AKS
        run: |
          az aks get-credentials --resource-group eu-rg --name app-eu
          helm upgrade --install api-service ./charts/api-service \
            --set image.tag=${GITHUB_SHA::8} \
            --set cloud=azure \
            --wait --timeout 300s

  deploy-gcp:
    needs: build
    runs-on: ubuntu-latest
    environment: production-gcp
    steps:
      - uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: ${{ secrets.GCP_WIF_PROVIDER }}
          service_account: ${{ secrets.GCP_SA_EMAIL }}

      - name: Deploy to GKE
        run: |
          gcloud container clusters get-credentials ml-cluster \
            --region us-central1
          helm upgrade --install api-service ./charts/api-service \
            --set image.tag=${GITHUB_SHA::8} \
            --set cloud=gcp \
            --wait --timeout 300s

Secrets Management Across Clouds

Secrets Management Options for Multi-Cloud:

Option 1: HashiCorp Vault (recommended for multi-cloud)
  ├── Single source of truth for all secrets
  ├── Dynamic secrets for each cloud (AWS STS, Azure SPN, GCP SA)
  ├── Unified audit trail
  └── K8s integration via Vault Secrets Operator

Option 2: External Secrets Operator (ESO)
  ├── Syncs secrets FROM cloud-native stores INTO Kubernetes
  ├── Supports AWS Secrets Manager, Azure Key Vault, GCP Secret Manager
  ├── Team uses native tools, K8s gets unified Secrets
  └── Less operational overhead than Vault

Option 3: Sealed Secrets + GitOps
  ├── Encrypted secrets stored in Git
  ├── Decrypted only inside the cluster
  └── Works anywhere K8s runs (cloud agnostic)

# external-secrets.yml — Sync secrets across clouds using ESO
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager    # or azure-keyvault, gcp-secret-manager
    kind: ClusterSecretStore
  target:
    name: db-credentials
  data:
    - secretKey: connection_string
      remoteRef:
        key: production/database
        property: connection_string

Cross-Cloud Networking

Cross-Cloud Networking Options:

1. VPN Tunnels (simplest, higher latency)
   AWS VPN Gateway ←──IPSec──→ Azure VPN Gateway
   Cost: ~$0.05/GB + hourly gateway fees
   Latency: +5-15ms

2. Dedicated Interconnects (lowest latency, highest cost)
   AWS Direct Connect ←──→ Equinix ←──→ Azure ExpressRoute
   Cost: $0.02/GB + port fees ($200-500/mo)
   Latency: +1-3ms

3. Service Mesh (application-level connectivity)
   Istio/Consul across clusters
   mTLS between services across clouds
   No infrastructure-level connectivity needed

4. Cloud-native peering
   AWS PrivateLink, Azure Private Link, GCP Private Service Connect
   Best for specific service-to-service connections

Monitoring Across Clouds

# Unified monitoring with Grafana Cloud + Prometheus
# Each cluster ships metrics to a central Grafana Cloud instance

# prometheus-agent.yml (runs on each cluster)
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus-agent
spec:
  replicas: 1
  remoteWrite:
    - url: "https://prometheus-prod-01-us-east-0.grafana.net/api/prom/push"
      basicAuth:
        username:
          name: grafana-cloud-creds
          key: username
        password:
          name: grafana-cloud-creds
          key: api-key
      writeRelabeling:
        - sourceLabels: [__name__]
          action: keep
          regex: "container_.*|kube_.*|node_.*|http_.*"
      queueConfig:
        maxSamplesPerSend: 1000
        batchSendDeadline: 30s
  externalLabels:
    cluster: "aws-primary"     # or "azure-eu", "gcp-ml"
    cloud: "aws"               # or "azure", "gcp"
    region: "us-east-1"        # cloud-specific region

Cost Management

Tool	Multi-Cloud Support	Key Feature
Kubecost	Any K8s cluster	Per-pod cost allocation across clouds
Infracost	Terraform-native	Cost estimates in PR comments before deploy
CloudHealth (VMware)	AWS, Azure, GCP	FinOps dashboards and recommendations
Vantage	All major clouds	Unified billing with Kubernetes cost reports
OpenCost	Any K8s cluster	Open-source, CNCF project

# Infracost: See cost impact in every Terraform PR
infracost breakdown --path . --format table

# Example output:
# Name                              Monthly Cost
# ─────────────────────────────────────────────
# aws_eks_cluster.primary           $73.00
# azurerm_kubernetes_cluster.eu     $438.00
# google_container_cluster.ml       $621.00
# ─────────────────────────────────────────────
# Total                             $1,132.00/mo

When Multi-Cloud Makes Sense vs. When It Does Not

Multi-cloud MAKES SENSE when:
  ✓ Regulatory requirements mandate specific providers for specific data
  ✓ You are acquiring companies on different clouds
  ✓ You need genuinely best-of-breed (GCP AI + AWS networking)
  ✓ Your scale justifies the operational overhead
  ✓ You have a platform team to manage the complexity

Multi-cloud DOES NOT make sense when:
  ✗ "Avoiding vendor lock-in" is the only reason
  ✗ Your team is < 20 engineers
  ✗ You are still figuring out one cloud
  ✗ You do not have a platform team
  ✗ The cost of abstraction exceeds the cost of lock-in

The 80/20 Rule:
  Run 80% of workloads on your primary cloud.
  Run 20% on secondary clouds where there is a clear advantage.
  Do NOT split evenly — that maximizes pain for minimal benefit.

Closing Note

Multi-cloud is a spectrum, not a binary choice. Start with Terraform and Kubernetes as your abstraction layers — they buy you portability without requiring immediate multi-cloud deployment. When the business case is clear (compliance, M&A, best-of-breed), you will already have the foundation to expand. The teams that succeed with multi-cloud are the ones that treat it as an engineering capability, not a checkbox on an architecture diagram.

Why Multi-Cloud?​

Multi-Cloud Challenges (Be Honest About These)​

Terraform as the Unified IaC Layer​

Multi-Cloud Module Pattern​

Kubernetes as the Unified Runtime​

Cross-Cloud CI/CD with GitHub Actions​

Secrets Management Across Clouds​

Cross-Cloud Networking​

Monitoring Across Clouds​

Cost Management​

When Multi-Cloud Makes Sense vs. When It Does Not​

Closing Note​

Stay Updated