Multi-Cluster Kubernetes — Federation, Submariner, and Cluster API
Running a single Kubernetes cluster is straightforward until your application needs to survive a regional outage, comply with data sovereignty laws, or serve users on three continents without 200ms latency. At that point, you need multiple clusters — and that changes everything about how you deploy, network, and manage workloads.
Why Multi-Cluster?
Before adding the complexity of multiple clusters, make sure you actually need them. Here are the legitimate reasons:
| Reason | Example | Single-Cluster Alternative? |
|---|---|---|
| Disaster Recovery | Region goes down, app stays up | Multi-AZ (partial protection only) |
| Data Sovereignty | EU data must stay in EU region | Namespace isolation (does not satisfy regulators) |
| Latency | Users in Asia, US, Europe need <50ms | CDN helps for static, not for APIs |
| Blast Radius | Bad deploy kills one cluster, not all | Canary deploys (still single point of failure) |
| Team Isolation | Platform team vs application teams | Namespaces + RBAC (works for smaller orgs) |
| Hybrid/Multi-Cloud | Avoid vendor lock-in, use best-of-breed | Not possible with single cluster |
If your answer is "we just want it for fun" — stop here. Multi-cluster is operationally expensive.
Multi-Cluster Patterns
The pattern you choose dictates your architecture, tooling, and recovery time:
| Pattern | Description | RTO | Complexity | Use Case |
|---|---|---|---|---|
| Active-Active | All clusters serve traffic simultaneously | ~0 | Very High | Global apps, zero downtime |
| Active-Passive | Standby cluster takes over on failure | Minutes | Medium | DR for critical apps |
| Hub-Spoke | Central management cluster, edge workload clusters | Varies | High | Edge computing, retail |
| Federated | Clusters loosely coupled, shared config | Varies | High | Multi-team, multi-region |
KubeFed — Kubernetes Federation v2
KubeFed lets you distribute resources across clusters from a single control plane. You define a FederatedDeployment, and KubeFed pushes it to member clusters.
Install KubeFed
# Add the KubeFed Helm repo
helm repo add kubefed-charts https://raw.githubusercontent.com/kubernetes-sigs/kubefed/master/charts
helm repo update
# Install KubeFed in the host cluster
helm install kubefed kubefed-charts/kubefed \
--namespace kube-federation-system \
--create-namespace \
--set controllermanager.replicaCount=2
# Join member clusters
kubefedctl join cluster-us-east \
--cluster-context=us-east-ctx \
--host-cluster-context=hub-ctx \
--v=2
kubefedctl join cluster-eu-west \
--cluster-context=eu-west-ctx \
--host-cluster-context=hub-ctx \
--v=2
# Verify joined clusters
kubectl get kubefedclusters -n kube-federation-system
Federated Deployment
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: payment-api
namespace: production
spec:
template:
metadata:
labels:
app: payment-api
spec:
replicas: 3
selector:
matchLabels:
app: payment-api
template:
spec:
containers:
- name: payment-api
image: myregistry/payment-api:v2.1.0
resources:
requests:
cpu: 100m
memory: 128Mi
placement:
clusters:
- name: cluster-us-east
- name: cluster-eu-west
overrides:
- clusterName: cluster-eu-west
clusterOverrides:
- path: "/spec/replicas"
value: 5 # More replicas in EU due to higher traffic
This creates a Deployment in both clusters, with 3 replicas in US-East and 5 in EU-West.
Submariner — Cross-Cluster Networking
The hardest part of multi-cluster is networking. Pods in Cluster A cannot reach pods in Cluster B by default — their Pod CIDRs are isolated. Submariner solves this by creating encrypted tunnels between clusters.
Install Submariner
# Install subctl CLI
curl -Ls https://get.submariner.io | VERSION=v0.18.0 bash
# Deploy the broker (on hub cluster)
subctl deploy-broker --kubeconfig hub-kubeconfig
# Join clusters to the broker
subctl join --kubeconfig us-east-kubeconfig broker-info.subm \
--clusterid cluster-us-east \
--natt=false
subctl join --kubeconfig eu-west-kubeconfig broker-info.subm \
--clusterid cluster-eu-west \
--natt=false
# Verify connectivity
subctl show connections
subctl verify --kubecontext us-east-ctx --tocontext eu-west-ctx --only connectivity
Export a Service Across Clusters
# In cluster-us-east, export the payment-api service
subctl export service payment-api -n production
# Now pods in cluster-eu-west can reach it via:
# payment-api.production.svc.clusterset.local
# From a pod in cluster-eu-west:
apiVersion: v1
kind: Pod
metadata:
name: test-cross-cluster
spec:
containers:
- name: curl
image: curlimages/curl:latest
command: ["curl", "http://payment-api.production.svc.clusterset.local:8080/health"]
Cluster API — Lifecycle Management
Cluster API (CAPI) treats clusters as Kubernetes resources. You declare a cluster in YAML, and CAPI provisions the infrastructure, bootstraps the nodes, and manages the lifecycle — just like a Deployment manages pods.
Bootstrap a Management Cluster
# Install clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.7.0/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl && sudo mv clusterctl /usr/local/bin/
# Initialize the management cluster with AWS provider
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
clusterctl init --infrastructure aws
# Generate a workload cluster manifest
clusterctl generate cluster prod-us-east \
--kubernetes-version v1.29.0 \
--control-plane-machine-count 3 \
--worker-machine-count 5 \
> prod-us-east-cluster.yaml
# Create the cluster
kubectl apply -f prod-us-east-cluster.yaml
# Watch cluster provisioning
kubectl get clusters -w
kubectl get machines
Cluster YAML Structure
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: prod-us-east
namespace: default
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
services:
cidrBlocks: ["10.96.0.0/12"]
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: prod-us-east-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
name: prod-us-east
Multi-Cluster Service Mesh with Istio
Istio supports multi-cluster deployments where services in different clusters can communicate transparently through the mesh.
# Install Istio on both clusters with mesh ID
istioctl install --context=us-east-ctx \
--set values.global.meshID=goel-mesh \
--set values.global.multiCluster.clusterName=cluster-us-east \
--set values.global.network=network-us
istioctl install --context=eu-west-ctx \
--set values.global.meshID=goel-mesh \
--set values.global.multiCluster.clusterName=cluster-eu-west \
--set values.global.network=network-eu
# Create remote secrets for cross-cluster discovery
istioctl create-remote-secret --context=us-east-ctx --name=cluster-us-east | \
kubectl apply -f - --context=eu-west-ctx
istioctl create-remote-secret --context=eu-west-ctx --name=cluster-eu-west | \
kubectl apply -f - --context=us-east-ctx
# Verify cross-cluster mesh
istioctl remote-clusters --context=us-east-ctx
Liqo — Virtual Nodes and Resource Sharing
Liqo extends your cluster by creating virtual nodes that represent remote clusters. Pods scheduled on a virtual node actually run in the remote cluster — the scheduler does not even know the difference.
# Install Liqo on both clusters
curl --fail -LS "https://get.liqo.io" | bash
# Peer clusters together
liqoctl peer --remoteconfig remote-cluster-config.yaml
# Check virtual nodes
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# node-1 Ready worker 30d v1.29.0
# node-2 Ready worker 30d v1.29.0
# liqo-cluster-eu-west Ready agent 5m v1.29.0 ← virtual node!
# Schedule pods to the remote cluster using node affinity
kubectl label namespace production liqo.io/scheduling-enabled=true
Admiralty — Multi-Cluster Scheduling
Admiralty provides a clean multi-cluster scheduling layer. You create a "source" pod in one cluster, and Admiralty creates a "delegate" pod in a target cluster.
# Label the source cluster namespace
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
multicluster-scheduler: enabled
---
# ClusterTarget defines where pods can be scheduled
apiVersion: multicluster.admiralty.io/v1alpha1
kind: ClusterTarget
metadata:
name: cluster-eu-west
spec:
kubeconfigSecret:
name: eu-west-kubeconfig
---
# Annotate pods for multi-cluster scheduling
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api
spec:
replicas: 6
template:
metadata:
annotations:
multicluster.admiralty.io/elect: "" # Enable multi-cluster scheduling
spec:
containers:
- name: payment-api
image: myregistry/payment-api:v2.1.0
Managing Config Across Clusters with ArgoCD ApplicationSet
ArgoCD ApplicationSet is the most practical way to deploy the same application across multiple clusters. One ApplicationSet generates an Application for each target cluster.
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: payment-api
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
environment: production
values:
revision: main
template:
metadata:
name: 'payment-api-{{name}}'
spec:
project: default
source:
repoURL: https://github.com/myorg/k8s-manifests
targetRevision: '{{values.revision}}'
path: apps/payment-api/overlays/{{metadata.labels.region}}
destination:
server: '{{server}}'
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
This generates one ArgoCD Application per cluster labeled environment: production. Each gets the right overlay for its region automatically.
Challenges and Trade-Offs
Multi-cluster is not free. Here is what you are signing up for:
| Challenge | Impact | Mitigation |
|---|---|---|
| Networking complexity | Cross-cluster DNS, firewall rules, encryption | Submariner, Cilium ClusterMesh |
| Data consistency | Distributed databases, eventual consistency | CockroachDB, Vitess, or single-region DB |
| Observability | Logs and metrics scattered across clusters | Thanos for Prometheus, centralized logging |
| Secret management | Syncing secrets across clusters | External Secrets Operator + Vault |
| Cost | More control planes, cross-region traffic | Start with 2 clusters, not 5 |
| Cognitive load | Engineers must understand multi-cluster context | Good abstractions, platform team |
Start small. Run two clusters — one primary and one DR — before attempting active-active across three regions. Get your observability, GitOps, and networking right on two clusters first. The jump from two to five is easier than the jump from one to two.
Multi-cluster Kubernetes is where infrastructure engineering gets genuinely hard. There is no single tool that solves everything — you pick the tools that match your specific requirements. Federation for resource distribution, Submariner or Cilium for networking, Cluster API for provisioning, and ArgoCD for deployment. Layer them carefully, test your failovers, and always ask yourself: do we actually need this complexity?
