Skip to main content

Kubernetes Logging — EFK Stack, Loki, and Fluent Bit

· 6 min read
Goel Academy
DevOps & Cloud Learning Hub

A pod crashes at 3 AM, restarts, and by the time you check in the morning, kubectl logs shows only the current container's output — the crash logs are gone forever. Kubernetes does not persist logs beyond the lifetime of a container, and on a busy cluster, even node-level logs rotate away within hours. If you are not shipping logs to a central store, you are debugging with one eye closed.

Kubernetes Logging Architecture

Kubernetes writes container logs to files on the node at /var/log/containers/. There are two primary patterns to collect these logs:

Node-level logging (DaemonSet): A log collector runs on every node as a DaemonSet, reads log files from /var/log/containers/, and ships them to a central store. This is the most common approach.

Sidecar logging: A logging container runs alongside your application container in the same pod. Used when you need per-application log processing or when logs are not written to stdout.

┌──────────────────────────────────────────┐
│ Node │
│ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ │ Pod A │ │ Pod B │ │ Fluent │ │
│ │ (stdout) │ │ (stdout) │ │ Bit │ │
│ └────┬─────┘ └────┬─────┘ │ DaemonSet│ │
│ │ │ └─────┬────┘ │
│ ▼ ▼ │ │
│ /var/log/containers/*.log ─────────┘ │
│ │ │
└─────────────────────────────────────┼──────┘

┌───────────────────┐
│ Elasticsearch/Loki │
└───────────────────┘

Option 1: EFK Stack (Elasticsearch + Fluent Bit + Kibana)

The EFK stack is the traditional enterprise-grade logging solution. Elasticsearch indexes and stores logs, Fluent Bit collects and ships them, and Kibana provides a search and visualization UI.

Deploy Elasticsearch

# Add the Elastic Helm repo
helm repo add elastic https://helm.elastic.co
helm repo update

# Install Elasticsearch (3-node cluster)
helm install elasticsearch elastic/elasticsearch \
--namespace logging \
--create-namespace \
--set replicas=3 \
--set minimumMasterNodes=2 \
--set resources.requests.memory=2Gi \
--set resources.limits.memory=4Gi \
--set volumeClaimTemplate.resources.requests.storage=100Gi

Deploy Fluent Bit as a DaemonSet

Fluent Bit is the lightweight alternative to Fluentd — written in C, uses about 15 MB of memory per node, and handles thousands of events per second.

apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
Daemon off
Parsers_File parsers.conf

[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser cri
DB /var/log/flb_kube.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10

[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude Off

[FILTER]
Name grep
Match kube.*
Exclude $kubernetes['namespace_name'] kube-system

[OUTPUT]
Name es
Match kube.*
Host elasticsearch-master
Port 9200
Logstash_Format On
Logstash_Prefix k8s-logs
Retry_Limit 3
Replace_Dots On

parsers.conf: |
[PARSER]
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z

Deploy Kibana

helm install kibana elastic/kibana \
--namespace logging \
--set elasticsearchHosts="http://elasticsearch-master:9200"

# Access Kibana
kubectl port-forward svc/kibana-kibana -n logging 5601:5601

Option 2: Grafana Loki (Lightweight Alternative)

Loki is Grafana's answer to Elasticsearch — but instead of indexing the full text of every log line, it only indexes metadata labels (namespace, pod, container). This makes it dramatically cheaper to run and operate.

# Install Loki stack (Loki + Promtail)
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki-stack \
--namespace logging \
--create-namespace \
--set grafana.enabled=true \
--set loki.persistence.enabled=true \
--set loki.persistence.size=50Gi

Promtail (Loki's log collector) automatically discovers pods and attaches Kubernetes labels. Query logs in Grafana using LogQL:

# All logs from the production namespace
{namespace="production"}

# Error logs from a specific deployment
{namespace="production", app="payment-service"} |= "error"

# Parse JSON logs and filter by status code
{namespace="production"} | json | status_code >= 500

# Count errors per minute
sum(rate({namespace="production"} |= "error" [1m])) by (app)

Comparison: EFK vs Loki vs Cloud-Native

FeatureEFK StackGrafana LokiCloudWatch / Stackdriver
Full-text indexingYesNo (labels only)Yes
Resource usageHigh (8+ GB RAM)Low (512 MB RAM)N/A (managed)
Storage costHighLow (10-20x cheaper)Medium
Query languageKibana KQLLogQLProprietary
Integrates withKibanaGrafanaCloud console
Setup complexityHighLowNone
Multi-clusterComplexEasy with Grafana CloudPer-account
Best forLarge enterprises, complianceMost K8s teamsCloud-native shops

For most teams, Loki is the right choice. You likely already have Grafana for metrics — adding Loki gives you logs in the same UI with minimal resource overhead. Choose EFK when you need full-text search across billions of log lines or have compliance requirements that demand Elasticsearch.

Structured Logging Best Practices

The biggest difference between logs you can query and logs that are useless is structure. Always log in JSON format:

{
"timestamp": "2025-08-23T10:15:32.456Z",
"level": "error",
"service": "payment-api",
"trace_id": "abc123def456",
"message": "Payment processing failed",
"error": "insufficient_funds",
"user_id": "usr_789",
"amount": 49.99,
"currency": "USD"
}

In your application, use structured logging libraries:

# Python with structlog
import structlog

logger = structlog.get_logger()
logger.error(
"payment_failed",
error="insufficient_funds",
user_id="usr_789",
amount=49.99,
currency="USD",
trace_id=request.headers.get("X-Trace-ID"),
)

Multi-Line Log Handling

Stack traces span multiple lines, and by default, each line becomes a separate log entry. Configure Fluent Bit to concatenate them:

# fluent-bit multiline config
[MULTILINE_PARSER]
name java-stacktrace
type regex
flush_timeout 2000
rule "start_state" "/^\d{4}-\d{2}-\d{2}/" "cont"
rule "cont" "/^\s+(at|Caused by|\.{3})/" "cont"

[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/payment*.log
multiline.parser java-stacktrace

Log Retention and Rotation

Without retention policies, your logging storage will grow indefinitely. Configure cleanup based on your compliance needs:

# Elasticsearch: Create an Index Lifecycle Policy
curl -X PUT "elasticsearch-master:9200/_ilm/policy/k8s-log-policy" \
-H 'Content-Type: application/json' -d '{
"policy": {
"phases": {
"hot": { "min_age": "0ms", "actions": { "rollover": { "max_size": "50gb", "max_age": "1d" }}},
"warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 }}},
"delete": { "min_age": "30d", "actions": { "delete": {} }}
}
}
}'
# Loki: Set retention in values.yaml
loki:
config:
table_manager:
retention_deletes_enabled: true
retention_period: 720h # 30 days
compactor:
retention_enabled: true

Centralized Logging for Multi-Cluster

When running multiple clusters, ship all logs to a single central store. With Loki, add a cluster label in Promtail:

# promtail config for multi-cluster
config:
snippets:
extraRelabelConfigs:
- target_label: cluster
replacement: production-us-east-1

Now you can query logs across clusters in Grafana:

{cluster="production-us-east-1", namespace="checkout"} |= "timeout"

Wrapping Up

Your logging stack is only as good as the structure of your logs. Ship JSON, attach Kubernetes metadata automatically with Fluent Bit, and pick the backend that fits your scale — Loki for most teams, EFK for enterprises that need full-text search, and cloud-native solutions when you want zero operational burden.

With monitoring and logging in place, you can see what is happening and read why. But neither will help you prevent security incidents. In the next post, we will lock down Kubernetes with Pod Security Standards, Network Policies, and OPA Gatekeeper.