18 posts tagged with "Monitoring"

Observability, logging, and monitoring

CloudWatch — Logs, Metrics, Alarms, and Dashboards That Save You at 3 AM

June 21, 2025 · 7 min read

DevOps & Cloud Learning Hub

It's 3:17 AM. Your phone buzzes. "Site is down." You SSH into the server, tail the logs, see nothing obvious, check CPU — it's fine. Memory? Fine. Disk? 100% full. Log files ate the disk three hours ago, and nobody noticed because monitoring wasn't set up. CloudWatch exists so that you don't have to be the monitoring system. It collects metrics, aggregates logs, fires alarms, and pages you before users start tweeting.

Azure Monitor — Logs, Metrics, Alerts, and Application Insights

June 21, 2025 · 7 min read

Goel Academy

DevOps & Cloud Learning Hub

It is 2 AM and your application is down. The on-call engineer opens the Azure portal, stares at a wall of services, and asks the worst question in operations: "Where do I even start looking?" Azure Monitor is the answer. It collects, analyzes, and acts on telemetry from every layer of your infrastructure — from VM CPU spikes to application exceptions to user click patterns. But only if you set it up properly before that 2 AM call.

Linux Log Analysis — Find Problems Before They Find You

June 7, 2025 · 7 min read

Goel Academy

DevOps & Cloud Learning Hub

The server was acting weird for 3 days before it crashed — the logs told the story all along. Logs are the black box of your servers. If you can read them effectively, you can prevent outages instead of just reacting to them.

Prometheus and Grafana — Set Up Production Monitoring in 15 Minutes

May 24, 2025 · 7 min read

Goel Academy

DevOps & Cloud Learning Hub

You have read about the Golden Signals and the three pillars of observability. Now it is time to stop theorizing and start measuring. In this post, we will set up a complete monitoring stack — Prometheus for metrics collection, Grafana for visualization, node_exporter for system metrics, and Alertmanager for routing alerts to Slack. All running locally with Docker Compose, all production-ready patterns.

Monitoring 101 — Metrics, Logs, Traces, and the Golden Signals

May 10, 2025 · 7 min read

Goel Academy

DevOps & Cloud Learning Hub

It is 3 AM. Your phone buzzes with a PagerDuty alert: "CPU usage above 90%." You drag yourself out of bed, SSH into the server, and discover the CPU spike was caused by a log rotation cron job that runs every night. It resolved itself two minutes later. This happens three times a week. You start ignoring alerts. Then one night, the database actually fills up and takes down production. Nobody notices for 47 minutes because the team has learned to silence their phones.

Linux Process Management — ps, top, kill and Beyond

March 8, 2025 · 7 min read

Goel Academy

DevOps & Cloud Learning Hub

It's 3 AM. Your pager goes off. The production server is crawling. CPU is at 100%. Memory is gone. Something is eating your server alive, and you need to find it and stop it — fast. Knowing how to manage Linux processes isn't optional for a DevOps engineer; it's survival.