Skip to main content

Azure Monitor — Logs, Metrics, Alerts, and Application Insights

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

It is 2 AM and your application is down. The on-call engineer opens the Azure portal, stares at a wall of services, and asks the worst question in operations: "Where do I even start looking?" Azure Monitor is the answer. It collects, analyzes, and acts on telemetry from every layer of your infrastructure — from VM CPU spikes to application exceptions to user click patterns. But only if you set it up properly before that 2 AM call.

Azure Monitor Platform Overview

Azure Monitor is not a single tool — it is the umbrella over everything observability-related in Azure:

  • Metrics — Numeric time-series data (CPU %, memory, request count). Lightweight, near real-time.
  • Logs — Structured event data stored in Log Analytics. Queryable with KQL.
  • Alerts — Rules that fire when metrics cross thresholds or log queries match conditions.
  • Application Insights — APM (Application Performance Monitoring) for web apps.
  • Workbooks — Interactive dashboards combining metrics, logs, and text.

Everything flows into one of two data stores: the Metrics database (fast, 93-day retention) or a Log Analytics workspace (flexible, up to 2 years).

Metrics — Platform vs Custom

Platform metrics are collected automatically for every Azure resource. No setup required.

# List available metrics for a VM
az monitor metrics list-definitions \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
--output table

# Query CPU percentage over the last hour
az monitor metrics list \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
--metric "Percentage CPU" \
--interval PT1M \
--start-time 2025-06-21T10:00:00Z \
--end-time 2025-06-21T11:00:00Z \
--output table

Custom metrics let you send your own application-level numbers (queue depth, cache hit ratio, active sessions) using the Application Insights SDK or the custom metrics API.

Log Analytics Workspace

A Log Analytics workspace is where all your logs land. Think of it as your central data lake for operational data.

# Create a Log Analytics workspace
az monitor log-analytics workspace create \
--resource-group rg-monitoring \
--workspace-name law-prod-central \
--location eastus \
--retention-time 90 \
--sku PerGB2018

# Get the workspace ID (you will need this for diagnostic settings)
az monitor log-analytics workspace show \
--resource-group rg-monitoring \
--workspace-name law-prod-central \
--query customerId \
--output tsv

KQL — Kusto Query Language

KQL is the query language for Log Analytics. If you have used SQL, KQL will feel familiar but with a pipe-based syntax that reads left to right.

// Find the top 10 slowest requests in the last 24 hours
requests
| where timestamp > ago(24h)
| where success == true
| top 10 by duration desc
| project timestamp, name, url, duration, resultCode

// Count exceptions by type in the last 7 days
exceptions
| where timestamp > ago(7d)
| summarize count() by type
| order by count_ desc
| render barchart

// Find VMs with CPU above 90% in the last hour
Perf
| where TimeGenerated > ago(1h)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer
| where AvgCPU > 90
| order by AvgCPU desc

// Track failed logins from Azure AD
SigninLogs
| where ResultType != "0"
| summarize FailedAttempts = count() by UserPrincipalName, IPAddress
| where FailedAttempts > 5
| order by FailedAttempts desc

KQL tip: Start with the table name, then pipe through filters (where), aggregations (summarize), projections (project), and sorting (order by). Every pipe narrows the data.

Alert Rules

Azure Monitor supports three types of alert rules:

Alert TypeTriggers OnEvaluation FrequencyBest For
Metric alertMetric crosses threshold1-15 minutesCPU, memory, disk, latency
Log alertKQL query returns results5-15 minutesError patterns, security events
Activity log alertAzure resource operationsReal-timeResource deletions, role changes

Creating a Metric Alert

# Alert when VM CPU exceeds 85% for 5 minutes
az monitor metrics alert create \
--resource-group rg-monitoring \
--name "High-CPU-Alert" \
--scopes /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
--condition "avg Percentage CPU > 85" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--description "VM CPU has been above 85% for 5 minutes" \
--action /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Insights/actionGroups/ag-ops-team

Creating a Log Alert

# Alert when more than 10 failed requests in 15 minutes
az monitor scheduled-query create \
--resource-group rg-monitoring \
--name "Failed-Requests-Alert" \
--scopes /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central \
--condition "count > 10" \
--condition-query "requests | where success == false | summarize count()" \
--window-size 15m \
--evaluation-frequency 5m \
--severity 1 \
--action /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Insights/actionGroups/ag-ops-team

Action Groups

Action groups define who gets notified and how when an alert fires.

# Create an action group with email, SMS, and webhook
az monitor action-group create \
--resource-group rg-monitoring \
--name ag-ops-team \
--short-name OpsTeam \
--action email ops-lead ops-lead@company.com \
--action sms on-call-sms 1 5551234567 \
--action webhook pagerduty https://events.pagerduty.com/integration/abc123/enqueue

# Add a Logic App action for auto-remediation
az monitor action-group update \
--resource-group rg-monitoring \
--name ag-ops-team \
--add-action logicapp auto-restart /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Logic/workflows/restart-vm https://prod-01.eastus.logic.azure.com:443/workflows/abc123

Application Insights

Application Insights is Azure Monitor's APM solution. It automatically detects performance anomalies and tracks requests, dependencies, exceptions, and user behavior.

# Create an Application Insights resource
az monitor app-insights component create \
--resource-group rg-monitoring \
--app appi-webapp-prod \
--location eastus \
--kind web \
--application-type web \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central

What Application Insights Tracks

Telemetry TypeWhat It CapturesExample
RequestsIncoming HTTP requestsGET /api/users took 245ms
DependenciesOutbound callsSQL query took 1200ms
ExceptionsUnhandled errorsNullReferenceException at line 42
Page ViewsClient-side navigationUser loaded /dashboard
Custom EventsYour business eventsUser clicked "Purchase"
Live MetricsReal-time streamCurrent requests/sec, failure rate

Smart Detection

Application Insights automatically detects anomalies without any configuration:

  • Sudden spike in failure rates
  • Abnormal response time increases
  • Memory leak patterns
  • Dependency failure anomalies

Diagnostic Settings

Diagnostic settings connect Azure resources to Log Analytics. Without them, platform logs never reach your workspace.

# Enable diagnostic settings for an App Service
az monitor diagnostic-settings create \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Web/sites/myapp-prod \
--name "send-to-law" \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central \
--logs '[{"categoryGroup": "allLogs", "enabled": true}]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]'

# Enable diagnostic settings for a Key Vault
az monitor diagnostic-settings create \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.KeyVault/vaults/kv-prod-2025 \
--name "send-to-law" \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central \
--logs '[{"categoryGroup": "audit", "enabled": true}]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]'

Cost Management for Logs

Log Analytics charges per GB ingested. Costs can spiral quickly if you log everything at debug level. Here is how to control it:

StrategyImpact
Set data collection rules to filter out noisy logsReduce ingestion volume
Use Basic Logs for high-volume, low-query tables50% cheaper ingestion, limited query
Set retention to 30-90 days for most tablesReduce storage costs
Archive to Storage Account for compliance logsCheapest long-term storage
Use daily cap as a safety netPrevent runaway costs
# Set a daily cap on workspace ingestion
az monitor log-analytics workspace update \
--resource-group rg-monitoring \
--workspace-name law-prod-central \
--quota 5

The --quota flag sets a daily cap in GB. When the cap is reached, ingestion pauses until the next day. Use this as a safety net, not a primary cost control — you do not want critical logs dropping during an incident.

Wrapping Up

Observability is not optional — it is the foundation of reliable operations. Set up a central Log Analytics workspace, enable diagnostic settings on every resource, and create alerts for the conditions that actually matter (not everything). Use Application Insights for application-level visibility, and learn KQL well enough to investigate incidents quickly. The goal is not to collect every possible metric — it is to have the right data available when things go wrong.


Next up: We will explore AKS (Azure Kubernetes Service) — running production Kubernetes clusters on Azure with node pools, networking, and integrated monitoring.