Azure Monitor — Logs, Metrics, Alerts, and Application Insights
It is 2 AM and your application is down. The on-call engineer opens the Azure portal, stares at a wall of services, and asks the worst question in operations: "Where do I even start looking?" Azure Monitor is the answer. It collects, analyzes, and acts on telemetry from every layer of your infrastructure — from VM CPU spikes to application exceptions to user click patterns. But only if you set it up properly before that 2 AM call.
Azure Monitor Platform Overview
Azure Monitor is not a single tool — it is the umbrella over everything observability-related in Azure:
- Metrics — Numeric time-series data (CPU %, memory, request count). Lightweight, near real-time.
- Logs — Structured event data stored in Log Analytics. Queryable with KQL.
- Alerts — Rules that fire when metrics cross thresholds or log queries match conditions.
- Application Insights — APM (Application Performance Monitoring) for web apps.
- Workbooks — Interactive dashboards combining metrics, logs, and text.
Everything flows into one of two data stores: the Metrics database (fast, 93-day retention) or a Log Analytics workspace (flexible, up to 2 years).
Metrics — Platform vs Custom
Platform metrics are collected automatically for every Azure resource. No setup required.
# List available metrics for a VM
az monitor metrics list-definitions \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
--output table
# Query CPU percentage over the last hour
az monitor metrics list \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
--metric "Percentage CPU" \
--interval PT1M \
--start-time 2025-06-21T10:00:00Z \
--end-time 2025-06-21T11:00:00Z \
--output table
Custom metrics let you send your own application-level numbers (queue depth, cache hit ratio, active sessions) using the Application Insights SDK or the custom metrics API.
Log Analytics Workspace
A Log Analytics workspace is where all your logs land. Think of it as your central data lake for operational data.
# Create a Log Analytics workspace
az monitor log-analytics workspace create \
--resource-group rg-monitoring \
--workspace-name law-prod-central \
--location eastus \
--retention-time 90 \
--sku PerGB2018
# Get the workspace ID (you will need this for diagnostic settings)
az monitor log-analytics workspace show \
--resource-group rg-monitoring \
--workspace-name law-prod-central \
--query customerId \
--output tsv
KQL — Kusto Query Language
KQL is the query language for Log Analytics. If you have used SQL, KQL will feel familiar but with a pipe-based syntax that reads left to right.
// Find the top 10 slowest requests in the last 24 hours
requests
| where timestamp > ago(24h)
| where success == true
| top 10 by duration desc
| project timestamp, name, url, duration, resultCode
// Count exceptions by type in the last 7 days
exceptions
| where timestamp > ago(7d)
| summarize count() by type
| order by count_ desc
| render barchart
// Find VMs with CPU above 90% in the last hour
Perf
| where TimeGenerated > ago(1h)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer
| where AvgCPU > 90
| order by AvgCPU desc
// Track failed logins from Azure AD
SigninLogs
| where ResultType != "0"
| summarize FailedAttempts = count() by UserPrincipalName, IPAddress
| where FailedAttempts > 5
| order by FailedAttempts desc
KQL tip: Start with the table name, then pipe through filters (where), aggregations (summarize), projections (project), and sorting (order by). Every pipe narrows the data.
Alert Rules
Azure Monitor supports three types of alert rules:
| Alert Type | Triggers On | Evaluation Frequency | Best For |
|---|---|---|---|
| Metric alert | Metric crosses threshold | 1-15 minutes | CPU, memory, disk, latency |
| Log alert | KQL query returns results | 5-15 minutes | Error patterns, security events |
| Activity log alert | Azure resource operations | Real-time | Resource deletions, role changes |
Creating a Metric Alert
# Alert when VM CPU exceeds 85% for 5 minutes
az monitor metrics alert create \
--resource-group rg-monitoring \
--name "High-CPU-Alert" \
--scopes /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Compute/virtualMachines/vm-web-01 \
--condition "avg Percentage CPU > 85" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--description "VM CPU has been above 85% for 5 minutes" \
--action /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Insights/actionGroups/ag-ops-team
Creating a Log Alert
# Alert when more than 10 failed requests in 15 minutes
az monitor scheduled-query create \
--resource-group rg-monitoring \
--name "Failed-Requests-Alert" \
--scopes /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central \
--condition "count > 10" \
--condition-query "requests | where success == false | summarize count()" \
--window-size 15m \
--evaluation-frequency 5m \
--severity 1 \
--action /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Insights/actionGroups/ag-ops-team
Action Groups
Action groups define who gets notified and how when an alert fires.
# Create an action group with email, SMS, and webhook
az monitor action-group create \
--resource-group rg-monitoring \
--name ag-ops-team \
--short-name OpsTeam \
--action email ops-lead ops-lead@company.com \
--action sms on-call-sms 1 5551234567 \
--action webhook pagerduty https://events.pagerduty.com/integration/abc123/enqueue
# Add a Logic App action for auto-remediation
az monitor action-group update \
--resource-group rg-monitoring \
--name ag-ops-team \
--add-action logicapp auto-restart /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Logic/workflows/restart-vm https://prod-01.eastus.logic.azure.com:443/workflows/abc123
Application Insights
Application Insights is Azure Monitor's APM solution. It automatically detects performance anomalies and tracks requests, dependencies, exceptions, and user behavior.
# Create an Application Insights resource
az monitor app-insights component create \
--resource-group rg-monitoring \
--app appi-webapp-prod \
--location eastus \
--kind web \
--application-type web \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central
What Application Insights Tracks
| Telemetry Type | What It Captures | Example |
|---|---|---|
| Requests | Incoming HTTP requests | GET /api/users took 245ms |
| Dependencies | Outbound calls | SQL query took 1200ms |
| Exceptions | Unhandled errors | NullReferenceException at line 42 |
| Page Views | Client-side navigation | User loaded /dashboard |
| Custom Events | Your business events | User clicked "Purchase" |
| Live Metrics | Real-time stream | Current requests/sec, failure rate |
Smart Detection
Application Insights automatically detects anomalies without any configuration:
- Sudden spike in failure rates
- Abnormal response time increases
- Memory leak patterns
- Dependency failure anomalies
Diagnostic Settings
Diagnostic settings connect Azure resources to Log Analytics. Without them, platform logs never reach your workspace.
# Enable diagnostic settings for an App Service
az monitor diagnostic-settings create \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.Web/sites/myapp-prod \
--name "send-to-law" \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central \
--logs '[{"categoryGroup": "allLogs", "enabled": true}]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]'
# Enable diagnostic settings for a Key Vault
az monitor diagnostic-settings create \
--resource /subscriptions/<sub-id>/resourceGroups/rg-prod/providers/Microsoft.KeyVault/vaults/kv-prod-2025 \
--name "send-to-law" \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-prod-central \
--logs '[{"categoryGroup": "audit", "enabled": true}]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]'
Cost Management for Logs
Log Analytics charges per GB ingested. Costs can spiral quickly if you log everything at debug level. Here is how to control it:
| Strategy | Impact |
|---|---|
| Set data collection rules to filter out noisy logs | Reduce ingestion volume |
| Use Basic Logs for high-volume, low-query tables | 50% cheaper ingestion, limited query |
| Set retention to 30-90 days for most tables | Reduce storage costs |
| Archive to Storage Account for compliance logs | Cheapest long-term storage |
| Use daily cap as a safety net | Prevent runaway costs |
# Set a daily cap on workspace ingestion
az monitor log-analytics workspace update \
--resource-group rg-monitoring \
--workspace-name law-prod-central \
--quota 5
The --quota flag sets a daily cap in GB. When the cap is reached, ingestion pauses until the next day. Use this as a safety net, not a primary cost control — you do not want critical logs dropping during an incident.
Wrapping Up
Observability is not optional — it is the foundation of reliable operations. Set up a central Log Analytics workspace, enable diagnostic settings on every resource, and create alerts for the conditions that actually matter (not everything). Use Application Insights for application-level visibility, and learn KQL well enough to investigate incidents quickly. The goal is not to collect every possible metric — it is to have the right data available when things go wrong.
Next up: We will explore AKS (Azure Kubernetes Service) — running production Kubernetes clusters on Azure with node pools, networking, and integrated monitoring.
