Project 4: Disaster Recovery Solution β
Overview β
Implement a comprehensive disaster recovery solution using Azure Backup and Azure Site Recovery. This project covers VM backup, file-level recovery, and cross-region replication for business continuity.
Difficulty: Intermediate
Duration: 3-4 hours
Cost: ~$50-100/month (ASR, backup storage)
Architecture Diagram β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRIMARY REGION (East US) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β VNet: vnet-primary (10.0.0.0/16) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Subnet: snet-workloads (10.0.1.0/24) β β β
β β β β β β
β β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β β
β β β β vm-web-01 β β vm-app-01 β β vm-sql-01 β β β β
β β β β Web Server β β App Server β β SQL Server β β β β
β β β β β β β β β β β β
β β β β ββββββββββββ β β ββββββββββββ β β ββββββββββββββββββββββββ β β β β
β β β β β Azure β β β β Azure β β β β Azure Backup β β β β β
β β β β β Backup β β β β Backup β β β β + SQL Backup β β β β β
β β β β β Agent β β β β Agent β β β β β β β β β
β β β β ββββββ¬ββββββ β β ββββββ¬ββββββ β β ββββββββββββ¬ββββββββββββ β β β β
β β β ββββββββΌββββββββ ββββββββΌββββββββ ββββββββββββββΌββββββββββββββ β β β
β β βββββββββββΌββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββ β β
β ββββββββββββββΌββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββ β
β β β β β
β βββββββββββββββββββ΄ββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β RECOVERY SERVICES VAULT (rsv-primary-eastus) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Backup Policies: β β β
β β β - Daily backup at 2:00 AM UTC β β β
β β β - Retain daily backups: 30 days β β β
β β β - Retain weekly backups: 12 weeks β β β
β β β - Retain monthly backups: 12 months β β β
β β β - Retain yearly backups: 3 years β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Azure Site Recovery (Replication to DR Region) β β β
β β β - RPO: 15 minutes β β β
β β β - RTO: < 2 hours β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Continuous Replication
β (Azure Site Recovery)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SECONDARY REGION (West US 2) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β VNet: vnet-dr (10.1.0.0/16) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Subnet: snet-dr (10.1.1.0/24) β β β
β β β β β β
β β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β β
β β β β vm-web-01 β β vm-app-01 β β vm-sql-01 β β β β
β β β β (Replica) β β (Replica) β β (Replica) β β β β
β β β β STANDBY β β STANDBY β β STANDBY β β β β
β β β β β β β β β β β β
β β β β Powered off β β Powered off β β Powered off β β β β
β β β β until β β until β β until β β β β
β β β β failover β β failover β β failover β β β β
β β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β GRS Storage (Backup Data Replication) β β
β β - Automatic geo-replication of backup data β β
β β - Cross-region restore capability β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Disaster Recovery Flow:
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ
β Disaster βββββΆβ Detect βββββΆβ Failover βββββΆβ VMs Active β
β Event β β & Alert β β to DR β β in DR Regionβ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββWhat You'll Learn β
- Configure Azure Backup for VMs
- Create and manage backup policies
- Perform file-level and full VM restore
- Set up Azure Site Recovery (ASR)
- Execute test failover and planned failover
- Implement recovery plans with automation
Prerequisites β
- Azure subscription
- Azure CLI installed
- Two Azure regions available
Phase 1: Set Up Primary Infrastructure β
Step 1.1: Create Resource Groups β
bash
# Set variables
PRIMARY_LOCATION="eastus"
DR_LOCATION="westus2"
RG_PRIMARY="rg-dr-lab-eastus"
RG_DR="rg-dr-lab-westus2"
# Create primary resource group
az group create \
--name $RG_PRIMARY \
--location $PRIMARY_LOCATION \
--tags Project=DisasterRecovery Environment=Lab Role=Primary
# Create DR resource group
az group create \
--name $RG_DR \
--location $DR_LOCATION \
--tags Project=DisasterRecovery Environment=Lab Role=DR
echo "Resource groups created in both regions"Step 1.2: Create Primary VNet β
bash
# Create primary VNet
az network vnet create \
--resource-group $RG_PRIMARY \
--name vnet-primary \
--address-prefix 10.0.0.0/16 \
--subnet-name snet-workloads \
--subnet-prefix 10.0.1.0/24 \
--location $PRIMARY_LOCATION
# Add Azure Bastion subnet
az network vnet subnet create \
--resource-group $RG_PRIMARY \
--vnet-name vnet-primary \
--name AzureBastionSubnet \
--address-prefix 10.0.2.0/27Step 1.3: Create DR VNet β
bash
# Create DR VNet (same address space - will be used for failover)
az network vnet create \
--resource-group $RG_DR \
--name vnet-dr \
--address-prefix 10.1.0.0/16 \
--subnet-name snet-dr \
--subnet-prefix 10.1.1.0/24 \
--location $DR_LOCATIONStep 1.4: Deploy Primary VMs β
bash
ADMIN_USER="azureadmin"
ADMIN_PASSWORD="P@ssw0rd123!Complex"
# Create Web VM
az vm create \
--resource-group $RG_PRIMARY \
--name vm-web-01 \
--vnet-name vnet-primary \
--subnet snet-workloads \
--image Win2022Datacenter \
--size Standard_D2s_v3 \
--admin-username $ADMIN_USER \
--admin-password $ADMIN_PASSWORD \
--public-ip-address "" \
--no-wait
# Create App VM
az vm create \
--resource-group $RG_PRIMARY \
--name vm-app-01 \
--vnet-name vnet-primary \
--subnet snet-workloads \
--image Ubuntu2204 \
--size Standard_D2s_v3 \
--admin-username $ADMIN_USER \
--generate-ssh-keys \
--public-ip-address "" \
--no-wait
# Create SQL VM
az vm create \
--resource-group $RG_PRIMARY \
--name vm-sql-01 \
--vnet-name vnet-primary \
--subnet snet-workloads \
--image MicrosoftSQLServer:sql2022-ws2022:standard-gen2:latest \
--size Standard_D4s_v3 \
--admin-username $ADMIN_USER \
--admin-password $ADMIN_PASSWORD \
--public-ip-address "" \
--no-wait
# Wait for VMs
echo "Waiting for VMs to be created..."
az vm wait --resource-group $RG_PRIMARY --name vm-web-01 --created
az vm wait --resource-group $RG_PRIMARY --name vm-app-01 --created
az vm wait --resource-group $RG_PRIMARY --name vm-sql-01 --created
echo "All VMs created"Phase 2: Configure Azure Backup β
Step 2.1: Create Recovery Services Vault β
bash
# Create Recovery Services Vault
az backup vault create \
--resource-group $RG_PRIMARY \
--name rsv-primary-eastus \
--location $PRIMARY_LOCATION
echo "Recovery Services Vault created"Step 2.2: Configure Vault Settings β
bash
# Set vault storage redundancy to Geo-Redundant (GRS)
az backup vault backup-properties set \
--resource-group $RG_PRIMARY \
--name rsv-primary-eastus \
--backup-storage-redundancy GeoRedundant
# Enable cross-region restore
az backup vault backup-properties set \
--resource-group $RG_PRIMARY \
--name rsv-primary-eastus \
--cross-region-restore-flag Enabled
echo "Vault configured with GRS and cross-region restore"Step 2.3: Create Backup Policy β
bash
# Create enhanced backup policy JSON
cat > backup-policy.json << 'EOF'
{
"properties": {
"backupManagementType": "AzureIaasVM",
"schedulePolicy": {
"schedulePolicyType": "SimpleSchedulePolicy",
"scheduleRunFrequency": "Daily",
"scheduleRunTimes": ["2024-01-01T02:00:00Z"],
"scheduleWeeklyFrequency": 0
},
"retentionPolicy": {
"retentionPolicyType": "LongTermRetentionPolicy",
"dailySchedule": {
"retentionTimes": ["2024-01-01T02:00:00Z"],
"retentionDuration": {
"count": 30,
"durationType": "Days"
}
},
"weeklySchedule": {
"daysOfTheWeek": ["Sunday"],
"retentionTimes": ["2024-01-01T02:00:00Z"],
"retentionDuration": {
"count": 12,
"durationType": "Weeks"
}
},
"monthlySchedule": {
"retentionScheduleFormatType": "Weekly",
"retentionScheduleWeekly": {
"daysOfTheWeek": ["Sunday"],
"weeksOfTheMonth": ["First"]
},
"retentionTimes": ["2024-01-01T02:00:00Z"],
"retentionDuration": {
"count": 12,
"durationType": "Months"
}
},
"yearlySchedule": {
"retentionScheduleFormatType": "Weekly",
"monthsOfYear": ["January"],
"retentionScheduleWeekly": {
"daysOfTheWeek": ["Sunday"],
"weeksOfTheMonth": ["First"]
},
"retentionTimes": ["2024-01-01T02:00:00Z"],
"retentionDuration": {
"count": 3,
"durationType": "Years"
}
}
},
"instantRpRetentionRangeInDays": 5,
"timeZone": "UTC"
}
}
EOF
# Create the policy using Azure Portal or REST API
# CLI alternative: Use default policy and modify
az backup policy set \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--name EnhancedVMPolicy \
--policy backup-policy.json 2>/dev/null || echo "Use portal to create custom policy"Step 2.4: Enable Backup for VMs β
bash
# Enable backup for Web VM
az backup protection enable-for-vm \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--vm vm-web-01 \
--policy-name DefaultPolicy
# Enable backup for App VM
az backup protection enable-for-vm \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--vm vm-app-01 \
--policy-name DefaultPolicy
# Enable backup for SQL VM
az backup protection enable-for-vm \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--vm vm-sql-01 \
--policy-name DefaultPolicy
echo "Backup enabled for all VMs"Step 2.5: Trigger Initial Backup β
bash
# Get container names
WEB_CONTAINER=$(az backup container list \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--backup-management-type AzureIaasVM \
--query "[?contains(name, 'vm-web-01')].name" -o tsv)
# Trigger backup for Web VM
az backup protection backup-now \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--container-name $WEB_CONTAINER \
--item-name vm-web-01 \
--retain-until $(date -d "+30 days" +%Y-%m-%d) \
--backup-management-type AzureIaasVM
echo "Initial backup triggered"Phase 3: Configure Azure Site Recovery β
Step 3.1: Create Cache Storage Account β
bash
# Create cache storage account for ASR
CACHE_STORAGE="asrcache$(date +%s | tail -c 8)"
az storage account create \
--resource-group $RG_PRIMARY \
--name $CACHE_STORAGE \
--location $PRIMARY_LOCATION \
--sku Standard_LRS \
--kind StorageV2
echo "Cache storage account created: $CACHE_STORAGE"Step 3.2: Configure Site Recovery (Azure Portal) β
Portal Configuration Required
Some ASR configurations are easier through the Azure Portal. Follow these steps:
Navigate to Recovery Services Vault:
- Go to rsv-primary-eastus
- Click "Site Recovery" β "Prepare Infrastructure"
Configure Protection Goal:
yamlWhere are your machines located: Azure Where do you want to replicate: To AzureConfigure Source Settings:
yamlSource: East US Subscription: Your subscription Resource group: rg-dr-lab-eastus Deployment model: Resource ManagerConfigure Target Settings:
yamlTarget region: West US 2 Target subscription: Same Target resource group: rg-dr-lab-westus2 Target virtual network: vnet-dr Cache storage account: asrcache[timestamp]
Step 3.3: Enable Replication via CLI β
bash
# Get source VM details
WEB_VM_ID=$(az vm show -g $RG_PRIMARY -n vm-web-01 --query id -o tsv)
APP_VM_ID=$(az vm show -g $RG_PRIMARY -n vm-app-01 --query id -o tsv)
# Get target subnet ID
DR_SUBNET_ID=$(az network vnet subnet show \
--resource-group $RG_DR \
--vnet-name vnet-dr \
--name snet-dr \
--query id -o tsv)
# Enable replication for Web VM (using portal is recommended for first time)
echo "Enable replication through Portal:"
echo "1. Recovery Services Vault β Site Recovery β Replicated items"
echo "2. Click + Replicate"
echo "3. Select VMs: vm-web-01, vm-app-01, vm-sql-01"
echo "4. Configure target settings"
echo "5. Enable replication"Step 3.4: Monitor Replication Status β
bash
# List replicated items
az site-recovery replicated-item list \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--output table
# Check replication health
az site-recovery replicated-item show \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--replicated-protected-item-name vm-web-01 \
--query "properties.replicationHealth" -o tsvPhase 4: Create Recovery Plan β
Step 4.1: Create Recovery Plan (Portal) β
- Navigate to Recovery Services Vault β Recovery Plans
- Click "+ Recovery plan"
- Configure:
yaml
Name: rp-webapp-dr
Source: East US
Target: West US 2
Allow items with deployment model: Resource Manager
Select items:
Group 1: vm-sql-01 (Database - start first)
Group 2: vm-app-01 (Application tier)
Group 3: vm-web-01 (Web tier - start last)Step 4.2: Add Pre/Post Actions β
Add automation runbooks to recovery plan:
Pre-failover script (sample):
powershell
# pre-failover.ps1
param(
[Parameter(Mandatory=$true)]
[string]$RecoveryPlanContext
)
# Parse context
$context = ConvertFrom-Json $RecoveryPlanContext
# Send notification
$webhook = "https://your-webhook-url"
$body = @{
text = "DR Failover initiated for $($context.RecoveryPlanName)"
} | ConvertTo-Json
Invoke-RestMethod -Uri $webhook -Method Post -Body $body -ContentType "application/json"Post-failover script (sample):
powershell
# post-failover.ps1
param(
[Parameter(Mandatory=$true)]
[string]$RecoveryPlanContext
)
$context = ConvertFrom-Json $RecoveryPlanContext
# Update DNS records
# Connect to DNS provider API and update records
# Verify services are running
# Test application endpoints
Write-Output "Post-failover tasks completed"Phase 5: Testing Disaster Recovery β
Step 5.1: Test Failover (Non-disruptive) β
bash
# Perform test failover via CLI
az site-recovery recovery-plan test-failover \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--name rp-webapp-dr \
--direction PrimaryToRecovery \
--network-type VmNetworkAsInputOr via Portal:
- Go to Recovery Plans β rp-webapp-dr
- Click "Test failover"
- Select recovery point (Latest processed recommended)
- Select test VNet (creates isolated environment)
- Click OK
Step 5.2: Validate Test Failover β
bash
# List VMs in DR region (test VMs will have -test suffix)
az vm list --resource-group $RG_DR --output table
# Verify test VMs are running
az vm get-instance-view \
--resource-group $RG_DR \
--name vm-web-01-test \
--query instanceView.statuses[1].displayStatus -o tsvValidation Checklist:
- [ ] VMs are powered on
- [ ] Network connectivity works
- [ ] Applications respond correctly
- [ ] Data is consistent
Step 5.3: Cleanup Test Failover β
bash
# Cleanup test failover
az site-recovery recovery-plan test-failover-cleanup \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--name rp-webapp-dr
echo "Test failover cleanup initiated"Phase 6: File-Level Recovery β
Step 6.1: Generate Recovery Script β
bash
# Get latest recovery point
az backup recoverypoint list \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--container-name $WEB_CONTAINER \
--item-name vm-web-01 \
--output table
# Note the recovery point name (e.g., "12345678901234")Step 6.2: Mount Recovery Volume (Portal) β
- Go to Recovery Services Vault β Backup Items β Azure Virtual Machine
- Select vm-web-01
- Click "File Recovery"
- Select recovery point
- Download and run the executable on a Windows machine
- Mounted disks appear as additional drives
- Browse and copy needed files
- Click "Unmount Disks" when done
Phase 7: Full VM Restore β
Step 7.1: Restore VM to New Location β
bash
# Get recovery point
RECOVERY_POINT=$(az backup recoverypoint list \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--container-name $WEB_CONTAINER \
--item-name vm-web-01 \
--query "[0].name" -o tsv)
# Restore VM
az backup restore restore-disks \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--container-name $WEB_CONTAINER \
--item-name vm-web-01 \
--rp-name $RECOVERY_POINT \
--storage-account $CACHE_STORAGE \
--target-resource-group $RG_PRIMARY
echo "Disk restore initiated. Monitor progress in portal."Step 7.2: Cross-Region Restore β
bash
# Restore to secondary region (requires cross-region restore enabled)
az backup restore restore-disks \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--container-name $WEB_CONTAINER \
--item-name vm-web-01 \
--rp-name $RECOVERY_POINT \
--storage-account "storage-in-westus2" \
--target-resource-group $RG_DR \
--use-secondary-regionPhase 8: Monitoring and Alerts β
Step 8.1: Configure Backup Alerts β
bash
# Create action group
az monitor action-group create \
--resource-group $RG_PRIMARY \
--name ag-backup-alerts \
--short-name BackupAlrt \
--action email admin admin@contoso.com
# Configure backup alerts in vault
# This is done via Portal:
# Recovery Services Vault β Monitoring β Alerts β Manage alert rulesStep 8.2: View Backup Reports β
bash
# Configure diagnostic settings
VAULT_ID=$(az backup vault show \
--resource-group $RG_PRIMARY \
--name rsv-primary-eastus \
--query id -o tsv)
# Create Log Analytics workspace
az monitor log-analytics workspace create \
--resource-group $RG_PRIMARY \
--workspace-name law-backup
LAW_ID=$(az monitor log-analytics workspace show \
--resource-group $RG_PRIMARY \
--workspace-name law-backup \
--query id -o tsv)
# Enable diagnostics
az monitor diagnostic-settings create \
--resource $VAULT_ID \
--name "backup-diagnostics" \
--workspace $LAW_ID \
--logs '[
{"category": "AzureBackupReport", "enabled": true},
{"category": "CoreAzureBackup", "enabled": true},
{"category": "AddonAzureBackupJobs", "enabled": true},
{"category": "AddonAzureBackupAlerts", "enabled": true},
{"category": "AddonAzureBackupPolicy", "enabled": true},
{"category": "AddonAzureBackupStorage", "enabled": true},
{"category": "AddonAzureBackupProtectedInstance", "enabled": true}
]'DR Metrics Summary β
| Metric | Target | Configuration |
|---|---|---|
| RPO (Recovery Point Objective) | 15 minutes | ASR continuous replication |
| RTO (Recovery Time Objective) | < 2 hours | Recovery plan automation |
| Backup Frequency | Daily | 2:00 AM UTC |
| Retention - Daily | 30 days | |
| Retention - Weekly | 12 weeks | |
| Retention - Monthly | 12 months | |
| Retention - Yearly | 3 years |
Cleanup β
bash
# Disable replication first
# Portal: Replicated items β Select VMs β Disable replication
# Stop backup protection
az backup protection disable \
--resource-group $RG_PRIMARY \
--vault-name rsv-primary-eastus \
--container-name $WEB_CONTAINER \
--item-name vm-web-01 \
--delete-backup-data true \
--yes
# Delete resource groups
az group delete --name $RG_PRIMARY --yes --no-wait
az group delete --name $RG_DR --yes --no-wait
echo "Cleanup initiated"Key Takeaways β
- RPO vs RTO: Understand business requirements
- GRS Storage: Automatic geo-replication of backups
- ASR: Near-zero RPO with continuous replication
- Recovery Plans: Orchestrated failover with automation
- Test Failovers: Regular DR testing is essential
- Cross-Region Restore: Recover anywhere, anytime
Next Steps β
- Implement Azure Backup for Azure SQL
- Configure backup for Azure Files
- Set up Azure Automation runbooks for DR
- Implement Azure Traffic Manager for DNS failover