🔥 Palo Alto Networks: Restore Failed Active Firewall with RMA Device
When the active firewall fails in an Active/Passive HA pair, the replacement procedure is different from a standard restore. Because the passive firewall has already taken over as active, your goal is to rebuild the failed unit so it can rejoin the pair as the new passive member — without causing network disruption or split-brain scenarios.
📋 Table of Contents
- Scenario Overview
- Critical Pre-Work
- Phase 1: Prepare Replacement Unit
- Phase 2: Obtain & Import Device State
- Phase 3: Critical HA Safety Configuration
- Phase 4: Connect & Synchronize
- Phase 5: Finalize & Verify
- Panorama-Specific Steps
- Troubleshooting
- Quick Reference
🔄 Scenario Overview
BEFORE FAILURE AFTER FAILURE
┌─────────────────┐ ┌─────────────────┐
│ ACTIVE (A) │◄── FAILS │ ACTIVE (A) │──► FAILED
│ Priority: 100 │ │ Priority: 100 │ (Needs RMA)
│ 192.168.1.1 │ │ 192.168.1.1 │
└────────┬────────┘ └─────────────────┘
│
│ HA Links
│
┌────────┴────────┐ ┌─────────────────┐
│ PASSIVE (B) │ │ ACTIVE (B) │◄── PROMOTED
│ Priority: 200 │ │ Priority: 200 │
│ 192.168.1.2 │ │ 192.168.1.2 │
└─────────────────┘ └────────┬────────┘
│
│ HA Links (Disconnected)
│
▼
┌─────────────────┐
│ REPLACEMENT │◄── NEW RMA
│ (New Serial) │ Goal: Passive
│ Temp IP: 1.3 │
└─────────────────┘
🔑 Key Principle: Use the Device State file exported from the currently active (formerly passive) firewall. This preserves HA peer runtime and sync info that a simple named configuration backup would lose.
⚠️ Critical Pre-Work Before Starting
1. Verify Surviving Peer Stability
- Confirm the passive firewall has successfully taken over active duties
- Verify traffic is flowing normally through the current active unit
- Check Dashboard > High Availability widget shows healthy status
2. Document Current Settings
Take screenshots of the following from the current active firewall:
| Setting | Location |
|---|---|
| HA General Settings | Device > High Availability > General |
| Device Priority & Preemption | Device > High Availability > Election Settings |
| Management Interface IP | Device > Setup > Management |
| Hostname of Failed Unit | Device > Setup > Management > Hostname |
⚠️ Warning: Do NOT power off or disrupt the current active firewall during this process. It is the only source of truth for your HA configuration.
🔧 Phase 1: Prepare Replacement Unit
Step 1: Register & License New Device
- Log into Palo Alto Support Portal
- Transfer licenses from old (failed) serial number to new (replacement) serial number
- On the new device:
Device > Licenses > Retrieve license keys from license server
Step 2: Configure Temporary Network Access
# Use a TEMPORARY unique IP initially to avoid conflicts # Example: If failed unit was 192.168.1.1, use 192.168.1.3 temporarily configure set deviceconfig system ip-address 192.168.1.3 netmask 255.255.255.0 set deviceconfig system default-gateway 192.168.1.254 set deviceconfig system dns-setting servers primary 8.8.8.8 commit
✅ Verify: Test internet connectivity:
ping source 192.168.1.3 host updates.paloaltonetworks.com
Step 3: Match Software Versions
| Component | Command / Path | Must Match Peer? |
|---|---|---|
| PAN-OS Version | Device > Software |
YES - Critical |
| Applications Database | Device > Dynamic Updates > Applications |
YES - Critical |
| Threat Database | Device > Dynamic Updates > Threats |
YES - Critical |
| Antivirus Database | Device > Dynamic Updates > Anti-Virus |
Recommended |
# CLI: Install latest content updates request content upgrade install version latest request anti-virus upgrade install version latest
Step 4: Match Special Settings
- Multi-VSYS mode:
Device > VSYS - Jumbo Frames:
Device > Setup > Session > Session Settings - FIPS/CC Mode:
Device > Setup > Management > HSM/FIPS
📥 Phase 2: Obtain & Import Device State
Step 1: Export Device State from Current Active
┌─────────────────────────────────────────┐ │ Current Active Firewall (Formerly │ │ Passive) - Source of Truth │ │ │ │ GUI: Device > Setup > Operations │ │ ↓ │ │ [Export Device State] │ │ ↓ │ │ Save: device_state_active.tgz │ └─────────────────────────────────────────┘
- Log into the web interface of the currently active (formerly passive) firewall
- Navigate to
Device > Setup > Operations - Click Export Device State and save the
.tgzfile
💡 Why Device State? Unlike a named configuration backup, Device State includes:
- HA peer runtime and sync information
- IPSec key material
- Certificate private keys
- Master key configuration
Step 2: Import to Replacement Unit
# On the NEW replacement firewall: Device > Setup > Operations > Import Device State Select file: device_state_active.tgz
🛑 CRITICAL: Do NOT commit yet! The imported Device State contains the old management IP and hostname. Committing now would cause an IP conflict with the current active firewall.
🛡️ Phase 3: Critical HA Safety Configuration
Before connecting to the network, you must force the replacement to stay passive to prevent split-brain or election conflicts.
Step 1: Disable Config Sync
configure set deviceconfig high-availability group setup config-synchronization disabled
Step 2: Disable Preemption
set deviceconfig high-availability group election-option preemptive no
Step 3: Set Highest Device Priority
# Lower number wins election, so HIGHER number = stays passive # Current active is likely 100, set replacement to 255 set deviceconfig high-availability group election-option priority 255
Step 4: Restore Identity Settings
Change these to match the failed unit's original settings (from your screenshots):
# Restore original management IP (was temporary during setup) set deviceconfig system ip-address 192.168.1.1 netmask 255.255.255.0 # Restore original hostname set deviceconfig system hostname FW-Active-Original # Restore HA peer IP (IP of current active unit) set deviceconfig high-availability group peer-ip 192.168.1.2
Step 5: Force Commit
# Force commit to apply all changes commit force
⚠️ Verify Before Proceeding: Confirm the replacement unit now shows:
- Hostname: Original failed unit's hostname
- Management IP: Original failed unit's IP
- HA Priority: 255 (or higher than current active)
- Config Sync: Disabled
- Preemption: Disabled
🔗 Phase 4: Connect & Synchronize
Connection Sequence (Critical!)
┌─────────────────────────────────────────────────────────────┐ │ CONNECTION ORDER (DO NOT SKIP!) │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Step 1: Connect HA1 Control Link ONLY │ │ ├─ Verify HA control plane communication │ │ └─ Check: show high-availability state │ │ │ │ Step 2: Verify Config Sync Status │ │ ├─ Dashboard > HA widget │ │ └─ CLI: show high-availability all │ │ │ │ Step 3: Connect HA2 Data Link │ │ ├─ Enables session synchronization │ │ └─ Verify: show session all filter ha │ │ │ │ Step 4: Enable Config Sync on Replacement │ │ ├─ Device > HA > General > Enable Config Sync │ │ └─ Commit │ │ │ │ Step 5: Connect Data Plane Interfaces │ │ ├─ Only after confirming sync is healthy │ │ └─ Monitor for traffic disruption │ │ │ └─────────────────────────────────────────────────────────────┘
Step 1: Connect HA1 Control Link Only
# Verify HA communication show high-availability state # Expected output: HA State: State: passive (HA is functional but not active) Peer State: active
Step 2: Verify Configuration Synchronization
# Check running config sync status show high-availability all | match "Running Configuration" # Verify no idmgr differences debug device-server dump idmgr high-availability state
Step 3: Enable Config Sync & Synchronize
configure set deviceconfig high-availability group setup config-synchronization enabled commit # From current active unit, push config to peer if needed request high-availability sync-to-remote running-config
Step 4: Connect Remaining Links
- Connect HA2 link for session synchronization
- Verify session sync:
show session all filter ha - Connect data plane interfaces only after confirming sync is healthy
- Monitor traffic:
show counter global filter delta yes
✅ Phase 5: Finalize & Verify
Step 1: Restore Original HA Design
# If your design requires preemption, re-enable it configure set deviceconfig high-availability group election-option preemptive yes # Adjust priority back to original (e.g., 100 if this should be active eventually) set deviceconfig high-availability group election-option priority 100 commit
Step 2: Final Verification Checklist
| Check | Command / Location | Expected Result |
|---|---|---|
| HA Status | show high-availability state |
Passive, Peer = Active |
| Config Sync | show high-availability all |
"Running Configuration: synchronized" |
| Session Sync | show session all filter ha |
Sessions syncing to peer |
| Traffic Flow | show counter global |
Counters incrementing |
| VPN Tunnels | show vpn ipsec-sa |
Tunnels established (if applicable) |
🌐 Panorama-Specific Steps
If your firewalls are managed by Panorama, you must update the serial number mapping before the replacement connects.
Replace Serial Number in Panorama
# Connect to Panorama CLI ssh admin@panorama.company.com # Replace old failed serial with new replacement serial replace device old 007200000123 new 007200000456 # Commit changes commit
⏰ Timing: Do this before connecting the replacement firewall to the network. If Panorama still references the old serial, it will reject the new device's connection attempts.
Verify Panorama Registration
# On Panorama show devices all # Verify new serial appears with correct hostname
🛠️ Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| "HA peer not detected" | HA1 link down or wrong port | Verify HA1 cable and port mapping matches original setup |
| "Config sync failed" | PAN-OS version mismatch | Upgrade/downgrade replacement to exact peer version |
| "Commit failed - Master Key" | Original master key changed from default | Set master key before import: request system master-key set |
| "Panorama not connecting" | Serial number not updated in Panorama | Run replace device old <SN> new <SN> on Panorama |
| "Split-brain detected" | Both units think they are active | Immediately disable HA on replacement, check priority/preemption |
| "IP conflict" alarms | Management IP not changed before commit | Use temporary IP during setup, restore original before final commit |
Emergency: Break Split-Brain
# On replacement unit (if it incorrectly became active): configure set deviceconfig high-availability group enable no commit # Fix priority/settings, then re-enable HA set deviceconfig high-availability group enable yes set deviceconfig high-availability group election-option priority 255 commit
📖 Quick Command Reference
| Task | CLI Command |
|---|---|
| Check HA state | show high-availability state |
| Check HA all details | show high-availability all |
| Sync config to peer | request high-availability sync-to-remote running-config |
| Suspend HA (maintenance) | request high-availability state suspend |
| Resume HA | request high-availability state functional |
| Export device state | scp export device-state to user@host:path (or GUI) |
| Force commit | commit force |
| Replace device in Panorama | replace device old <SN> new <SN> |
| Send gratuitous ARP | test arp gratuitous ip <ip> interface <iface> |
🔗 Reference
- Palo Alto HA Administration Guide
- KB: How to Replace a Failed Firewall in an HA Pair
- Panorama CLI Device Management
✅ Success Criteria: After completing this procedure, you should have:
- Current active firewall: Still active, processing traffic (zero downtime)
- Replacement firewall: Joined as passive, fully synchronized
- HA pair: Healthy with config sync and session sync operational
- Panorama: Updated with new serial number (if applicable)
💡 Pro Tip: Schedule a non-disruptive failover test 24-48 hours after restoration to verify the replacement unit can successfully take over as active when needed.
Comments
Post a Comment