Palo Alto Networks: Restore Failed Active Firewall with RMA Device

 

🔥 Palo Alto Networks: Restore Failed Active Firewall with RMA Device

When the active firewall fails in an Active/Passive HA pair, the replacement procedure is different from a standard restore. Because the passive firewall has already taken over as active, your goal is to rebuild the failed unit so it can rejoin the pair as the new passive member — without causing network disruption or split-brain scenarios.


📋 Table of Contents

  1. Scenario Overview
  2. Critical Pre-Work
  3. Phase 1: Prepare Replacement Unit
  4. Phase 2: Obtain & Import Device State
  5. Phase 3: Critical HA Safety Configuration
  6. Phase 4: Connect & Synchronize
  7. Phase 5: Finalize & Verify
  8. Panorama-Specific Steps
  9. Troubleshooting
  10. Quick Reference

🔄 Scenario Overview

        BEFORE FAILURE                    AFTER FAILURE
    ┌─────────────────┐              ┌─────────────────┐
    │   ACTIVE (A)    │◄── FAILS     │   ACTIVE (A)    │──► FAILED
    │  Priority: 100  │              │  Priority: 100  │    (Needs RMA)
    │  192.168.1.1    │              │  192.168.1.1    │
    └────────┬────────┘              └─────────────────┘
             │                               
             │ HA Links                      
             │                               
    ┌────────┴────────┐              ┌─────────────────┐
    │   PASSIVE (B)   │              │   ACTIVE (B)    │◄── PROMOTED
    │  Priority: 200  │              │  Priority: 200  │
    │  192.168.1.2    │              │  192.168.1.2    │
    └─────────────────┘              └────────┬────────┘
                                              │
                                              │ HA Links (Disconnected)
                                              │
                                              ▼
                                    ┌─────────────────┐
                                    │  REPLACEMENT    │◄── NEW RMA
                                    │  (New Serial)   │    Goal: Passive
                                    │  Temp IP: 1.3   │
                                    └─────────────────┘
🔑 Key Principle: Use the Device State file exported from the currently active (formerly passive) firewall. This preserves HA peer runtime and sync info that a simple named configuration backup would lose.

⚠️ Critical Pre-Work Before Starting

1. Verify Surviving Peer Stability

  • Confirm the passive firewall has successfully taken over active duties
  • Verify traffic is flowing normally through the current active unit
  • Check Dashboard > High Availability widget shows healthy status

2. Document Current Settings

Take screenshots of the following from the current active firewall:

Setting Location
HA General Settings Device > High Availability > General
Device Priority & Preemption Device > High Availability > Election Settings
Management Interface IP Device > Setup > Management
Hostname of Failed Unit Device > Setup > Management > Hostname
⚠️ Warning: Do NOT power off or disrupt the current active firewall during this process. It is the only source of truth for your HA configuration.

🔧 Phase 1: Prepare Replacement Unit

Step 1: Register & License New Device

  1. Log into Palo Alto Support Portal
  2. Transfer licenses from old (failed) serial number to new (replacement) serial number
  3. On the new device: Device > Licenses > Retrieve license keys from license server

Step 2: Configure Temporary Network Access

# Use a TEMPORARY unique IP initially to avoid conflicts
# Example: If failed unit was 192.168.1.1, use 192.168.1.3 temporarily

configure
set deviceconfig system ip-address 192.168.1.3 netmask 255.255.255.0
set deviceconfig system default-gateway 192.168.1.254
set deviceconfig system dns-setting servers primary 8.8.8.8
commit
✅ Verify: Test internet connectivity: ping source 192.168.1.3 host updates.paloaltonetworks.com

Step 3: Match Software Versions

Component Command / Path Must Match Peer?
PAN-OS Version Device > Software YES - Critical
Applications Database Device > Dynamic Updates > Applications YES - Critical
Threat Database Device > Dynamic Updates > Threats YES - Critical
Antivirus Database Device > Dynamic Updates > Anti-Virus Recommended
# CLI: Install latest content updates
request content upgrade install version latest
request anti-virus upgrade install version latest

Step 4: Match Special Settings

  • Multi-VSYS mode: Device > VSYS
  • Jumbo Frames: Device > Setup > Session > Session Settings
  • FIPS/CC Mode: Device > Setup > Management > HSM/FIPS

📥 Phase 2: Obtain & Import Device State

Step 1: Export Device State from Current Active

┌─────────────────────────────────────────┐
│  Current Active Firewall (Formerly      │
│  Passive) - Source of Truth             │
│                                         │
│  GUI: Device > Setup > Operations       │
│       ↓                                 │
│  [Export Device State]                  │
│       ↓                                 │
│  Save: device_state_active.tgz          │
└─────────────────────────────────────────┘
  1. Log into the web interface of the currently active (formerly passive) firewall
  2. Navigate to Device > Setup > Operations
  3. Click Export Device State and save the .tgz file
💡 Why Device State? Unlike a named configuration backup, Device State includes:
  • HA peer runtime and sync information
  • IPSec key material
  • Certificate private keys
  • Master key configuration

Step 2: Import to Replacement Unit

# On the NEW replacement firewall:
Device > Setup > Operations > Import Device State
Select file: device_state_active.tgz
🛑 CRITICAL: Do NOT commit yet! The imported Device State contains the old management IP and hostname. Committing now would cause an IP conflict with the current active firewall.

🛡️ Phase 3: Critical HA Safety Configuration

Before connecting to the network, you must force the replacement to stay passive to prevent split-brain or election conflicts.

Step 1: Disable Config Sync

configure
set deviceconfig high-availability group setup config-synchronization disabled

Step 2: Disable Preemption

set deviceconfig high-availability group election-option preemptive no

Step 3: Set Highest Device Priority

# Lower number wins election, so HIGHER number = stays passive
# Current active is likely 100, set replacement to 255

set deviceconfig high-availability group election-option priority 255

Step 4: Restore Identity Settings

Change these to match the failed unit's original settings (from your screenshots):

# Restore original management IP (was temporary during setup)
set deviceconfig system ip-address 192.168.1.1 netmask 255.255.255.0

# Restore original hostname
set deviceconfig system hostname FW-Active-Original

# Restore HA peer IP (IP of current active unit)
set deviceconfig high-availability group peer-ip 192.168.1.2

Step 5: Force Commit

# Force commit to apply all changes
commit force
⚠️ Verify Before Proceeding: Confirm the replacement unit now shows:
  • Hostname: Original failed unit's hostname
  • Management IP: Original failed unit's IP
  • HA Priority: 255 (or higher than current active)
  • Config Sync: Disabled
  • Preemption: Disabled

🔗 Phase 4: Connect & Synchronize

Connection Sequence (Critical!)

┌─────────────────────────────────────────────────────────────┐
│           CONNECTION ORDER (DO NOT SKIP!)                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Step 1: Connect HA1 Control Link ONLY                      │
│          ├─ Verify HA control plane communication            │
│          └─ Check: show high-availability state              │
│                                                             │
│  Step 2: Verify Config Sync Status                          │
│          ├─ Dashboard > HA widget                          │
│          └─ CLI: show high-availability all                │
│                                                             │
│  Step 3: Connect HA2 Data Link                              │
│          ├─ Enables session synchronization                  │
│          └─ Verify: show session all filter ha               │
│                                                             │
│  Step 4: Enable Config Sync on Replacement                  │
│          ├─ Device > HA > General > Enable Config Sync       │
│          └─ Commit                                          │
│                                                             │
│  Step 5: Connect Data Plane Interfaces                      │
│          ├─ Only after confirming sync is healthy            │
│          └─ Monitor for traffic disruption                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Step 1: Connect HA1 Control Link Only

# Verify HA communication
show high-availability state

# Expected output:
HA State:
  State: passive (HA is functional but not active)
  Peer State: active

Step 2: Verify Configuration Synchronization

# Check running config sync status
show high-availability all | match "Running Configuration"

# Verify no idmgr differences
debug device-server dump idmgr high-availability state

Step 3: Enable Config Sync & Synchronize

configure
set deviceconfig high-availability group setup config-synchronization enabled
commit

# From current active unit, push config to peer if needed
request high-availability sync-to-remote running-config

Step 4: Connect Remaining Links

  1. Connect HA2 link for session synchronization
  2. Verify session sync: show session all filter ha
  3. Connect data plane interfaces only after confirming sync is healthy
  4. Monitor traffic: show counter global filter delta yes

✅ Phase 5: Finalize & Verify

Step 1: Restore Original HA Design

# If your design requires preemption, re-enable it
configure
set deviceconfig high-availability group election-option preemptive yes

# Adjust priority back to original (e.g., 100 if this should be active eventually)
set deviceconfig high-availability group election-option priority 100
commit

Step 2: Final Verification Checklist

Check Command / Location Expected Result
HA Status show high-availability state Passive, Peer = Active
Config Sync show high-availability all "Running Configuration: synchronized"
Session Sync show session all filter ha Sessions syncing to peer
Traffic Flow show counter global Counters incrementing
VPN Tunnels show vpn ipsec-sa Tunnels established (if applicable)

🌐 Panorama-Specific Steps

If your firewalls are managed by Panorama, you must update the serial number mapping before the replacement connects.

Replace Serial Number in Panorama

# Connect to Panorama CLI
ssh admin@panorama.company.com

# Replace old failed serial with new replacement serial
replace device old 007200000123 new 007200000456

# Commit changes
commit
⏰ Timing: Do this before connecting the replacement firewall to the network. If Panorama still references the old serial, it will reject the new device's connection attempts.

Verify Panorama Registration

# On Panorama
show devices all

# Verify new serial appears with correct hostname

🛠️ Troubleshooting

Issue Cause Solution
"HA peer not detected" HA1 link down or wrong port Verify HA1 cable and port mapping matches original setup
"Config sync failed" PAN-OS version mismatch Upgrade/downgrade replacement to exact peer version
"Commit failed - Master Key" Original master key changed from default Set master key before import: request system master-key set
"Panorama not connecting" Serial number not updated in Panorama Run replace device old <SN> new <SN> on Panorama
"Split-brain detected" Both units think they are active Immediately disable HA on replacement, check priority/preemption
"IP conflict" alarms Management IP not changed before commit Use temporary IP during setup, restore original before final commit

Emergency: Break Split-Brain

# On replacement unit (if it incorrectly became active):
configure
set deviceconfig high-availability group enable no
commit

# Fix priority/settings, then re-enable HA
set deviceconfig high-availability group enable yes
set deviceconfig high-availability group election-option priority 255
commit

📖 Quick Command Reference

Task CLI Command
Check HA state show high-availability state
Check HA all details show high-availability all
Sync config to peer request high-availability sync-to-remote running-config
Suspend HA (maintenance) request high-availability state suspend
Resume HA request high-availability state functional
Export device state scp export device-state to user@host:path (or GUI)
Force commit commit force
Replace device in Panorama replace device old <SN> new <SN>
Send gratuitous ARP test arp gratuitous ip <ip> interface <iface>

🔗 Reference

✅ Success Criteria: After completing this procedure, you should have:
  • Current active firewall: Still active, processing traffic (zero downtime)
  • Replacement firewall: Joined as passive, fully synchronized
  • HA pair: Healthy with config sync and session sync operational
  • Panorama: Updated with new serial number (if applicable)
💡 Pro Tip: Schedule a non-disruptive failover test 24-48 hours after restoration to verify the replacement unit can successfully take over as active when needed.

Comments