Fortigate Firewall HA

Commands:

get system status | grep HA

get system ha status
diagnose sys ha status
diag sys ha dump-by group  << (see the uptime)
diag sys ha reset-uptime       << Trigger failover, when override is disable (default)

exe ha manage [0 | 1] admin  << Connect to standby unit

Another way to manual failover. 
      exe ha failover status
       exe ha failover unset 1
       exe ha failover set 1


2 to 4 FortiGate devices to form a cluster 

A cluster includes one device that acts as the primary FortiGate (also called the active FortiGate). The primary synchronizes its configuration, session information, FIB entries, FortiGuard definitions, and other operation-related information to the secondary devices, which are also known as standby devices.

The cluster shares one or more heartbeat interfaces among all devices—also known as members—for
synchronizing data and monitoring the health of each member.

In either of A-P (active-passive) or A-A (active-active) HA operation modes, the operation information (configuration, sessions, FIB entries, and so on) of the primary FortiGate is synchronized with secondary devices. 

In A-P mode, the primary FortiGate is the only FortiGate that actively processes traffic. Secondary FortiGate devices remain in passive mode, monitoring the status of the primary device.
If a problem is detected on the primary FortiGate, one of the secondary devices takes over the primary role.

Like A-P HA, in A-A HA, the operation-related data of the primary FortiGate is synchronized to the secondary FortiGate devices. Also, if a problem is detected on the primary device, one of the secondary devices takes over the role of the primary, to process the traffic.
However, one of the main differences from active-passive mode is that in active-active mode, all cluster
members can process traffic. That is, based on the HA settings and traffic type, the primary FortiGate can distribute sessions to the secondary devices.

FortiGate HA offers several solutions for adding redundancy in the case where a failure occurs on the FortiGate, or is detected by the FortiGate through monitored links, routes, and other health checks.

FortiGate Clustering Protocol (FGCP)

All FortiGates in the cluster must be the same model, same firmware installed, and same hardware configuration (such as the same number of hard disks).

Critical cluster components

The following are critical components in an HA cluster:

  •  Heartbeat connections: members will use this to communicate with each other. In general, a two-member cluster is most common. We recommend double back-to-back heartbeat connections.
  •  Identical connections for internal and external interfaces: as demonstrated in the topology, we recommend similar connections from each member to the switches for the cluster to function properly.

The following are best practices for general cluster operation:

  • Ensure that heartbeat communication is present.
  • Enable the session synchronization option in daily operation.
  •  Monitor traffic flowing in and out of the interfaces.

FortiGate Session Life Support Protocol (FGSP)

  The external load balancers or routers can distribute sessions among the FortiGates and the FGSP performs session synchronization


VRRP

FortiGates can function as primary or backup Virtual Router Redundancy Protocol (VRRP) routers. The FortiGates can quickly and easily integrate into a network that has already deployed VRRP.


Failover

FGCP provides failover protection in the following scenarios:

  •  The active device loses power.
  •  A monitored interface loses a connection.


Synchronizing the configuration

The following settings are not synchronized between cluster units:
  • The FortiGate host name
  • GUI Dashboard widgets
  • HA override
  • HA device priority
  • The virtual cluster priority
  • The HA priority setting for a ping server (or dead gateway detection) configuration
  • The system interface settings of the HA reserved management interface
  • The HA default route for the reserved management interface, set using the ha-mgmt-interface-gateway
  • option of the config system ha command


Active/Passive HA

2-4 devices
1 (prefer 2) HA link



Summary:

A-P HA only has one IP for each interface pair, active on the primary unit, when failover occurs, IP and virtual MAC address are moved to new primary unit.
Primary is the active (master, work)  firewall, Secondary is the standby firewall.

HB link is between the two units.

By default, device has higher uptime (more than 5 minutes) is primary. 

To prefer an unit as the primary whenever possible, we can enable Override and set higher priority on this unit., 
To check Device Priority before check uptime: enable HA Override and set Device Priority, this is not recommended, (Override is Preempt). HA Uptime is the device uptime difference when two units join the cluster, so one is always 0, another is only taken account when it is bigger than 300 seconds. HA uptime is NOT system uptime.

HA Override is disabled (by default), when two new units boot up at almost same time to build a cluster, Device Priority determine the primary unit; When one unit in cluster is failed and a replacement  unit joins the cluster, the existing unit normally has longer uptime, so the replacement unit will join as Secondary.

in A-A HA, only sessions that are subject to proxy inspection are distributed to secondary devices. If you want to force the distribution of sessions that are subject to flow inspection or no inspection at all, then you must enable the load-balance-all setting under HA configuration—this setting is disabled by default.



Check HA uptime:

diag sys ha dump-by group
or
diag sys ha dump-by vcluster

'FGVMEVB0DJ6EXX57': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=0/0
        'FGVMEVWNBN-LQQ11': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, mem_failover=0, uptime/reset_cnt=7/0

0 is the device with lowest HA uptime. another one in this example has HA uptime 7 seconds longer, it won't change unless FW uptime-reset or reboot.

If a monitoring interface fails, or a member reboots, the HA uptime for that member is reset to 0.


Command on Primary FW to reset HA uptime will trigger a manual failover (only when uptime is more than 300 seconds.)

diag sys ha reset-uptime


reset-uptime on the "0" HA uptime unit, will make another unit HA uptime increase.


Check system uptime:

get system performance status

===CLI======

1. Configure the first unit

FortiGate-VM64-KVM # config system ha
FortiGate-VM64-KVM (ha) # set group-id 10
FortiGate-VM64-KVM (ha) # set group-name FORTI_HA
FortiGate-VM64-KVM (ha) # set mode a-p
FortiGate-VM64-KVM (ha) # set password 123456
FortiGate-VM64-KVM (ha) # set hbdev port4 0
FortiGate-VM64-KVM (ha) # set priority 200        !!highest wins
FortiGate-VM64-KVM (ha) # set monitor port4
FortiGate-VM64-KVM (ha) # end

FortiGate-VM64-KVM # config system interface
FortiGate-VM64-KVM (interface) # edit port1
FortiGate-VM64-KVM (port1) # set mode static
FortiGate-VM64-KVM (port1) # set ip 192.168.2.101/24
FortiGate-VM64-KVM (port1) # set allowaccess http
FortiGate-VM64-KVM # config router static
FortiGate-VM64-KVM (static) # edit 1
FortiGate-VM64-KVM (1) # set dst 0.0.0.0/0
FortiGate-VM64-KVM (1) # set gateway 192.168.2.1
FortiGate-VM64-KVM (1) # set device port1
FortiGate-VM64-KVM (1) # end
FortiGate-VM64-KVM # config system global
FortiGate-VM64-KVM (global) # set hostname Forti-HA-1
FortiGate-VM64-KVM (global) # end
Forti-HA-1 # 


2. Configure the 2nd unit
FortiGate-VM64-KVM # config system ha
FortiGate-VM64-KVM (ha) # set group-id 10
FortiGate-VM64-KVM (ha) # set group-name FORTI_HA
FortiGate-VM64-KVM (ha) # set mode a-p
FortiGate-VM64-KVM (ha) # set password 123456
FortiGate-VM64-KVM (ha) # set hbdev port4 0
FortiGate-VM64-KVM (ha) # set priority 100
FortiGate-VM64-KVM (ha) # set monitor port4
FortiGate-VM64-KVM (ha) # end
FortiGate-VM64-KVM # config system global
FortiGate-VM64-KVM (global) # set hostname Forti-HA-2
FortiGate-VM64-KVM (global) # end
Forti-HA-2 # 

Wait about one minute!!

===============CLI + GUI==========

1. Config IP on an interface for GUI access
FortiGate-VM64-KVM # config system interface
FortiGate-VM64-KVM (interface) # edit port2
FortiGate-VM64-KVM (port1) # set mode static
FortiGate-VM64-KVM (port1) # set ip 10.10.0.1/24
FortiGate-VM64-KVM (port1) # set allowaccess http https ssh ping

2. Login GUI change hostname














3. Set Active-Passive HA mode, increase priority from default 128 to 250, set Group name and password, enabled session pickup, specify heartbeat interfaces and its priority, change Group ID from cli if needed.



















4. On unit-2,  configure a temporary IP on an interface for GUI access
FortiGate-VM64-KVM # config system interface
FortiGate-VM64-KVM (interface) # edit port2
FortiGate-VM64-KVM (port1) # set mode static
FortiGate-VM64-KVM (port1) # set ip 10.10.0.2/24
FortiGate-VM64-KVM (port1) # set allowaccess http https ssh ping

5. Change 2nd unit hostname















6. Same with step 3 except Device priority changes from default 128 to 50. Connection to unit-2 will be lost since IP is overwritten by the unit-1.

7. Connect the primary unit to verify HA status, wait a few minutes for HA status to be stabilized.











Add HA Widget if is not on dashboard.

















8. If need connect to secondary CLI:
on the primary unit
Unit-1 # exe ha manage [0 | 1] admin 
Warning: Permanently added '169.254.0.1' (ED25519) to the list of known hosts.
admin@169.254.0.1's password: 
Unit-2 # 
Unit-2 # exit
Connection to 169.254.0.1 closed.

Unit-1# 

================

Verification

diagnose sys ha status
diag sys ha dump-by group  << (see the uptime)
get system ha status


Forti-HA-1 # diagnose sys ha status
HA information
Statistics
        traffic.local = s:0 p:572 b:79778
        traffic.total = s:0 p:572 b:79778
        activity.ha_id_changes = 2
        activity.fdb  = c:0 q:0

Model=80001, Mode=2 Group=10 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_primary=1.
FGVMEVX455AXV9CD:      Primary, serialno_prio=0, usr_priority=200, hostname=Forti-HA-1
FGVMEVNWQAHVSW68:    Secondary, serialno_prio=1, usr_priority=100, hostname=Forti-HA-2

[Kernel HA information]
vcluster 1, state=work, primary_ip=169.254.0.1, primary_id=0:
FGVMEVX455AXV9CD:      Primary, ha_prio/o_ha_prio=0/0
FGVMEVNWQAHVSW68:    Secondary, ha_prio/o_ha_prio=1/1

Note:
The IP address that is assigned to a heartbeat interface depends on the serial number priority of the member. Higher serial number has  serialno_prio=0, therefore has the 1st IP 169.254.0.1, When a Foritgate join or leave cluster, the IP may change.


Forti-HA-2 # diagnose sys ha status
HA information
Statistics
        traffic.local = s:0 p:472 b:52695
        traffic.total = s:0 p:472 b:52695
        activity.ha_id_changes = 1
        activity.fdb  = c:0 q:0

Model=80001, Mode=2 Group=10 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_primary=0.
FGVMEVNWQAHVSW68:    Secondary, serialno_prio=1, usr_priority=100, hostname=Forti-HA-2
FGVMEVX455AXV9CD:      Primary, serialno_prio=0, usr_priority=200, hostname=Forti-HA-1

[Kernel HA information]
vcluster 1, state=standby, primary_ip=169.254.0.1, primary_id=0:
FGVMEVNWQAHVSW68:    Secondary, ha_prio/o_ha_prio=1/1
FGVMEVX455AXV9CD:      Primary, ha_prio/o_ha_prio=0/0

diag sys ha checksum show



Unit-1# get system ha status 
HA Health Status: OK
Model: FortiGate-VM64-KVM
Mode: HA A-P
Group: 0
Debug: 0
Cluster Uptime: 0 days 4:34:22
Cluster state change time: 2021-09-05 08:43:00
Primary selected using:
    <2021/09/05 08:43:00> FGVMEVN03HJ2E443 is selected as the primary because it has the largest value of uptime.
****in case manual failover
<2021/09/05 13:33:39> FGVMEVN03HJ2E443 is selected as the primary because it has EXE_FAIL_OVER flag set.
****
    <2021/09/05 08:32:31> FGVMEVN03HJ2E443 is selected as the primary because it's the only member in the cluster.
ses_pickup: disable
override: disable
Configuration Status:
    FGVMEVN03HJ2E443(updated 1 seconds ago): in-sync
    FGVMEVQDPJRLEY84(updated 3 seconds ago): in-sync
System Usage stats:
    FGVMEVN03HJ2E443(updated 1 seconds ago):
        sessions=22, average-cpu-user/nice/system/idle=1%/0%/0%/98%, memory=66%
    FGVMEVQDPJRLEY84(updated 3 seconds ago):
        sessions=1, average-cpu-user/nice/system/idle=0%/0%/1%/98%, memory=65%
HBDEV stats:
    FGVMEVN03HJ2E443(updated 1 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=32938300/97349/0/0, tx=40363497/100154/0/0
        port4: physical/10000full, up, rx-bytes/packets/dropped/errors=29966288/79069/0/0, tx=31342655/82309/0/0
    FGVMEVQDPJRLEY84(updated 3 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=39829733/98733/0/0, tx=32933781/97337/0/0
        port4: physical/10000full, up, rx-bytes/packets/dropped/errors=30808797/80887/0/0, tx=29962119/79058/0/0
Primary     : Unit-1         , FGVMEVN03HJ2E443, HA cluster index = 1
Secondary   : Unit-2       , FGVMEVQDPJRLEY84, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Primary: FGVMEVN03HJ2E443, HA operating index = 0
Secondary: FGVMEVQDPJRLEY84, HA operating index = 1




4. When 1st unit is down
2nd unit becomes the primary unit, see also from GUI

Forti-HA-2 # diagnose sys ha status
HA information
Statistics
        traffic.local = s:0 p:557 b:62150
        traffic.total = s:0 p:557 b:62150
        activity.ha_id_changes = 2
        activity.fdb  = c:0 q:0

Model=80001, Mode=2 Group=10 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_primary=1.
FGVMEVNWQAHVSW68:      Primary, serialno_prio=0, usr_priority=100, hostname=Forti-HA-2

[Kernel HA information]
vcluster 1, state=work, primary_ip=169.254.0.1, primary_id=0:
FGVMEVNWQAHVSW68:      Primary, ha_prio/o_ha_prio=0/0

Secondary # get sys ha status
HA Health Status: 
    ERROR: FGVMEVN03HJ2E443 is lost @ 2021/09/05 13:15:24
Model: FortiGate-VM64-KVM
Mode: HA A-P
Group: 0
Debug: 0
Cluster Uptime: 0 days 4:44:54
Cluster state change time: 2021-09-05 13:15:24
Primary selected using:
    <2021/09/05 13:15:24> FGVMEVQDPJRLEY84 is selected as the primary because it's the only member in the cluster.
    <2021/09/05 08:43:00> FGVMEVN03HJ2E443 is selected as the primary because it has the largest value of uptime.
ses_pickup: disable
override: disable
System Usage stats:
    FGVMEVQDPJRLEY84(updated 0 seconds ago):
        sessions=34, average-cpu-user/nice/system/idle=0%/0%/0%/96%, memory=66%
HBDEV stats:
    FGVMEVQDPJRLEY84(updated 0 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=41279467/102130/0/0, tx=34251138/101291/0/0
        port4: physical/10000full, up, rx-bytes/packets/dropped/errors=31817304/83534/0/0, tx=31167979/82233/0/0
Primary     : Secondary       , FGVMEVQDPJRLEY84, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.1
Primary: FGVMEVQDPJRLEY84, HA operating index = 0





5. When the 1st unit come back, HA status doesn't change, no preempt. 

6. Force (manual) failover
    6.1 check current unit if failover flag is set
      exe ha failover status
    6.2 if  failover flag is set, use unset command to trigger failover
       exe ha failover unset 1
    6.3 if  failover flag is unset, use set command to trigger failover
      exe ha failover set 1

     Also can use this command when override is disabled and HA uptime > 300
       diag sys ha reset-uptime 


7. HA dedicated management interface (out of band)

 Example: Port 3 for HA dedicated management port

1. on active FG-A console

config system ha
    set ha-mgmt-status enable
    config ha-mgmt-interfaces
        edit 1
            set interface "port3"
            set gateway 192.168.100.1
            end


or in GUI if have access


2. on standby FG-B console or management session, verify above configure is synched

3. on FG-A  console

config system interface
    edit "port3"
        set ip 192.168.100.101 255.255.255.0
        set allowaccess ping https ssh http
        set alias HA-MGMT
        end    

4. on standby FG-B console or management session

config system interface
    edit "port3"
        set ip 192.168.100.102 255.255.255.0
        set allowaccess ping https ssh http
        set alias HA-MGMT
        end   
.


8. Run command on another unit
    exe ha manage

9. Upgrade
    upload new firmware to primary only.

10. Show failover history from HA Widget















11. Session failover (session-pickup)
config system ha
set session-pickup enable
end


12. In-band management 

    config system interface
    edit <port name>
    set management-ip <ip/mask>




==============

Primary Election


1. When Override is Enabled
config system ha
set override enable
end

Change HA priority to force a failover

Number of active monitored ports > Priority> HA Uptime  > Serial Number


2. When Override is Disabled (Default)
Force a failover - diag sys ha reset-uptime

Uptime difference need be 5 minutes longer to be considered in election  

Number of active monitored ports > HA Uptime > Priority > Serial Number




=================

HA Firmware Updates

1. Upload new firmware to the primary.
2. The cluster will upgrades all secondary.
3. A new primary is elected.
4. The cluster upgrade the former primary 



========Loopback for HA unit Management====

1. config system interface
    edit MGMT
    set type loopback
    set ip x.x.x.x/y
    set vdom root.

    config system ha
     set ha-mgmt-status enable
     config ha-mgmt-interfaces
        edit 1
        set interface MGMT
       

2. In GUI interface configuration, Enable [Dedicate Management Port]
    same IP show up on both units.


=======

FW with dedicated MGMT port


config system dedicated-mgmt
show
get  (by default its disabled, means MP and DP  can access each other)

to enable it:
config system dedicated-mgmt
  set status enable
  set interface mgmt
  end

then MGMT port is then removed from interface list, and moved to a new vdom "dmgmt-vdom"


When MGMT is used for HA unit dedicated management port,  no need to enable dedicated-mgmt, to configure it for HA:

config system ha   
 config ha-mgmt-interfaces
        edit 1
            set interface "mgmt"
            set gateway x.x.x.x
        next
    end

HA unit with dedicated MGMT port (Outband) configuration 

VM lab, version 7.2.0

FG-C and FG-D has port3 dedicated for HA unit management, suitable for network has dedicated MGMT VLAN.
port4 and port5 for heartbeat

From FG-C console:

config system global
    set hostname "FG-C"

config system ha
    set group-id 11
    set group-name FG-HA
    set mode a-p
    set password 123456
    set hbdev port4 0 port5 1
    set session-pickup enable
    set priority 200
    set ha-mgmt-status enable
    config ha-mgmt-interfaces
        edit 1
            set interface port3
            set gateway 192.168.100.1
        end
   end

    

config system interface
    edit "port3"
        set ip 192.168.100.103 255.255.255.0
        set alias FG-C-MGMT
        set allowaccess ping https ssh http
     end


verify FG-C GUI is accessible


From FG-D console:

config system global
    set hostname FG-D
    end

config system ha
    set group-id 11
    set group-name FG-HA
    set mode a-p
    set password 123456
    set hbdev port4 0 port5 1
    set session-pickup enable
    set priority 100

    config ha-mgmt-interfaces
        edit 1
            set gateway 192.168.100.1
        end
    end

config system interface
    edit port3
        set ip 192.168.100.104 255.255.255.0
        set alias FG-D-MGMT
        set allowaccess ping https ssh http
     end

verify FG-D GUI is accessible



HA unit with dedicated MGMT IP (Inband) configuration 

VM lab, version 7.2.0

FG-A and FG-B has LAN on port1, float IP is 10.1.1.1, which is LAN default gateway, each unit has its dedicated management IP added to port1.  suitable for network without a dedicated MGMT VLAN, so manage FW from internal LAN.
port4 and port5 for heartbeat.

From FG-A console:

config system global
    set hostname "FG-A"
end

config system ha
    set group-id 10
    set group-name "FG-Group10"
    set mode a-p
    set password 123456
    set hbdev "port4" 50 "port5" 50
    set session-pickup enable
    set priority 200
end

config system interface
    edit "port2"
        set management-ip 10.1.1.11 255.255.255.0
        set ip 10.1.1.1 255.255.255.0
        set allowaccess ping https ssh http
    next
end



From FG-B console:

config system global
    set hostname "FG-B"
 end

config system ha
    set group-id 10
    set group-name "FG-Group10"
    set mode a-p
    set password 123456
    set hbdev "port4" 50 "port5" 50
    set session-pickup enable
    set priority 100
end

config system interface
    edit "port2"
        set management-ip 10.1.1.12 255.255.255.0
      next
end


Comments