Fortigate Firewall HA

Commands:

get system status | grep HA

get system ha status
diagnose sys ha status
diag sys ha dump-by group  << (see the uptime)
diag sys ha reset-uptime       << Trigger failover, when override is disable (default)

exe ha manage [0 | 1] admin  << Connect to standby unit

Another way to manual failover. 
      exe ha failover status
       exe ha failover unset 1
       exe ha failover set 1


2 to 4 FortiGate devices to form a cluster 

A cluster includes one device that acts as the primary FortiGate (also called the active FortiGate). The primary synchronizes its configuration, session information, FIB entries, FortiGuard definitions, and other operation-related information to the secondary devices, which are also known as standby devices.

The cluster shares one or more heartbeat interfaces among all devices—also known as members—for
synchronizing data and monitoring the health of each member.

In either of A-P (active-passive) or A-A (active-active) HA operation modes, the operation information (configuration, sessions, FIB entries, and so on) of the primary FortiGate is synchronized with secondary devices. 

In A-P mode, the primary FortiGate is the only FortiGate that actively processes traffic. Secondary FortiGate devices remain in passive mode, monitoring the status of the primary device.
If a problem is detected on the primary FortiGate, one of the secondary devices takes over the primary role.

Like A-P HA, in A-A HA, the operation-related data of the primary FortiGate is synchronized to the secondary FortiGate devices. Also, if a problem is detected on the primary device, one of the secondary devices takes over the role of the primary, to process the traffic.
However, one of the main differences from active-passive mode is that in active-active mode, all cluster
members can process traffic. That is, based on the HA settings and traffic type, the primary FortiGate can distribute sessions to the secondary devices.

FortiGate HA offers several solutions for adding redundancy in the case where a failure occurs on the FortiGate, or is detected by the FortiGate through monitored links, routes, and other health checks.

FortiGate Clustering Protocol (FGCP)

All FortiGates in the cluster must be the same model, same firmware installed, and same hardware configuration (such as the same number of hard disks).

Critical cluster components

The following are critical components in an HA cluster:

  •  Heartbeat connections: members will use this to communicate with each other. In general, a two-member cluster is most common. We recommend double back-to-back heartbeat connections.
  •  Identical connections for internal and external interfaces: as demonstrated in the topology, we recommend similar connections from each member to the switches for the cluster to function properly.

The following are best practices for general cluster operation:

  • Ensure that heartbeat communication is present.
  • Enable the session synchronization option in daily operation.
  •  Monitor traffic flowing in and out of the interfaces.

FortiGate Session Life Support Protocol (FGSP)

  The external load balancers or routers can distribute sessions among the FortiGates and the FGSP performs session synchronization


VRRP

FortiGates can function as primary or backup Virtual Router Redundancy Protocol (VRRP) routers. The FortiGates can quickly and easily integrate into a network that has already deployed VRRP.


Failover

FGCP provides failover protection in the following scenarios:

  •  The active device loses power.
  •  A monitored interface loses a connection.


Synchronizing the configuration

The following settings are not synchronized between cluster units:
  • The FortiGate host name
  • GUI Dashboard widgets
  • HA override
  • HA device priority
  • The virtual cluster priority
  • The HA priority setting for a ping server (or dead gateway detection) configuration
  • The system interface settings of the HA reserved management interface
  • The HA default route for the reserved management interface, set using the ha-mgmt-interface-gateway
  • option of the config system ha command


Active/Passive HA

2-4 devices
1 (prefer 2) HA link



Summary:

A-P HA only has one IP for each interface pair, active on the primary unit, when failover occurs, IP and virtual MAC address are moved to new primary unit.
Primary is the active (master, work)  firewall, Secondary is the standby firewall.

HB link is between the two units.

By default, device has higher uptime (more than 5 minutes) is primary. To check Device Priority before check uptime: enable HA Override and set Device Priority, this is not recommended, (Override is Preempt). HA Uptime is the device uptime difference when two units join the cluster, so one is always 0, another is only taken account when it is bigger than 300 seconds. HA uptime is NOT system uptime.

HA Override is disabled (by default), when two new units boot up at almost same time to build a cluster, Device Priority determine the primary unit; When one unit in cluster is failed and a replacement  unit joins the cluster, the existing unit normally has longer uptime, so the replacement unit will join as Secondary.

in A-A HA, only sessions that are subject to proxy inspection are distributed to secondary devices. If you want to force the distribution of sessions that are subject to flow inspection or no inspection at all, then you must enable the load-balance-all setting under HA configuration—this setting is disabled by default.



Check HA uptime:

diag sys ha dump-by group
or
diag sys ha dump-by vcluster

'FGVMEVB0DJ6EXX57': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=0/0
        'FGVMEVWNBN-LQQ11': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, mem_failover=0, uptime/reset_cnt=7/0

0 is the device with lowest HA uptime. another one in this example has HA uptime 7 seconds longer.

If a monitoring interface fails, or a member reboots, the HA uptime for that member is reset to 0.


Command on Primary FW to reset HA uptime will trigger a manual failover

diag sys ha reset-uptime


Check system uptime:

get system performance status

===CLI======

1. Configure the first unit

FortiGate-VM64-KVM # config system ha
FortiGate-VM64-KVM (ha) # set group-id 10
FortiGate-VM64-KVM (ha) # set group-name FORTI_HA
FortiGate-VM64-KVM (ha) # set mode a-p
FortiGate-VM64-KVM (ha) # set password 123456
FortiGate-VM64-KVM (ha) # set hbdev port4 0
FortiGate-VM64-KVM (ha) # set priority 200
FortiGate-VM64-KVM (ha) # set monitor port4
FortiGate-VM64-KVM (ha) # end

FortiGate-VM64-KVM # config system interface
FortiGate-VM64-KVM (interface) # edit port1
FortiGate-VM64-KVM (port1) # set mode static
FortiGate-VM64-KVM (port1) # set ip 192.168.2.101/24
FortiGate-VM64-KVM (port1) # set allowaccess http
FortiGate-VM64-KVM # config router static
FortiGate-VM64-KVM (static) # edit 1
FortiGate-VM64-KVM (1) # set dst 0.0.0.0/0
FortiGate-VM64-KVM (1) # set gateway 192.168.2.1
FortiGate-VM64-KVM (1) # set device port1
FortiGate-VM64-KVM (1) # end
FortiGate-VM64-KVM # config system global
FortiGate-VM64-KVM (global) # set hostname Forti-HA-1
FortiGate-VM64-KVM (global) # end
Forti-HA-1 # 


2. Configure the 2nd unit
FortiGate-VM64-KVM # config system ha
FortiGate-VM64-KVM (ha) # set group-id 10
FortiGate-VM64-KVM (ha) # set group-name FORTI_HA
FortiGate-VM64-KVM (ha) # set mode a-p
FortiGate-VM64-KVM (ha) # set password 123456
FortiGate-VM64-KVM (ha) # set hbdev port4 0
FortiGate-VM64-KVM (ha) # set priority 100
FortiGate-VM64-KVM (ha) # set monitor port4
FortiGate-VM64-KVM (ha) # end
FortiGate-VM64-KVM # config system global
FortiGate-VM64-KVM (global) # set hostname Forti-HA-2
FortiGate-VM64-KVM (global) # end
Forti-HA-2 # 

Wait about one minute!!

===============CLI + GUI==========

1. Config IP on an interface for GUI access
FortiGate-VM64-KVM # config system interface
FortiGate-VM64-KVM (interface) # edit port2
FortiGate-VM64-KVM (port1) # set mode static
FortiGate-VM64-KVM (port1) # set ip 10.10.0.1/24
FortiGate-VM64-KVM (port1) # set allowaccess http https ssh ping

2. Login GUI change hostname














3. Set Active-Passive HA mode, increase priority from default 128 to 250, set Group name and password, enabled session pickup, specify heartbeat interfaces and its priority, change Group ID from cli if needed.



















4. On unit-2,  configure a temporary IP on an interface for GUI access
FortiGate-VM64-KVM # config system interface
FortiGate-VM64-KVM (interface) # edit port2
FortiGate-VM64-KVM (port1) # set mode static
FortiGate-VM64-KVM (port1) # set ip 10.10.0.2/24
FortiGate-VM64-KVM (port1) # set allowaccess http https ssh ping

5. Change 2nd unit hostname















6. Same with step 3 except Device priority changes from default 128 to 50. Connection to unit-2 will be lost since IP is overwritten by the unit-1.

7. Connect the primary unit to verify HA status, wait a few minutes will be like this.











Add HA Widget if is not on dashboard.

















8. If need connect to secondary CLI:
on the primary unit
Unit-1 # exe ha manage [0 | 1] admin 
Warning: Permanently added '169.254.0.1' (ED25519) to the list of known hosts.
admin@169.254.0.1's password: 
Unit-2 # 
Unit-2 # exit
Connection to 169.254.0.1 closed.

Unit-1# 

================

Verification

diagnose sys ha status
diag sys ha dump-by group  << (see the uptime)
get system ha status


Forti-HA-1 # diagnose sys ha status
HA information
Statistics
        traffic.local = s:0 p:572 b:79778
        traffic.total = s:0 p:572 b:79778
        activity.ha_id_changes = 2
        activity.fdb  = c:0 q:0

Model=80001, Mode=2 Group=10 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_primary=1.
FGVMEVX455AXV9CD:      Primary, serialno_prio=0, usr_priority=200, hostname=Forti-HA-1
FGVMEVNWQAHVSW68:    Secondary, serialno_prio=1, usr_priority=100, hostname=Forti-HA-2

[Kernel HA information]
vcluster 1, state=work, primary_ip=169.254.0.1, primary_id=0:
FGVMEVX455AXV9CD:      Primary, ha_prio/o_ha_prio=0/0
FGVMEVNWQAHVSW68:    Secondary, ha_prio/o_ha_prio=1/1

Note:
The IP address that is assigned to a heartbeat interface depends on the serial number priority of the member. Higher serial number has  serialno_prio=0, therefore has the 1st IP 169.254.0.1, When a Foritgate join or leave cluster, the IP may change.


Forti-HA-2 # diagnose sys ha status
HA information
Statistics
        traffic.local = s:0 p:472 b:52695
        traffic.total = s:0 p:472 b:52695
        activity.ha_id_changes = 1
        activity.fdb  = c:0 q:0

Model=80001, Mode=2 Group=10 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_primary=0.
FGVMEVNWQAHVSW68:    Secondary, serialno_prio=1, usr_priority=100, hostname=Forti-HA-2
FGVMEVX455AXV9CD:      Primary, serialno_prio=0, usr_priority=200, hostname=Forti-HA-1

[Kernel HA information]
vcluster 1, state=standby, primary_ip=169.254.0.1, primary_id=0:
FGVMEVNWQAHVSW68:    Secondary, ha_prio/o_ha_prio=1/1
FGVMEVX455AXV9CD:      Primary, ha_prio/o_ha_prio=0/0

diag sys ha checksum show



Unit-1# get system ha status 
HA Health Status: OK
Model: FortiGate-VM64-KVM
Mode: HA A-P
Group: 0
Debug: 0
Cluster Uptime: 0 days 4:34:22
Cluster state change time: 2021-09-05 08:43:00
Primary selected using:
    <2021/09/05 08:43:00> FGVMEVN03HJ2E443 is selected as the primary because it has the largest value of uptime.
****in case manual failover
<2021/09/05 13:33:39> FGVMEVN03HJ2E443 is selected as the primary because it has EXE_FAIL_OVER flag set.
****
    <2021/09/05 08:32:31> FGVMEVN03HJ2E443 is selected as the primary because it's the only member in the cluster.
ses_pickup: disable
override: disable
Configuration Status:
    FGVMEVN03HJ2E443(updated 1 seconds ago): in-sync
    FGVMEVQDPJRLEY84(updated 3 seconds ago): in-sync
System Usage stats:
    FGVMEVN03HJ2E443(updated 1 seconds ago):
        sessions=22, average-cpu-user/nice/system/idle=1%/0%/0%/98%, memory=66%
    FGVMEVQDPJRLEY84(updated 3 seconds ago):
        sessions=1, average-cpu-user/nice/system/idle=0%/0%/1%/98%, memory=65%
HBDEV stats:
    FGVMEVN03HJ2E443(updated 1 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=32938300/97349/0/0, tx=40363497/100154/0/0
        port4: physical/10000full, up, rx-bytes/packets/dropped/errors=29966288/79069/0/0, tx=31342655/82309/0/0
    FGVMEVQDPJRLEY84(updated 3 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=39829733/98733/0/0, tx=32933781/97337/0/0
        port4: physical/10000full, up, rx-bytes/packets/dropped/errors=30808797/80887/0/0, tx=29962119/79058/0/0
Primary     : Unit-1         , FGVMEVN03HJ2E443, HA cluster index = 1
Secondary   : Unit-2       , FGVMEVQDPJRLEY84, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Primary: FGVMEVN03HJ2E443, HA operating index = 0
Secondary: FGVMEVQDPJRLEY84, HA operating index = 1




4. When 1st unit is down
2nd unit becomes the primary unit, see also from GUI

Forti-HA-2 # diagnose sys ha status
HA information
Statistics
        traffic.local = s:0 p:557 b:62150
        traffic.total = s:0 p:557 b:62150
        activity.ha_id_changes = 2
        activity.fdb  = c:0 q:0

Model=80001, Mode=2 Group=10 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_primary=1.
FGVMEVNWQAHVSW68:      Primary, serialno_prio=0, usr_priority=100, hostname=Forti-HA-2

[Kernel HA information]
vcluster 1, state=work, primary_ip=169.254.0.1, primary_id=0:
FGVMEVNWQAHVSW68:      Primary, ha_prio/o_ha_prio=0/0

Secondary # get sys ha status
HA Health Status: 
    ERROR: FGVMEVN03HJ2E443 is lost @ 2021/09/05 13:15:24
Model: FortiGate-VM64-KVM
Mode: HA A-P
Group: 0
Debug: 0
Cluster Uptime: 0 days 4:44:54
Cluster state change time: 2021-09-05 13:15:24
Primary selected using:
    <2021/09/05 13:15:24> FGVMEVQDPJRLEY84 is selected as the primary because it's the only member in the cluster.
    <2021/09/05 08:43:00> FGVMEVN03HJ2E443 is selected as the primary because it has the largest value of uptime.
ses_pickup: disable
override: disable
System Usage stats:
    FGVMEVQDPJRLEY84(updated 0 seconds ago):
        sessions=34, average-cpu-user/nice/system/idle=0%/0%/0%/96%, memory=66%
HBDEV stats:
    FGVMEVQDPJRLEY84(updated 0 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=41279467/102130/0/0, tx=34251138/101291/0/0
        port4: physical/10000full, up, rx-bytes/packets/dropped/errors=31817304/83534/0/0, tx=31167979/82233/0/0
Primary     : Secondary       , FGVMEVQDPJRLEY84, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.1
Primary: FGVMEVQDPJRLEY84, HA operating index = 0





5. When the 1st unit come back, HA status doesn't change, no preempt. 

6. Force (manual) failover
    6.1 check current unit if failover flag is set
      exe ha failover status
    6.2 if  failover flag is set, use unset command to trigger failover
       exe ha failover unset 1
    6.3 if  failover flag is unset, use set command to trigger failover
      exe ha failover set 1

     Also can use this command when override is disabled
       diag sys ha reset-uptime 


7. HA dedicated management interface (out of band)

  Example: Port 4 for HA dedicated management port
    7.1 Set port4 alias MGMT
    7.2 System HA
   

    7.3 Go to port4 setting to config interface IP 10.1.2.101/24, this IP won't be synched to the peer.
    7.2 Go peer CLI to configure peer interface  MGMT IP
           exe has manage 1 admin
           configure system interface
           edit port4
               set ip 10.1.2.102/24
               set allowaccess http https ssh ping

8. Run command on another unit
    exe ha manage

9. Upgrade
    upload new firmware to primary only.

10. Show failover history from HA Widget















11. Session failover (session-pickup)
config system ha
set session-pickup enable
end


12. In-band management 

    config system interface
    edit <port name>
    set management-ip <ip/mask>




==============

Primary Election


1. When Override is Enabled
config system ha
set override enable
end

Change HA priority to force a failover

Number of active monitored ports > Priority> HA Uptime  > Serial Number


2. When Override is Disabled (Default)
Force a failover - diag sys ha reset-uptime

Uptime difference need be 5 minutes longer to be considered in election  

Number of active monitored ports > HA Uptime > Priority > Serial Number




=================

HA Firmware Updates

1. Upload new firmware to the primary.
2. The cluster will upgrades all secondary.
3. A new primary is elected.
4. The cluster upgrade the former primary 



========Notes====

1. config system interface
    edit MGMT
    set type loopback
    set ip x.x.x.x/y
    set vdom root.

    config system ha
     set ha-mgmt-status enable
     config ha-mgmt-interfaces
        edit 1
        set interface MGMT
       

2. In GUI interface configuration, Enable [Dedicate Management Port]
    same IP show up on both units.


=======
FW with dedicated MGMT port

config system dedicated-mgmt
show
get  (by default its disabled, means MP and DP  can access each other)

to enable it:
config system dedicated-mgmt
  set status enable
  set interface mgmt
  end

then MGMT port is then removed from interface list, and moved to a new vdom "dmgmt-vdom"










Comments