Palo Alto Networks: Active/Active High Availability

Scenario
The pair of PA5050 firewalls are at the edge of the network, the downstream of PA5050 pairs has a pair of Cisco Catalyst 6506 and a pair of Cisco Catalyst 4506 switches. The diagram is illustrated as below.

palo alto and vss

The pair of Cisco Catalyst 6506 is configured as a virtual switching system, which unifies the pair into one logical layer3 switch. Also Cisco non-stop-forwarding is used together with stateful-switch over, the layer3 routing redundancy is taken care by Cisco redundancy technology 😉 Also HSRP version 2 is used so that the downstream trusted host have redundancy….

Layer2 link aggregation (802.3ad) is used for all downstream (towards Cisco Catalyst 4500) and upstream (towards PA5050) links.

The layer 3 is terminated at the pair of Cisco Catalyst 6506, the PA5050 HA pair divides the network into trusted and untrusted zones. The firewall policy will then be implemented on the zones.

Before implementing firewall policy the resiliency of PA5050 active/active redundancy is in question, you are tasked to perform a proof of concept (PoC) to demonstrate the effectiveness of the pair of PAN firewalls.

Cisco Virtual Switching System VSS

The configuration is described in this post. The VSS configuration used for this scenario is identical to the one I use for this post hence there is no need to repeat the steps here.

The concept of the VSS is similar to stackwise technology. However to build VSS you need 10GB ethernet link for the virtual switch link (VSL). The VSS unifies the pair of Cisco Catalyst 6500 switches into one logical switch. One switch is in standby where you cannot apply any configuration or any commands, the other switch is the active switch where you can issue commands to configure and verify.

The unification of the pair of Cisco Catalyst 6500 will look like this:
VSS

To the perception of PA5050 and Cisco Catalyst 4500 there is only one switch. This type of setup is known as Active/Active Layer3 High Availability with Multi-chassis link aggregation topology by Palo Alto Networks Design Guide Revision A.

High Availability links of PAN firewall in general

There are two build-in HA interfaces in PA5050 namely HA1 and HA2. The physical HA interfaces locations are designed such a way that it is easily understood at a glance.

HA1 interface is together with console and management interfaces, this tells you that HA1 is the control link.

HA2 interface is together with the data interfaces, this tells you HA2 is data link.

HA1 and HA2 are sufficed for Active/Passive redundancy. You need HA3 if you want Active/Active redundancy.

Concise notes about Control Link HA1
1. This is the only layer 3 HA link, in other words this is the only HA link that requires IP address.

2. HA1 is for HA agents (PA5050 active/active firewalls) to communicate with each other.

3. HA1 acts as a keepalive between the HA agents, it senses powercycle, reboot and power down of the peer HA agent.

4. TCP28769 is for clear text communication.

5. TCP49969 is for SSH encrypted communication. You need to import the public key manual to make encryption works.

6. Default monitor hold time is 3000ms.

7. HA1 also acts to monitor the HA status such configuration synchronization (for active/passive redundancy, active/active is not necessary) and management plane synchronization.

The conclusion is HA1 monitors the HA state.

The location of HA configuration is at Device tab, then select High Availability from the left side menu.

Control link configuration window.
Control link configuration window.

Concise notes about Data Link HA2

1. HA2 is layer 2 link, in other words no IP address is required although you can specify layer3 information as well in the web gui.
datalink ha2

2. HA2 is used to synchronize HA states, routing information, IPSec security association, ARP table and traffic sessions.

3. Transport protocol for IP is IP number 99, if UDP is used for layer 4 transportation then it is UDP29281. If neither IP nor UDP is chosen then the default is ethernet.

Concise notes about HA3

1. HA3 is a layer 2 link using MAC-in-MAC encapsulation.

2. You have to choose a data interface and set it to HA mode in order to be included in HA3.

3. HA3 is for packet forwarding between session owner and session setup firewall.

4. Link aggregated data interface can be used for HA3 if the mode is configured as HA.

Network tab > Interfaces, choose one interface and set to HA interface type.
Network tab > Interfaces, choose one interface and set to HA interface type.
Device tab > High Availability and choose Active/Active Config tab
Device tab > High Availability and choose Active/Active Config tab

Session setup options

The session setup can be distributed for setup load sharing by using IP modulo or IP Hash only. Primary Device always setup the session, there is no distribution of setup load between HA agents in the HA cluster.

session setup options

IP Modulo – The session setup load sharing is distributed between HA agents in the HA cluster based on the parity of the source IP address.

IP Hash – The session setup load sharing is distributed between HA agents within the HA cluster based on hash of either the source IP address or the combination of source and destination IP address.

Primary Device – HA agent with Active-Primary state will always setup the session.

There are two session owner options.

Primary Device – Active-Primary HA agent will always be the session owner.

First Packet – The first packet that is sent out by the HA agent (either Active-Primary or Active-Secondary) is the session owner.

Monitoring High Availability

The default dashboard does not have the high availability monitoring, you have to add it yourself.

Widgets >  System > High Availability
Widgets > System > High Availability

Running config synchronization is not needed for Active/Active redundancy. Running config sync is only applicable for Active/Passive redundancy.
Running config synchronization is not needed for Active/Active redundancy. Running config sync is only applicable for Active/Passive redundancy.

WARNING: You should not synchronize the running configuration between the HA agents for active/active redundancy, the running configuration synchronization is only applicable to active/passive HA. The active/passive HA agents only one HA agent is active the standby HA agent does not do data forwarding or data routing or zoning, if you have used Cisco supervisory engine stateful switch over before this concept is not hard to grasp. The reason for running config synchronization is because when the active agent is down, the standby agent can resume the role hence the running configuration should be synchronized to prevent a blackhole or outage.

Active/Active HA configuration steps

Configure Active-Primary
This section shows how to configure Active-Primary HA agent in CLI and web GUI.

It is not clear in the web gui, the device-id actually determines which one HA agent is active-primary and active-secondary. If device-id is 0 it is active-primary if 1 it is active-secondary.

The first is to configure the HA setup. The peer-ip address is the peer active-secondary HA agent.

The CLI also includes the use of ethernet as the transport method for HA2. In web ui there is no need to configure the transport method for HA2 if you do not use IP or UDP.

configure
edit deviceconfig high-availability
set group 1 peer-ip 172.16.0.2 mode active-active device-id 0 session-owner-selection first-packet session-setup ip-modulo
set group 1 mode active-active packet-forwarding yes network-configuration sync virtual-router yes qos no
set group 1 state-synchronization enabled yes transport ethernet
up
set high-availability enabled yes
top

From the web interface click on Device tab, select High Availability from the left side menu. Then from the Setup section click the icon button that looks like a gear.

Device tab > High Availability and choose Active/Active Config tab
Device tab > High Availability and choose Active/Active Config tab

The end result of the configuration looks like this
The end result of the configuration looks like this

Configure HA1 control link

configure
edit deviceconfig
set high-availability interface ha1 ip-address 172.16.0.1 netmask 255.255.255.252 link-speed 1000 link-duplex full monitor-hold-time 3000

Go to the Control Link (HA1) section then click on primary.
active primary ha configuration

The end result of the HA1 and HA2 looks like this. The HA2 is default at ethernet, hence there is no configuration needed since we are not using IP or UDP.
The end result of the HA1 and HA2 looks like this. The HA2 is default at ethernet, hence there is no configuration needed since we are not using IP or UDP.

Assign one interface as HA then include this interface as HA3.

configure
set network interface ethernet ethernet1/13 ha
edit deviceconfig high-availability
set interface ha3 port ethernet1/13

From the Web interface click Network tab, click on the interface you want to assign as HA interface type.

Network tab > Interfaces, choose one interface and set to HA interface type.
Network tab > Interfaces, choose one interface and set to HA interface type.

Click on Device tab, select High Availability, select Active/Active Config tab.

Device tab > High Availability and choose Active/Active Config tab
Device tab > High Availability and choose Active/Active Config tab

The end result of the configuration looks like this
The end result of the configuration looks like this

You do not need virtual address actually, for this setup the Cisco Catalyst 6506 actually handles the layer3 redundancy for the downstream trusted host using HSRP. The virtual address concept is very similar to VRRP and HSRP.

In the command line for configuring virtual address

configure
set deviceconfig high-availability group 1 mode active-active virtual-address ae1.100 ip 192.168.50.1 floating device-priority device-0 1 device-1 10 failover-on-link-down yes

0 device id priority is the highest priority and highest device id priority is 255 is lowest priority. Confusing eh? Perhaps the way I articulate has the problem 😉 I chose link aggregated ae1.100 is because I want device-0 to be the primary router for vlan 100, however for this setup I have not connected the link aggregated link.

Click Add then select L3 interface.
Click Add then select L3 interface.
Simply click Add in the Virtual Address section, select the L3 interface then
Choose the priority for device-0 and device-1 and also assign virtual IP address and netmask.

Configure Active-Secondary HA agent
There are repetitions in the configuration steps as Active-Primary HA agent, you need to take note the peer ip address should be the active-primary HA agent.

The peer ip address is the address of active-primary HA agent
The peer ip address is the address of active-primary HA agent

If you have configured virtual address for the HA3 link the address is identical to that of active-primary HA agent’s HA3 configuration.

Verifying active active HA states

The state of active-primary HA agent.
The state of active-primary HA agent.

Active-secondary HA agent state.
Active-secondary HA agent state.

About Election Settings
You DO NOT configure Election Settings for active/active HA, this is only for active/passive HA whereby the election is required to determine which HA agent is active and passive.

I was wrong, the preemptive option actually still influences the Active-Primary role election, when preemptive is enabled the previously downed Active-Primary will resume the role as Active-Primary once it had finished the initialization with the Active/Active peer. Both firewalls have to enable preemptive to make this works. Of course if you do not wish the previously Active-Primary firewall to resume the original role as Active-Primary after the firewall is up then you can ignore this option.

The promotion hold-time will ensure the Active-Secondary remain as Active-Secondary over the period of time specified by this option, however if Active-Primary is downed, Active-Secondary will take over the Active-Primary role immediately irregardless of whichever promotion hold-down time you have configured.

The lowest device priority wins the active role
The lowest device priority wins the active role

The lowest device priority HA agent wins the active HA agent role! The default priority is 100.

How fast is the recovery?

When I break the link aggregated link it takes 1 second to recover.
When I break the link aggregated link it takes 1 second to recover.
Only takes 5 seconds to recover when I rebooted active-primary PA firewall,
Only takes 5 seconds to recover when I rebooted active-primary PA firewall,

The downtime when a link breaks and when active-primary PA firewall is rebooted is very good and acceptable for data network, taking into account that the HA agent pair actually has to synchronize the routing table, ARP table and firewall stateful tables.

This setup uses OSPF as the dynamic routing protocol among Cisco VSS and the PA5050 HA agents. Unfortunately PA5050 does not support IS-IS which is more scalable than OSPF in wider networks, IS-IS actually can support 100 routers within a single area and IS-IS has inherent support for IPv6 there is no need to define extra IS-IS process to do IPv4 to IPv6 migration.

3 thoughts on “Palo Alto Networks: Active/Active High Availability

  1. Heartbeat backup option is so useful to install HA. MGT could be processed as a HA1 backup when enabled heartbeat backup. I’ve always enabled heartbeat backup when both of mgt is in same subnet.

Leave a comment