Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation
Guide That Contains This Content
[+] Expand All
[-] Collapse All

    Configuring High Availability for the Contrail OVSDB TOR Agent

    This topic describes how high availability can be configured for the Contrail TOR agent.

    Overview: High Availability for a TOR Switch

    Starting with Contrail Release 2.20, high availability can be configured for the Contrail TOR agent.

    When a top-of-rack (TOR) switch is managed through the Open vSwitch Database Management (OVSDB) Protocol by using a TOR agent on Contrail, a high availability configuration is necessary to maintain TOR agent redundancy. With TOR agent redundancy, if the TOR agent responsible for a TOR switch is unable to act as the vRouter agent for the TOR switch, due to any failure condition in the network or the node, then another TOR agent takes over and manages the TOR switch.

    TOR agent redundancy (high availability) for Contrail Release 2.20 and greater is achieved using HAProxy. HAProxy is an open source, reliable solution that offers high availability and proxy service for TCP applications. The solution uses HAProxy to initiate an SSL connection from the TOR switch to the TOR agent. This configuration ensures that the TOR switch is connected to exactly one active TOR agent at any given point in time.

    High Availability Solution for Contrail TOR Agent

    The following figure illustrates the method for achieving high availability for the TOR agent in Contrail.

    Figure 1: High Availability Solution for Contrail TOR Agent

    High Availability Solution for Contrail TOR Agent

    The following describes the events shown in the figure:

    • TOR agent redundancy is achieved using HAProxy.
    • Two TOR agents are provisioned on different TSN nodes, to manage the same TOR switch.
    • Both TOR agents created in the cluster are active and get the same information from the control node.
    • HAProxy monitors these TOR agents.
    • An SSL connection is established from the TOR switch to the TOR agent, via HAProxy.
    • HAProxy selects one TOR agent to establish the SSL connection (e.g., TOR Agent 1 running on TSN A).
    • Upon connection establishment, this TOR Agent adds the ff:ff:ff:ff:ff:ff broadcast MAC in the OVSDB with its own TSN IP address.
    • The TOR Agent sends the MAC addresses of the bare metal servers learned by the TOR switch to the control node using XMPP.
    • The control node reflects the addresses to other TOR agents and vRouter agents.

    Failover Methodology Description

    The TOR switch connects to the HAProxy that is configured to use one of the TOR agents on the two TOR services nodes (TSNs). An SSL connection is established from the TOR switch to the TOR agent, making that agent the active TOR agent. The active TOR agent is responsible for managing the OVSDB on the TOR switch. It configures the OVSDB tables based on the configuration. It advertises the MAC routes learnt on the TOR switch as Ethernet VPN (EVPN) routes to the Contrail controller. It also programs any routes learned by means of EVPN over XMPP, southbound into OVSDB on the TOR switch.

    The active TOR agent also advertises the multicast route (ff:ff:ff:ff:ff:ff) to the TOR switch, ensuring that there is only one multicast route in OVSDB pointing to the active TSN.

    Both the TOR agents, active and standby, receive the same configuration from the control node, and all routes are synchronized by means of BGP.

    After the SSL connection is established, keepalive messages are exchanged between the TOR switch and the TOR agent. The messages can be sent from either end and are responded to from the other end. When any message exchange is seen on the connection, the keepalive message is skipped for that interval. When the TOR switch sees that keepalive has failed, it closes the current SSL session and attempts to reconnect. When the TOR agent side sees that keepalive has failed, it closes the SSL session and retracts the routes it exported to the control node.

    Failure Scenarios

    Whenever the HAProxy cannot communicate with the TOR agent, a new SSL connection from the TOR switch is established to the other TOR agent.

    HAProxy communication failures can occur under several scenarios, including:

    • The node on which the TOR agent is running goes down or fails.
    • The TOR agent crashes.
    • A network or other issue prevents or interrupts HAProxy communication with the TOR agent.

    Figure 2: Failure Scenarios

    Failure Scenarios

    When a connection is established to the other TOR agent, the new TOR agent does the following:

    • Updates the multicast route in OVSDB to point to the new TSN.
    • Gets all of the OVSDB entries.
    • Audits the data with the configurations available.
    • Updates the database.
    • Exports entries from the OVSDB local table to the control node.

    Because the configuration and routes from control node are already synchronized to the new TOR Services Node (TSN), the new TSN can immediately act on the broadcast traffic from the TOR switch. Any impact to the service is only for the time needed for the SSL connection to be set up and for programming the multicast and unicast routes in OVSDB.

    When SSL connection goes down, the TOR agent retracts the routes exported. Also, if the Extensible Messaging and Presence Protocol (XMPP) connection between the TOR agent and the control node goes down, control node removes the routes exported by the TOR agent. In these scenarios, the entries from OVSDB local table are retracted and then added back from the new TOR agent.

    Redundancy for HAProxy

    In a high availability configuration, multiple HAProxy nodes are configured, with Virtual Router Redundancy Protocol (VRRP) running between them. The TOR agents are configured to use the virtual IP address of the HAProxy nodes to make the SSL connection to the controller. The active TCP connections go to the virtual IP master node, which proxies them to the chosen TOR agent. A TOR agent is chosen based on the number of connections from the HA Proxy to that node (the node with lower number of connections gets the new connection) and can be controlled through configuration of the HAProxy.

    Figure 3: Redundancy for HAProxy

    Redundancy for HAProxy

    If the HAProxy node fails, a standby node becomes the virtual IP master and sets up the connections to the TOR agents. The SSL connections are reestablished following the same methods discussed earlier.

    Configuration for TOR Agent High Availability

    To get the required configuration downloaded from the control node to the TSN agent and to the TOR agent, the physical router node must be linked to the virtual router nodes that represent the two TOR agents and the two TSNs.

    The Contrail Web user interface can be used to configure this. Go to Configure > Physical Devices > Physical Routers and create an entry for the TOR switch, providing the TOR switch IP address and the virtual tunnel endpoint (VTEP) address. The router name should match the hostname of the TOR switch. Both TOR agents and their respective TSN nodes can be configured here.

    Testbed.py and Provisioning for High Availability

    The same testbed configuration used for provisioning the TSN and TOR agents is used to provision high availability. The redundant TOR agents should have the same tor_name and tor_ovs_port in their respective stanzas, for them to be considered as a pair.

    Modified: 2015-09-02