Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Understanding the Software as a Service Solution

 

Overview

Web Services providers typically offer large-scale, distributed Web applications, consumer applications, Software as a Service (SaaS), and Infrastructure as a Service (IaaS) to their customers. To offer these services effectively, these providers need to be mindful of ways to keep the network costs under control. Automation, programmability, and zero touch deployments maximize the speed of offering new services, minimize staffing levels, and support employee software expertise with Python, Ruby, and other programming languages in a Linux operational environment.

The solution described in this guide will help you understand the requirements for an SaaS network, the architecture required to build the network, how to configure each layer, and how to verify its operational state. Because of the cutting-edge nature of the SaaS solution, this architecture will also appeal to system integrators, infrastructure vendors, Web hosting companies, and multisystem operators (MSOs). The solution was built using the following key component features:

  • BGP / Layer 3 routing—An SaaS network needs Layer 3 and specifically BGP to provide the proper scaling and performance required by this solution. Networks that rely on Layer 2 switching are restricted based on the number of VLANs that can be supported by the equipment in the network. However, BGP was designed to handle the scale of the global Internet routing table and can be repurposed to support the needs of a top-tier cloud provider. As shown in Figure 1, you assign an autonomous system number to each device in the Clos-based IP fabric network. The devices peer with each other using external BGP and allow you to grow and scale your SaaS network efficiently.

    Figure 1: BGP Autonomous Systems in a Clos-Based IP Fabric
    BGP Autonomous Systems in a Clos-Based
IP Fabric
  • Five-Layer IP fabric—An SaaS network uses a robust set of layers to direct traffic and to offer multiple paths to the same destination. Instead of high availability options such as multichassis link aggregation groups (MC-LAGs) or Virtual Chassis, an SaaS solution uses multiple links at each layer and device. Such an architecture shifts some of the needs for resiliency to distributed applications in the network, and enhances the amount of resiliency and redundancy provided by the physical network infrastructure itself. As shown in Figure 2, the five IP fabric layers include a core routing layer, a fabric layer, a spine layer, a leaf layer, and a compute layer.

    Figure 2: Five-Layer IP Fabric
    Five-Layer IP Fabric

    When deployed, the five layers look like the network shown in Figure 3. The core routing layer connects into a Data Center Interconnect (DCI), a virtual private network (VPN), or both, to carry data center traffic from the fabric layer securely to other Web Services provider locations or data centers. The fabric layer joins the core routing layer to the spine layer, aggregates traffic from the spine layer, and provides connectivity between point of delivery (POD) modules that contain spine and leaf layer devices. The spine layer sits between the fabric and leaf layers to aggregate traffic and provide intra-POD connectivity, and the leaf layer connects servers to the rest of the IP fabric. The multiple paths available from each device at each layer provide robust connectivity and no single point of failure. Additionally, the fabric layer can connect to multiple PODs containing a spine and leaf layer.

    Figure 3: SaaS Solution—Clos-Based IP Fabric Topology
    SaaS Solution—Clos-Based IP Fabric
Topology
  • Multiple server deployment options—As shown in Figure 4, the SaaS solution offers three choices for server connectivity:

    • Unicast—The servers connect to the leaf layer at Layer 2 through a server VLAN, and the server VLAN is routed at Layer 3 through an integrated routing and bridging (IRB) interface on the leaf layer devices. The spine layer receives the aggregated prefixes from the leaf layer IRB interfaces.

    • Anycast—The servers run BGP natively and connect to the leaf layer devices at Layer 3. The servers share their address prefixes directly with the leaf layer devices, and the spine layer devices receive routes from the leaf layer devices. By configuring a single anycast IP address on each server, the Clos-based network offers multiple ways for the traffic to flow, multiple servers that can process the requests when they arrive, and multiple services that can be handled simultaneously.

    • Hybrid—This option offers a blend of the unicast and anycast choices by including an IRB interface at the leaf layer. BGP runs natively on the servers, and the leaf layer devices share both the IP fabric routes and the IRB prefixes with the spine layer devices.

    Figure 4: Server Models for the SaaS Solution
    Server Models for the SaaS Solution
  • Automation—The SaaS network configuration shown in this solution guide was partially created using OpenClos Python scripts. OpenClos is an open-source method to dynamically generate device configurations and is embedded within Network Director. With the SaaS solution contained in this guide, you can use either Network Director or native Python scripts to build an initial configuration. The solution also offers Junos® OS support for other programming languages, zero touch provisioning (ZTP), and REST APIs. Such tools allow the SaaS network to be built and supported effectively and efficiently by a lean operational staff.

Requirements

Web Services companies interested in deploying an SaaS solution have certain requirements that must be met in order to make the solution viable for their needs. The main categories into which these requirements fall include device capabilities, network services, automation, monitoring, class of service, security, and performance and scaling. Table 1 explores these services and requirements.

Table 1: SaaS Solution Descriptions and Requirements

Type

Components

Requirements

Network devices

The physical hardware required for networking infrastructure in the SaaS solution includes MX Series routers and QFX Series switches as follows:

  • Core routing layer—You can select any of the MX Series routers that meet your needs (MX80, MX104, MX240, MX480, MX960, MX2010, or MX2020). The MX480 router was used in the testing of this solution.

  • Fabric layer—You can select any of the currently supported QFX Series switches based on your budget and scaling needs (QFX5100 or QFX10002). The QFX5100-24Q switch was used in the testing of this solution.

  • Spine layer—Both the QFX10002-72Q and QFX5100-24Q switches were used in the testing of this solution, but we recommend the QFX10002-72Q switches at this layer for their expanded capabilities, port density, and processing power.

  • Leaf layer—The QFX5100-48S, QFX5100-48T, and OCX1100 switches were used in the testing of this solution as leaf devices.

  • Switches deployed at the fabric, spine, and leaf layers must be able to boot in 10 seconds and reboot in 30 seconds.

  • Software can be upgraded in 90 seconds without in-service software upgrade (ISSU).

  • Switches must support 10-Gigabit Ethernet and 40-Gigabit Ethernet third-party optical transceivers.

Network services

  • IP fabric—The SaaS solution must be able to support up to a five-stage IP fabric. Such fabrics take advantage of Layer 3 and BGP to support native cloud applications and services.

  • Multiple traffic models—The network must support unicast, anycast, and hybrid (both unicast and anycast) traffic. This allows you to choose the traffic model that works best for your network.

  • IPv4 and IPv6 addressing—To allow for expansion and growth, the solution requires support for both IPv4 and IPv6.

  • Multispeed interfaces—As your network evolves, you need to migrate servers and access interfaces to higher speeds. To do this, the leaf layer must support Fast Ethernet (100 Mbps), Gigabit Ethernet (1 Gbps), and 10-Gigabit Ethernet (10 Gbps) using either copper or fiber cabling to connect to servers, and 40-Gigabit Ethernet (40 Gbps) uplink interfaces to connect to the rest of the IP fabric.

  • Traffic management—To provide maximum resiliency and redundancy, the solution provides variable oversubscription, 64-way equal-cost multipath (ECMP) routing, and user-defined ECMP, resilient hashing, and traffic profiles.

BGP

  • All switches in the IP fabric must use external BGP with 32-bit autonomous system (AS) numbers.

  • BGP must be the only routing protocol used in the IP fabric.

  • Servers must run BGP, and up to 48 servers can have the same anycast address.

  • Access switches must automatically accept trusted BGP connections from new servers by default and without additional configuration.

  • BGP anycast must be supported across points of delivery (PODs).

Five-stage IP fabric

  • The solution must use a five-stage IP fabric topology.

  • Leaf switches in a five-stage topology must aggregate all access switch IRB interfaces to the spine layer and support a minimum of 48-way ECMP routing.

  • The IP fabric must be configured to support a dual stack with IPv4 and IPv6 addresses.

Advertisements, connectivity, and filtering

  • All switches must advertise the lo0 loopback address into BGP.

  • All leaf layer access switches must advertise an IRB aggregate into BGP.

  • All spine and leaf devices must be able to reach every other spine and leaf device in the network using only loopback addressing.

  • Resilient hashing must be enabled to support stateful services.

Other protocols

  • Bidirectional Forwarding Detection (BFD) must be configured on all PTP interfaces with 250 ms intervals and a multiplier of 3.

  • All switches must support LLDP to identify remote devices and ports.

  • The network must support jumbo frames up to 9000 bytes end-to-end in the IP fabric.

  • The network must support unicast, anycast, and hybrid traffic flows.

Automation

  • Zero touch provisioning (ZTP)—When you physically connect a switch to the network and boot it with a default configuration, ZTP attempts to upgrade the Junos OS software automatically and autoinstall a configuration file from the network. This allows your network to be provisioned quickly over Layer 3 using revenue ports.

  • Indefinite retries—This enables the network to keep trying to establish connectivity until all required components are added to the network.

  • Programming language support—Junos OS software offers native programming languages, such as Python, to enable you to automate common operational and configuration tasks in your network.

  • OpenClos—A set of Python scripts, created by Juniper Networks staff and the open-source scripting community, designed to autoprovision an IP fabric. Many of the OpenClos scripts are also integrated with Network Director.

    Note: The SaaS solution can implement OpenClos either through the capabilities built into Network Director or through the native OpenClos Python scripts. For more information on OpenClos, see Configuring an IP Fabric using Junos Space Network Director or OpenClos.

  • The IP fabric must be generated using OpenClos and ZTP.

  • Each switch must support the ability to execute Python scripts.

  • Trusted servers must use BGP to peer automatically with the leaf switches, but without requiring manual configuration changes on the leaf switches when new servers are connected.

  • The solution must be able to use the REST API to make Packet Forwarding Engine-level and configuration-level queries and changes.

Monitoring and management

  • Port mirroring—Allows a switch to send copies of packets to either a local interface for local monitoring or to a remote server for remote monitoring.

  • Monitoring for all BGP paths—This includes BGP Monitoring Protocol version 3 (BMPv3), timestamping, and peer down notifications.

  • SNMPv3—Provides support for SNMP, which is used widely in the networking industry.

  • Remote logging—Offers system log message support, firewall filters, and control plane/data plane thresholds.

  • Digital optical monitoring (DOM)—Provides support for temperature and power thresholds and reporting.

  • The solution must have the ability to mirror traffic locally on a switch, or use GRE encapsulation to send the mirrored traffic to a remote Linux server.

  • Each switch must be configured with BMPv3 and send information to a centralized server for BGP reporting. (We recommend using the BMP process available here: https://github.com/garberg/bmpd.)

  • Each switch must be configured with SNMPv3 and use the authentication and encryption options.

  • All SNMP data must be exported to a centralized server for reporting purposes.

  • DOM must be part of the information captured by SNMPv3, including detection and reporting of unacceptable power levels and temperatures.

Security

  • Control plane security—Must include the ability to use firewall filters and policers, and discard and log traffic.

  • Storm control—Must be provided for broadcast, unknown unicast, and multicast (BUM) traffic.

  • Firewall filter support—Must be able to filter on IPv4, IPv6, Layer 2 fields, and Layer 3 fields.

  • Each switch must be configured to deny all IP traffic destined to the control plane that is not essential to the operation of the switch.

  • Any denied control plane traffic must be counted and logged to a remote server.

  • SSH traffic sent to the switch must be policed at a rate of 5 Mbps.

  • Storm control must be configured on every leaf layer access switch and triggered at 50 percent of bandwidth capacity.

  • Leaf layer access switches must support the following to allow and block traffic:

    • IPv4 and IPv6 port-based firewall filters

    • IPv4 and IPv6 VLAN-based firewall filters

    • IPv4 and IPv6 IRB-based firewall filters

The SaaS solution also has the following performance and scale requirements:

  • 3:1 oversubscription within a POD (48x10G downstream and 4x40G upstream)

  • 32 BFD sessions with a 250 ms interval and a multiplier of 3 (32x40GE spine layer)

  • 52 BGP sessions on each leaf layer device (48 servers + 4 uplinks)

  • 48-way ECMP in a leaf layer device (48 servers + anycast)

  • 4-way 802.3ad on a leaf layer device

  • Routing table and forwarding table requirements

    • 2048 IPv4 or IPv6 loopback addresses

    • 1024 IPv4 or IPv6 PTP networks

    • 32 IPv4 or IPv6 IRB networks

Design Considerations

There are two primary design concerns when implementing an SaaS network:

  • IBGP or EBGP Clos-based IP fabric—The first decision to make in an SaaS environment is whether to use internal BGP (IBGP) or external BGP (EBGP). The very nature of an IP fabric requires having multiple, equal-cost paths. The design must consider how IBGP and EBGP handle the equal-cost multipath (ECMP) feature. By default, EBGP supports ECMP without enabling additional features. Conversely, IBGP requires the use of a BGP route reflector and the AddPath feature to fully support ECMP.

    EBGP offers a simpler and more elegant way to design an IP fabric. EBGP also facilitates traffic engineering by using local preference and autonomous system padding techniques. As shown in Figure 5, each device in the IP fabric uses a different autonomous system (AS) number, and each leaf device must peer with every spine device in the IP fabric.

    Figure 5: AS Number Assignment in an IP Fabric
    AS Number Assignment in an IP Fabric

    Designing IBGP in an IP fabric is a bit more complicated because IBGP requires that all devices must peer with every other device. To make the peering requirements easier, you can use inline BGP route reflectors in the spine layer of the network. However, standard BGP route reflection only reflects the best prefix and does not work well with ECMP. In order to enable full ECMP, you need to configure the BGP AddPath feature, which provides additional ECMP paths into the BGP advertisements between the route reflector and the clients.

    Note

    For more information about the BGP AddPath feature, see Understanding the Advertisement of Multiple Paths to a Single Destination in BGP.

    Because EBGP supports ECMP in a more straightforward fashion and IBGP is more complicated, this solution guide focuses on the configuration and validation of an EBGP-based IP fabric.

  • Server reachability options—In an EBGP-based IP fabric, there are three ways to connect from the leaf layer to the servers:

    • Anycast—The server-to-leaf connection is pure Layer 3, where you configure BGP on the physical interface of the leaf layer device.

    • Hybrid—The server-to-leaf connection happens at Layer 2 through use of VLANs. The BGP session is established between the integrated routing and bridging (IRB) interface and the Layer 3 connection at the server.

    • Unicast—The server-to-leaf connection happens at Layer 2 through the use of VLANs with the leaf layer device IRB interface used as a default gateway. You do not need to configure BGP between the servers and leaf layer devices.

    Because the requirements for most Web Services providers focus on BGP and Layer 3 running at the server layer, this solution guide focuses on the configuration and validation of anycast in the IP fabric. However, for completeness sake, we also show the unicast and hybrid options as well.

Implementation

The following hardware equipment and software features were used to create the SaaS solution described in the example:

Core Routing

  • The two core routers are MX480 routers.

  • Configure both Router R1 and Router R2 as MPLS provider edge (PE) routers to connect upstream to the provider core.

  • Configure all fabric layer-facing interfaces as customer edge (CE) links inside a VRF routing instance.

  • Configure EBGP with 2-byte autonomous system (AS) numbers as the PE-to-CE routing protocol to advertise the anycast routes to the core.

Fabric

  • There are four QFX5100-24Q switches in the fabric layer.

  • Use EBGP peering with 4-byte AS numbers to reach the upstream routers through a routing instance.

  • Use EBGP peering with 4-byte AS numbers to connect with the downstream spine devices.

  • Anycast routes received from the spine devices are advertised to the routers by way of EBGP.

  • Configure EBGP multipath and per-packet load balancing on all devices.

  • Enable resilient hashing.

  • Configure Bidirectional Forwarding Detection (BFD) for all BGP sessions.

Spine

  • There are four QFX10002-72Q switches in the spine layer.

  • Use EBGP peering with 4-byte AS numbers to connect with both the upstream fabric devices and the downstream leaf devices.

  • Anycast routes received from the leaf devices are advertised to the fabric devices by way of EBGP.

  • Configure EBGP multipath and per-packet load balancing on all devices.

  • Enable resilient hashing.

  • Configure BFD for all BGP sessions.

Leaf

  • There are two QFX5100-48S, two QFX5100-48T, and two OCX1100 switches in the leaf layer to provide flexibility for fiber and copper cabled networks.

  • Use EBGP peering with 4-byte AS numbers to connect with both the upstream spine devices and the downstream servers.

  • Anycast routes received from the servers are advertised to the spine devices by way of EBGP.

  • Configure EBGP multipath and per-packet load balancing on all devices.

  • Enable resilient hashing.

  • Configure BFD for all BGP sessions.

Compute

  • Configure IBM Flex blade servers with the VMware ESXi operating system.

  • A traffic generator was used to simulate the BGP sessions to the leaf devices, as well as server-based traffic.

Now that we have completed our overview of the SaaS solution, it is time to view the configuration and verification sections of the solution.