Midsize Enterprise Campus Architecture Design Considerations
With the proliferation of mobiles devices and ubiquitous Internet availability, employees, partners, contractors, and guests all want to connect to the campus network not only while they are on campus but also when they are outside the traditional campus boundary. They also want to connect with corporate devices as well as their own devices (this is known as bring your own device, or BYOD). Offering the flexibility of connecting anytime, anywhere, with any device, increases productivity and user satisfaction but creates huge manageability and security risks for campus resources. Users expect the same level of access experience no matter where, what, and how they are connecting to the campus.
These requirements demand role-based policy orchestration. When a user connects to the network, the policy orchestration engine must be able to:
Identify the user and the role of the user
Authenticate and authorize the user
Identify whether or not the client device of the user is company owned or BYOD
Identify the type of OS running on the client devices (MAC OSX, PC WIndows, or other)
Quarantine the device if necessary
Detect the location of the entry point
Detect traffic encryption requirements
Provide accountability of user access (report number of attempts and success rate)
Policy orchestration and access control are two of the more critical elements in delivering a secure infrastructure for the midsize enterprise campus solution. These functions allow for a comprehensive suite of features for device connectivity and security.
When a user connects to a network, access control must provide:
Guest access control
Layer 2 access control (802.1X, MAC authentication)
MAC authorization and device profiling
Protection against MAC spoofing
Monitoring and containment of unauthorized connections
Role-based access control
Identity-aware networking (Network Access Control (NAC) and Identity and Access Management (IAM))
The Midsize Enterprise Campus solution supports access control methods based on user parameters as well as device MAC. Scale considerations should be given to the mix of user-based and MAC-based authentication methods. Scale can be increased through the use of a dedicated LDAP back-end server as well.
Guest access allows the detection of corporate users and guest users. Guest users describe users that enter the network without a standard issued corporate device or supplicant, for example vendors or contractors. The number of devices (with MAC association) and number of users including guests, are how the DUA (Device/User/Application) Profiles are determined. For this reference architecture, it is assumed that as many as 10 percent of total users could be guest access users.
Together policy orchestration and access control determine the security profiles and policies for users. The DUA profiles and policies will permit what access users will be able to have to corporate resources and the Internet. Figure 1 illustrates how policy orchestration and access control work together to help enforce entry onto the campus network. These functions can be consolidated into a single device, provided that the capabilities can be supported at scale.
The Interface for Metadata Access Point (IF-MAP) protocol should be used for transfers of session information to the secure access server in real time. IF-MAP is an open standard protocol that communicates information about sessions, roles, access zones, and other elements between clients to the server as a federation.
The solution should be highly available by supporting either active/passive or active/active configuration. An active/passive configuration could lead to a performance limitation. However, load balancers can be used to help scale nonclustered nodes for better performance.
Additional services or service modules can also be used in the same chassis to allow for remote access services and network access control (NAC) policy services, but careful consideration must be made to the overall scalability and availability on the campus network. NAC services can be considered as part of an overall security strategy that may be managed by the security management server.
Network management can be broken down into five basic services: fault, configuration, accounting, performance, and security (FCAPS). To support these roles, the Midsize Enterprise Campus solution relies on the capabilities of Network Director for wired services and RingMaster for the wireless infrastructure. This solution was tested with Network Director 1.5, making it necessary to break the management domain into these two distinct platforms (one for wired network devices and one for wireless network devices) at this writing.
Robust security is important to the campus environment. This includes perimeter security, which must provide stateful firewall protection ingress and egress to the campus network as well as protect all traffic within the various silos of the campus network. Part of the security posture for the solution is also to provide role-based access control (RBAC) to the network, including AAA in conjunction with 802.1X, which provides an endpoint access authentication model.
Additional device security should be associated to headless network devices, such as printers and video surveillance cameras, to provide the ability to prevent MAC spoofing attempts with these types of devices which have the inability to provide traditional AAA credentials.
Access security posture for the Campus solution should allow authenticated endpoints to be dynamically allocated to different VLANS automatically. Activation and transmission of firewall filters and VLAN assignments should be supported on access switches with a policy provided by an authentication server. If the authentication server cannot be reached, switches will support an authorization failed policy, in which devices are set to a non-authenticated state. Authenticated ports will remain authenticated for the duration of the connected session until the device is disconnected (either physically or logically) or the policy has timed out. The switch ports will also provide a method to grant trusted access to resources, while denying non-authenticated devices or providing only limited access to a remediation service.
Quality of Service
Quality of service (QoS) is an essential design category for maintaining application and user real-time performance monitoring (RPM) and ensuring consistent performance of the network. Although the Midsize Enterprise Campus solution reference architecture is designed for high bandwidth services with gigabit Ethernet or 10GE links, QoS should be considered mandatory for any campus deployment, regardless of bandwidth, for any interface or access point with the potential for congestion or contention for resources.
QoS policies are implemented for Per-Hop-Behavior (PHB), meaning that each device should be configured to ensure consistent end-to-end policy enforcement. Although QoS policies are implemented as PHB, QoS should be considered end-to-end and flow through the entire campus in order to correctly adhere to the RPMs of the specific applications and campus policies.
QoS policies are first established by setting the trust boundaries and the relationships of marking the traffic in the campus network. For this reference architecture, trusted relationships (trusted inter-switch policies) are established at the aggregation and core layers of the network. In a trusted relationship, the classifications and markings of the traffic do not require a rewrite or an inspection. However, queuing and policing policies should be considered at the ingress and egress of all inter-switch links. WAN policies (both to the corporate WAN and to the Internet) can be constrained by lower bandwidth access links (less than 100 MB) and thus require a QoS policy with queuing and policing applied to maintain RPMs . Depending on the inbound and outbound QoS policy for the campus, WAN links can have different levels of trust associated with the interface. The access layer should be considered untrusted. At the access layer, QoS access policies include queuing, policing, classification, marking, and rewriting for ingress traffic. Based on the campus QoS policy, some devices may be considered trusted, such as an IP phone, which would receive its marking policy from the corporate IP PBX. WLAN QoS policies are configured at the WLAN controller, which provides the campus administrator the ability to trust the client DSCP through the wireless connection.
Figure 2 illustrates the QoS classification used in the validated reference architecture.
QoS policies are typically based on application. Each application will have specific RPM attributes that must be considered when determining the QoS policies for the campus.
Voice traffic profile RPM attributes:
Low latency < 150 ms.
Low jitter < 20 ms.
Low loss < 1%.
Bandwidth for VoIP applications varies based on the codec used. Traffic is typically considered smooth and predictable without variability in the size of the packets.
Video traffic profile RPM attributes:
Low latency < 150 ms.
Low jitter < 20 ms.
Low loss < 1%.
Bandwidth for video applications varies based on the codec used and resolution. A single video stream at 1080p can reach as high as 5 MB per screen. Traffic is typically considered highly variable and bursty in nature.
Data traffic profile RPM attributes:
Data traffic will vary greatly based on the specific application. Data traffic is classified based on the importance of the application to the enterprise.
Mission-critical application data traffic will be given higher priority in the queuing and policing policies over traffic such as Internet or Web traffic.
It should be noted that the above profile attributes are general guidelines. Each application should be carefully evaluated, prior to setting QoS policies. The RPMs above should be used primarily as a standard baseline with variations per actual application introduced.
For the solution reference architecture, it is assumed that DSCP will be utilized for the classification and marking policy. Although some campus environments might put voice and video into the same queue, the Midsize Enterprise Campus solution allocates these into different policies because of the bursty nature of video. Voice should be allocated to an Expedited Forwarding (EF) queue, and video traffic should be marked and handled specific to a pre-established RPM. Data traffic is classified into different queues (mission-critical and a best-effort data queue), which should be marked and handled according to the RPM determined for the application profile.
This solution reference architecture follows a five-class model:
Expedited Forwarding (EF)—for voice applications
Assured Forwarding (AF)—for video applications
Some campus networks can choose to deploy a more granular approach to the classifying traffic; however, the number of classes here should be considered sufficient based on the application levels tested as part of the Midsize Enterprise Campus solution.
High availability (HA) and resiliency is essential for maintaining connectivity and avoiding service disruption. The expectation in this Midsize Enterprise Campus Solution Reference Architecture is to ensure uninterrupted (sub-second recovery) access, including during voice and video sessions, in the event of hardware or software failure in the network. Additionally it assumed, there is the ability to minimize downtime during planned outages through the use of features such as nonstop routing (NSR), nonstop bridging (NSB), and nonstop software upgrades (NSSU).
High Availability at Layer 2
In campus architecture, each access switch is connected to two aggregation switches for redundancy to provide reliability and HA. Aggregation switches are interconnected at Layer 2. As shown in Figure 3, this can create a Layer 2 loop. Ethernet does not have any inherent way to track frames that are looping.
The following techniques are detailed to address Layer 2 looping.
Spanning Tree Protocol (STP)
Spanning Tree Protocol (STP) is the industry standard for preventing Layer 2 loops. STP calculates the best path through a switched network that contains redundant paths. STP uses bridge protocol data unit (BPDU) data frames to exchange information with other switches. STP uses the information provided by the BPDUs to elect a root bridge, identify root ports for each switch, identify designated ports for each physical LAN segment, and prune specific redundant links to create a loop-free tree topology. The resulting tree topology provides a single active Layer 2 data path between any two endpoints.
Figure 4 shows how STP creates a single path to any endpoint.
Although STP can prevent Layer 2 loops, it is not an efficient protocol. At any given time, a Layer 2 switch port can be listening, learning, forwarding, or blocking in order to determine the loop-free data path. Many improvements have been made to STP to improve its convergence time from 50 seconds to much less. However, STP does not perform sub-second convergence like most Layer 3 protocols can. Another disadvantage of STP is that it blocks all but one of the redundant paths that are connected. As a result, network resources can be under-utilized as there are occupied switchports not actively in use (because they are in a blocking state). Real-time applications suffer most when STP is used alone in a campus environment. To avoid the performance limitations of STP, this reference architecture used virtual chassis and MC-LAG.
Virtual chassis is an intelligent technique for avoiding Layer 2 looping altogether. As shown in Figure 5, multiple switches are brought under a single management and control plane, creating a virtual device that consists of two or more physical devices. A client access device (depicted as an EX4300 in Figure 5) can use link aggregation technology when it detects that it has two or more links connected to the virtual chassis.
In a virtual chassis, each of the member devices are stacked together to act as single logical device. In a virtual chassis connection, there is a client device, such as a server or switch that has more than one physical link into the virtual chassis. This client device does not need to have any virtualization configured. On the other side of the connection is the virtual chassis. Each of the virtual chassis stack members has one or more physical links connected to the client device. Each of the connections that go to the client device are placed into a link aggregation group (LAG). All of the link members in the LAG behave as a single path from the virtual chassis to the client, so that link blocking is avoided.
Multichassis Link Aggregation Group (MC-LAG)
Multichassis link aggregation group (MC-LAG) is another form of virtualization. However, MC-LAG does not require the creation of a single management and control plane. Instead, MC-LAG peers use Interchassis Control Protocol (ICCP) to exchange control information and coordinate between devices to ensure that data traffic is forwarded properly. The connecting device interprets this as being connected to a single device through link aggregation. MC-LAG provides redundancy and load balancing between the two MC-LAG peers, multihoming support, and a loop-free Layer 2 network. Refer to Figure 6 to see how the devices are connected.
On one end of MC-LAG, there is an MC-LAG client device, such as a server or switch, that has more than one physical link in a LAG. (This client device does not need to have an MC-LAG configured.) On the other side of the MC-LAG, there are two MC-LAG peers. Each of the MC-LAG peers has one or more physical links connected to a single client device.
Link Aggregation Control Protocol (LACP) is a subcomponent of the IEEE 802.3ad standard. LACP must be configured on all member links for MC-LAG or LAG to work correctly. LACP is used to discover multiple links from a client device that is connected to an MC-LAG or LAG.
High availability at Layer 3
Every network element in the Campus solution ultimately converges into the core switch, which functions as the core of Layer 3 for the campus network. As such, the core switches of the campus must support high port speed and density, high availability and resiliency, a robust feature set, and scalability.
OSPF is used on the core switch, as the interior gateway protocol in this reference architecture (some campus networks might elect to use IS-IS). In addition, the core switch enables the loop-free alternate (LFA) feature in OSPF to converge faster during link and node failures. To detect forwarding errors and help enable faster end-to-end convergence, the Bidirectional Forwarding Detection (BFD) protocol should be enabled on all point-to-point and broadcast interfaces.
The recommended values for routing convergence for this reference architecture are:
Minimum interval of 50ms
Multiplier of 3
Neighbor authentication should be enabled between each node in the topology using either MD5 or SHA-1 encryption; the intent is to prevent accidental OSPF adjacencies in the future if new equipment is installed. There are various physical, logical, and software components that need to be configured to support redundancy between the core switches.
Redundant power supplies
Redundant routing engines
Redundant switching fabric
Redundant line cards
Routing engine redundancy:
Graceful Routing Engine switchover (GRES)
Nonstop routing (NSR)
Nonstop bridging (NSB)
Non-Stop Service Software Upgrade (NSSU)
BFD enabled on all links
LFA enabled on all links
QoS properly configured to give network control adequate bandwidth during times of congestion
Expanded Core and Aggregation Versus Collapsed Core and Aggregation in the Campus
Separate core and aggregation blocks are suitable for a large campus deployment where multiple aggregation blocks are required to provide the necessary access port density.
Separate core and aggregation can sometimes suffer from the following disadvantages:
Increased number of devices to manage
Added CapEx and OpEx cost
A collapsed core and aggregation architecture can provide the necessary access port density for an enterprise campus environment that services up to 15,000 devices.
As shown in Figure 7, this solution reference architecture depicts a virtual chassis access block and one separated aggregation block consisting of EX4300, EX4200, and EX3300 devices. Each virtual chassis block provides 480 10/100/1000BaseT access ports to end-user devices. Total access ports supported by this design are 15,360.
One separated aggregation block is deployed to support the environment where a separated aggregation block(s) is more logical because of geographical distance or other restriction.
The Midsize Enterprise Campus solution reference architecture provides a solid foundation for baseline campus design for the medium to large campus. The architecture was validated against application RPMs and tested at scale to include Layer 2 and Layer 3 design, security, QoS, HA, and scale-up to 10,000 users and 40,000 devices (both real and simulated devices). In addition, the solution includes wireless access as well as policy and identity capabilities necessary for BYOD strategies for today’s mobile enterprise.