Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Solution Architecture

Before mentioning the suggested production-grade architecture, for completion’s sake, we will share the approach to use if security is not a concern and faster results are preferred.

Proof of Concept Non-Production-Grade DHCP Server Integration Approach

In this approach, the DHCP server is attached locally to the fabric’s service block function. For redundancy, an ESI-LAG is configured on the fabric side and a normal LAG on the server side. All possible 4,000+ VLANs are configured and exposed on this link towards the DHCP server. The fabric itself then needs no extra configuration of DHCP relay as the Layer 2 broadcast domains of all access VLANs are stretched to the DHCP server. By configuring matching sub-interfaces to listen on, the DHCP server can then assign leases by listening to the MAC broadcast packets the clients send. This works the old-fashioned way, and many DHCP servers (including Junos OS) can respond to this traffic.

Note:

This is clearly not recommended in production-grade designs and rollouts. It may help getting the network up faster, but there are severe security risks that come with such a design!

The reason for not recommending this in production-grade designs is that you bypass a basic security function of the fabric design. Normally, all fabric VRFs are isolated from each other. This forces all traffic between VLANs in different VRFs to transit the WAN router where security functions such as firewalls can be implemented. By placing all VLANs on the link to the DHCP server in the service block, we bypass the WAN router and thus the security functionality. If an attacker finds a security hole such as an open administrator login, or something is misconfigured, the attacker can jump between all VLANs and maybe also bypass any WAN router-based screening between VRFs. Please be aware of this trade-off.

Figure 1: EVPN Multihoming with 2 Collapsed Cores EVPN Multihoming with 2 Collapsed Cores

Virtual Gateway Fabric Versus Anycast Fabric

Depending on the fabric type, the overlay VLANs (where the client traffic is located), may need additional IP addresses for internal purposes, which is the case for virtual gateway fabrics. The Juniper Mist campus fabric configures the following fabric types:

Fabric Type Virtual Gateway Fabric Anycast Fabric
EVPN Multihoming Yes ---
CRB Yes ---
ERB --- Yes
IP Clos Fabric --- Yes

In a virtual gateway fabric, you typically have a very limited amount of VRFs. Those are located on the core or collapsed core switches. The maximum amount of core or collapsed core switches supported in a Juniper Mist campus fabric is four. This means a certain VRF can be duplicated to each redundant core or collapsed core switch a maximum of four times in the fabric. Anycast fabrics, as opposed to virtual gateway fabrics, are appropriate for more scaled designs. Hence, the location of the VRFs is either on the distribution switches (ERB) or at the access switch (IP Clos fabric). The nature of virtual gateway fabrics is that the system requires an additional static IP address that is unique per VRF for every VLAN located in the fabric. Hence, in addition to the shared gateway IP address for each VLAN, up to four additional unique IP addresses on that subnet are required.

Why such a design? There are benefits for certain traffic on the fabric such as DHCP relay. For DHCP relay, the system uses the static IP address instead of the gateway IP address when forwarding the DHCP client requests. This behavior ensures that the DHCP response packet will be sent back to the correct VRF since the static IP address is unique to the VLAN/core switch.

Another way to think about a virtual gateway fabric is if you compare it with traditional Layer 2 gateway failover designs such as VRRP. There you always have a VIP which floats between the gateways (that are our VRFs) and each gateway needs an additional unique static IP for each VLAN. In a Juniper Mist campus fabric, the VRRP protocol is not needed as the EVPN control plane takes over for it.

The small sacrifice needed to carve out those additional static IP addresses in each subnet is eliminated in anycast fabrics. This is because of the more scaled distribution/access switches where VRFs are installed you would have to plan your future growth well when creating VLANs. System services such as DHCP relay work in anycast fabrics a bit differently and are internally more complex.

Figure 2: Virtual Gateway Versus Anycast Fabric Types Virtual Gateway Versus Anycast Fabric Types

Which IP Address to Choose as Reported Gateway IP Address?

You must typically choose between one of the following approaches to determine which gateway IP address is embedded in the packets forwarded by the DHCP relay function in the fabric:

  • For EVPN multihoming fabrics, the UI does not provide any choice so you will always be using virtual gateway IP addresses for the gateway IP. This enables the DHCP server to identify the VLAN where the request originates by only analyzing the gateway IP address embedded in the forwarded packets.
  • For CRB fabrics, you can select between virtual gateway static IP address design by leaving the field “Loopback per-VRF subnet” empty or by using a design with overlay loopback IP addresses assigned to each VRF in the fabric when populating an IP prefix in the “Loopback per-VRF subnet” of the campus fabric dialogue.
  • For larger fabrics such as ERB and IP Clos, we recommend entering an IP prefix in the “Loopback per-VRF subnet” as part of the campus fabric configuration. By doing so, the fabric will automatically assign unique overlay loopback IPs out of this pool range to each VRF in the fabric. In this case, we also highly recommend leveraging a routing protocol such as OSPF or BGP for WAN router integration towards the fabric as the usage of these overlay loopback IPs can make it difficult to predict how each gateway IP can be reached from the WAN router.

While it is technically possible to leave the "Loopback per-VRF subnet" field empty in these fabrics, it is not recommended. If left blank, the anycast gateway IP will be the reported gateway IP embedded in the forwarded packets. This doesn’t cause issues on the way to the DHCP server. However, when the DHCP server's response returns, because the anycast IP is shared by multiple switches within the fabric, the response packet might be routed to a switch that did not originate the request and could be in a different PoD or building. If this happens, when the packet arrives at a switch that didn’t initiate the request, the DHCP relay function will decapsulate the response and determine, based on the client's MAC address, that the packet must be forwarded to a remote switch. To do so, it will use a VXLAN tunnel to resend the packet east-west to the switch where the client is actually connected. This creates an inefficient design.

Where to Locate the DHCP Server

The approach we recommend is local integration as part of the fabric itself.

Diagram Description automatically generated

In the case of a campus fabric, the service leaf is called a service block function and, in the following example, a pair of physical switches north of where the DHCP server is attached to be an integral piece of the fabric.

The DHCP server can also manage leases for clients not in the same VRF as itself. However, as there is always VRF to VRF isolation inside the fabric, the packets will have to be exchanged through a WAN router.

When such a local integration is not possible, one can also try to integrate the DHCP server as an external element towards the fabric. This is what you see in the upper-left corner in the following figure:

What Needs to be Considered When Using External DHCP Servers?

When integrating external DHCP servers into the fabric, there are two critical points to keep in mind for this approach to succeed:

  • Keep the latency between the fabric and DHCP servers as low as possible. Do not consider operating the DHCP server in a public cloud environment if you do not know the latency impact. Some DHCP clients can be very aggressive when requesting a lease, which can cause unanswered DHCP lease requests to stack up in a high latency environment, overloading the DHCP server. Since the DHCP client behavior is hardly influenceable at the fabric level, we recommend testing the design with a focus on the entire round-trip latency times before putting the fabric into production.
  • No form of network address translation (NAT) is supported between the fabric and the DHCP server. The reason for this is explained in the following excerpt from IETF RFC 2131 describing how a DHCP server must respond to client requests. Remember, “giaddr” is the embedded gateway IP address:

You may not be aware what this RFC description means, so we’ve provided an example of traffic using a Microsoft DHCP server that does not respond to SNATed DHCP requests in accordance with the RFC.

Here is the original source IP and port from the access switch created for the relay packet:

When the WAN router applies the SNAT to the forwarded discover message, it changes the source IP and port as follows:

The DHCP server’s response, however, answers back to the original embedded gateway IP on port 67, which was the original source IP inside the fabric before the SNAT was applied:

This packet will never arrive back at the SNAT firewall.

Optimizations in Junos OS to Help Your DHCP Relay Design

Through the Junos OS configuration, a couple of statements help to optimize your design. We present the critical ones here in case you want to know why the fabric is automatically configuring those:

In the Junos OS configuration, a “forward-only” statement is needed to prevent the device from monitoring uncontrolled DHCP traffic.

In the Junos OS configuration, the “relay-option-82 circuit-id vlan-id-only” statement synchronizes the behavior of QFX and EX switches when forwarding option 82 DHCP traffic (by default, they use different attributes). Also, this attribute then no longer adds unwanted interface information such as "IRB-irb.1099:ae10.0" or "IRB-irb.1099:vtep.32769". With this configuration added, only the VLAN ID is reported, easing the string parsing in this field, which we leverage for the Linux KEA DHCP server to assign the right lease.

The Junos OS configuration statement “relay-option-82 server-id override” is described here and is needed in our environment. It helps the Microsoft DHCP server using sub-option 5 to determine the source of the packet and choose the right pool to assign.

The Junos OS configuration statement “route-suppression destination” is needed when using loopback IPs as gateway IPs. It is not used for virtual gateway address static IPs as gateway IPs.

In the Junos OS configuration, the fabric configures all IRB interfaces now with the “no-dhcp-flood” option. This helps to limit the MAC broadcast of the client so that not all DHCP relay devices receive the same request from the client duplicated. Without this option in place, as all client requests are broadcast packets in a VLAN, the original request gets duplicated through VXLAN and sent to all fabric nodes which have that VRF configured, which then perform DHCP relay towards the WAN router.

Note:

Should you use the vJunos-switch as a virtual switch instance in a lab, it is known that while the “no-dhcp-flood” option can be configured on a virtual switch, it is not currently implemented. Hence, you may observe flooding and the wrong gateway IP address used. However, that should not affect your ability to test this. It’s just a known limitation of the vJunos-switch.