Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Understanding How to Mitigate Fate Sharing on a QFabric System Interconnect Device by Remapping Traffic Flows (Forwarding Classes)

On a QFabric system, traffic either is switched locally on a Node device (traffic enters and exits the same Node device without crossing the Interconnect device), or is switched remotely, across the Interconnect device. Traffic flows that belong to the same forwarding class are mapped to the same output queue and share the output queue resources. If congestion occurs on one of these flows, the congestion can affect the uncongested flows in the forwarding class when the flows use the same ingress interface.

For example, if a congested flow is paused to prevent packet loss, uncongested flows that use the same ingress interface are also paused because they share the same forwarding class and output queue. When a congested flow affects an uncongested flow, the flows share the same fate—this is known as fate sharing.

Fate sharing happens because pausing traffic is based on forwarding class. When a flow experiences congestion, the output queue sends a pause message to the input queue on which the flow arrived. On that input queue, the pause message affects all traffic in the forwarding class that is mapped to the congested output queue. So all traffic in that forwarding class is paused on the input queue, not just the flow that is experiencing the congestion. This is how uncongested flows can share fate with a congested flow.

Traffic from many QFabric system Node devices crosses the Interconnect device, so flows within a given forwarding class are aggregated on the Interconnect device. The aggregated flows use the same output queue on the Interconnect device and are subject to fate sharing if the flows also use the same ingress interface.

In addition to the external physical interfaces that connect the Interconnect device to Node devices, the Interconnect device has internal Clos interfaces. The Interconnect device automatically selects the best path through its internal Clos interfaces. Path selection through the internal Clos interfaces is not configurable, so you cannot control the traffic that enters any particular ingress Clos interface, and so fate sharing can occur on the Interconnect device. (On Node devices, you control the traffic connected to an ingress interface, but on the Interconnect device, you cannot control which flows use a particular internal ingress Clos interface.)

However, to mitigate fate sharing on the Interconnect device, you can use firewall filters to separate the traffic in one forwarding class and split it into different forwarding classes. Remapping the flows into different forwarding classes means the flows use different output queues on the Interconnect device. If the flows use the same ingress interface on the Interconnect device, they do not experience fate sharing because only the flows mapped to the congested queue are paused, while the flows remapped to other forwarding classes (remapped to different queues) are not paused.

Mitigating fate sharing is often useful for lossless flows such as storage traffic, but is not limited to lossless flows. You can remap forwarding classes to mitigate fate sharing on the Interconnect device to separate flows that belong to any application (for example, iSCSI, NAS, FCoE, and so on), even when the flows are in the same VLAN.

Fate Sharing on the Interconnect Device

Fate sharing can occur when multiple flows use the same output queue (the flows are mapped to the same forwarding class) and the same ingress interface. If the flows use the same ingress interface, then if one congested flow is paused, the uncongested flows in the same forwarding class are also paused at the shared ingress interface—the uncongested flows share fate with the congested flow. On an Interconnect device, the flows from many Node devices are aggregated, so the number of flows assigned to a given forwarding class and forwarded through a particular egress interface can be much greater on an Interconnect device than on a single Node device.

Note:

The possibility of fate sharing cannot be avoided on Node device ingress interfaces. If two servers access a Node device on the same ingress interface, and both servers send traffic flows that are classified into the same forwarding class (for example, if both flows are FCoE traffic and are classified into the fcoe forwarding class), then even if the flows are in different VLANs, congestion on one flow affects the other flow. The congested flow affects the uncongested flow because both flows share the same forwarding class (and therefore the same output queue and IEEE 802.1p code point), and priority-based flow control (PFC) is applied to the ingress interface, not to the VLAN. So when PFC pauses the congested flow on the ingress interface, the uncongested flow that uses the same code point is also paused.

An example of fate sharing is when two Fibre Channel over Ethernet (FCoE) flows are in the same forwarding class (so they use the same output queue) and use the same Interconnect device ingress interface. If one of those flows experiences congestion and the other flow does not experience congestion, the congested flow can affect the uncongested flow. If backpressure forces the ingress interface to pause the congested FCoE flow, the uncongested FCoE flow is also paused because the two flows use the same forwarding class ( output queue) and traffic in that forwarding class is paused on the ingress interface.

Remapping flows that belong to the same forwarding class into different forwarding classes for transport across the Interconnect device mitigates fate sharing by separating the flows onto different output queues. Using different output queues means that the flows use different forwarding classes on the ingress interface. When a flow on one queue is paused, it does not affect flows that have been remapped onto other queues. The congestion only affects traffic on the paused queue, so the congestion only affects the congested forwarding class on the ingress interface.

After the traffic crosses the Interconnect device, the Node device on which the traffic egresses the QFabric system must map the traffic back to its original forwarding class before forwarding the traffic toward its destination, because the original forwarding class contains similar traffic, and is classified to support the CoS that the traffic type requires and the destination device expects.

For example, traffic destined for different targets in the same storage area network (SAN) normally should be in the same forwarding class, because a SAN uses one IEEE 802.1p code point (priority) to identify all traffic of a particular type, such as FCoE traffic. So when traffic destined for the SAN leaves the QFabric system, all of it must be mapped to the same forwarding class so that it uses the same IEEE 802.1p code point and is identified and classified the same way when it enters the SAN. This is why the QFabric system must map the traffic back into its original forwarding class after the traffic crosses the Interconnect device.

The QFabric system uses a firewall filter to remap traffic to a different forwarding class before it crosses the Interconnect device, and then map traffic back to its original forwarding class after it exits the Interconnect device.

The firewall filter must remap forwarding classes in each direction the traffic flows through the Interconnect device. For example, filter terms must remap traffic when it travels from a server to a target, and also when traffic travels from a target to a server. For each direction of traffic, you configure a filter term that maps traffic into a different forwarding class when it enters the Interconnect device, and a filter term that maps traffic back into its original forwarding class after it exits the Interconnect device.

As with all firewall filters, there is a default discard at the end of each filter rule, so if you do not want to discard all traffic that is not explicitly permitted, you should add a final term to accept traffic that is not affected by the other terms. This is especially important when you are not remapping all of the traffic in a VLAN.

The QFabric system supports up to six lossless traffic classes called fabric forwarding class sets (fabric fc-sets) on the Interconnect device. (You can also configure up to six lossless forwarding classes on a Node device.) Each fabric fc-set maps to a different output queue on the Interconnect device.

Note:

Fabric fc-sets on the Interconnect device are analogous to forwarding classes on Node devices, in that both fabric fc-sets and forwarding classes map to output queues on their respective devices. However, more than one forwarding class can map to a fabric fc-set, so a fabric fc-set can aggregate forwarding classes on the Interconnect device.

Fabric fc-set names are not user-configurable, and you cannot configure new fabric fc-sets. You can configure forwarding class to fabric fc-set mapping, so each fabric fc-set transports the traffic that you want it to transport.

The six lossless fabric fc-sets enable you to separate traffic from Node devices into as many as six lossless traffic classes on the Interconnect device. Each fabric fc-set uses a different output queue, so the flows (forwarding classes) mapped to each fabric fc-set use different ingress interfaces, and the flows in one fabric fc-set do not share fate with flows in other fabric fc-sets.

In addition, there are four multidestination fabric fc-sets on the Interconnect to handle multicast, broadcast, and destination lookup fail traffic.

Note:

The flows (forwarding classes) within a fabric fc-set can share fate if they use the same ingress interface (the shared ingress interface could be an external 40-Gigabit interface or an internal Clos interface) because they use the same output queue, so they belong to the same forwarding class. However, the ability to separate flows into different forwarding classes enables you to spread the traffic among multiple output queues, and thus to mitigate the possibility of fate sharing because only traffic that belongs to the paused forwarding class (output queue) is paused on the ingress interface. Traffic remapped into other forwarding classes is not paused.

Scenario 1: How Fate Sharing Can Occur on a QFabric System Interconnect Device

An example of traffic that might share fate across the Interconnect device is storage traffic. This scenario uses FCoE as an example.

Note:

Any type of traffic that shares the same forwarding class (output queue) and Interconnect device ingress interface can experience fate sharing.

QFabric system Node devices aggregate FCoE traffic from connected ENodes. Because the FCoE traffic requires the same treatment across the network, in this scenario the FCoE traffic uses the same forwarding class on all of the Node devices (the default fcoe forwarding class), and is mapped to the same output queue on all of the Node devices. Because the Fibre Channel (FC) SAN usually expects traffic to have a priority value of 3 (IEEE 802.1p code point 011), priority 3 identifies all of the FCoE traffic.

All of the FCoE traffic that is not locally switched on the Node devices is remotely switched across the Interconnect device. A large amount of FCoE traffic might be switched across the Interconnect device, and all of that traffic uses the same egress queue. Whenever the FCoE flows use the same Interconnect device ingress interface, fate sharing can occur, as shown in Figure 1.

Figure 1: Fate Sharing Scenario: FCoE Traffic Shares Fate on the Interconnect DeviceFate Sharing Scenario: FCoE Traffic Shares Fate on the Interconnect Device

In Figure 1, FCoE traffic flows from two FCoE hosts (ENode E1 and ENode E2) through the QFabric system and an FCoE-FC gateway switch to two storage target devices in the SAN (target T1 and target T2). Target device T1 is experiencing congestion. Target device T2 is not experiencing congestion, as shown by the red “X”.

The dotted line shows the path that FCoE traffic from ENode E1 takes, entering the QFabric system at ingress Node device N1, flowing through the Interconnect device to the egress Node device N3, exiting the QFabric system to FCoE-FC gateway switch GW1, entering the SAN, then finally reaching target T1.

The solid line shows the path that FCoE traffic from ENode E2 takes, entering the QFabric system at ingress Node device N2, flowing through the Interconnect device to the egress Node device N4, exiting the QFabric system to FCoE-FC gateway switch GW2, entering the SAN, then finally reaching target T2.

When FCoE traffic from hosts ENode E1 and ENode E2 crosses the Interconnect device, the flows from the two hosts might use the same ingress interface at any of the Interconnect device interface stages (external 40-Gigabit interfaces or internal Clos interfaces). If the flows use the same ingress interface at any point, the paths of the flows converge at that interface on the input queue instead of remaining separate. (The dotted line and the solid line can intersect if at any time they share a common Interconnect device ingress interface.) When traffic flows assigned to the same forwarding class use the same ingress interface, fate sharing can occur because the flows use the same output queue.

In this scenario, the flows from hosts E1 and E2 share an ingress interface as they cross the Interconnect device. When target T1 experiences congestion, it sends a pause message to temporarily stop the incoming flow until the congestion clears, in order to prevent packet loss due to queue overfill. The pause message propagates back through the data path. Eventually, host E1 will receive a pause message and temporarily stop transmitting.

However, when the pause message reaches the Interconnect device ingress interface that the FCoE flows from hosts E1 and E2 share, not only is the flow originating from host E1 paused, the flow originating from host E2 is also paused, even though the E2 host flow is not experiencing congestion. Both flows are paused because both flows belong to the same forwarding class, and therefore use the same output queue, and use the same ingress interface. When the message pauses the E1-to-T1 flow, it also pauses the E2-to-T2 flow, because the all flows in the forwarding class are paused on the shared ingress interface, regardless of whether or not an individual flow in that forwarding class is experiencing congestion.

In this scenario, the uncongested FCoE flow from E2-to-T2 shares the same fate as the congested FCoE flow from E1-to-T1.

Note:

This FCoE traffic scenario is one example of fate sharing. Fate sharing can occur on any flows that are mapped to the same forwarding class (output queue) and use the same Interconnect device ingress interface.

Scenario 2: How Forwarding Class Remapping Mitigates Fate Sharing on a QFabric System Interconnect Device

Fate sharing occurs when traffic flows are assigned to the same forwarding class (and therefore use the same output queue) and also use the same ingress interface. The trick to mitigating the effects of fate sharing is to do one of two things: either ensure that flows assigned to the same forwarding class use different ingress interfaces, or ensure that flows use different forwarding classes, so that if they use the same ingress interface, they use different egress queues on the interface.

Ensuring that flows assigned to the same forwarding class use different ingress interfaces is not possible because the Interconnect device automatically selects the best path through its internal Clos interfaces. You cannot configure the Interconnect device to route traffic along a particular path within the device. However, you can separate the traffic assigned to one forwarding class into multiple forwarding classes for the journey across the Interconnect device. Remapping the flows into different forwarding classes means the flows use different output queues, so if the flows use the same ingress interface, they will not experience fate sharing.

Figure 2: Fate Sharing Mitigation Scenario: FCoE Traffic Avoids Fate Sharing on the Interconnect DeviceFate Sharing Mitigation Scenario: FCoE Traffic Avoids Fate Sharing on the Interconnect Device

Figure 2 is similar to Figure 1, with one exception. There are still two FCoE traffic flows, one from host ENode E1 to SAN target T1, and one from host ENode E2 to SAN target T2. Target T1 is experiencing congestion, and target T2 is not experiencing congestion.

The difference is that the path from Node device N2 to the Interconnect device and from the Interconnect device to Node device N4 (yellow in color display, light gray in black and white display) indicates that the forwarding class has been remapped from the original fcoe default forwarding class into a different forwarding class.

As in the fate sharing scenario, the flows from hosts E1 and E2 share an ingress interface as they cross the Interconnect device. Also as in the fate sharing scenario, when target T1 experiences congestion, it sends a pause message to temporarily stop the incoming flow until the congestion clears, and the pause message propagates back through the data path.

However, unlike the flows in the fate sharing scenario, these flows use different output queues because the flows have been remapped into different forwarding classes for transit across the Interconnect device. Since the flows use different forwarding classes, they do not share fate on the shared ingress interface, and the uncongested flow from E2-to-T2 does not share the fate of the congested E1-to-T1 flow.

You configure a firewall filter to control how the forwarding classes are remapped before traffic exits the ingress Node device and crosses the Interconnect device. In the same firewall filter, you also configure terms to control how the remapped forwarding class is mapped back to its original forwarding class when traffic enters the egress Node device after the traffic crosses the Interconnect device. The firewall filter requires terms for remapping forwarding class in both directions of flow. For example, the filters must remap the forwarding class not only in the E1-to-T1 direction, but also in the T1-to-E1 direction.

Note:

If an ENode (FCoE device on the Ethernet network) is directly connected to a QFabric system Node device, and that Node device is directly connected to the FCoE-FC gateway by a LAG interface, then using firewall filters to mitigate fate sharing by remapping forwarding classes is not supported, so that traffic is not remapped.

On Node devices that have directly connected ENodes and that also connect directly to an FCoE-FC gateway using a LAG interface, configure the Node device interfaces in a different VLAN than the interfaces on which you want to mitigate fate sharing. In this scenario, interfaces on the Node device should not be in the same VLAN as interfaces on which you want to apply firewall filters to mitigate fate sharing.

If the interface between the Node device and the FCoE-FC gateway is not a LAG interface, then forwarding class remapping works when ENodes are directly connected to the Node device. The fate sharing mitigation feature does not work only when ENodes are directly connected to the Node device and the connection between the Node device and the FCoE-FC gateway is a LAG interface.

Fate Sharing Mitigation Process

The following sequence summarizes the packet flow and the QFabric system operations for mitigating fate sharing:

  1. A packet enters a QFabric system ingress Node device. The ingress Node device classifies the packet into a forwarding class, usually based on its IEEE 802.1p code point (priority).

  2. The Node device switching lookup determines that the packet needs to traverse the Interconnect device.

  3. On the Node device, a firewall filter remaps the packet from its original forwarding class into a different forwarding class.

  4. The packet exits the Node device and enters the Interconnect device, using the new (remapped) forwarding class. On the Interconnect device, the new forwarding class is mapped to a different fabric fc-set, and therefore to a different output queue, than the original forwarding class, so it does not share fate with traffic in the original forwarding class on ingress interfaces. (Each fabric fc-set maps to a different output queue by default, so placing traffic in a different fabric fc-set allows that traffic to use different output queue bandwidth resources than traffic that is mapped to other fabric fc-sets.) The packet crosses the Interconnect device, and then exits.

  5. The packet arrives at the egress Node device. At the Node device ingress interface, the same firewall filter remaps the packet from the new forwarding class back into the original forwarding class.

    This process remaps the traffic into a different forwarding class for the journey across the Interconnect device, and then maps the traffic back into its original forwarding class to continue the journey to its destination.

  6. The egress Node device forwards the packet toward its destination. Because the packet has been mapped back to its original forwarding class, it once again receives the same CoS treatment as similar traffic that was not remapped across the Interconnect device. It is important to map traffic back to its original forwarding class before forwarding traffic toward its destination because the original forwarding class contains similar traffic, and is configured to support the CoS that the traffic type requires and the destination device expects.

Note:

If you configure non-default forwarding classes and use non-default fabric fc-sets, you must also configure queue scheduling for the new forwarding classes on the Node device and for the non-default fabric fc-sets on the Interconnect device.

Mitigating fate sharing consists of configuration steps that create the necessary forwarding classes and firewall filters, apply the firewall filters to traffic, map the forwarding classes to fabric fc-sets on the Interconnect device, and schedule port bandwidth resources (if needed) for the forwarding classes and fabric fc-sets.

Forwarding Classes (Node Devices)

If you have only a few flows that you want to separate for transit across the Interconnect device, and the default forwarding classes provide enough separation to avoid fate sharing, you do not need to configure new forwarding classes. There are five default forwarding classes on a QFabric system:

  • fcoe—Guaranteed delivery for Fibre Channel over Ethernet (FCoE) traffic.

  • no-loss—Guaranteed delivery for TCP lossless traffic.

  • best-effort—Provides best-effort delivery without a service profile. Loss priority is typically not carried in a class-of-service (CoS) value.

  • network-control—Supports protocol control and is typically high priority.

  • mcast—Provides service for multidestination (multicast, broadcast, and destination lookup fail) packets.

For example, if you want to separate FCoE traffic into two separate flows on the Interconnect device, and you are not using the no-loss forwarding class for other traffic, you can remap some of the FCoE traffic to the no-loss forwarding class and leave the rest in the fcoe forwarding class. If this provides sufficient separation of flows, you do not need to create new forwarding classes.

Using the existing default forwarding classes has two more time-saving advantages:

  1. You do not need to map the forwarding class to a fabric fc-set on the Interconnect device, because each default forwarding class is already mapped to a default fabric fc-set.

  2. You do not need to schedule port bandwidth resources for a new forwarding class on the Node device or for the fabric fc-set on the Interconnect device, because each default forwarding class and fabric fc-set already has a default port bandwidth allocation.

However, if the default forwarding classes are not sufficient, you can configure up to eight unicast forwarding classes (including the four default forwarding classes) and up to four multidestination forwarding classes (including the default mcast forwarding class). You can configure up to six of the unicast forwarding classes as lossless forwarding classes. Lossless transport is not supported on multidestination queues.

For more information about forwarding classes, see Understanding CoS Forwarding Classes. For an example of how to configure forwarding classes, see Defining CoS Forwarding Classes.

Note:

Configuring a new forwarding class includes mapping an output queue to that forwarding class. When you configure a new forwarding class, you also need to configure scheduling resources (output queue bandwidth) for the new forwarding class. When the fate mitigation firewall filter separates traffic flows in a VLAN and assigns all or some of those flows to the new forwarding class, if the output queue mapped to the new forwarding class does not receive bandwidth, traffic cannot be forwarded. For more information about scheduling on Node devices, see Understanding CoS Output Queue Schedulers, Understanding CoS Priority Group Scheduling, and Understanding CoS Hierarchical Port Scheduling (ETS).

Firewall Filter Construction (Node Devices)

Fate sharing mitigation uses firewall filters to separate traffic before the traffic crosses the Interconnect device, and to bring that traffic back together after it exits the Interconnect device. The QFabric system uses firewall filters to identify (match) traffic and remap forwarding classes because firewall filter match conditions are granular enough to easily identify and separate (filter) particular traffic flows within a VLAN.

Note:

You can configure firewall filters for fate sharing mitigation only in the firewall family ethernet-switching hierarchy. You cannot configure firewall filters to mitigate fate sharing in the inet (IPv4) or inet6 (IPv6) firewall family hierarchies.

You bind firewall filters for fate sharing mitigation to ingress VLANs as input filters (later in this document is an explanation of why ingress VLANs are the filter bind point). Each firewall filter consists of terms that contain match conditions to identify traffic, and actions to perform on the matched traffic. For more information about firewall filters, see Overview of Firewall Filters (QFX Series).

The firewall filter terms:

  1. Remap some or all of the traffic in one forwarding class into another forwarding class before that traffic exits an ingress Node device to go to the Interconnect device. This separates traffic flows before they traverse the Interconnect device, so the traffic uses different output queues and does not experience fate sharing if the traffic in the different forwarding classes uses the same ingress interface.

  2. Map the remapped traffic back into its original forwarding class after it exits the Interconnect device, when the traffic enters egress Node device. This brings the traffic flows back into their original forwarding class and classification before the traffic is forwarded toward its destination.

Note:

Each firewall filter requires terms to remap the forwarding class in both directions of flow through the Interconnect device. For example, the forwarding class needs to be remapped on the Interconnect device as traffic flows from a source server to a destination target, and the forwarding class also needs to be remapped on the Interconnect device as traffic flows back from the target to the server.

Firewall filter terms contain match conditions (from statement) to identify traffic, and actions (then statement) to tell the system what to do with the identified traffic.

Each forwarding class remapping firewall filter uses match conditions to identify a particular traffic flow to remap, and match conditions to identify the direction of flow on the Interconnect device. Each fate sharing mitigation firewall filter includes terms that:

  1. Identify and remap traffic flowing from the server to the target before it enters the Interconnect device.

  2. Identify and remap traffic flowing from the server to the target after it exits the Interconnect device.

  3. Identify and remap traffic flowing from the target to the server before it enters the Interconnect device.

  4. Identify and remap traffic flowing from the target to the server after it exits the Interconnect device.

  5. Accept other traffic. Because firewall filters have an implicit default discard terminating action, include a final accept term so that traffic that does not match the filter is not dropped.

You can use the following match conditions in the filter term from statement to identify traffic that you want to remap as it crosses the Interconnect device:

  • Client-side MAC address (for example, an FCF MAC address for FCoE traffic) (destination-mac-address mac-address) or (source-mac-address mac-address)

  • Server-side MAC address (for example, an ENode MAC address for FCoE traffic) (destination-mac-address mac-address) or (source-mac-address mac-address)

  • EtherType (ether-type value)

    Note:

    If you remap an FCoE flow using EtherType as a match condition, you need to include two terms in the filter in each direction of flow to identify the traffic, one term to identify FCoE traffic (EtherType 0x8906), and one term to identify FIP traffic (EtherType 0x8914).

  • VLAN (vlan (vlan-name | vlan-id))

  • .1q user priority (dot1q-user-priority value)

These five match conditions select the traffic from within a VLAN that you want to map to a different forwarding class. The match conditions enable you to identify traffic in VLANs that carry a mix of traffic types—for example, you can identify a flow within a VLAN based on EtherType or .1q value. For more information about match conditions, see Firewall Filter Match Conditions and Actions (QFX5100, QFX5110, QFX5120, QFX5200, EX4600, EX4650).

Best Practice:

For FCoE traffic, we recommend that you use the FCF MAC address (instead of the ENode MAC address) as the source or destination address when you configure a firewall filter, because an ENode might be able to reach more than one FCF. Using the FCF MAC is the most specific way to identify the correct path for the traffic.

Note:

You cannot match on multicast addresses based on prefix. You must use a specific multicast address as the source or destination address.

In the same filter term from statement, you specify a match condition to determine whether you are identifying traffic that is flowing from a Node device into the Interconnect device, or traffic that is flowing from the Interconnect device to a Node device:

  • to-fabric <except>—This condition matches traffic that flows from a Node device to an Interconnect device (traffic that is exiting a Node device and entering the Interconnect device). Traffic that matches the to-fabric condition is remapped before it exits the ingress Node device and enters the Interconnect device.

    The except option remaps forwarding classes for traffic that is locally switched. For example, if a target device is directly connected to a Node device, the traffic destined for the directly connected target is remapped to the new forwarding class. When you specify the except option, traffic that is remotely switched is not remapped to a new forwarding class before it crosses the Interconnect device.

  • from-fabric—This condition matches traffic that flows from the Interconnect device to a Node device (traffic that is exiting the Interconnect device and entering the egress Node device). Traffic that matches the from-fabric condition is mapped back to its original forwarding class after it exits the Interconnect device, when it enters the egress Node device.

Best Practice:

In a firewall filter configuration, if you use a to-fabric except match condition, place it before the from-fabric term in the sequence of terms in the filter.

After you configure match conditions in a filter term, you configure an action to take on the identified (matched) traffic in the same term. Because the goal is to remap traffic in one forwarding class into a different forwarding class, the action is usually to place the matched traffic into a forwarding class.

Use the following actions (then statement) to control into which forwarding class the matched traffic is remapped in a given term:

  • forwarding-class forwarding-class-name—Specify a default or a user-defined forwarding class into which matching traffic is mapped.

  • loss-priority level—If you specify a forwarding class for matching traffic, you must also specify the packet loss priority (PLP) level for the forwarding class. The PLP level can be low, medium-high, or high.

  • count counter-name—Optionally, you can configure an action to count the number of packets affected by each term.

    Note:

    You can use the match conditions to identify a traffic flow, and then count the packets without remapping the forwarding class. To do that, in the then statement, do not include the forwarding class and loss priority, include only the count action.

Applying Firewall Filters to Traffic (Node Devices)

You apply (bind) firewall filters for fate sharing mitigation to ingress VLANs, not to ports. (Firewall filters for mitigating fate sharing do not apply to VLANs on the egress side.) Applying the firewall filter to an ingress VLAN has advantages compared to applying the firewall filter to a port:

  • The filter affects all of the matched traffic on all interfaces that are members of the VLAN, on all Node devices on the QFabric system. Instead of applying the firewall filter to individual ports or ranges of ports on each Node device, you only have to apply the firewall filter once to the VLAN.

  • VLANs usually carry similar types of traffic.

You bind firewall filters to ingress VLANs as input filters using the set vlans vlan-name filter input filter-name configuration statement. See Configuring Firewall Filters for more information about configuring and applying firewall filters.

Best Practice:

Place traffic of one type in one VLAN (use separate VLANs for each different type of traffic). We recommend that you do not mix different types of traffic in the same VLAN. The QFabric system requires that a VLAN that carries FCoE traffic must carry only FCoE traffic. However, it is a good practice to do the same thing with other types of traffic. For example, if your network carries both iSCSI and NAS traffic, we recommend that you dedicate one VLAN to iSCSI traffic, and one VLAN to NAS traffic (and so on). You can configure separate firewall filters to mitigate fate sharing for each type of traffic.

Note:

Because firewall filters for mitigating fate sharing are applied to VLANs, and not to ports, there are several behaviors you should be aware of:

  • If more than one VLAN uses a port, the firewall filter applies only to the traffic in the VLAN on which you applied the firewall filter. Traffic in other VLANs might be exposed to fate sharing on the Interconnect device.

  • The ports on which the firewall filter is applied depend on VLAN membership. If ports on multiple Node devices are members of the VLAN, then the firewall filter remaps traffic on the VLAN member ports of all of those Node devices. If you want to remap traffic on only one Node device, then the VLAN member interfaces should all be on that Node device, and not on other Node devices. (Configuring a VLAN that includes member interfaces from only one Node device enables you to remap traffic on that Node device independently from other Node devices.)

  • Although firewall filters mitigate fate sharing on the Interconnect device, they do not mitigate fate sharing on a Node device. This is because PFC is applied to specified queues on a port, not to a VLAN. (Recall that forwarding classes are mapped to queues, so all traffic in the same forwarding class uses the same queue, regardless of VLAN membership.)

    An example scenario is two VLANs that contain FCoE traffic that is classified into the fcoe forwarding class and use an ingress interface on the same Node device. The fcoe forwarding class is classified to IEEE 802.1p code point 011 (priority 3) to identify the FCoE traffic on both VLANs (because all of the FCoE traffic requires the same CoS treatment and all of the traffic is destined for the same SAN), and so both VLANs use the same output queue.

    If FCoE traffic in one of the VLANs experiences congestion, PFC is enabled on the flow, and the flow is paused until the congestion clears. Because the FCoE traffic in the other VLAN uses the same output queue (forwarding class), when the congested FCoE flow is paused on the ingress interface, all FCoE traffic that uses that ingress interface is also paused. In this way, the congested FCoE flow affects the uncongested FCoE flow, and the two flows share the same fate.

    So if two servers on the same Node device ingress port send traffic that belongs to the same forwarding class (in this example, fcoe), they can experience fate sharing on the Node device.

Warning:

Do not apply firewall filters that remap forwarding classes while traffic that the filters affect is flowing!

For forwarding class remapping to work properly, traffic must be mapped from its original forwarding class to a new forwarding class before it enters the Interconnect device, and then mapped back to the original forwarding class after it exits the Interconnect. If traffic is not mapped back into its original forwarding class after crossing the Interconnect device, traffic is classified into the wrong forwarding class and is not delivered as expected. Because of this, the QFabric system must program the filters on the ingress Node device and the egress Node device when affected traffic is not flowing.

If traffic is flowing when you apply the filters to a VLAN, and the ingress Node device filter is programmed before the egress Node device filter is programmed, traffic is not remapped back into its original forwarding class until the egress Node device filter is applied. For this reason, apply filters only when affected traffic is not flowing through the QFabric system.

Mapping Forwarding Classes to Fabric Forwarding Class Sets (Interconnect Device)

The five default forwarding classes (best-effort, fcoe, no-loss, network-control, and mcast, see Forwarding Classes (Node Devices)) are mapped by default to fabric fc-sets on the Interconnect device. If you are using only default forwarding classes on the Node devices, then you do not need to map forwarding classes to fabric fc-sets, you can use the default mapping.

If you create new (user-defined) forwarding classes on a Node device, you must map the new forwarding classes to fabric fc-sets on the Interconnect device. (If you do not map a new forwarding class to a fabric fc-set on the Interconnect device, the traffic that belongs to the new forwarding class receives very little bandwidth on the Interconnect device.)

Each fabric fc-set maps to a different output queue on the Interconnect device by default, much like each forwarding class maps to a different output queue by default on Node devices. Mapping a new forwarding class to a non-default (unused) fabric fc-set causes the traffic assigned to that forwarding class to use a different output queue on the Interconnect device. (The traffic in the new forwarding class uses a different output queue than traffic mapped to other fabric fc-sets.)

Also similar to forwarding classes on Node devices, there are five default fabric fc-sets on the Interconnect device, and twelve total fabric fc-sets, eight of which are unicast fabric fc-sets, and four of which are multidestination fabric fc-sets. Each default forwarding class has a default mapping to one of the default fabric fc-sets. The non-default fabric fc-sets are hidden until you map forwarding classes to them, but are available for use.

The five default forwarding classes are mapped to the five default fabric fc-sets as shown in Table 1 (you can reconfigure the mapping of default forwarding classes to default fabric fc-sets if you want):

Table 1: Default Fabric Forwarding Class Sets

Fabric Forwarding Class Set Name

Characteristics

fabric_fcset_be

Transports best-effort unicast traffic across the fabric.

fabric_fcset_strict_high

Transports unicast traffic that has been configured with strict-high priority and in the network-control forwarding class across the fabric.

Note:

This fabric fc-set receives as much bandwidth across the fabric as it needs to service the traffic in the group up to the entire fabric interface bandwidth. For this reason, exercise caution when mapping traffic to this fabric fc-set to avoid starving other traffic.

fabric_fcset_noloss1

Transports unicast traffic in the default fcoe forwarding class across the fabric.

fabric_fcset_noloss2

Transports unicast traffic in the default no-loss forwarding class across the fabric.

fabric_fcset_multicast1

Transports multidestination traffic in the mcast forwarding class across the fabric. This fabric fc-set is valid only for multidestination forwarding classes.

The remaining four unicast fabric fc-sets (fabric_fcset_noloss3, fabric_fcset_noloss4, fabric_fcset_noloss5, and fabric_fcset_noloss6) can carry lossless traffic and are available for mapping or remapping forwarding classes on the Interconnect device. The remaining three mutidestination fabric fc-sets (fabric_fcset_multicast2, fabric_fcset_multicast3, and fabric_fcset_multicast4) are available for remapping multidestination forwarding classes.

The total of six lossless and four multidestination fabric fc-sets enable you to separate traffic from Node devices into up to ten classes on the Interconnect device, not including the best-effort and strict high-priority fabric fc-sets. Because each fabric fc-set uses a different output queue on egress interfaces, the flows (forwarding classes) mapped to each fabric fc-set do not share fate with flows in other fabric fc-sets on ingress interfaces.

The total number of unique flows on a QFabric system is vastly greater than the number of fabric fc-sets, so fabric fc-sets still aggregate flows—each fabric fc-set will carry a group of flows that require similar CoS treatment. However, the fabric fc-sets enable you to spread the flows across multiple output queues, and thus mitigate the effects of fate sharing.

Note:

The forwarding class flows within a fabric fc-set share fate on ingress interfaces because they use the same output queue. However, the ability to separate flows into different classes that use different output queues enables you to control how much traffic is mapped to a given output queue, and in that way to mitigate the possibility of fate sharing.

For more information about fabric fc-sets, see Understanding CoS Fabric Forwarding Class Sets.

Scheduling Bandwidth for Fabric Forwarding-Class Sets (Interconnect Device)

The five default fabric fc-sets (fabric_fcset_be, fabric_fcset_strict_high, fabric_fcset_noloss1, fabric_fcset_noloss2, and fabric_fcset_multicast1, see Mapping Forwarding Classes to Fabric Forwarding Class Sets (Interconnect Device)) receive scheduling resources on Interconnect device output queues by default. If you are using only default fabric fc-sets, then you can use the default scheduling. However, you can change scheduling parameters, such as the amount of bandwidth allocated to a default fabric fc-set, if you want to adjust the default scheduling.

If you configure a new forwarding class on a Node device, you must map the new forwarding class to a fabric fc-set so that the traffic classified into the forwarding class receives queue bandwidth resources. If you map a new forwarding class to one of the default fabric fc-sets on the Interconnect device, then the default bandwidth scheduled for that fabric fc-set is shared among the forwarding classes assigned to the fabric fc-set by default, and also with the new forwarding class.

If you map a new forwarding class to one of the non-default fabric fc-sets, you must schedule queue bandwidth resources for that fabric fc-set, or else the traffic mapped to the fabric fc-set receives only a small amount of bandwidth.

Note:

You apply queue (forwarding class) scheduling to interfaces. The Interconnect device interfaces consist of the ingress and egress 40-Gbps (fte) interfaces that connect to QFabric system Node devices, and internal Clos fabric (bfte) interfaces. You need to apply the appropriate scheduler to each fte interface in the traffic path. All traffic traverses the internal Clos fabric interfaces, so you also need to apply the appropriate scheduler to the Clos fabric bfte interfaces. (You configure one scheduler that applies to all of the internal Clos fabric interfaces. It is not possible or desirable to attach a scheduler to a particular internal Clos fabric interface.)

Because one scheduler applies to all of the Clos fabric interfaces, you either use the default scheduler on all Clos interfaces, or you use your custom configured scheduler on all Clos interfaces.

For conceptual information about configuring CoS scheduling on an Interconnect device and across the entire QFabric system, see Understanding CoS Scheduling Across the QFabric System. For information about default CoS scheduling on the Interconnect device, see Understanding Default CoS Scheduling on QFabric System Interconnect Devices (Junos OS Release 13.1 and Later Releases). For an example of how to configure scheduling on an Interconnect device and across the entire QFabric system, see Example: Configuring CoS Scheduling Across the QFabric System.

Multidestination Traffic (FCoE Initialization Protocol Traffic)

Multidestination (multicast, broadcast, and destination lookup fail) traffic that is not switched locally on a QFabric system Node device is switched across the Interconnect device. On the Node device, by default, multidestination traffic uses the mcast forwarding class. On the Interconnect device, by default the multidestination traffic from the Node devices uses the fabric_fcset_multicast1 fabric fc-set. The output queue for the fabric_fcset_multicast1 fabric fc-set receives up to 20 percent of the available egress port bandwidth.

FCoE devices on the Ethernet network use FCoE Initialization Protocol (FIP) to establish a virtual point-to-point link with the FCF. The FCF sends periodic multicast discovery advertisements (MDAs) to advertise its presence on the network to ENodes. When an ENode comes online, it sends a multicast discovery solicitation (MDS) advertisement to search the network for FCFs.

The FIP MDA and MDS advertisements use the default multicast queue on the Interconnect device. If the amount of multidestination traffic that crosses the Interconnect device causes congestion on the multidestination queue, that congestion can impact FIP discovery advertisement traffic. (Fate sharing can occur because the FIP advertisements share the same fabric fc-set, and therefore the same output queue, as the rest of the multidestination traffic on the Interconnect device. Multidestination traffic that uses the same ingress interface at any point on the Interconnect device can experience fate sharing if the output queue becomes congested.)

Note:

If the amount of multidestination traffic on the Interconnect device is not enough to cause congestion, you do not have to remap multicast FIP traffic into a separate forwarding class to avoid fate sharing.

Note:

Although multicast FIP traffic uses the mcast queue and the fabric_fcset_multicast1 fabric fc-set by default, unicast FCoE and FIP traffic uses the fcoe forwarding class and the fabric_fcset_noloss1 fabric fc-set by default.

If the amount of multidestination traffic that traverses the Interconnect device can cause congestion, then you can remap the FIP multicast traffic into a new forwarding class on the Node device and a new fabric fc-set on the Interconnect device to mitigate fate sharing. The process is similar to mitigating fate sharing on unicast traffic, but there are a few differences:

  1. Configure a new multidestination forwarding class for the FIP multicast traffic on the Node device. (By default, multicast FIP traffic is classified into the default mcast forwarding class.)

  2. Configure queue and priority group scheduling (hierarchical scheduling) for the new multidestination forwarding class.

  3. Configure a firewall filter to remap the FIP multicast traffic into the new forwarding class. To match FIP multicast traffic, specify two match conditions: the ALL-FCF-MAC address (01:10:18:01:00:02) as the source or destination MAC address (depending on the direction of flow), and the FIP EtherType (0x8014).

  4. Bind the firewall filter to the appropriate VLAN.

  5. Map the new multidestination forwarding class that you created on the Node device to an unused multicast fabric fc-set on the Interconnect device.

  6. Configure scheduling for the multicast fabric fc-set on the Interconnect device.

Note:

When configuring firewall filter match conditions, you cannot match on multicast addresses based on prefix. You must use a specific multicast address as the source or destination address.

Best Practices

The previous sections include some best practices for mitigating fate sharing. This section aggregates those best practices, along with a few other tips.

VLANs

Place traffic of one type in one VLAN (use separate VLANs for each different type of traffic). We recommend that you do not mix different types of traffic in the same VLAN. The QFabric system requires that a VLAN that carries FCoE traffic must carry only FCoE traffic. However, it is a good practice to do the same thing with other types of traffic. For example, if your network carries both iSCSI and NAS traffic, we recommend that you dedicate one VLAN to iSCSI traffic, and one VLAN to NAS traffic (and so on). You can configure separate firewall filters to mitigate fate sharing for each type of traffic.

Source/Destination MAC Address for FCoE Traffic

For FCoE traffic, we recommend that you use the FCF MAC address (instead of the ENode MAC address) as the source or destination address when you configure a firewall filter, because an ENode might be able to reach more than one FCF. Using the FCF MAC is the most specific way to identify the correct path for the traffic.

Firewall Filter Term Sequence

In most cases, the sequence of terms in a fate sharing firewall filter does not matter (with the exception of the final accept term), so in most cases, it does not matter if a from-fabric term is placed before a to-fabric term in the firewall filter.

However, we recommend that if you use the except option with to-fabric (to-fabric except), you should place the to-fabric except term before the from-fabric in the firewall filter.

In general, we recommend that in a filter, you configure the to-fabric terms first, then configure the from-fabric terms, and end the filter with an accept term (unless you want to drop traffic that does not match the filter).

Limitations and Notes on Behavior

There are a number of limitations and behaviors that you should understand about how to mitigate fate sharing across an Interconnect device. Some of those limitations and behaviors have been discussed in the previous sections, and are repeated here for your convenience.

Limitations

  • You can configure firewall filters for fate sharing mitigation only in the firewall family ethernet-switching hierarchy. You cannot configure firewall filters to mitigate fate sharing in the inet (IPv4) or inet6 (IPv6) firewall family hierarchies.

  • Interconnect device fabric fc-sets are not user-configurable (you cannot rename them or configure new fabric fc-sets). You can map the default forwarding classes and the forwarding classes that you define on Node devices to fabric fc-sets to control the traffic that is mapped to each fabric-fcset.

  • The possibility of fate sharing cannot be avoided on Node device ingress interfaces. If two servers access a Node device on the same ingress interface, and both servers send traffic flows that are classified into the same forwarding class (for example, if both flows are FCoE traffic and are classified into the fcoe forwarding class), then even if the flows are in different VLANs, congestion on one flow affects the other flow. The congested flow affects the uncongested flow because both flows share the same forwarding class (and therefore the same output queue and IEEE 802.1p code point), and priority-based flow control (PFC) is applied to the ingress interface, not to the VLAN. So when PFC pauses the congested flow on the ingress interface, the uncongested flow that uses the same code point is also paused.

  • The Interconnect device supports a maximum of six lossless unicast flow groups (six lossless unicast fabric fc-sets). In practice, a QFabric system has many more that six flows, so you cannot map each individual flow to a dedicated fabric fc-set. However, you can group flows into six separate sets by mapping groups of flows to different fabric fc-sets. Each fabric fc-set uses a different output queue, so the flows in one fabric fc-set do not share fate with the flows in the other fabric fc-sets when the flows traverse the same ingress interface. The ability to separate flows into six different fabric fc-sets spreads the flows among six different output queues, thus mitigating fate sharing.

  • The Interconnect device supports a maximum of four multidestination flow groups (four multicast fabric fc-sets).

  • The flows (forwarding classes) within a fabric fc-set share fate when they use the same ingress interface because they use the same output queue. (However, the ability to separate flows into different classes that use different output queues enables you to control how much traffic is mapped to a given output queue, and to mitigate the possibility of fate sharing.)

  • Do not apply firewall filters that remap forwarding classes while traffic that the filters affect is flowing!

    For forwarding class remapping to work properly, traffic must be mapped from its original forwarding class to a new forwarding class before it enters the Interconnect device, and then mapped back to the original forwarding class after it exits the Interconnect. If traffic is not mapped back into its original forwarding class after crossing the Interconnect device, traffic is classified into the wrong forwarding class and is not delivered as expected. Because of this, the QFabric system must program the filters on the ingress Node device and the egress Node device when affected traffic is not flowing.

    If traffic is flowing when you apply the filters to a VLAN, and the ingress Node device filter is programmed before the egress Node device filter is programmed, traffic is not remapped back into its original forwarding class until the egress Node device filter is applied. For this reason, apply filters only when affected traffic is not flowing through the QFabric system.

  • If an ENode (FCoE device on the Ethernet network) is directly connected to a QFabric system Node device, and that Node device is directly connected to the FCoE-FC gateway by a LAG interface, then using firewall filters to mitigate fate sharing by remapping forwarding classes is not supported, so that traffic is not remapped.

    On Node devices that have directly connected ENodes and that also connect directly to an FCoE-FC gateway using a LAG interface, configure the Node device interfaces in a different VLAN than the interfaces on which you want to mitigate fate sharing. In this scenario, interfaces on the Node device should not be in the same VLAN as interfaces on which you want to apply firewall filters to mitigate fate sharing.

    If the interface between the Node device and the FCoE-FC gateway is not a LAG interface, then forwarding class remapping works when ENodes are directly connected to the Node device. The fate sharing mitigation feature does not work only when ENodes are directly connected to the Node device and the connection between the Node device and the FCoE-FC gateway is a LAG interface.

  • When configuring firewall filter match conditions, you cannot match on multicast addresses based on prefix. You must use a specific multicast address as the source or destination address.

Notes on Behavior

  • You bind (apply) firewall filters for mitigating fate sharing to ingress VLANs only, not to ports. The filter affects all matched traffic on all Node device ingress interfaces that are members of the VLAN. So if ports on multiple Node devices are members of the VLAN, then the firewall filter remaps traffic on the VLAN member ports of all of those Node devices. If you want to remap traffic on only one Node device, then the VLAN member interfaces should all be on that Node device, and not on other Node devices.

  • Although firewall filters mitigate fate sharing on the Interconnect device, they do not mitigate fate sharing on a Node device. This is because PFC is applied to specified queues on a port, not to a VLAN. (Recall that forwarding classes are mapped to queues, so all traffic in the same forwarding class uses the same queue, regardless of VLAN membership.)

    An example scenario is two VLANs that contain FCoE traffic that is classified into the fcoe forwarding class and use an ingress interface on the same Node device. The fcoe forwarding class is classified to IEEE 802.1p code point 011 (priority 3) to identify the FCoE traffic on both VLANs (because all of the FCoE traffic requires the same CoS treatment and all of the traffic is destined for the same SAN), and so both VLANs use the same output queue.

    If FCoE traffic in one of the VLANs experiences congestion, PFC is enabled on the flow, and the flow is paused until the congestion clears. Because the FCoE traffic in the other VLAN uses the same output queue (forwarding class), when the congested FCoE flow is paused on the ingress interface, all FCoE traffic that uses that ingress interface is also paused. In this way, the congested FCoE flow affects the uncongested FCoE flow, and the two flows share the same fate.

    So if two servers on the same Node device ingress port send traffic that belongs to the same forwarding class (in this example, fcoe), they can experience fate sharing on the Node device.

  • When you configure a firewall filter, by default, the last term in the filter is a discard action. (This s standard default behavior and is not unique to fate sharing mitigation filters.) To avoid dropping traffic that does not match the filter conditions for forwarding class remapping, add a final term with accept as the action. This is especially important when you are not remapping all of the traffic in a VLAN.

  • If you remap FCoE flows based on EtherType, include separate filter terms to match both the FCoE EtherType (0x8906) and the FIP EtherType (0x8914).

  • You must configure filter terms that remap the forwarding classes in both directions of flow. You need to configure terms for to-fabric and from-fabric for the flow from the originating device to the target, and also for the return flow from the target to the originating device. For example, for an FCoE flow, you configure a to-fabric and a from-fabric term for the traffic flowing from the ENode to the FC SAN, and a to-fabric and a from-fabric term for traffic flowing from the FC SAN to the ENode.