Configuring CoS Hierarchical Schedulers

In metro Ethernet environments, a VLAN typically corresponds to a customer premises equipment (CPE) device and the VLANs are identified by an inner VLAN tag on Ethernet frames (called the customer VLAN, or C-VLAN, tag). A set of VLANs can be grouped at the DSL access multiplexer (DSLAM) and identified by using the same outer VLAN tag (called the service VLAN, or S-VLAN, tag). The service VLANs are typically gathered at the Broadband Remote Access Server (BRAS) level, which can be (among other devices) an SRX Series device. On SRX5600 and SRX5800 devices, hierarchical schedulers let you provide shaping and scheduling at the service VLAN level as well as other levels, such as the physical interface. In other words, you can group a set of logical interfaces and then apply scheduling and shaping parameters to the logical interface set as well as to other levels.

This basic architecture is shown in Figure 7. You can apply class-of-service (CoS) parameters at the premises on the CPE, on the customer or service VLANs, at the BRAS level, or at all levels.

Figure 7: An SRX Series Device in a Hierarchical Scheduler Architecture

Image g016864.gif

On SRX5600 and SRX5800 devices, you can apply CoS shaping and scheduling at one of four different levels, including the VLAN set level.

The supported scheduler hierarchy is as follows:

You can specify a traffic control profile (output-traffic-control-profile) that can specify a shaping rate, a guaranteed rate, and a scheduler map with transmit rate and buffer delay. The scheduler map contains the mapping of queues (forwarding classes) to their respective schedulers (schedulers define the properties for the queue). Queue properties can specify a transmit rate and buffer management parameters such as buffer size and drop profile. For more information, see Defining Schedulers.

To configure CoS hierarchical schedulers, include the following statements at the [edit class-of-service interfaces] and [edit interfaces] hierarchy levels:

[edit class-of-service interfaces]interface-set interface-set-name {excess-bandwidth-share (proportional value | equal);internal-node;output-traffic-control-profile profile-name;output-traffic-control-profile-remaining profile-name;}

[edit interfaces]hierarchical-scheduler;interface-set interface-set-name {ethernet-interface-name {(interface-parameters);}}

Hierarchical Scheduler Terminology

Hierarchical schedulers introduce some new terms into a discussion of CoS capabilities. They also use some familiar terms in different contexts. This section presents a complete overview of the terms used with hierarchical schedulers.

The following terms are important for hierarchical schedulers:

These terms are especially important when applied to a scheduler hierarchy. Scheduler hierarchies are composed of nodes and queues. Queues terminate the CLI hierarchy. Nodes can be either root nodes, leaf nodes, or internal (non-leaf) nodes. Internal nodes are nodes that have other nodes as “children” in the hierarchy. For example, if an interface-set statement is configured with a logical interface (such as unit 0) and queue, then the interface-set is an internal node at level 2 of the hierarchy. However, if there are no traffic control profiles configured on logical interfaces, then the interface set is at level 3 of the hierarchy.

Table 46 shows how the configuration of an interface set or logical interface affects the terminology of hierarchical scheduler nodes.

Table 46: Hierarchical Scheduler Nodes

Root Node (Level 1)Level 2Level 3Queue (Level 4)

Physical interface

Interface set

Logical interfaces

One or more queues

Physical interface

 

Interface set

One or more queues

Physical interface

 

Logical interfaces

One or more queues

SRX3400 and SRX3600 Device Hardware Capabilities and Limitations

The following list describes the hardware capabilities and limitations for the SRX3400 and SRX3600 devices:

Configuring an Interface Set

To configure an interface set, include the following statement at the [edit class-of-service interfaces] hierarchy level of the configuration:

[edit class-of-service interfaces]interface-set interface-set-name {(interface-cos-parameters);}

To apply the interface set to interfaces, include the following statements at the [edit interfaces] hierarchy level of the configuration:

interface-set interface-set-name {ethernet-interface-name {(interface-cos-parameters);}}

Interface sets can be defined as a list of logical interfaces (unit 100, unit 200, and so on). Service providers can use these statements to group interfaces to apply scheduling parameters such as guaranteed rate and shaping rate to the traffic in the groups.

All traffic heading downstream must be gathered into an interface set with the interface-set statement at the [edit class-of-service interfaces] hierarchy level.

Interface sets are currently only used by CoS, but they are applied at the [edit interfaces] hierarchy level so that they might be available to other services.

[edit interfaces]interface-set interface-set-name {ethernet-interface-name {unit unit-number {...}}}

The logical interface naming option lists Ethernet interfaces:

[edit interfaces]interface-set unitl-set-ge-0 {ge-0/0/0 {unit 0;unit 1;...}}

Note: Ranges are not supported; you must list each logical interface separately.

Applying an Interface Set

Although the interface set is applied at the [edit interfaces] hierarchy level, the CoS parameters for the interface set are defined at the [edit class-of-service interfaces] hierarchy level, usually with the output-traffic-control-profile profile-name statement.

This example applies a traffic control profile called tcp--set1 to an interface set called set-ge-0:

[edit interfaces]interface-set set-ge-0 {output-traffic-control-profile tcp-set1;}

Interface Set Caveats

You cannot specify an interface set mixing the logical interface, S-VLAN, or VLAN outer tag list forms of the interface-set statement.

A logical interface can only belong to one interface set. If you try to add the same logical interface to different interface sets, the commit will fail.

This example will generate a commit error:

[edit interfaces]interface-set set-one {ge-2/0/0 {unit 0;unit 2;}}interface-set set-two {ge-2/0/0 {unit 1;unit 3;unit 0; # COMMIT ERROR! Unit 0 already belongs to -set-one.}}

Members of an interface set cannot span multiple physical interfaces. Only one physical interface is allowed to appear in an interface set.

This configuration is not supported:

[edit interfaces]interface-set set-group {ge-0/0/1 {unit 0;unit 1;}ge-0/0/2 { # This type of configuration is NOT supported in the same interface set!unit 0;unit 1;}}

Introduction to Hierarchical Schedulers

When used, the interface set level of the hierarchy falls between the physical interface level (level 1) and the logical interface (level 3). Queues are always level 4 of the hierarchy.

Hierarchical schedulers add CoS parameters to the new interface set level of the configuration. They use traffic control profiles to set values for parameters such as shaping rate (the peak information rate [PIR]), guaranteed rate (the committed information rate [CIR] on these interfaces), scheduler maps (assigning queues and resources to traffic), and so on.

The following CoS configuration places the following parameters in traffic control profiles at various levels:

In this case, the traffic control profiles look like this:

[edit class-of-service traffic-control-profiles]tcp-port-level1 { # This is the physical port levelshaping-rate 100m;delay-buffer-rate 100m;}tcp-interface-level2 { # This is the interface set levelshaping-rate 60m;guaranteed-rate 40m;}tcp-unit-level3 { # This is the logical interface levelshaping-rate 50m;guaranteed-rate 30m;scheduler-map smap1;delay-buffer-rate 40m;}

Once configured, the traffic control profiles must be applied to the proper places in the CoS interfaces hierarchy.

[edit class-of-service interfaces]interface-set level-2 {output-traffic-control-profile tcp-interface-level-2;}ge-0/1/0 {output-traffic-control-profile tcp-port-level-1;unit 0 {output-traffic-control-profile tcp-unit-level-3;}}

In all cases, the properties for level 4 of the hierarchical schedulers are determined by the scheduler map.

Scheduler Hierarchy Example

This section provides a more complete example of building a 4-level hierarchy of schedulers. The configuration parameters are shown in Figure 8. The queues are shown at the top of the figure with the other three levels of the hierarchy below.

Figure 8: Building a Scheduler Hierarchy

Image g016856.gif

The figure's PIR values will be configured as the shaping rates, and the CIRs will be configured as the guaranteed rate on the Ethernet interface ge-1/0/0. The PIR can be oversubscribed (that is, the sum of the children PIRs can exceed the parent's, as in svlan 1, where 200 + 200 + 100 exceeds the parent rate of 400). However, the sum of the children node level's CIRs must never exceed the parent node's CIR, as shown in all the service VLANs (otherwise, the guaranteed rate could never be provided in all cases).

This configuration example will present all details of the CoS configuration for the interface in the figure (ge-1/0/0), including:

Interface Sets for the Hierarchical Example

[edit interfaces]interface-set svlan-0 {interface ge-1/0/0 {unit 0;unit 1;}}interface-set svlan-1 {interface ge-1/0/0 {unit 2;unit 3;unit 4;}}

Interfaces for the Hierarchical Example

The keyword to configure hierarchical schedulers is at the physical interface level, as are VLAN tagging and the VLAN IDs. In this example, the interface sets are defined by logical interfaces (units) and not outer VLAN tags. All VLAN tags in this example are customer VLAN tags.

[edit interface ge-1/0/0]hierarchical-scheduler;vlan-tagging;unit 0 {vlan-id 100;}unit 1 {vlan-id 101;}unit 2 {vlan-id 102;}unit 3 {vlan-id 103;}unit 4 {vlan-id 104;}

Traffic Control Profiles for the Hierarchical Example

The traffic control profiles hold parameters for levels above the queue level of the scheduler hierarchy. This section defines traffic control profiles for both the service VLAN level (logical interfaces) and the customer VLAN (VLAN tag) level.

[edit class-of-service traffic-control-profiles]tcp-500m-shaping-rate {shaping-rate 500m;}tcp-svlan0 {shaping-rate 200m;guaranteed-rate 100m;delay-buffer-rate 300m; # This parameter is not shown in the figure}tcp-svlan1 {shaping-rate 400m;guaranteed-rate 300m;delay-buffer-rate 100m; # This parameter is not shown in the figure}tcp-cvlan0 {shaping-rate 100m;guaranteed-rate 60m;scheduler-map tcp-map-cvlan0; # This example applies scheduler maps to customer VLANs}tcp-cvlan1 {shaping-rate 100m;guaranteed-rate 40m;scheduler-map tcp-map-cvlan1; # This example applies scheduler maps to customer VLANs}tcp-cvlan2 {shaping-rate 200m;guaranteed-rate 100m;scheduler-map tcp-map-cvlanx; # This example applies scheduler maps to customer VLANs}tcp-cvlan3 {shaping-rate 200m;guaranteed-rate 150m;scheduler-map tcp-map-cvlanx; # This example applies scheduler maps to customer VLANs}tcp-cvlan4 {shaping-rate 100m;guaranteed-rate 50m;scheduler-map tcp-map-cvlanx; # This example applies scheduler maps to customer VLANs}

Schedulers for the Hierarchical Example

The schedulers hold the information about the queues, the last level of the hierarchy. Note the consistent naming schemes applied to repetitive elements in all parts of this example.

[edit class-of-service schedulers]sched-cvlan0-qx {priority low;transmit-rate 20m;buffer-size temporal 100ms;drop-profile-map loss-priority low dp-low;drop-profile-map loss-priority high dp-high;}sched-cvlan1-q0 {priority high;transmit-rate 20m;buffer-size percent 40;drop-profile-map loss-priority low dp-low;drop-profile-map loss-priority high dp-high;}sched-cvlanx-qx {transmit-rate percent 30;buffer-size percent 30;drop-profile-map loss-priority low dp-low;drop-profile-map loss-priority high dp-high;}sched-cvlan1-qx {transmit-rate 10m;buffer-size temporal 100ms;drop-profile-map loss-priority low dp-low;drop-profile-map loss-priority high dp-high;}

Drop Profiles for the Hierarchical Example

This section configures the drop profiles for the example. For more information about drop profiles, see Configuring RED Drop Profiles for Congestion Control .

[edit class-of-service drop-profiles]dp-low {interpolate fill-level 80 drop-probability 80;interpolate fill-level 100 drop-probability 100;}dp-high {interpolate fill-level 60 drop-probability 80;interpolate fill-level 80 drop-probability 100;}

Scheduler Maps for the Hierarchical Example

This section configures the scheduler maps for the example. Each one references a scheduler configured in Schedulers for the Hierarchical Example.

[edit class-of-service scheduler-maps]tcp-map-cvlan0 {forwarding-class voice scheduler sched-cvlan0-qx;forwarding-class video scheduler sched-cvlan0-qx;forwarding-class data scheduler sched-cvlan0-qx;}tcp-map-cvlan1 {forwarding-class voice scheduler sched-cvlan1-q0;forwarding-class video scheduler sched-cvlan1-qx;forwarding-class data scheduler sched-cvlan1-qx;}tcp-map-cvlanx {forwarding-class voice scheduler sched-cvlanx-qx;forwarding-class video scheduler sched-cvlanx-qx;forwarding-class data scheduler sched-cvlanx-qx;}

Applying Traffic Control Profiles for the Hierarchical Example

This section applies the traffic control profiles to the proper levels of the hierarchy.

Note: Although a shaping rate can be applied directly to the physical interface, hierarchical schedulers must use a traffic control profile to hold this parameter, as shown in Controlling Remaining Traffic.

[edit class-of-service interfaces]ge-1/0/0 {output-traffic-control-profile tcp-500m-shaping-rate;unit 0 {output-traffic-control-profile tcp-cvlan0;}unit 1 {output-traffic-control-profile tcp-cvlan1;}unit 2 {output-traffic-control-profile tcp-cvlan2;}unit 3 {output-traffic-control-profile tcp-cvlan3;}unit 4 {output-traffic-control-profile tcp-cvlan4;}}interface-set svlan0 {output-traffic-control-profile tcp-svlan0;}interface-set svlan1 {output-traffic-control-profile tcp-svlan1;}

Controlling Remaining Traffic

You can configure many logical interfaces under an interface. However, only a subset of them might have a traffic control profile attached. For example, you can configure three logical interfaces (units) over the same service VLAN, but you can apply a traffic control profile specifying best-effort and voice queues to only one of the logical interface units. Traffic from the two remaining logical interfaces is considered remaining traffic. To configure transmit rate guarantees for the remaining traffic, you configure the output-traffic-control-profile-remaining statement specifying a guaranteed rate for the remaining traffic. Without this statement, the remaining traffic gets a default, minimal bandwidth. In the same way, the shaping-rate and delay-buffer-rate statements can be specified in the traffic control profile referenced with the output-traffic-control-profile-remaining statement in order to shape and provide buffering for remaining traffic.

Consider the interface shown in Figure 9. Customer VLANs 3 and 4 have no explicit traffic control profile. However, the service provider might want to establish a shaping and guaranteed transmit rate for aggregate traffic heading for those customer VLANs. The solution is to configure and apply a traffic control profile for all remaining traffic on the interface.

Figure 9: Handling Remaining Traffic

Image g016857.gif

This example considers the case where customer VLANs 3 and 4 have no explicit traffic control profile, yet need to establish a shaping and guaranteed transmit rate for traffic heading for those customer VLANs. The solution is to add a traffic control profile to the svlan1 interface set. This example builds on the example used in Scheduler Hierarchy Example and so this does not repeat all configuration details, only those at the service VLAN level.

[edit class-of-service interfaces]interface-set svlan0 {output-traffic-control-profile tcp-svlan0;}interface-set svlan1 {output-traffic-control-profile tcp-svlan1; output-traffic-control-profile-remaining tcp-svlan1-remaining; # For all remaining traffic}[edit class-of-service traffic-control-profiles]tcp-svlan1 {shaping-rate 400m;guaranteed-rate 300m;}tcp-svlan1-remaining {shaping-rate 300m;guaranteed-rate 200m;scheduler-map smap-remainder; # this smap is not shown in detail}

Next, consider the example shown in Figure 10.

Figure 10: Another Example of Handling Remaining Traffic

Image g016865.gif

In this example, ge-1/0/0 has five logical interfaces (cvlan 0, 1, 2, 3 and 4), and svlan0, which are covered by the interface set:

This example does not include the [edit interfaces] configuration.

[edit class-of-service interfaces]interface-set {svlan0 {output-traffic-control-profile tcp-svlan0; # Guarantee & shaper for svlan0}}ge-1/0/0 {output-traffic-control-profile-remaining tcp-svlan0-rem# Unit 3 and 4 are not explicitly configured, but captured by “remaining'unit 1 {output-traffic-control-profile tcp-ifl1; # Unit 1 be & ef queues}}

Here is how the traffic control profiles for this example are configured:

[edit class-of-service traffic-control-profiles]tcp-svlan0 {shaping-rate 200m;guaranteed-rate 100m;}tcp-svlan0-rem {shaping-rate 300m;guaranteed-rate 200m;scheduler-map smap-svlan0-rem; # This specifies queues for remaining traffic}tcp-ifl1 {scheduler-map smap-ifl1;}

Finally, here are the scheduler maps and queues for the example:

[edit class-of-service scheduler-maps]smap-svlan0-rem {forwarding-class best-effort scheduler sched-foo;}smap-ifl1 {forwarding-class best-effort scheduler sched-bar;forwarding-class assured-forwarding scheduler sched-baz;}

The configuration for the referenced schedulers is not given for this example.

Internal Scheduler Nodes

A node in the hierarchy is considered internal if either of the following conditions apply:

Why would it be important to make a certain node internal? Generally, there are more resources available at the logical interface (unit) level than at the interface set level. Also, it might be desirable to configure all resources at a single level, rather than spread over several levels. The internal-node statement provides this flexibility. This can be a helpful configuration device when interface-set queuing without logical interfaces is used exclusively on the interface.

The internal-node statement can be used to raise the interface set without children to the same level as the other configured interface sets with children, allowing them to compete for the same set of resources.

In summary, using the internal-node statement allows statements to all be scheduled at the same level with or without children.

The following example makes the interfaces sets if-set-1 and if-set-2 internal:

[edit class-of-service interfaces ]interface-set {if-set-1 {internal-node;output-traffic-control-profile tcp-200m-no-smap;}if-set-2 {internal-node;output-traffic-control-profile tcp-100m-no-smap;}}

If an interface set has logical interfaces configured with a traffic control profile, then the use of the internal-node statement has no effect.

Internal nodes can specify a traffic-control-profile-remaining statement.

PIR-only and CIR Mode

The actual behavior of many CoS parameters, especially the shaping rate and guaranteed rate, depend on whether the physical interface is operating in PIR-only (peak information rate) or CIR (committed information rate) mode.

In PIR-only mode, one or more nodes perform shaping. The physical interface is in the PIR-only mode if no child (or grandchild) node under the port has a guaranteed rate configured.

The mode of the port is important because in PIR-only mode, the scheduling across the child nodes is in proportion to their shaping rates (PIRs) and not the guaranteed rates (CIRs). This can be important if the observed behavior is not what is anticipated.

In CIR mode, one or more nodes applies a guaranteed rate and might perform shaping. A physical interface is in CIR mode if at least one child (or grandchild) node has a guaranteed rate configured. In addition, any child or grandchild node under the physical interface can have a shaping rate configured.

Only the guaranteed rate matters. In CIR mode, nodes that do not have a guaranteed rate configured are assumed to have a very small guaranteed rate (queuing weight).

Priority Propagation

SRX5600 and SRX5800 devices with input/output cards (IOCs) perform priority propagation. Priority propagation is useful for mixed traffic environments when, for example, you want to make sure that the voice traffic of one customer does not suffer due to the data traffic of another customer. Nodes and queues are always serviced in the order of their priority. The priority of a queue is decided by configuration (the default priority is low) in the scheduler. However, not all elements of hierarchical schedulers have direct priorities configured. Internal nodes, for example, must determine their priority in other ways.

The priority of any internal node is decided by:

Each queue will have a configured priority and a hardware priority. The usual mapping between the configured priority and the hardware priority as shown in Table 48.

Table 48: Queue Priority

Configured Priority

Hardware Priority

Strict-high

0

High

0

Medium-high

1

Medium-low

1

Low

2

In CIR mode, the priority for each internal node depends on whether the highest active child node is above or below the guaranteed rate. The mapping between the highest active child's priority and the hardware priority below and above the guaranteed rate is shown in Table 49.

Table 49: Internal Node Queue Priority for CIR Mode

Configured Priority of Highest Active Child Node

Hardware Priority Below Guaranteed Rate

Hardware Priority Above Guaranteed Rate

Strict-high

0

0

High

0

3

Medium-high

1

3

Medium-low

1

3

Low

2

3

In PIR-only mode, nodes cannot send if they are above the configured shaping rate. The mapping between the configured priority and the hardware priority is for PIR-only mode is shown in Table 50.

Table 50: Internal Node Queue Priority for PIR-Only Mode

Configured Priority

Hardware Priority

Strict-high

0

High

0

Medium-high

1

Medium-low

1

Low

2

A physical interface with hierarchical schedulers configured is shown in Figure 11. The configured priorities are shown for each queue at the top of the figure. The hardware priorities for each node are shown in parentheses. Each node also shows any configured shaping rate (PIR) or guaranteed rate (CIR) and whether or not the queues are above or below the CIR. The nodes are shown in one of three states: above the CIR (clear), below the CIR (dark), or in a condition where the CIR does not matter (gray).

Figure 11: Hierarchical Schedulers and Priorities

Image g016858.gif

In the figure, the strict high queue for customer VLAN 0 (cvlan 0) receives service first, even though the customer VLAN is above the configured CIR (see Table 49 for the reason: strict-high always has hardware priority 0 regardless of CIR state). Once that queue has been drained, and the priority of the node has become 3 instead of 0 (due to the lack of strict-high traffic), the system moves on to the medium queues next (cvlan 1 and cvlan 3), draining them in a round robin fashion (empty queues lose their hardware priority). The low queue on cvlan 4 (priority 2) will be sent next, because that mode is below the CIR. Then the high queues on cvlan 0 and cvlan2 (both now with priority 3) are drained in a round-robin fashion, and finally the low queue on cvlan 0 is drained (because svlan 0 has a priority of 3).

IOC Hardware Properties

On SRX5600 and SRX5800 devices, two IOCs (40x1GE IOC and 4x10GE IOC) are supported on which you can configure schedulers and queues. You can configure 15 VLAN sets per Gigabit Ethernet (40x1GE IOC) port and 255 VLAN sets per 10 Gigabit Ethernet (4x10GE IOC) port. The IOC performs priority propagation from one hierarchy level to another, and drop statistics are available on the IOC per color per queue instead of just per queue.

SRX5600 and SRX5800 devices with IOCs have Packet Forwarding Engines that can support up to 512 MB of frame memory, and packets are stored in 512–byte frames. Table 51 compares the major properties of the the Packet Forwarding Engine within the IOC.

Table 51: Forwarding Engine Properties within 40x1GE IOC and 4x10GE IOC

Feature

PFE Within 40x1GE IOC and 4x10GE IOC

Number of usable queues

16,000

Number of shaped logical interfaces

2,000 with 8 queues each, or 4,000 with 4 queues each.

Number of hardware priorities

4

Priority propagation

Yes

Dynamic mapping

Yes: schedulers/port are not fixed.

Drop statistics

Per queue per color (PLP high, low)

Additionally, the IOC features also support hierarchical weighted random early detection (WRED).

The IOC supports the following hierarchical scheduler characteristics:

The IOC supports the following features for scalability:

Note: The exact option for a transmit-rate (transmit-rate rate exact) is not supported on the IOCs on SRX Series devices.

The manner in which the IOC maps a queue to a scheduler depends on whether 8 queues or 4 queues are configured. By default, a scheduler at level 3 has 4 queues. Level 3 scheduler X controls queue X*4 to X*4+3, so that scheduler 100 (for example) controls queues 400 to 403. However, when 8 queues per scheduler are enabled, the odd-numbered schedulers are disabled, allowing twice the number of queues per subscriber as before. With 8 queues, level 3 scheduler X controls queue X*4 to X*4+7, so that scheduler 100 (for example) now controls queues 400 to 407.

You configure the max-queues-per-interface statement to set the number of queues at 4 or 8 at the FPC level of the hierarchy. Changing this statement will result in a restart of the FPC. For more information about the max-queues-per-interface statement, see Example: Configuring Up to Eight Forwarding Classes and the JUNOS Software CLI Reference.

The IOC maps level 3 (customer VLAN) schedulers in groups to level 2 (service VLAN) schedulers. Sixteen contiguous level 3 schedulers are mapped to level 2 when 4 queues are enabled, and 8 contiguous level 3 schedulers are mapped to level 2 when 8 queues are enabled. All the schedulers in the group should use the same queue priority mapping. For example, if the queue priorities of one scheduler are high, medium, low, and low, all members of the group should have the same queue priority.

Groups at level 3 to level 2 can be mapped at any time. However, a group at level 3 can only be unmapped from a level 2 scheduler, and only if all the schedulers in the group are free. Once unmapped, a level 3 group can be remapped to any level 2 scheduler. There is no restriction on the number of level 3 groups that can be mapped to a particular level 2 scheduler. There can be 256 level 3 groups, but fragmentation of the scheduler space can reduce the number of schedulers available. In other words, there are scheduler allocation patterns that might fail even though there are free schedulers.

In contrast to level 3 to level 2 mapping, the IOC maps level 2 (service VLAN) schedulers in a fixed mode to level 1 (physical interface) schedulers. On 40-port Gigabit Ethernet IOCs, there are 16 level 1 schedulers, and 10 of these are used for the physical interfaces. There are 256 level 2 schedulers, or 16 per level 1 scheduler. A level 1 scheduler uses level schedulers X*16 through X*16+15. Therefore level 1 scheduler 0 uses level 2 schedulers 0 through 15, level 1 scheduler 1 uses level 2 schedulers 16 through 31, and so on. On 4-port 10 Gigabit Ethernet PICs, there is one level 1 scheduler for the physical interface, and 256 level 2 schedulers are mapped to the single level 1 scheduler.

The maximum number of level 3 (customer VLAN) schedulers that can be used is 4076 (4 queues) or 2028 (8 queues) for the 10-port Gigabit Ethernet Packet Forwarding Engine and 4094 (4 queues) or 2046 (8 queues) for the 10 Gigabit Ethernet Packet Forwarding Engine.

WRED on the IOC

Shaping to drop out-of-profile traffic is done on the IOC at all levels except the queue level. However, weighed random early discard (WRED) is done at the queue level with much the same result. With WRED, the decision to drop or send the packet is made before the packet is placed in the queue.

WRED shaping on the IOC involves two levels. The probabilistic drop region establishes a minimum and a maximum queue depth. Below the minimum queue depth, the drop probability is 0 (send). Above the maximum level, the drop probability is 100 (certainty).

There are four drop profiles associated with each queue. These correspond to each of four loss priorities (low, medium-low, medium-high, and high). Sixty-four sets of four drop profiles are available (32 for ingress and 32 for egress). In addition, there are eight WRED scaling profiles in each direction.

An IOC drop profile for expedited forwarding traffic might look like this:

[edit class-of-service drop-profiles]drop-ef {fill-level 20 drop-probability 0; # Minimum Q depthfill-level 100 drop-probability 100; # Maximum Q depth}

Note that only two fill levels can be specified for the IOC. You can configure the interpolate statement, but only two fill levels are used. The delay-buffer-rate statement in the traffic control profile determines the maximum queue size. This delay buffer rate is converted to a packet delay buffers, where one buffer is equal to 512 bytes. For example, at 10 Mbps, the IOC will allocate 610 delay buffers when the delay buffer rate is set to 250 milliseconds. The WRED threshold values are specified in terms of absolute buffer values.

The WRED scaling factor multiples all WRED thresholds (both minimum and maximum) by the value specified. There are eight values in all: 1, 2, 4, 8, 16, 32, 64, and 128. The WRED scaling factor is chosen to best match the user-configured drop profiles. This is done because the hardware supports only certain values of thresholds (all values must be a multiple of 16). So if the configured value of a threshold is 500 (for example), the multiple of 16 is 256 and the scaling factor applied is 2, making the value 512, which allows the value of 500 to be used. If the configured value of a threshold is 1500, the multiple of 16 is 752 and the scaling factor applied is 2, making the value 1504, which allows the value of 1500 to be used.

Hierarchical RED is used to support the oversubscription of the delay buffers (WRED is configured only at the queue, physical interface, and PIC level). Hierarchical RED works with WRED as follows:

However, these rules might lead to the accepting of packets under loaded conditions that might otherwise have been dropped. In other words, the logical interface will accept packets if the physical interface is not congested.

Due to the limits placed on shaping thresholds used in the hierarchy, there is a granularity associated with the IOCs. The shaper accuracies differ at various levels of the hierarchy, with shapers at the logical interface level (level 3) being more accurate than shapers at the interface set level (level 2) or the port level (level 1). Table 52 shows the accuracy of the logical interface shaper at various speeds for Ethernet ports operating at 1 Gbps.

Table 52: Shaper Accuracy of 1-Gbps Ethernet at the Logical Interface Level

Range of Logical Interface Shaper

Step Granularity

Up to 4.096 Mbps

16 Kbps

4.096 to 8.192 Mbps

32 Kbps

8.192 to 16.384 Mbps

64 Kbps

16.384 to 32.768 Mbps

128 Kbps

32.768 to 65.535 Mbps

256 Kbps

65.535 to 131.072 Mbps

512 Kbps

131.072 to 262.144 Mbps

1024 Kbps

262.144 to 1 Gbps

4096 Kbps

Table 53 shows the accuracy of the logical interface shaper at various speeds for Ethernet ports operating at 10 Gbps.

Table 53: Shaper Accuracy of 10-Gbps Ethernet at the Logical Interface Level

Range of Logical Interface Shaper

Step Granularity

Up to 10.24 Mbps

40 Kbps

10.24 to 20.48 Mbps

80 Kbps

10.48 to 40.96 Mbps

160 Kbps

40.96 to 81.92 Mbps

320 Kbps

81.92 to 163.84 Mbps

640 Kbps

163.84 to 327.68 Mbps

1280 Kbps

327.68 to 655.36 Mbps

2560 Kbps

655.36 to 2611.2 Mbps

10240 Kbps

2611.2 to 5222.4 Mbps

20480 Kbps

5222.4 to 10 Gbps

40960 Kbps

Table 54 shows the accuracy of the interface set shaper at various speeds for Ethernet ports operating at 1 Gbps.

Table 54: Shaper Accuracy of 1-Gbps Ethernet at the Interface Set Level

Range of Interface Set Shaper

Step Granularity

Up to 20.48 Mbps

80 Kbps

20.48 Mbps to 81.92 Mbps

320 Kbps

81.92 Mbps to 327.68 Mbps

1.28 Mbps

327.68 Mbps to 1 Gbps

20.48 Mbps

Table 55 shows the accuracy of the interface set shaper at various speeds for Ethernet ports operating at 10 Gbps.

Table 55: Shaper Accuracy of 10-Gbps Ethernet at the Interface Set Level

Range of Interface Set Shaper

Step Granularity

Up to 128 Mbps

500 Kbps

128 Mbps to 512 Mbps

2 Mbps

512 Mbps to 2.048 Gbps

8 Mbps

2.048 Gbps to 10 Gbps

128 Mbps

Table 56 shows the accuracy of the physical port shaper at various speeds for Ethernet ports operating at 1 Gbps.

Table 56: Shaper Accuracy of 1-Gbps Ethernet at the Physical Port Level

Range of Physical Port Shaper

Step Granularity

Up to 64 Mbps

250 Kbps

64 Mbps to 256 Mbps

1 Mbps

256 Mbps to 1 Gbps

4 Mbps

Table 57 shows the accuracy of the physical port shaper at various speeds for Ethernet ports operating at 10 Gbps.

Table 57: Shaper Accuracy of 10-Gbps Ethernet at the Physical Port Level

Range of Physical Port Shaper

Step Granularity

Up to 640 Mbps

2.5 Mbps

640 Mbps to 2.56 Gbps

10 Mbps

2.56 Gbps to 10 Gbps

40 Mbps

For more information about configuring RED drop profiles, see Configuring RED Drop Profiles for Congestion Control .

MDRR on the IOC

The guaranteed rate (CIR) at the interface set level is implemented by using modified deficit round-robin (MDRR). The IOC hardware provides four levels of strict priority. There is no restriction on the number of queues for each priority. MDRR is used among queues of the same priority. Each queue has one priority when it is under the guaranteed rate and another priority when it is over the guaranteed rate but still under the shaping rate (PIR). The IOC hardware implements the priorities with 256 service profiles. Each service profile assigns eight priorities for eight queues. One set is for logical interfaces under the guaranteed rate and another set is for logical interfaces over the guaranteed rate but under the shaping rate. Each service profile is associated with a group of 16 level 3 schedulers, so there is a unique service profile available for all 256 groups at level 3, giving 4,096 logical interfaces.

JUNOS Software provides three priorities for traffic under the guaranteed rate and one reserved priority for traffic over the guaranteed rate that is not configurable. JUNOS Software provides three priorities when there is no guaranteed rate configured on any logical interface.

The relationship between JUNOS Software priorities and the IOC hardware priorities below and above the guaranteed rate (CIR) is shown in Table 58.

Table 58: JUNOS Priorities Mapped to IOC Hardware Priorities

JUNOS Software Priority

IOC Hardware Priority Below Guaranteed Rate

IOC Hardware Priority Above Guaranteed Rate

Strict-high

High

High

High

High

Low

Medium-high

Medium-high

Low

Medium-low

Medium-high

Low

Low

Medium-low

Low

The JUNOS Software parameters are set in the scheduler map:

[edit class-of-service schedulers]best-effort-scheduler {transmit-rate percent 30; # if no shaping ratebuffer-size percent 30;priority high;}expedited-forwarding-scheduler {transmit-rate percent 40; # if no shaping ratebuffer-size percent 40;priority strict-high;}

Note: The use of both shaping rate and a guaranteed rate at the interface set level (level 2) is not supported.

MDRR is provided at three levels of the scheduler hierarchy of the IOC with a granularity of 1 through 255. There are 64 MDRR profiles at the queue level, 16 at the interface set level, and 32 at the physical interface level.

Queue transmit rates are used for queue-level MDRR profile weight calculation. The queue MDRR weight is calculated differently based on the mode set for sharing excess bandwidth. If you configure the equal option for excess bandwidth, then the queue MDRR weight is calculated as:

Queue weight = (255 * Transmit-rate-percentage) / 100

If you configure the proportional option for excess bandwidth, which is the default, then the queue MDRR weight is calculated as:

Queue weight = Queue-transmit-rate / Queue-base-rate, where

Queue-transmit-rate = (Logical-interface-rate * Transmit-rate-percentage) / 100, and

Queue-base-rate = Excess-bandwidth-proportional-rate / 255

To configure the way that the IOC should handle excess bandwidth, configure the excess-bandwidth-share statement at the [edit interface-set interface-set-name] hierarchy level. By default, the excess bandwidth is set to proportional with a default value of 32.64 Mbps. In this mode, the excess bandwidth is shared in the ratio of the logical interface shaping rates. If set to equal, the excess bandwidth is shared equally among the logical interfaces.

This example sets the excess bandwidth sharing to proportional at a rate of 100 Mbps with a shaping rate of 80 Mbps.

[edit interface-set example-interface-set]excess-bandwidth-share proportional 100m;output-traffic-control-profile PIR-80Mbps;

Shaping rates established at the logical interface level are used to calculate the MDRR weights used at the interface set level. The 16 MDRR profiles are set to initial values, and the closest profile with rounded values is chosen. By default, the physical port MDRR weights are preset to the full bandwidth on the interface.

Configuring Excess Bandwidth Sharing

When using the IOC (40x1GE IOC or 4x10GE IOC) on an SRX Series device, there are circumstances when you should configure excess bandwidth sharing and minimum logical interface shaping. This section details some of the guidelines for configuring excess bandwidth sharing.

Excess Bandwidth Sharing and Minimum Logical Interface Shaping

The default excess bandwidth sharing proportional rate is 32.65 Mbps (128 Kbps x 255). In order to have better weighed fair queuing (WFQ) accuracy among queues, the shaping rate configured should be larger than the excess bandwidth sharing proportional rate. Some examples are shown in Table 59.

Table 59: Shaping Rates and WFQ Weights

Shaping Rate

Configured Queue Transmit Rate

WFQ Weight

Total Weights

10 Mbps

(30, 40, 25, 5)

(22, 30, 20, 4)

76

33 Mbps

(30, 40, 25, 5)

(76, 104, 64, 13)

257

40 Mbps

(30, 40, 25, 5)

(76, 104.64, 13)

257

With a 10-Mbps shaping rate, the total weights are 76. This is divided among the four queues according to the configured transmit rate. Note that when the shaping rate is larger than the excess bandwidth sharing proportional rate of 32.65 Mbps, the total weight on the logical interface is 257 and the WFQ accuracy will be the same.

Selecting Excess Bandwidth Sharing Proportional Rates

To determine a good excess bandwidth-sharing proportional rate to configure, choose the largest CIR (guaranteed rate) among all the logical interfaces (units). If the logical units have PIRs (shaping rates) only, then choose the largest PIR rate. However, this is not ideal if a single logical interface has a large WRR rate. This method can skew the distribution of traffic across the queues of the other logical interfaces. To avoid this issue, set the excess bandwidth-sharing proportional rate to a lower value on the logical interfaces where the WRR rates are concentrated. This improves the bandwidth sharing accuracy among the queues on the same logical interface. However, the excess bandwidth sharing for the logical interface with the larger WRR rate is no longer proportional.

As an example, consider five logical interfaces on the same physical port, each with four queues, all with only PIRs configured and no CIRs. The WRR rate is the same as the PIR for the logical interface. The excess bandwidth is shared proportionally with a rate of 40 Mbps. The traffic control profiles for the logical interfaces are shown in Table 60.

Table 60: Example Shaping Rates and WFQ Weights

Shaping Rate

Configured Queue Transmit Rate

WFQ Weight

Total Weights

(Unit 0) 10 Mbps

(95, 0, 0, 5)

(60, 0, 0, 3)

63

(Unit 1) 20 Mbps

(25, 25, 25, 25)

(32, 32, 32, 32)

128

(Unit 2) 40 Mbps

(40, 30, 20, 10)

(102, 77, 51, 26)

255

(Unit 3) 200 Mbps

(70, 10, 10, 10)

(179, 26, 26, 26)

255

(Unit 4) 2 Mbps

(25, 25, 25, 25)

(5, 5, 5, 5)

20

Even though the maximum transmit rate for the queue on logical interface unit 3 is 200 Mbps, the excess bandwidth-sharing proportional rate is kept at a much lower value. Within a logical interface, this method provides a more accurate distribution of weights across queues. However, the excess bandwidth is now shared equally between unit 2 and unit 3 (total weights = 255).

Mapping Calculated Weights to Hardware Weights

The calculated weight in a traffic control profile is mapped to hardware weight, but the hardware only supports a limited WFQ profile. The weights are rounded to the nearest hardware weight according to the values in Table 61.

Table 61: Rounding Configured Weights to Hardware Weights

Traffic Control Profile Number

Number of Traffic Control Profiles

Weights

Maximum Error

1–16

16

1–16 (interval of 1)

50.00%

17–29

13

18–42 (interval of 2)

6.25%

30–35

6

45–60 (interval of 3)

1.35%

36–43

8

64–92 (interval of 4)

2.25%

44–49

6

98–128 (interval of 6)

3.06%

50–56

7

136–184 (interval of 8)

3.13%

57–62

6

194–244 (interval of 10)

2.71%

63–63

1

255–255 (interval of 11)

2.05%

From the table, as an example, the calculated weight of 18.9 is mapped to a hardware weight of 18, because 18 is closer to 18.9 than 20 (an interval of 2 applies in the range 18–42).

Allocating Weight with Only Shaping Rates or Unshaped Logical Interfaces

Logical interfaces with only shaping rates (PIRs) or unshaped logical interfaces (units) are given a weight of 10. A logical interface with a small guaranteed rate (CIR) might get an overall weight less than 10. In order to allocate a higher share of the excess bandwidth to logical interfaces with a small guaranteed rate in comparison to the logical interfaces with only shaping rates configured, a minimum weight of 20 is given to the logical interfaces with guaranteed rates configured.

For example, consider a logical interface configuration with five units, as shown in Table 62.

Table 62: Allocating Weights with PIR and CIR on Logical Interfaces

Logical Interface (Unit)

Traffic Control Profile

WRR Percentages

Weights

Unit 1

PIR 100 Mbps

95, 0, 0, 5

10, 1, 1, 1

Unit 2

CIR 20 Mbps

25, 25, 25, 25

64, 64, 64, 64

Unit 3

PIR 40 Mbps, CIR 20 Mbps

50, 30, 15, 5

128, 76, 38, 13

Unit 4

Unshaped

95, 0, 0, 5

10, 1, 1, 1

Unit 5

CIR 1 Mbps

95, 0, 0, 5

10, 1, 1, 1

The weights for these units are calculated as follows:

Sharing Bandwidth Among Logical Interfaces

As a simple example showing how bandwidth is shared among the logical interfaces, assume that all traffic is sent on queue 0. Assume also that there is a 40-Mbps load on all of the logical interfaces. Configuration details are shown in Table 63.

Table 63: Sharing Bandwidth Among Logical Interfaces

Logical Interface (Unit)

Traffic Control Profile

WRR Percentages

Weights

Unit 1

PIR 100 Mbps

95, 0, 0, 5

10, 1, 1, 1

Unit 2

CIR 20 Mbps

25, 25, 25, 25

64, 64, 64, 64

Unit 3

PIR 40 Mbps, CIR 20 Mbps

50, 30, 15, 5

128, 76, 38, 13

Unit 4

Unshaped

95, 0, 0, 5

10, 1, 1, 1

  1. When the port is shaped at 40 Mbps, because units 2 and 3 have a guaranteed rate (CIR) configured, both units 2 and 3 get 20 Mbps of shared bandwidth.
  2. When the port is shaped at 100 Mbps, because units 2 and 3 have a guaranteed rate (CIR) configured, each of them can transmit 20 Mbps. On units 1, 2, 3, and 4, the 60 Mbps of excess bandwidth is shaped according to the values shown in Table 64.

    Table 64: First Example of Bandwidth Sharing

    Logical Interface (Unit)

    Calculation

    Bandwidth

    1

    10 / (10+64+128+10) x 60 Mbps

    2.83 Mbps

    2

    64 / (10+64+128+10) x 60 Mbps

    18.11 Mbps

    3

    128 / (10+64+128+10) x 60 Mbps

    36.22 Mbps

    4

    10 (10+64+128+10) x 60 Mbps

    2.83 Mbps

However, unit 3 only has 20 Mbps extra (PIR and CIR) configured. This means that the leftover bandwidth of 16.22 Mbps (36.22 Mbps – 20 Mbps) is shared among units 1, and 2, and 4. This is shown in Table 65.

Table 65: Second Example of Bandwidth Sharing

Logical Interface (Unit)

Calculation

Bandwidth

1

10 / (10+64+128+10) x 16.22 Mbps

1.93 Mbps

2

64 / (10+64+128+10) x 16.22 Mbps

12.36 Mbps

4

10 (10+64+128+10) x 16.22 Mbps

1.93 Mbps

Finally, Table 66 shows the resulting allocation of bandwidth among the logical interfaces when the port is configured with a 100-Mbps shaping rate.

Table 66: Final Example of Bandwidth Sharing

Logical Interface (Unit)

Calculation

Bandwidth

1

2.83 Mbps + 1.93 Mbps

4.76 Mbps

2

20 Mbps + 18.11 Mbps + 12.36 Mbps

50.47 Mbps

3

20 Mbps + 20 Mbps

40 Mbps

4

2.83 Mbps + 1.93 Mbps

4.76 Mbps