Configuring CoS Hierarchical Schedulers
In metro Ethernet environments, a VLAN typically corresponds to a customer premises equipment (CPE) device and the VLANs are identified by an inner VLAN tag on Ethernet frames (called the customer VLAN, or C-VLAN, tag). A set of VLANs can be grouped at the DSL access multiplexer (DSLAM) and identified by using the same outer VLAN tag (called the service VLAN, or S-VLAN, tag). The service VLANs are typically gathered at the Broadband Remote Access Server (BRAS) level, which can be (among other devices) an SRX Series device. On SRX5600 and SRX5800 devices, hierarchical schedulers let you provide shaping and scheduling at the service VLAN level as well as other levels, such as the physical interface. In other words, you can group a set of logical interfaces and then apply scheduling and shaping parameters to the logical interface set as well as to other levels.
This basic architecture is shown in Figure 7. You can apply class-of-service (CoS) parameters at the premises on the CPE, on the customer or service VLANs, at the BRAS level, or at all levels.
Figure 7: An SRX Series Device in a Hierarchical Scheduler Architecture

On SRX5600 and SRX5800 devices, you can apply CoS shaping and scheduling at one of four different levels, including the VLAN set level.
The supported scheduler hierarchy is as follows:
- The physical interface (level 1)
- The service VLAN (level 2 is unique to SRX Series devices)
- The logical interface or customer VLAN (level 3)
- The queue (level 4)
You can specify a traffic control profile (output-traffic-control-profile) that can specify a shaping rate, a guaranteed rate, and a scheduler map with transmit rate and buffer delay. The scheduler map contains the mapping of queues (forwarding classes) to their respective schedulers (schedulers define the properties for the queue). Queue properties can specify a transmit rate and buffer management parameters such as buffer size and drop profile. For more information, see Defining Schedulers.
To configure CoS hierarchical schedulers, include the following statements at the [edit class-of-service interfaces] and [edit interfaces] hierarchy levels:
Hierarchical Scheduler Terminology
Hierarchical schedulers introduce some new terms into a discussion of CoS capabilities. They also use some familiar terms in different contexts. This section presents a complete overview of the terms used with hierarchical schedulers.
The following terms are important for hierarchical schedulers:
- Customer VLAN (C-VLAN)—A C-VLAN, defined by IEEE 802.1ad, . A stacked VLAN contains an outer tag corresponding to the S-VLAN, and an inner tag corresponding to the C-VLAN. A C-VLAN often corresponds to CPE. Scheduling and shaping is often used on a C-VLAN to establish minimum and maximum bandwidth limits for a customer. See also S-VLAN.
- Interface set—A logical group of interfaces that describe the characteristics of set of service VLANs, logical interfaces, or customer VLANs. Interface sets establish the set and name the traffic control profiles. See also Service VLAN.
- Scheduler— A scheduler defines the scheduling and queuing characteristics of a queue. Transmit rate, scheduler priority, and buffer size can be specified. In addition, a drop profile may be referenced to describe WRED congestion control aspects of the queue. See also Scheduler map.
- Scheduler map—A scheduler map is referenced by traffic control profiles to define queues. The scheduler map establishes the queues that comprise a scheduler node and associates a forwarding class with a scheduler. See also Scheduler.
- Stacked VLAN—An encapsulation on an S-VLAN with an outer tag corresponding to the S-VLAN, and an inner tag corresponding to the C-VLAN. See also Service VLAN and Customer VLAN.
- Service VLAN (S-VLAN)—An S-VLAN, defined by IEEE 802.1ad, often corresponds to a network aggregation device such as a DSLAM. Scheduling and shaping is often established for an S-VLAN to provide CoS for downstream devices with little buffering and simple schedulers. See also Customer VLAN.
- Traffic control profile—Defines the characteristics of a scheduler node. Traffic control profiles are used at several levels of the CLI, including the physical interface, interface set, and logical interface levels. Scheduling and queuing characteristics can be defined for the scheduler node using the shaping-rate, guaranteed-rate, and delay-buffer-rate statements. Queues over these scheduler nodes are defined by referencing a scheduler map. See also Scheduler and Scheduler map.
- VLAN—Virtual LAN, defined on an Ethernet logical interface.
These terms are especially important when applied to a scheduler hierarchy. Scheduler hierarchies are composed of nodes and queues. Queues terminate the CLI hierarchy. Nodes can be either root nodes, leaf nodes, or internal (non-leaf) nodes. Internal nodes are nodes that have other nodes as “children” in the hierarchy. For example, if an interface-set statement is configured with a logical interface (such as unit 0) and queue, then the interface-set is an internal node at level 2 of the hierarchy. However, if there are no traffic control profiles configured on logical interfaces, then the interface set is at level 3 of the hierarchy.
Table 46 shows how the configuration of an interface set or logical interface affects the terminology of hierarchical scheduler nodes.
Table 46: Hierarchical Scheduler Nodes
| Root Node (Level 1) | Level 2 | Level 3 | Queue (Level 4) |
|---|---|---|---|
Physical interface | Interface set | Logical interfaces | One or more queues |
Physical interface | Interface set | One or more queues | |
Physical interface | Logical interfaces | One or more queues |
SRX3400 and SRX3600 Device Hardware Capabilities and Limitations
The following list describes the hardware capabilities and limitations for the SRX3400 and SRX3600 devices:
- For SRX3400 and SRX3600 devices, each Input/Output Card
(IOC) Flexible PIC Concentrator (FPC) or IOC slot has only one Physical
Interface Card (PIC), which contains either two 10-Gigabit or sixteen
1-Gigabit Ethernet ports. Table 47 shows
the maximum number of cards and ports allowed in an SRX3400 and SRX3600
device.
Table 47: Available NPCs and IO Ports for SRX3400 and SRX3600 Devices
System IOCs IO Ports NPCs SRX3600
7
108 (16 x 6 + 12)
3
SRX3400
5
76 (16 x 4 + 12)
2

Note: The number of ports the Network Processing Unit (NPU) needs to handle may be different than the fixed 10:1 port to NPU ratio for 1G IOC, or the 1:1 ratio for the 10G IOC that is needed on the SRX5600 and SRX5800 devices, leading to oversubscription on the SRX3400 and SRX3600 devices.
- SRX3400 and SRX3600 devices allow you to install up to three Network Processing Cards (NPC). In a single-NPC configuration, the NPC has to process all of the packets to and from each IOC. However, when there is more than one NPC available, an IOC will only exchange packets with a pre-assigned NPC. You can use the set chassis ioc-npc-connectivity CLI statement to configure the IOC-to-NPC mapping. By default, the mapping is assigned so that the load is shared equally among all NPCs. When the mapping is changed, for example, an IOC or NPC is removed, or you have mapped a specific NPC to an IOC, then the device has to be restarted. For more information, see the JUNOS Software Administration Guide for Security Devices.
- For SRX3400 and SRX3600 devices, the IOC supports the
following hierarchical scheduler characteristics:
Level 1- Shaping at the physical interface (ifd)
Level 2- Shaping and scheduling at the logical interface level (ifl)
Level 3- Scheduling at the queue level

Note: Interface set (iflset) is not supported for the SRX3400 and SRX3600 devices.
- Shaping at the port level—In SRX5600 and SRX5800
devices, an NPC supports 32 port-level shaping profiles at level 1,
such that each front port can have its own shaping profile.
In SRX3400 and SRX3600 devices, an NPC supports only 16 port-level shaping profiles in the hardware, including two profiles that are predefined for 10-Gbps and 1-Gbps shaping rates. The user can configure up to 14 different levels of shaping rates. If more levels are configured, then the closest match found in the 16 profiles will be used instead.
For example, assume that a system is already configured with the following rates for ifds:
10Mbps, 20Mbps, 40Mbps, 60Mbps, 80Mbps, 100Mbps, 200Mbps, 300Mbps, 400Mbps, 500Mbps, 600Mbps, 700Mbps, 800Mbps, 900Mbps, 1Gbps (predefined), 10Gbps (predefined)
Each of these 16 rates is programmed into one of the 16 profiles in the hardware, then consider the following two scenarios.
- If the user changes one port’s shaping rate from 1Gbps to 100Mbps, which is already programmed in one of the 16 profiles, the profile with 100Mbps shaping rate will be used by the port.
- If the user changes another port’s shaping rate from 1Gbps to 50Mbps, which is not in the shaping profiles, the closest matching profile with 60Mbps shaping rate will be used instead.
When scenario 2) happens, not all of the user-configured rates can be supported by the hardware. If more than 14 different rates are specified, only 14 will be programmed in the hardware. Which 14 rates are programmed in the hardware depends on many factors. For this reason, we recommend that you plan carefully and use no more than 14 levels of port-level shaping rates.
- Weighed Random Early Discard (WRED) at the port level—Each NPU has 512 MB of frame memory. Also, 10-Gigabit Ethernet ports get more buffers than the 1-Gigabit Ethernet ports. Buffer availability depends on how much bandwidth (number of NPCs, ports, 1-Gigabit or 10-Gigabit, and so forth) the device has to support. The more the bandwidth that the device has to support, the less buffer is available. When two NPCs are available, the amount of frame buffer available is doubled.
Configuring an Interface Set
To configure an interface set, include the following statement at the [edit class-of-service interfaces] hierarchy level of the configuration:
To apply the interface set to interfaces, include the following statements at the [edit interfaces] hierarchy level of the configuration:
Interface sets can be defined as a list of logical interfaces (unit 100, unit 200, and so on). Service providers can use these statements to group interfaces to apply scheduling parameters such as guaranteed rate and shaping rate to the traffic in the groups.
All traffic heading downstream must be gathered into an interface set with the interface-set statement at the [edit class-of-service interfaces] hierarchy level.
Interface sets are currently only used by CoS, but they are applied at the [edit interfaces] hierarchy level so that they might be available to other services.
The logical interface naming option lists Ethernet interfaces:
![]() | Note: Ranges are not supported; you must list each logical interface separately. |
Applying an Interface Set
Although the interface set is applied at the [edit interfaces] hierarchy level, the CoS parameters for the interface set are defined at the [edit class-of-service interfaces] hierarchy level, usually with the output-traffic-control-profile profile-name statement.
This example applies a traffic control profile called tcp--set1 to an interface set called set-ge-0:
Interface Set Caveats
You cannot specify an interface set mixing the logical interface, S-VLAN, or VLAN outer tag list forms of the interface-set statement.
A logical interface can only belong to one interface set. If you try to add the same logical interface to different interface sets, the commit will fail.
This example will generate a commit error:
Members of an interface set cannot span multiple physical interfaces. Only one physical interface is allowed to appear in an interface set.
This configuration is not supported:
Introduction to Hierarchical Schedulers
When used, the interface set level of the hierarchy falls between the physical interface level (level 1) and the logical interface (level 3). Queues are always level 4 of the hierarchy.
Hierarchical schedulers add CoS parameters to the new interface set level of the configuration. They use traffic control profiles to set values for parameters such as shaping rate (the peak information rate [PIR]), guaranteed rate (the committed information rate [CIR] on these interfaces), scheduler maps (assigning queues and resources to traffic), and so on.
The following CoS configuration places the following parameters in traffic control profiles at various levels:
- Traffic control profile at the port level (tcp-port-level1):
- A shaping rate (PIR) of 100 Mbps
- A delay buffer rate of 100 Mbps
- Traffic control profile at the interface set level (tcp-interface-level2):
- A shaping rate (PIR) of 60 Mbps
- A guaranteed rate (CIR) of 40 Mbps
- Traffic control profile at the logical interface level
(tcp-unit-level3):
- A shaping rate (PIR) of 50 Mbps
- A guaranteed rate (CIR) of 30 Mbps
- A scheduler map called smap1 to hold various queue properties (level 4)
- A delay buffer rate of 40 Mbps
In this case, the traffic control profiles look like this:
Once configured, the traffic control profiles must be applied to the proper places in the CoS interfaces hierarchy.
In all cases, the properties for level 4 of the hierarchical schedulers are determined by the scheduler map.
Scheduler Hierarchy Example
This section provides a more complete example of building a 4-level hierarchy of schedulers. The configuration parameters are shown in Figure 8. The queues are shown at the top of the figure with the other three levels of the hierarchy below.
Figure 8: Building a Scheduler Hierarchy

The figure's PIR values will be configured as the shaping rates, and the CIRs will be configured as the guaranteed rate on the Ethernet interface ge-1/0/0. The PIR can be oversubscribed (that is, the sum of the children PIRs can exceed the parent's, as in svlan 1, where 200 + 200 + 100 exceeds the parent rate of 400). However, the sum of the children node level's CIRs must never exceed the parent node's CIR, as shown in all the service VLANs (otherwise, the guaranteed rate could never be provided in all cases).
This configuration example will present all details of the CoS configuration for the interface in the figure (ge-1/0/0), including:
- Interface Sets for the Hierarchical Example
- Interfaces for the Hierarchical Example
- Traffic Control Profiles for the Hierarchical Example
- Schedulers for the Hierarchical Example
- Drop Profiles for the Hierarchical Example
- Scheduler Maps for the Hierarchical Example
- Applying Traffic Control Profiles for the Hierarchical Example
Interface Sets for the Hierarchical Example
Interfaces for the Hierarchical Example
The keyword to configure hierarchical schedulers is at the physical interface level, as are VLAN tagging and the VLAN IDs. In this example, the interface sets are defined by logical interfaces (units) and not outer VLAN tags. All VLAN tags in this example are customer VLAN tags.
Traffic Control Profiles for the Hierarchical Example
The traffic control profiles hold parameters for levels above the queue level of the scheduler hierarchy. This section defines traffic control profiles for both the service VLAN level (logical interfaces) and the customer VLAN (VLAN tag) level.
Schedulers for the Hierarchical Example
The schedulers hold the information about the queues, the last level of the hierarchy. Note the consistent naming schemes applied to repetitive elements in all parts of this example.
Drop Profiles for the Hierarchical Example
This section configures the drop profiles for the example. For more information about drop profiles, see Configuring RED Drop Profiles for Congestion Control .
Scheduler Maps for the Hierarchical Example
This section configures the scheduler maps for the example. Each one references a scheduler configured in Schedulers for the Hierarchical Example.
Applying Traffic Control Profiles for the Hierarchical Example
This section applies the traffic control profiles to the proper levels of the hierarchy.
![]() | Note: Although a shaping rate can be applied directly to the physical interface, hierarchical schedulers must use a traffic control profile to hold this parameter, as shown in Controlling Remaining Traffic. |
Controlling Remaining Traffic
You can configure many logical interfaces under an interface. However, only a subset of them might have a traffic control profile attached. For example, you can configure three logical interfaces (units) over the same service VLAN, but you can apply a traffic control profile specifying best-effort and voice queues to only one of the logical interface units. Traffic from the two remaining logical interfaces is considered remaining traffic. To configure transmit rate guarantees for the remaining traffic, you configure the output-traffic-control-profile-remaining statement specifying a guaranteed rate for the remaining traffic. Without this statement, the remaining traffic gets a default, minimal bandwidth. In the same way, the shaping-rate and delay-buffer-rate statements can be specified in the traffic control profile referenced with the output-traffic-control-profile-remaining statement in order to shape and provide buffering for remaining traffic.
Consider the interface shown in Figure 9. Customer VLANs 3 and 4 have no explicit traffic control profile. However, the service provider might want to establish a shaping and guaranteed transmit rate for aggregate traffic heading for those customer VLANs. The solution is to configure and apply a traffic control profile for all remaining traffic on the interface.
Figure 9: Handling Remaining Traffic

This example considers the case where customer VLANs 3 and 4 have no explicit traffic control profile, yet need to establish a shaping and guaranteed transmit rate for traffic heading for those customer VLANs. The solution is to add a traffic control profile to the svlan1 interface set. This example builds on the example used in Scheduler Hierarchy Example and so this does not repeat all configuration details, only those at the service VLAN level.
Next, consider the example shown in Figure 10.
Figure 10: Another Example of Handling Remaining Traffic

In this example, ge-1/0/0 has five logical interfaces (cvlan 0, 1, 2, 3 and 4), and svlan0, which are covered by the interface set:
- Scheduling for the interface set svlan0 is specified by referencing an output-traffic-control-profile statement, which specifies the guaranteed-rate, shaping-rate, and delay-buffer-rate statement values for the interface set. In this example, the output traffic control profile called tcp-svlan0 guarantees 100 Mbps and shapes the interface set svlan0 to 200 Mbps.
- Scheduling and queuing for remaining traffic of svlan0 is specified by referencing an output-traffic-control-profile-remaining statement, which references a scheduler-map statement that establishes queues for the remaining traffic. The specified traffic control profile can also configure guaranteed, shaping, and delay-buffer rates for the remaining traffic. In this example, output-traffic-control-profile-remaining tcp-svlan0-rem references scheduler-map smap-svlan0-rem, which calls for a best-effort queue for remaining traffic (that is, traffic on unit 3 and unit 4, which is not classified by the svlan0 interface set). The example also specifies a guaranteed-rate of 200 Mbps and a shaping-rate of 300 Mbps for all remaining traffic.
- Scheduling and queuing for logical interface ge-1/0/0 unit 1 is configured “traditionally” and uses an output-traffic-control-profile specified for that unit. In this example, output-traffic-control-profile tcp-ifl1 specifies scheduling and queuing for ge-1/0/0 unit 1.
This example does not include the [edit interfaces] configuration.
Here is how the traffic control profiles for this example are configured:
Finally, here are the scheduler maps and queues for the example:
The configuration for the referenced schedulers is not given for this example.
Internal Scheduler Nodes
A node in the hierarchy is considered internal if either of the following conditions apply:
- Any one of its children nodes has a traffic control profile configured and applied.
- You configure the internal-node statement.
Why would it be important to make a certain node internal? Generally, there are more resources available at the logical interface (unit) level than at the interface set level. Also, it might be desirable to configure all resources at a single level, rather than spread over several levels. The internal-node statement provides this flexibility. This can be a helpful configuration device when interface-set queuing without logical interfaces is used exclusively on the interface.
The internal-node statement can be used to raise the interface set without children to the same level as the other configured interface sets with children, allowing them to compete for the same set of resources.
In summary, using the internal-node statement allows statements to all be scheduled at the same level with or without children.
The following example makes the interfaces sets if-set-1 and if-set-2 internal:
If an interface set has logical interfaces configured with a traffic control profile, then the use of the internal-node statement has no effect.
Internal nodes can specify a traffic-control-profile-remaining statement.
PIR-only and CIR Mode
The actual behavior of many CoS parameters, especially the shaping rate and guaranteed rate, depend on whether the physical interface is operating in PIR-only (peak information rate) or CIR (committed information rate) mode.
In PIR-only mode, one or more nodes perform shaping. The physical interface is in the PIR-only mode if no child (or grandchild) node under the port has a guaranteed rate configured.
The mode of the port is important because in PIR-only mode, the scheduling across the child nodes is in proportion to their shaping rates (PIRs) and not the guaranteed rates (CIRs). This can be important if the observed behavior is not what is anticipated.
In CIR mode, one or more nodes applies a guaranteed rate and might perform shaping. A physical interface is in CIR mode if at least one child (or grandchild) node has a guaranteed rate configured. In addition, any child or grandchild node under the physical interface can have a shaping rate configured.
Only the guaranteed rate matters. In CIR mode, nodes that do not have a guaranteed rate configured are assumed to have a very small guaranteed rate (queuing weight).
Priority Propagation
SRX5600 and SRX5800 devices with input/output cards (IOCs) perform priority propagation. Priority propagation is useful for mixed traffic environments when, for example, you want to make sure that the voice traffic of one customer does not suffer due to the data traffic of another customer. Nodes and queues are always serviced in the order of their priority. The priority of a queue is decided by configuration (the default priority is low) in the scheduler. However, not all elements of hierarchical schedulers have direct priorities configured. Internal nodes, for example, must determine their priority in other ways.
The priority of any internal node is decided by:
- The highest priority of an active child (interface sets only take the highest priority of their active children
- Whether the node is above its configured guaranteed rate (CIR) or not (this is relevant only if the physical interface is in CIR mode)
Each queue will have a configured priority and a hardware priority. The usual mapping between the configured priority and the hardware priority as shown in Table 48.
Table 48: Queue Priority
Configured Priority | Hardware Priority |
|---|---|
Strict-high | 0 |
High | 0 |
Medium-high | 1 |
Medium-low | 1 |
Low | 2 |
In CIR mode, the priority for each internal node depends on whether the highest active child node is above or below the guaranteed rate. The mapping between the highest active child's priority and the hardware priority below and above the guaranteed rate is shown in Table 49.
Table 49: Internal Node Queue Priority for CIR Mode
Configured Priority of Highest Active Child Node | Hardware Priority Below Guaranteed Rate | Hardware Priority Above Guaranteed Rate |
|---|---|---|
Strict-high | 0 | 0 |
High | 0 | 3 |
Medium-high | 1 | 3 |
Medium-low | 1 | 3 |
Low | 2 | 3 |
In PIR-only mode, nodes cannot send if they are above the configured shaping rate. The mapping between the configured priority and the hardware priority is for PIR-only mode is shown in Table 50.
Table 50: Internal Node Queue Priority for PIR-Only Mode
Configured Priority | Hardware Priority |
|---|---|
Strict-high | 0 |
High | 0 |
Medium-high | 1 |
Medium-low | 1 |
Low | 2 |
A physical interface with hierarchical schedulers configured is shown in Figure 11. The configured priorities are shown for each queue at the top of the figure. The hardware priorities for each node are shown in parentheses. Each node also shows any configured shaping rate (PIR) or guaranteed rate (CIR) and whether or not the queues are above or below the CIR. The nodes are shown in one of three states: above the CIR (clear), below the CIR (dark), or in a condition where the CIR does not matter (gray).
Figure 11: Hierarchical Schedulers and Priorities

In the figure, the strict high queue for customer VLAN 0 (cvlan 0) receives service first, even though the customer VLAN is above the configured CIR (see Table 49 for the reason: strict-high always has hardware priority 0 regardless of CIR state). Once that queue has been drained, and the priority of the node has become 3 instead of 0 (due to the lack of strict-high traffic), the system moves on to the medium queues next (cvlan 1 and cvlan 3), draining them in a round robin fashion (empty queues lose their hardware priority). The low queue on cvlan 4 (priority 2) will be sent next, because that mode is below the CIR. Then the high queues on cvlan 0 and cvlan2 (both now with priority 3) are drained in a round-robin fashion, and finally the low queue on cvlan 0 is drained (because svlan 0 has a priority of 3).
IOC Hardware Properties
On SRX5600 and SRX5800 devices, two IOCs (40x1GE IOC and 4x10GE IOC) are supported on which you can configure schedulers and queues. You can configure 15 VLAN sets per Gigabit Ethernet (40x1GE IOC) port and 255 VLAN sets per 10 Gigabit Ethernet (4x10GE IOC) port. The IOC performs priority propagation from one hierarchy level to another, and drop statistics are available on the IOC per color per queue instead of just per queue.
SRX5600 and SRX5800 devices with IOCs have Packet Forwarding Engines that can support up to 512 MB of frame memory, and packets are stored in 512–byte frames. Table 51 compares the major properties of the the Packet Forwarding Engine within the IOC.
Table 51: Forwarding Engine Properties within 40x1GE IOC and 4x10GE IOC
Feature | PFE Within 40x1GE IOC and 4x10GE IOC |
|---|---|
Number of usable queues | 16,000 |
Number of shaped logical interfaces | 2,000 with 8 queues each, or 4,000 with 4 queues each. |
Number of hardware priorities | 4 |
Priority propagation | Yes |
Dynamic mapping | Yes: schedulers/port are not fixed. |
Drop statistics | Per queue per color (PLP high, low) |
Additionally, the IOC features also support hierarchical weighted random early detection (WRED).
The IOC supports the following hierarchical scheduler characteristics:
- Shaping at the physical interface level
- Shaping and scheduling at the service VLAN interface set level
- Shaping and scheduling at the customer VLAN logical interface level
- Scheduling at the queue level
The IOC supports the following features for scalability:
- 16,000 queues per PFE
- 4 Packet Forwarding Engines per IOC
- 4000 schedulers at logical interface level (level 3) with 4 queues each
- 2000 schedulers at logical interface level (level 3) with 8 queues each
- 255 schedulers at the interface set level (level 2) per 1-port PFE on a 10-Gigabit Ethernet IOC (4x10GE IOC )
- 15 schedulers at the interface set level (level 2) per 10-port PFE on a 1-Gigabit Ethernet IOC (40x1GE IOC )
- About 400 milliseconds of buffer delay (this varies by packet size and if large buffers are enabled)
- 4 levels of priority (strict-high, high, medium, and low)
![]() | Note: The exact option for a transmit-rate (transmit-rate rate exact) is not supported on the IOCs on SRX Series devices. |
The manner in which the IOC maps a queue to a scheduler depends on whether 8 queues or 4 queues are configured. By default, a scheduler at level 3 has 4 queues. Level 3 scheduler X controls queue X*4 to X*4+3, so that scheduler 100 (for example) controls queues 400 to 403. However, when 8 queues per scheduler are enabled, the odd-numbered schedulers are disabled, allowing twice the number of queues per subscriber as before. With 8 queues, level 3 scheduler X controls queue X*4 to X*4+7, so that scheduler 100 (for example) now controls queues 400 to 407.
You configure the max-queues-per-interface statement to set the number of queues at 4 or 8 at the FPC level of the hierarchy. Changing this statement will result in a restart of the FPC. For more information about the max-queues-per-interface statement, see Example: Configuring Up to Eight Forwarding Classes and the JUNOS Software CLI Reference.
The IOC maps level 3 (customer VLAN) schedulers in groups to level 2 (service VLAN) schedulers. Sixteen contiguous level 3 schedulers are mapped to level 2 when 4 queues are enabled, and 8 contiguous level 3 schedulers are mapped to level 2 when 8 queues are enabled. All the schedulers in the group should use the same queue priority mapping. For example, if the queue priorities of one scheduler are high, medium, low, and low, all members of the group should have the same queue priority.
Groups at level 3 to level 2 can be mapped at any time. However, a group at level 3 can only be unmapped from a level 2 scheduler, and only if all the schedulers in the group are free. Once unmapped, a level 3 group can be remapped to any level 2 scheduler. There is no restriction on the number of level 3 groups that can be mapped to a particular level 2 scheduler. There can be 256 level 3 groups, but fragmentation of the scheduler space can reduce the number of schedulers available. In other words, there are scheduler allocation patterns that might fail even though there are free schedulers.
In contrast to level 3 to level 2 mapping, the IOC maps level 2 (service VLAN) schedulers in a fixed mode to level 1 (physical interface) schedulers. On 40-port Gigabit Ethernet IOCs, there are 16 level 1 schedulers, and 10 of these are used for the physical interfaces. There are 256 level 2 schedulers, or 16 per level 1 scheduler. A level 1 scheduler uses level schedulers X*16 through X*16+15. Therefore level 1 scheduler 0 uses level 2 schedulers 0 through 15, level 1 scheduler 1 uses level 2 schedulers 16 through 31, and so on. On 4-port 10 Gigabit Ethernet PICs, there is one level 1 scheduler for the physical interface, and 256 level 2 schedulers are mapped to the single level 1 scheduler.
The maximum number of level 3 (customer VLAN) schedulers that can be used is 4076 (4 queues) or 2028 (8 queues) for the 10-port Gigabit Ethernet Packet Forwarding Engine and 4094 (4 queues) or 2046 (8 queues) for the 10 Gigabit Ethernet Packet Forwarding Engine.
WRED on the IOC
Shaping to drop out-of-profile traffic is done on the IOC at all levels except the queue level. However, weighed random early discard (WRED) is done at the queue level with much the same result. With WRED, the decision to drop or send the packet is made before the packet is placed in the queue.
WRED shaping on the IOC involves two levels. The probabilistic drop region establishes a minimum and a maximum queue depth. Below the minimum queue depth, the drop probability is 0 (send). Above the maximum level, the drop probability is 100 (certainty).
There are four drop profiles associated with each queue. These correspond to each of four loss priorities (low, medium-low, medium-high, and high). Sixty-four sets of four drop profiles are available (32 for ingress and 32 for egress). In addition, there are eight WRED scaling profiles in each direction.
An IOC drop profile for expedited forwarding traffic might look like this:
Note that only two fill levels can be specified for the IOC. You can configure the interpolate statement, but only two fill levels are used. The delay-buffer-rate statement in the traffic control profile determines the maximum queue size. This delay buffer rate is converted to a packet delay buffers, where one buffer is equal to 512 bytes. For example, at 10 Mbps, the IOC will allocate 610 delay buffers when the delay buffer rate is set to 250 milliseconds. The WRED threshold values are specified in terms of absolute buffer values.
The WRED scaling factor multiples all WRED thresholds (both minimum and maximum) by the value specified. There are eight values in all: 1, 2, 4, 8, 16, 32, 64, and 128. The WRED scaling factor is chosen to best match the user-configured drop profiles. This is done because the hardware supports only certain values of thresholds (all values must be a multiple of 16). So if the configured value of a threshold is 500 (for example), the multiple of 16 is 256 and the scaling factor applied is 2, making the value 512, which allows the value of 500 to be used. If the configured value of a threshold is 1500, the multiple of 16 is 752 and the scaling factor applied is 2, making the value 1504, which allows the value of 1500 to be used.
Hierarchical RED is used to support the oversubscription of the delay buffers (WRED is configured only at the queue, physical interface, and PIC level). Hierarchical RED works with WRED as follows:
- If any level accepts the packet (the queue depth is less than the minimum buffer level), this level accepts the packet.
- If any level probabilistically drops the packet, then this level drops the packet.
However, these rules might lead to the accepting of packets under loaded conditions that might otherwise have been dropped. In other words, the logical interface will accept packets if the physical interface is not congested.
Due to the limits placed on shaping thresholds used in the hierarchy, there is a granularity associated with the IOCs. The shaper accuracies differ at various levels of the hierarchy, with shapers at the logical interface level (level 3) being more accurate than shapers at the interface set level (level 2) or the port level (level 1). Table 52 shows the accuracy of the logical interface shaper at various speeds for Ethernet ports operating at 1 Gbps.
Table 52: Shaper Accuracy of 1-Gbps Ethernet at the Logical Interface Level
Range of Logical Interface Shaper | Step Granularity |
|---|---|
Up to 4.096 Mbps | 16 Kbps |
4.096 to 8.192 Mbps | 32 Kbps |
8.192 to 16.384 Mbps | 64 Kbps |
16.384 to 32.768 Mbps | 128 Kbps |
32.768 to 65.535 Mbps | 256 Kbps |
65.535 to 131.072 Mbps | 512 Kbps |
131.072 to 262.144 Mbps | 1024 Kbps |
262.144 to 1 Gbps | 4096 Kbps |
Table 53 shows the accuracy of the logical interface shaper at various speeds for Ethernet ports operating at 10 Gbps.
Table 53: Shaper Accuracy of 10-Gbps Ethernet at the Logical Interface Level
Range of Logical Interface Shaper | Step Granularity |
|---|---|
Up to 10.24 Mbps | 40 Kbps |
10.24 to 20.48 Mbps | 80 Kbps |
10.48 to 40.96 Mbps | 160 Kbps |
40.96 to 81.92 Mbps | 320 Kbps |
81.92 to 163.84 Mbps | 640 Kbps |
163.84 to 327.68 Mbps | 1280 Kbps |
327.68 to 655.36 Mbps | 2560 Kbps |
655.36 to 2611.2 Mbps | 10240 Kbps |
2611.2 to 5222.4 Mbps | 20480 Kbps |
5222.4 to 10 Gbps | 40960 Kbps |
Table 54 shows the accuracy of the interface set shaper at various speeds for Ethernet ports operating at 1 Gbps.
Table 54: Shaper Accuracy of 1-Gbps Ethernet at the Interface Set Level
Range of Interface Set Shaper | Step Granularity |
|---|---|
Up to 20.48 Mbps | 80 Kbps |
20.48 Mbps to 81.92 Mbps | 320 Kbps |
81.92 Mbps to 327.68 Mbps | 1.28 Mbps |
327.68 Mbps to 1 Gbps | 20.48 Mbps |
Table 55 shows the accuracy of the interface set shaper at various speeds for Ethernet ports operating at 10 Gbps.
Table 55: Shaper Accuracy of 10-Gbps Ethernet at the Interface Set Level
Range of Interface Set Shaper | Step Granularity |
|---|---|
Up to 128 Mbps | 500 Kbps |
128 Mbps to 512 Mbps | 2 Mbps |
512 Mbps to 2.048 Gbps | 8 Mbps |
2.048 Gbps to 10 Gbps | 128 Mbps |
Table 56 shows the accuracy of the physical port shaper at various speeds for Ethernet ports operating at 1 Gbps.
Table 56: Shaper Accuracy of 1-Gbps Ethernet at the Physical Port Level
Range of Physical Port Shaper | Step Granularity |
|---|---|
Up to 64 Mbps | 250 Kbps |
64 Mbps to 256 Mbps | 1 Mbps |
256 Mbps to 1 Gbps | 4 Mbps |
Table 57 shows the accuracy of the physical port shaper at various speeds for Ethernet ports operating at 10 Gbps.
Table 57: Shaper Accuracy of 10-Gbps Ethernet at the Physical Port Level
Range of Physical Port Shaper | Step Granularity |
|---|---|
Up to 640 Mbps | 2.5 Mbps |
640 Mbps to 2.56 Gbps | 10 Mbps |
2.56 Gbps to 10 Gbps | 40 Mbps |
For more information about configuring RED drop profiles, see Configuring RED Drop Profiles for Congestion Control .
MDRR on the IOC
The guaranteed rate (CIR) at the interface set level is implemented by using modified deficit round-robin (MDRR). The IOC hardware provides four levels of strict priority. There is no restriction on the number of queues for each priority. MDRR is used among queues of the same priority. Each queue has one priority when it is under the guaranteed rate and another priority when it is over the guaranteed rate but still under the shaping rate (PIR). The IOC hardware implements the priorities with 256 service profiles. Each service profile assigns eight priorities for eight queues. One set is for logical interfaces under the guaranteed rate and another set is for logical interfaces over the guaranteed rate but under the shaping rate. Each service profile is associated with a group of 16 level 3 schedulers, so there is a unique service profile available for all 256 groups at level 3, giving 4,096 logical interfaces.
JUNOS Software provides three priorities for traffic under the guaranteed rate and one reserved priority for traffic over the guaranteed rate that is not configurable. JUNOS Software provides three priorities when there is no guaranteed rate configured on any logical interface.
The relationship between JUNOS Software priorities and the IOC hardware priorities below and above the guaranteed rate (CIR) is shown in Table 58.
Table 58: JUNOS Priorities Mapped to IOC Hardware Priorities
JUNOS Software Priority | IOC Hardware Priority Below Guaranteed Rate | IOC Hardware Priority Above Guaranteed Rate |
|---|---|---|
Strict-high | High | High |
High | High | Low |
Medium-high | Medium-high | Low |
Medium-low | Medium-high | Low |
Low | Medium-low | Low |
The JUNOS Software parameters are set in the scheduler map:
![]() | Note: The use of both shaping rate and a guaranteed rate at the interface set level (level 2) is not supported. |
MDRR is provided at three levels of the scheduler hierarchy of the IOC with a granularity of 1 through 255. There are 64 MDRR profiles at the queue level, 16 at the interface set level, and 32 at the physical interface level.
Queue transmit rates are used for queue-level MDRR profile weight calculation. The queue MDRR weight is calculated differently based on the mode set for sharing excess bandwidth. If you configure the equal option for excess bandwidth, then the queue MDRR weight is calculated as:
Queue weight = (255 * Transmit-rate-percentage) / 100
If you configure the proportional option for excess bandwidth, which is the default, then the queue MDRR weight is calculated as:
Queue weight = Queue-transmit-rate / Queue-base-rate, where
Queue-transmit-rate = (Logical-interface-rate * Transmit-rate-percentage) / 100, and
Queue-base-rate = Excess-bandwidth-proportional-rate / 255
To configure the way that the IOC should handle excess bandwidth, configure the excess-bandwidth-share statement at the [edit interface-set interface-set-name] hierarchy level. By default, the excess bandwidth is set to proportional with a default value of 32.64 Mbps. In this mode, the excess bandwidth is shared in the ratio of the logical interface shaping rates. If set to equal, the excess bandwidth is shared equally among the logical interfaces.
This example sets the excess bandwidth sharing to proportional at a rate of 100 Mbps with a shaping rate of 80 Mbps.
Shaping rates established at the logical interface level are used to calculate the MDRR weights used at the interface set level. The 16 MDRR profiles are set to initial values, and the closest profile with rounded values is chosen. By default, the physical port MDRR weights are preset to the full bandwidth on the interface.
Configuring Excess Bandwidth Sharing
When using the IOC (40x1GE IOC or 4x10GE IOC) on an SRX Series device, there are circumstances when you should configure excess bandwidth sharing and minimum logical interface shaping. This section details some of the guidelines for configuring excess bandwidth sharing.
- Excess Bandwidth Sharing and Minimum Logical Interface Shaping
- Selecting Excess Bandwidth Sharing Proportional Rates
- Mapping Calculated Weights to Hardware Weights
- Allocating Weight with Only Shaping Rates or Unshaped Logical Interfaces
- Sharing Bandwidth Among Logical Interfaces
Excess Bandwidth Sharing and Minimum Logical Interface Shaping
The default excess bandwidth sharing proportional rate is 32.65 Mbps (128 Kbps x 255). In order to have better weighed fair queuing (WFQ) accuracy among queues, the shaping rate configured should be larger than the excess bandwidth sharing proportional rate. Some examples are shown in Table 59.
Table 59: Shaping Rates and WFQ Weights
Shaping Rate | Configured Queue Transmit Rate | WFQ Weight | Total Weights |
|---|---|---|---|
10 Mbps | (30, 40, 25, 5) | (22, 30, 20, 4) | 76 |
33 Mbps | (30, 40, 25, 5) | (76, 104, 64, 13) | 257 |
40 Mbps | (30, 40, 25, 5) | (76, 104.64, 13) | 257 |
With a 10-Mbps shaping rate, the total weights are 76. This is divided among the four queues according to the configured transmit rate. Note that when the shaping rate is larger than the excess bandwidth sharing proportional rate of 32.65 Mbps, the total weight on the logical interface is 257 and the WFQ accuracy will be the same.
Selecting Excess Bandwidth Sharing Proportional Rates
To determine a good excess bandwidth-sharing proportional rate to configure, choose the largest CIR (guaranteed rate) among all the logical interfaces (units). If the logical units have PIRs (shaping rates) only, then choose the largest PIR rate. However, this is not ideal if a single logical interface has a large WRR rate. This method can skew the distribution of traffic across the queues of the other logical interfaces. To avoid this issue, set the excess bandwidth-sharing proportional rate to a lower value on the logical interfaces where the WRR rates are concentrated. This improves the bandwidth sharing accuracy among the queues on the same logical interface. However, the excess bandwidth sharing for the logical interface with the larger WRR rate is no longer proportional.
As an example, consider five logical interfaces on the same physical port, each with four queues, all with only PIRs configured and no CIRs. The WRR rate is the same as the PIR for the logical interface. The excess bandwidth is shared proportionally with a rate of 40 Mbps. The traffic control profiles for the logical interfaces are shown in Table 60.
Table 60: Example Shaping Rates and WFQ Weights
Shaping Rate | Configured Queue Transmit Rate | WFQ Weight | Total Weights |
|---|---|---|---|
(Unit 0) 10 Mbps | (95, 0, 0, 5) | (60, 0, 0, 3) | 63 |
(Unit 1) 20 Mbps | (25, 25, 25, 25) | (32, 32, 32, 32) | 128 |
(Unit 2) 40 Mbps | (40, 30, 20, 10) | (102, 77, 51, 26) | 255 |
(Unit 3) 200 Mbps | (70, 10, 10, 10) | (179, 26, 26, 26) | 255 |
(Unit 4) 2 Mbps | (25, 25, 25, 25) | (5, 5, 5, 5) | 20 |
Even though the maximum transmit rate for the queue on logical interface unit 3 is 200 Mbps, the excess bandwidth-sharing proportional rate is kept at a much lower value. Within a logical interface, this method provides a more accurate distribution of weights across queues. However, the excess bandwidth is now shared equally between unit 2 and unit 3 (total weights = 255).
Mapping Calculated Weights to Hardware Weights
The calculated weight in a traffic control profile is mapped to hardware weight, but the hardware only supports a limited WFQ profile. The weights are rounded to the nearest hardware weight according to the values in Table 61.
Table 61: Rounding Configured Weights to Hardware Weights
Traffic Control Profile Number | Number of Traffic Control Profiles | Weights | Maximum Error |
|---|---|---|---|
1–16 | 16 | 1–16 (interval of 1) | 50.00% |
17–29 | 13 | 18–42 (interval of 2) | 6.25% |
30–35 | 6 | 45–60 (interval of 3) | 1.35% |
36–43 | 8 | 64–92 (interval of 4) | 2.25% |
44–49 | 6 | 98–128 (interval of 6) | 3.06% |
50–56 | 7 | 136–184 (interval of 8) | 3.13% |
57–62 | 6 | 194–244 (interval of 10) | 2.71% |
63–63 | 1 | 255–255 (interval of 11) | 2.05% |
From the table, as an example, the calculated weight of 18.9 is mapped to a hardware weight of 18, because 18 is closer to 18.9 than 20 (an interval of 2 applies in the range 18–42).
Allocating Weight with Only Shaping Rates or Unshaped Logical Interfaces
Logical interfaces with only shaping rates (PIRs) or unshaped logical interfaces (units) are given a weight of 10. A logical interface with a small guaranteed rate (CIR) might get an overall weight less than 10. In order to allocate a higher share of the excess bandwidth to logical interfaces with a small guaranteed rate in comparison to the logical interfaces with only shaping rates configured, a minimum weight of 20 is given to the logical interfaces with guaranteed rates configured.
For example, consider a logical interface configuration with five units, as shown in Table 62.
Table 62: Allocating Weights with PIR and CIR on Logical Interfaces
Logical Interface (Unit) | Traffic Control Profile | WRR Percentages | Weights |
|---|---|---|---|
Unit 1 | PIR 100 Mbps | 95, 0, 0, 5 | 10, 1, 1, 1 |
Unit 2 | CIR 20 Mbps | 25, 25, 25, 25 | 64, 64, 64, 64 |
Unit 3 | PIR 40 Mbps, CIR 20 Mbps | 50, 30, 15, 5 | 128, 76, 38, 13 |
Unit 4 | Unshaped | 95, 0, 0, 5 | 10, 1, 1, 1 |
Unit 5 | CIR 1 Mbps | 95, 0, 0, 5 | 10, 1, 1, 1 |
The weights for these units are calculated as follows:
- Select the excess bandwidth-sharing proportional rate to be the maximum CIR among all the logical interfaces: 20 Mbps (unit 2).
- Unit 1 has a PIR and unit 4 is unshaped. The weight for these units is 10.
- The weight for unit 1 queue 0 is 9.5 (10 x 95%), which translates to a hardware weight of 10.
- The weight for unit 1 queue 1 is 0 (0 x 0%), but although the weight is zero, a weight of 1 is assigned to give minimal bandwidth to queues with zero WRR.
- Unit 5 has a very small CIR (1 Mbps), and a weight of 20 is assigned to units with a small CIR.
- The weight for unit 5 queue 0 is 19 (20 x 95%), which translates to a hardware weight of 18.
- Unit 3 has a CIR of 20 Mbps, which is the same as the excess bandwidth-sharing proportional rate, so it has a total weight of 255.
- The weight of unit 3 queue 0 is 127.5 (255 x 50%), which translates to a hardware weight of 128.
Sharing Bandwidth Among Logical Interfaces
As a simple example showing how bandwidth is shared among the logical interfaces, assume that all traffic is sent on queue 0. Assume also that there is a 40-Mbps load on all of the logical interfaces. Configuration details are shown in Table 63.
Table 63: Sharing Bandwidth Among Logical Interfaces
Logical Interface (Unit) | Traffic Control Profile | WRR Percentages | Weights |
|---|---|---|---|
Unit 1 | PIR 100 Mbps | 95, 0, 0, 5 | 10, 1, 1, 1 |
Unit 2 | CIR 20 Mbps | 25, 25, 25, 25 | 64, 64, 64, 64 |
Unit 3 | PIR 40 Mbps, CIR 20 Mbps | 50, 30, 15, 5 | 128, 76, 38, 13 |
Unit 4 | Unshaped | 95, 0, 0, 5 | 10, 1, 1, 1 |
- When the port is shaped at 40 Mbps, because units 2 and 3 have a guaranteed rate (CIR) configured, both units 2 and 3 get 20 Mbps of shared bandwidth.
- When the port is shaped at 100 Mbps, because units 2 and
3 have a guaranteed rate (CIR) configured, each of them can transmit
20 Mbps. On units 1, 2, 3, and 4, the 60 Mbps of excess bandwidth
is shaped according to the values shown in Table 64.
Table 64: First Example of Bandwidth Sharing
Logical Interface (Unit)
Calculation
Bandwidth
1
10 / (10+64+128+10) x 60 Mbps
2.83 Mbps
2
64 / (10+64+128+10) x 60 Mbps
18.11 Mbps
3
128 / (10+64+128+10) x 60 Mbps
36.22 Mbps
4
10 (10+64+128+10) x 60 Mbps
2.83 Mbps
However, unit 3 only has 20 Mbps extra (PIR and CIR) configured. This means that the leftover bandwidth of 16.22 Mbps (36.22 Mbps – 20 Mbps) is shared among units 1, and 2, and 4. This is shown in Table 65.
Table 65: Second Example of Bandwidth Sharing
Logical Interface (Unit) | Calculation | Bandwidth |
|---|---|---|
1 | 10 / (10+64+128+10) x 16.22 Mbps | 1.93 Mbps |
2 | 64 / (10+64+128+10) x 16.22 Mbps | 12.36 Mbps |
4 | 10 (10+64+128+10) x 16.22 Mbps | 1.93 Mbps |
Finally, Table 66 shows the resulting allocation of bandwidth among the logical interfaces when the port is configured with a 100-Mbps shaping rate.
Table 66: Final Example of Bandwidth Sharing
Logical Interface (Unit) | Calculation | Bandwidth |
|---|---|---|
1 | 2.83 Mbps + 1.93 Mbps | 4.76 Mbps |
2 | 20 Mbps + 18.11 Mbps + 12.36 Mbps | 50.47 Mbps |
3 | 20 Mbps + 20 Mbps | 40 Mbps |
4 | 2.83 Mbps + 1.93 Mbps | 4.76 Mbps |