Resource Monitoring for Subscriber Management and Services

Resource Monitoring for Subscriber Management and Services Overview

Junos OS supports a resource monitoring capability using both the CLI and SNMP MIB queries. You can employ this utility to provision sufficient headroom (memory space limits for the application or virtual router) to ensure system stability, especially the health and operating efficiency of I-chip-based line cards and Trio-based FPCs on MX Series routers.

When memory utilization, either the ukernel memory or ASIC memory, reaches a certain threshold, the system operations compromise on the health and traffic-handling stability of the line card. Such a trade-off on system performance can be detrimental for supporting live traffic and protocols.

Besides the ability to configure a threshold to raise error logs when a specific threshold value of resources is exceeded, you can also monitor the threshold values and resource utilization using SNMP MIB queries.

The following sections describe the types of resource monitoring available with Junos OS:

Using Watermarks for Line-Card Resource Monitoring
Throttling Subscriber Load Based on CoS Resource Capacity
Examining the Utilization of Memory Resource Regions Using show Commands
Load Throttling to Reduce Processing Delays
Limiting Subscribers with Resource Monitor
Change History on Resource Monitoring for Subscriber Management and Services
Platform-Specific Resource Monitoring for Subscriber Management and Services Behavior

Using Watermarks for Line-Card Resource Monitoring

You can configure watermark or checkpoint values for the line-card resources, such as ukern memory (heap), next-hop (NH) memory, and firewall or filter memory, to be uniform for both Trio-based and I-chip-based line cards. The NH memory watermark is applicable only for encapsulation memory (output WAN static RAM memory). Encapsulation memory is specific to I-chips and not applicable for Trio-based chips. When the configured watermark is exceeded, error logs are triggered. If the resource has been used above a certain threshold, warning system log messages are generated to notify about the threshold value having exceeded. Based on your network needs, you can then determine whether you want to terminate any existing subscribers and services to prevent the system from being overloaded and resulting in a breakdown.

This feature gathers input from each of the line cards and transfers this statistical detail to the Routing Engine process using a well-known internal port. This information is scanned by the daemon on the Routine Engine and using the shared memory space built into the session database, warning messages are generated for exceeded threshold conditions.

You can configure the following parameters at the [edit system services] hierarchy level to specify the high threshold value that is common for all the memory spaces or regions and the watermark values for the different memory blocks on DPCs and MPCs:

High threshold value, exceeding which warnings or error logs are generated, for all the regions of memory, such as heap or ukernel, next hop and encapsulation, and firewall filter memory, by using the resource-monitor high-threshold value statement.
Percentage of free memory space used for next hops to be monitored with a watermark value by using the resource-monitor free-nh-memory-watermark percentage statement.
Percentage of free memory space used for ukernel or heap memory to be monitored with a watermark value by using the resource-monitor free-heap-memory-watermark percentage statement.
Percentage of free memory space used for firewall and filter memory to be monitored with a watermark value by using the resource-monitor free-fw-memory-watermark percentage statement. This feature is enabled by default and you cannot disable it manually. The default value and the configured value of the watermark value for the percentage of free next-hop memory also applies to encapsulation memory.

The default watermark values for the percentage of free ukernel or heap memory, next-hop memory, and firewall filter memory are as follows:

free-heap-memory-watermark—20
free-nh-memory-watermark—20
free-fw-memory-watermark—20

Throttling Subscriber Load Based on CoS Resource Capacity

Class of service (CoS) criteria are incorporated into the throttling decision for subscriber access. Information about the availability of CoS resources, namely queue capacity, is collected from the line cards. At subscriber login, assuming that the subscriber requires CoS resources, the line cards report the CoS queue utilization as a percent of resources that are bound to a scheduling hierarchy and are not free to be bound to a new scheduling hierarchy. The high-cos-queue-threshold statement at the [edit system services] hierarchy level can be set in the range of from 0 percent to 90 percent, separately for each FPC slot. When CoS queue utilization on a given FPC reaches that FPC's configured threshold level, further subscriber logins on that FPC are not allowed. This resource monitoring mechanism provides adjustable safety margins to proactively avoid completely exhausting each FPC's available CoS queue resources. See high-cos-queue-threshold.This feature is only available when you enable subscriber management. For more information on enabling subscriber management, see Configuring Junos OS Enhanced Subscriber Management.

Examining the Utilization of Memory Resource Regions Using show Commands

You can use the show system resource-monitor fpc command to monitor the utilization of memory resources on the Packet Forwarding Engines of an FPC. The filter memory denotes the filter counter memory used for firewall filter counters. The asterisk (*) displayed next to each of the memory regions denotes the ones for which the configured threshold is being currently exceeded. Resource monitoring commands display the configured values of watermark for memories for different line-card applications to be monitored. The displayed statistical metrics are based on the computation performed of the current memory utilization of the individual line cards. The ukern memory is generic across the different types of line cards and signifies the heap memory buffers. Because a line card or an FPC in a particular slot can contain multiple Packet Forwarding Engine complexes, the memory utilized on the application-specific integrated circuits (ASICs) are specific to a particular PFE complex. Owing to different architecture models for different variants of line cards supported, the ASIC-specific memory (next-hop and firewall or filter memory) utilization percentage can be interpreted differently.

Load Throttling to Reduce Processing Delays

The Routing Engine can use resource monitoring to assess and reduce the processing load on a line card’s Packet Forwarding Engine. It is possible for the Routing Engine to send work at a higher rate than the Packet Forwarding Engine can process. This is sometimes called overdriving the line card or Packet Forwarding Engine. When the work load on the Packet Forwarding Engine is too high, it can cause noticeable delays in packet processing.

Resource monitoring enables the Routing Engine assess the load by evaluating the round-trip delay for packets that it sends to the Packet Forwarding Engine. A longer round-trip time indicates a higher load and therefore a greater chance of processing delays on the Packet Forwarding Engine. When appropriate, the Routing Engine reduces the percentage of subscriber sessions (client and service) that are allowed to complete.

This capability is called load throttling or round-trip time load throttling. Throttling prevents the Routing Engine from over-driving line cards to the point that processing delays become visible to operators and back-office systems. It works like this:

To monitor delays, the Routing Engine sends an echo request message every second to the Packet Forwarding Engine on the line card. The echo request includes both a timestamp for when it is sent and a running sequence number. The message priority is best effort, to simulate the worst-case processing delay on the line card.
The Packet Forwarding Engine processes the echo request and responds with an echo reply. The message priority is high to minimize jitter when the Routing Engine processes the returned packet.
When the Routing Engine receives the echo reply, it calculates the round trip time as the time difference between the echo request timestamp and the time it receives the echo reply for that particular sequence number.
The Routing Engine compares the round-trip delay time to a default round-trip threshold value of 1 second. If the measured delay is longer than the threshold for three consecutive trips, the Routing Engine denies logins for a percentage of new subscribers, reducing the number of new client and service sessions that are established. This reduction is called throttling.

An internal algorithm derives the throttling percentage based on the threshold and the round-trip time. This percentage varies based on the round-trip delay at that point in time.

The Routing Engine increases the throttle—denies more subscriber logins—for each successive set of three delay measurements that all exceed the threshold.
When the measured delay is less than the threshold for three consecutive trips, the Routing Engine removes the throttle. This allows subscribers to log in freely.

Note:

RTT load throttling applies on a per-line-card basis for Ethernet interfaces (ge, xe) and pseudowire interfaces (ps) as follows:

For aggregated Ethernet interfaces, it applies to the set of line cards associated with the aggregated Ethernet bundle.
For pseudowire interfaces with redundant logical tunnel (RLT), it applies to the set of line cards that are associated with the anchor point.

In both cases, the Routing Engine considers the delay value that determines throttling to be the longest round-trip delay of all the line cards in the set.

Table 1 shows how subscriber sessions are throttled on a line card over a period of 12 seconds when the round-trip delay is greater than the internal threshold. This example has the following assumptions:

The internal delay threshold is 1 second.
Delay measurements occur every second.
The session creation rate is reduced by 10 percent after 3 consecutive round-trip delay measurements that are above the round-trip delay threshold. For as long as the threshold is exceeded, the throttling is increased every 3 measurements.
If the measured delay drops and remains below the threshold for 3 consecutive round-trip delay measurements, the session rate returns to 100 percent.

Note:

This example is simplified. Remember that the exact throttling percentage is determined dynamically and can vary second to second.

Table 1: Example Load Throttling Due to Round-trip Delay Time
Time	Round-trip Delay (ms)	Threshold Exceeded	Percentage of Sessions Allowed
1	850	No	100
2	900	No	100
3	995	No	100
4	1021	Yes Threshold exceeded count #1	100
5	1130	Yes Threshold exceeded count #2	100
6	1158	Yes Threshold exceeded count #3	90 Session rate reduced by 10 %
7	1127	Yes Threshold exceeded count #1	90 Session rate reduced by 10 %
8	1135	Yes Threshold exceeded count #2	90
9	1126	Yes Threshold exceeded count #3	80 Session rate reduced by 10 %
10	1000	No Threshold not exceeded count #1	80
11	991	No Threshold not exceeded count #2	80
12	998	No Threshold not exceeded count #3	100 Throttling removed

Resource load monitoring and round-trip time throttling is enabled by default. You can use either of the following statements to disable this feature:

no-load-throttle at the [edit system services resource-monitor] hierarchy level
no-throttle at the [edit system services resource-monitor] hierarchy level

If you disable the feature and the Packet Forwarding Engine becomes too busy, new subscribers can log in and go active, but no traffic flows for a period of time. This delay in traffic processing might become noticeable.

You can use the following command to confirm whether the load throttling feature is enabled and see various aspects of the feature in action. The bolded fields are particularly useful.

Limiting Subscribers with Resource Monitor

Starting in Junos OS Release 17.3R1, you can also use resource monitoring to directly limit the number of subscribers supported per hardware element. You can specify the maximum number of subscribers that can be logged in per chassis, line card (MPC), MIC, or port. You can set the limit to subscribers of only one client type (DHCP, L2TP, or PPPoE) or to subscribers of any client type.

This feature ensures that the number of subscribers logged in per hardware element does not exceed the number that your network can serve with stability at the desired service bandwidth. When the limit is reached for a hardware element, new subscriber logins are denied on that element until the number of subscribers drops below the configured limit. New subscribers over the limit can connect to another hardware element in the same broadcast domain. When you configure the limit on one or more legs of an aggregated Ethernet interface, login is denied if the subscriber count exceeds the value on any of the legs.

Limiting subscribers this way distributes the load among hardware elements, but it does not provide any sort of load balancing. This feature can also help you map capacity in your network and determine what hardware resources you need to expand that capacity. For example, if you provide a service that needs a particular amount of memory and know how many subscribers you can service with a given set of hardware, you can determine how much memory you need. Or if you want to add a service with more memory per subscriber, you can calculate the additional amount that you need, compare it to your available memory, and determine whether you need to provision new ports, MICs, MPCs, or routers to handle the new service.

Change History on Resource Monitoring for Subscriber Management and Services

Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.

Table 2: Change History on Resource Monitoring for Subscriber Management and Services
Release	Description
17.3	Starting in Junos OS Release 17.3R1, you can also use resource monitoring to directly limit the number of subscribers supported per hardware element.
17.4	Starting in Junos OS Release 17.4R1, class of service (CoS) criteria are incorporated into the throttling decision for subscriber access.
19.4	Starting in Junos OS Release 19.4R1, you can specify a value of 0 to prevent any subscriber from being throttled by queue-based throttling.

Platform-Specific Resource Monitoring for Subscriber Management and Services Behavior

Platform	Difference
MX240, MX480, and MX960 routers with MPC2E legacy, MPC2E-NG, MPC3E-NG, MPC5E, and MPC7E line cards	CoS resource monitoring feature bases admission decisions only on queues is supported for the hardware. Other CoS resources are not part of this criteria. This feature does not support throttling for subscribers arriving on pseudo-wire, logical tunnel, or redundant logical tunnel devices.
MX80, MX104 routers	Support resource monitoring configuration .
MX240, MX480, MX960, MX2010, and MX2020 routers	The following line cards support resource monitoring on MX240, MX480, MX960, MX2010, and MX2020 routers: MX-MPC1-3D MX-MPC1-3D-Q MX-MPC2-3D MX-MPC2-3D-Q MX-MPC2-3D-EQ MPC-3D-16XGE-SFPP MPC3E MPC3E-3D-NG MPC4E-3D-2CGE-8XGE MPC4E-3D-32XGE MPC5EQ-40G10G MPC5EQ-100G10G MPC5E-100G10G MPC5E-40G10G MPC10E-10C-MRATE MPC10E-15C-MRATE MX2K-MPC6E MX2K-MPC11E DPCE MS-DPC MX Series Flexible PIC Concentrators (MX-FPCs) NG-MPC3E

Limiting Subscribers by Client Type and Hardware Element with Resource Monitor

In addition to using resource monitoring to monitor and manage system memory usage, you can use it to directly limit the number of subscribers supported per hardware element: chassis, line card (MPC), MIC, and port. You can specify the maximum number of subscribers that can be logged in to each of those elements. You apply the limit to subscribers of only one client type (DHCP, L2TP, or PPPoE) or to subscribers of any of these client types. In the latter case, the limit applies to the sum of sessions for all three client types.

Subscriber limiting can ensure that the number of subscribers logged in per hardware element does not exceed the number that your network can serve with stability at the desired service bandwidth. When the limit is reached for a hardware element, new subscriber logins are denied on that element until the number of subscribers drops below the configured limit. New subscribers over the limit connect to another hardware element in the same broadcast domain. When you configure the limit on one or more legs of an aggregated Ethernet interface, login is denied if the subscriber count exceeds the value on any of the legs.

Limiting subscribers this way distributes the load among hardware elements, but it does not provide any sort of load balancing. This feature can also help you map capacity in your network and determine what hardware resources you need to expand that capacity. For example, if you provide a service at a particular bandwidth and know how many subscribers you can service with a given set of hardware, you can determine how much bandwidth you need. Or if you want to add a service with more bandwidth per subscriber, you can calculate the additional bandwidth that you need, compare it to your available bandwidth, and determine whether you need to provision new ports, MICs, MPCs, or routers to handle the new service.

Note:

The CLI uses the terms fpc and pic. For this feature, fpc corresponds to MPC and pic corresponds to MIC.

To place a limit on the maximum number of subscribers allowed for a hardware element:

Configure the client type for the subscribers.

(Optional) Configure a subscriber limit on the chassis.

(Optional) Configure a subscriber limit on an MPC.

(Optional) Configure a subscriber limit on a MIC.

(Optional) Configure a subscriber limit on a port.

For example, the following configuration sets chassis and MPC limits for PPPoE subscribers:

Change History Table

Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.

Release

Description

17.4R1

Starting in Junos OS Release 17.4R1, class of service (CoS) criteria are incorporated into the throttling decision for subscriber access.

17.3R1

Starting in Junos OS Release 17.3R1, you can also use resource monitoring to directly limit the number of subscribers supported per hardware element.

ON THIS PAGE

Resource Monitoring for Subscriber Management and Services