Ping Hosts
Monitor Traffic Through the Router or Switch
Dynamic Ternary Content Addressable Memory Overview
Troubleshooting DNS Name Resolution in Logical System Security Policies (Primary Administrators Only)
Troubleshooting the Link Services Interface
Troubleshooting Security Policies
Log Error Messages used for Troubleshooting ISSU-Related Problems

Monitoring and Troubleshooting

This section describes the network monitoring and troubleshooting features of Junos OS.

Purpose

Use the CLI ping command to verify that a host can be reached over the network. This command is useful for diagnosing host and network connectivity problems. The device sends a series of Internet Control Message Protocol (ICMP) echo (ping) requests to a specified host and receives ICMP echo responses.

Action

To use the ping command to send four requests (ping count) to host3:

Sample Output

command-name

Meaning

The ping results show the following information:
- Size of the ping response packet (in bytes).
- IP address of the host from which the response was sent.
- Sequence number of the ping response packet. You can use this value to match the ping response to the corresponding ping request.
- Time-to-live (ttl) hop-count value of the ping response packet.
- Total time between the sending of the ping request packet and the receiving of the ping response packet, in milliseconds. This value is also called round-trip time.
- Number of ping requests (probes) sent to the host.
- Number of ping responses received from the host.
- Packet loss percentage.
- Round-trip time statistics: minimum, average, maximum, and standard deviation of the round-trip time.

Monitor Traffic Through the Router or Switch

For diagnosing a problem, display real-time statistics about the traffic passing through physical interfaces on the router or switch.

To display real-time statistics about physical interfaces, perform these tasks:

Display Real-Time Statistics About All Interfaces on the Router or Switch
Display Real-Time Statistics About an Interface on the Router or Switch

Display Real-Time Statistics About All Interfaces on the Router or Switch

Purpose
Action
Meaning

Purpose

Display real-time statistics about traffic passing through all interfaces on the router or switch.

Action

To display real-time statistics about traffic passing through all interfaces on the router or switch:

Sample Output

command-name

Meaning

The sample output displays traffic data for active interfaces and the amount that each field has changed since the command started or since the counters were cleared by using the C key. In this example, the monitor interface command has been running for 15 seconds since the command was issued or since the counters last returned to zero.

Display Real-Time Statistics About an Interface on the Router or Switch

Purpose
Action
Meaning

Purpose

Display real-time statistics about traffic passing through an interface on the router or switch.

Action

To display traffic passing through an interface on the router or switch, use the following Junos OS CLI operational mode command:

Sample Output

command-name

Meaning

The sample output shows the input and output packets for a particular SONET interface (so-0/0/1). The information can include common interface failures, such as SONET/SDH and T3 alarms, loopbacks detected, and increases in framing errors. For more information, see Checklist for Tracking Error Conditions.

To control the output of the command while it is running, use the keys shown in Table 1.

Table 1: Output Control Keys for the monitor interface Command
Action	Key
Display information about the next interface. The `monitor interface` command scrolls through the physical or logical interfaces in the same order that they are displayed by the `show interfaces terse` command.	`N`
Display information about a different interface. The command prompts you for the name of a specific interface.	`I`
Freeze the display, halting the display of updated statistics.	`F`
Thaw the display, resuming the display of updated statistics.	`T`
Clear (zero) the current delta counters since `monitor interface` was started. It does not clear the accumulative counter.	`C`
Stop the `monitor interface` command.	`Q`

See the CLI Explorer for details on using match conditions with the monitor traffic command.

Dynamic Ternary Content Addressable Memory Overview

In ACX Series routers, Ternary Content Addressable Memory (TCAM) is used by various applications like firewall, connectivity fault management, PTPoE, RFC 2544, etc. The Packet Forwarding Engine (PFE) in ACX Series routers uses TCAM with defined TCAM space limits. The allocation of TCAM resources for various filter applications are statically distributed. This static allocation leads to inefficient utilization of TCAM resources when all the filter applications might not use this TCAM resource simultaneously.

The dynamic allocation of TCAM space in ACX routers efficiently allocates the available TCAM resources for various filter applications. In the dynamic TCAM model, various filter applications (such as inet-firewall, bridge-firewall, cfm-filters, etc.) can optimally utilize the available TCAM resources as and when required. Dynamic TCAM resource allocation is usage driven and is dynamically allocated for filter applications on a need basis. When a filter application no longer uses the TCAM space, the resource is freed and available for use by other applications. This dynamic TCAM model caters to higher scale of TCAM resource utilization based on application’s demand.

Applications using Dynamic TCAM Infrastructure
Features Using TCAM Resource
Monitoring TCAM Resource Usage
Example: Monitoring and Troubleshooting the TCAM Resource
Monitoring and Troubleshooting TCAM Resource in ACX Series Routers
Service Scaling on ACX5048 and ACX5096 Routers

Applications using Dynamic TCAM Infrastructure

The following filter application categories use the dynamic TCAM infrastructure:

Firewall filter—All the firewall configurations
Implicit filter—Routing Engine (RE) demons using filters to achieve its functionality. For example, connectivity fault management, IP MAC validation, etc.
Dynamic filters—Applications using filters to achieve the functionality at the PFE level. For example, logical interface level fixed classifier, RFC 2544, etc. RE demons will not know about these filters.
System-init filters—Filters that require entries at the system level or fixed set of entries at router's boot sequence. For example, Layer 2 and Layer 3 control protocol trap, default ARP policer, etc.

Note:
The System-init filter which has the applications for Layer 2 and Layer 3 control protocols trap is essential for the overall system functionality. The applications in this control group consume a fixed and minimal TCAM space from the overall TCAM space. The system-init filter will not use the dynamic TCAM infrastructure and will be created when the router is initialized during the boot sequence.

Features Using TCAM Resource

Applications using the TCAM resource is termed tcam-app in this document. For example, inet-firewall, bridge-firewall, connectivity fault management, link fault management, and so on are all different tcam-apps.

Table 2 describes the list of tcam-apps that use TCAM resources.

Table 2: Features Using TCAM Resource
TCAM Apps/TCAM Users	Feature/Functionality	TCAM Stage
bd-dtag-validate	Bridge domain dual-tagged validate Note: This feature is not supported on ACX5048 and ACX5096 routers.	Egress
bd-tpid-swap	Bridge domain vlan-map with swap tpid operation	Egress
cfm-bd-filter	Connectivity fault management implicit bridge-domain filters	Ingress
cfm-filter	Connectivity fault management implicit filters	Ingress
cfm-vpls-filter	Connectivity fault management implicit vpls filters Note: This feature is supported only on ACX5048 and ACX5096 routers.	Ingress
cfm-vpls-ifl-filter	Connectivity fault management implicit vpls logical interface filters Note: This feature is supported only on ACX5048 and ACX5096 routers.	Ingress
cos-fc	Logical interface level fixed classifier	Pre-ingress
fw-ccc-in	Circuit cross-connect family ingress firewall	Ingress
fw-family-out	Family level egress firewall	Egress
fw-fbf	Firewall filter-based forwarding	Pre-ingress
fw-fbf-inet6	Firewall filter-based forwarding for inet6 family	Pre-ingress
fw-ifl-in	Logical interface level ingress firewall	Ingress
fw-ifl-out	Logical interface level egress firewall	Egress
fw-inet-ftf	Inet family ingress firewall on a forwarding-table	Ingress
fw-inet6-ftf	Inet6 family ingress firewall on a forwarding-table	Ingress
fw-inet-in	Inet family ingress firewall	Ingress
fw-inet-rpf	Inet family ingress firewall on RPF fail check	Ingress
fw-inet6-in	Inet6 family ingress firewall	Ingress
fw-inet6-family-out	Inet6 Family level egress firewall	Egress
fw-inet6-rpf	Inet6 family ingress firewall on a RPF fail check	Ingress
fw-inet-pm	Inet family firewall with port-mirror action Note: This feature is not supported on ACX5048 and ACX5096 routers.	Ingress
fw-l2-in	Bridge family ingress firewall on Layer 2 interface	Ingress
fw-mpls-in	MPLS family ingress firewall	Ingress
fw-semantics	Firewall sharing semantics for CLI configured firewall	Pre-ingress
fw-vpls-in	VPLS family ingress firewall on VPLS interface	Ingress
ifd-src-mac-fil	Physical interface level source MAC filter	Pre-ingress
ifl-statistics-in	Logical level interface statistics at ingress	Ingress
ifl-statistics-out	Logical level interface statistics at egress	Egress
ing-out-iff	Ingress application on behalf of egress family filter for log and syslog	Ingress
ip-mac-val	IP MAC validation	Pre-ingress
ip-mac-val-bcast	IP MAC validation for broadcast	Pre-ingress
ipsec-reverse-fil	Reverse filters for IPsec service Note: This feature is not supported on ACX5048 and ACX5096 routers.	Ingress
irb-cos-rw	IRB CoS rewrite	Egress
lfm-802.3ah-in	Link fault management (IEEE 802.3ah) at ingress Note: This feature is not supported on ACX5048 and ACX5096 routers.	Ingress
lfm-802.3ah-out	Link fault management (IEEE 802.3ah) at egress	Egress
lo0-inet-fil	Looback interface inet filter	Ingress
lo0-inet6-fil	Looback interface inet6 filter	Ingress
mac-drop-cnt	Statistics for drops by MAC validate and source MAC filters	Ingress
mrouter-port-in	Multicast router port for snooping	Ingress
napt-reverse-fil	Reverse filters for network address port translation (NAPT) service Note: This feature is not supported on ACX5048 and ACX5096 routers.	Ingress
no-local-switching	Bridge no-local-switching	Ingress
ptpoe	Point-to-Point-Over-the-Ethernet traps Note: This feature is not supported on ACX5048 and ACX5096 routers.	Ingress
ptpoe-cos-rw	CoS rewrite for PTPoE Note: This feature is not supported on ACX5048 and ACX5096 routers.	Egress
rfc2544-layer2-in	RFC2544 for Layer 2 service at ingress	Pre-ingress
rfc2544-layer2-out	RFC2544 for Layer 2 service at egress Note: This feature is not supported on ACX5048 and ACX5096 routers.	Egress
service-filter-in	Service filter at ingress Note: This feature is not supported on ACX5048 and ACX5096 routers.	Ingress

Monitoring TCAM Resource Usage

You can use the show and clear commands to monitor and troubleshoot dynamic TCAM resource usage.

Table 3 summarizes the command-line interface (CLI) commands you can use to monitor and troubleshoot dynamic TCAM resource usage.

Table 3: Show and Clear Commands to Monitor and Troubleshoot Dynamic TCAM
Task	Command
Display the shared and the related applications for a particular application	show pfe tcam app
Display the TCAM resource usage for an application and stages (egress, ingress, and pre-ingress)	show pfe tcam usage (ACX5448) show pfe filter hw summary
Display the TCAM resource usage errors for applications and stages (egress, ingress, and pre-ingress)	show pfe tcam errors
Clears the TCAM resource usage error statistics for applications and stages (egress, ingress, and pre-ingress)	clear pfe tcam-errors

Example: Monitoring and Troubleshooting the TCAM Resource

This section describes a use case where you can monitor and troubleshoot TCAM resources using show commands. In this use case scenario, you have configured Layer 2 services and the Layer 2 service-related applications are using TCAM resources. The dynamic approach, as shown in this example, gives you the complete flexibility to manage TCAM resources on a need basis.

The service requirement is as follows:

Each bridge domain has one UNI and one NNI interface
Each UNI interface has:
- One logical interface level policer to police the traffic at 10 Mbps.
- Multifield classifier with four terms to assign forwarding class and loss-priority.
Each UNI interface configures CFM UP MEP at the level 4.
Each NNI interface configures CFM DOWN MEP at the level 2

Let us consider a scenario where there are 100 services configured on the router. With this scale, all the applications are configured successfully and the status shows OK state.

Viewing TCAM resource usage for all stages.

To view the TCAM resource usage for all stages (egress, ingress, and pre-ingress), use the show pfe tcam usage all-tcam-stages detail command. On ACX5448 routers, use the show pfe filter hw summary command to view the TCAM resource usgae.

Configure additional Layer 2 services on the router.

For example, add 20 more services on the router, thereby increasing the total number of services to 120. After adding more services, you can check the status of the configuration by verifying either the syslog message using the command show log messages, or by running the show pfe tcam errors command.

The following is a sample syslog message output showing the TCAM resource shortage for Ethernet-switching family filters for newer configurations by running the show log messages CLI command.

If you use the show pfe tcam errors all-tcam-stages detail CLI command to verify the status of the configuration, the output will be as shown below:

The output indicates that the fw-l2-in application is running out of TCAM resources and moves into a FAILED state. Although there are two TCAM slices available at the ingress stage, the fw-l2-in application is not able to use the available TCAM space due to its mode (DOUBLE), resulting in resource shortage failure.

Fixing the applications that have failed due to the shortage of TCAM resouces.

The fw-l2-in application failed because of adding more number of services on the routers, which resulted in shortage of TCAM resources. Although other applications seems to work fine, it is recommended to deactivate or remove the newly added services so that the fw-l2-in application moves to an OK state. After removing or deactivating the newly added services, you need to run the show pfe tcam usage and show pfe tcam error commands to verify that there are no more applications in failed state.

To view the TCAM resource usage for all stages (egress, ingress, and pre-ingress), use the show pfe tcam usage all-tcam-stages detail command. For ACX5448 routers, use the show pfe filter hw summary command to to view the TCAM resource usage.

To view TCAM resource usage errors for all stages (egress, ingress, and pre-ingress), use the show pfe tcam errors all-tcam-stages command.

You can see that all the applications using the TCAM resources are in OK state and indicates that the hardware has been successfully configured.

Note:

As shown in the example, you will need to run the show pfe tcam errors and show pfe tcam usage commands at each step to ensure that your configurations are valid and that the applications using TCAM resource are in OK state. For ACX5448 routers, use the show pfe filter hw summary command to view the TCAM resource usage.

Monitoring and Troubleshooting TCAM Resource in ACX Series Routers

The dynamic allocation of Ternary Content Addressable Memory (TCAM) space in ACX Series efficiently allocates the available TCAM resources for various filter applications. In the dynamic TCAM model, various filter applications (such as inet-firewall, bridge-firewall, cfm-filters, etc.) can optimally utilize the available TCAM resources as and when required. Dynamic TCAM resource allocation is usage driven and is dynamically allocated for filter applications on a need basis. When a filter application no longer uses the TCAM space, the resource is freed and available for use by other applications. This dynamic TCAM model caters to higher scale of TCAM resource utilization based on application’s demand. You can use the show and clear commands to monitor and troubleshoot dynamic TCAM resource usage in ACX Series routers.

Note:

Applications using the TCAM resource is termed tcam-app in this document.

Dynamic Ternary Content Addressable Memory Overview shows the task and the commands to monitor and troubleshoot TCAM resources in ACX Series routers

Table 4: Commands to Monitor and Troubleshoot TCAM Resource in ACX Series
How to	Command
View the shared and the related applications for a particular application.	`show pfe tcam app (list-shared-apps \| list-related-apps)`
View the number of applications across all tcam stages.	`show pfe tcam usage all-tcam-stages`
View the number of applications using the TCAM resource at a specified stage.	`show pfe tcam usage tcam-stage (ingress \| egress \| pre-egress)`
View the TCAM resource used by an application in detail.	`show pfe tcam usage app <application-name> detail`
View the TCAM resource used by an application at a specified stage.	`show pfe tcam usage tcam-stage (ingress \| egress \| pre-egress) app <application-name>`
Know the number of TCAM resource consumed by a tcam-app	`show pfe tcam usage app <application-name>`
View the TCAM resource usage errors for all stages.	`show pfe tcam errors all-tcam-stages detail`
View the TCAM resource usage errors for a stage	`show pfe tcam errors tcam-stage (ingress \| egress \| pre-egress)`
View the TCAM resource usage errors for an application.	`show pfe tcam errors app <application-name>`
View the TCAM resource usage errors for an application along with its other shared application.	`show pfe tcam errors app <application-name> shared-usage`
Clear the TCAM resource usage error statistics for all stages.	`clear pfe tcam-errors all-tcam-stages`
Clear the TCAM resource usage error statistics for a specified stage	`clear pfe tcam-errors tcam-stage (ingress \| egress \| pre-egress)`
Clear the TCAM resource usage error statistics for an application.	`clear pfe tcam-errors app <application-name>`

To know more about dynamic TCAM in ACX Series, see Dynamic Ternary Content Addressable Memory Overview.

Service Scaling on ACX5048 and ACX5096 Routers

On ACX5048 and ACX5096 routers, a typical service (such as ELINE, ELAN and IP VPN) that is deployed might require applications (such as policers, firewall filters, connectivity fault management IEEE 802.1ag, RFC2544) that uses the dynamic TCAM infrastructure.

Note:

Service applications that uses TCAM resources is limited by the TCAM resource availability. Therefore, the scale of the service depends upon the consumption of the TCAM resource by such applications.

A sample use case for monitoring and troubleshooting service scale in ACX5048 and ACX5096 routers can be found at the Dynamic Ternary Content Addressable Memory Overview section.

Troubleshooting DNS Name Resolution in Logical System Security Policies (Primary Administrators Only)

Problem
Cause
Solution

Problem

Description

The address of a hostname in an address book entry that is used in a security policy might fail to resolve correctly.

Cause

Normally, address book entries that contain dynamic hostnames refresh automatically for SRX Series Firewalls. The TTL field associated with a DNS entry indicates the time after which the entry should be refreshed in the policy cache. Once the TTL value expires, the SRX Series Firewall automatically refreshes the DNS entry for an address book entry.

However, if the SRX Series Firewall is unable to obtain a response from the DNS server (for example, the DNS request or response packet is lost in the network or the DNS server cannot send a response), the address of a hostname in an address book entry might fail to resolve correctly. This can cause traffic to drop as no security policy or session match is found.

Solution

The primary administrator can use the show security dns-cache command to display DNS cache information on the SRX Series Firewall. If the DNS cache information needs to be refreshed, the primary administrator can use the clear security dns-cache command.

Note:

These commands are only available to the primary administrator on devices that are configured for logical systems. This command is not available in user logical systems or on devices that are not configured for logical systems.

Troubleshooting the Link Services Interface

To solve configuration problems on a link services interface:

Determine Which CoS Components Are Applied to the Constituent Links
Determine What Causes Jitter and Latency on the Multilink Bundle
Determine If LFI and Load Balancing Are Working Correctly
Determine Why Packets Are Dropped on a PVC Between a Juniper Networks Device and a Third-Party Device

Determine Which CoS Components Are Applied to the Constituent Links

Problem
Solution

Problem

Description

You are configuring a multilink bundle, but you also have traffic without MLPPP encapsulation passing through constituent links of the multilink bundle. Do you apply all CoS components to the constituent links, or is applying them to the multilink bundle enough?

Solution

You can apply a scheduler map to the multilink bundle and its constituent links. Although you can apply several CoS components with the scheduler map, configure only the ones that are required. We recommend that you keep the configuration on the constituent links simple to avoid unnecessary delay in transmission.

Table 5 shows the CoS components to be applied on a multilink bundle and its constituent links.

Table 5: CoS Components Applied on Multilink Bundles and Constituent Links
Cos Component	Multilink Bundle	Constituent Links	Explanation
Classifier	Yes	No	CoS classification takes place on the incoming side of the interface, not on the transmitting side, so no classifiers are needed on constituent links.
Forwarding class	Yes	No	Forwarding class is associated with a queue, and the queue is applied to the interface by a scheduler map. The queue assignment is predetermined on the constituent links. All packets from Q2 of the multilink bundle are assigned to Q2 of the constituent link, and packets from all the other queues are queued to Q0 of the constituent link.
Scheduler map	Yes	Yes	Apply scheduler maps on the multilink bundle and the constituent link as follows: Transmit rate—Make sure that the relative order of the transmit rate configured on Q0 and Q2 is the same on the constituent links as on the multilink bundle. Scheduler priority—Make sure that the relative order of the scheduler priority configured on Q0 and Q2 is the same on the constituent links as on the multilink bundle. Buffer size—Because all non-LFI packets from the multilink bundle transit on Q0 of the constituent links, make sure that the buffer size on Q0 of the constituent links is large enough. RED drop profile—Configure a RED drop profile on the multilink bundle only. Configuring the RED drop profile on the constituent links applies a back pressure mechanism that changes the buffer size and introduces variation. Because this behavior might cause fragment drops on the constituent links, make sure to leave the RED drop profile at the default settings on the constituent links.
Shaping rate for a per-unit scheduler or an interface-level scheduler	No	Yes	Because per-unit scheduling is applied only at the end point, apply this shaping rate to the constituent links only. Any configuration applied earlier is overwritten by the constituent link configuration.
Transmit-rate exact or queue-level shaping	Yes	No	The interface-level shaping applied on the constituent links overrides any shaping on the queue. Thus apply transmit-rate exact shaping on the multilink bundle only.
Rewrite rules	Yes	No	Rewrite bits are copied from the packet into the fragments automatically during fragmentation. Thus what you configure on the multilink bundle is carried on the fragments to the constituent links.
Virtual channel group	Yes	No	Virtual channel groups are identified through firewall filter rules that are applied on packets only before the multilink bundle. Thus you do not need to apply the virtual channel group configuration to the constituent links.

Determine What Causes Jitter and Latency on the Multilink Bundle

Problem
Solution

Problem

Description

To test jitter and latency, you send three streams of IP packets. All packets have the same IP precedence settings. After configuring LFI and CRTP, the latency increased even over a noncongested link. How can you reduce jitter and latency?

Solution

To reduce jitter and latency, do the following:

Make sure that you have configured a shaping rate on each constituent link.
Make sure that you have not configured a shaping rate on the link services interface.
Make sure that the configured shaping rate value is equal to the physical interface bandwidth.
If shaping rates are configured correctly, and jitter still persists, contact the Juniper Networks Technical Assistance Center (JTAC).

Determine If LFI and Load Balancing Are Working Correctly

Problem
Solution

Problem

Description

In this case, you have a single network that supports multiple services. The network transmits data and delay-sensitive voice traffic. After configuring MLPPP and LFI, make sure that voice packets are transmitted across the network with very little delay and jitter. How can you find out if voice packets are being treated as LFI packets and load balancing is performed correctly?

Solution

When LFI is enabled, data (non-LFI) packets are encapsulated with an MLPPP header and fragmented to packets of a specified size. The delay-sensitive, voice (LFI) packets are PPP-encapsulated and interleaved between data packet fragments. Queuing and load balancing are performed differently for LFI and non-LFI packets.

To verify that LFI is performed correctly, determine that packets are fragmented and encapsulated as configured. After you know whether a packet is treated as an LFI packet or a non-LFI packet, you can confirm whether the load balancing is performed correctly.

Solution Scenario—Suppose two Juniper Networks devices, R0 and R1, are connected by a multilink bundle lsq-0/0/0.0 that aggregates two serial links, se-1/0/0 and se-1/0/1. On R0 and R1, MLPPP and LFI are enabled on the link services interface and the fragmentation threshold is set to 128 bytes.

In this example, we used a packet generator to generate voice and data streams. You can use the packet capture feature to capture and analyze the packets on the incoming interface.

The following two data streams were sent on the multilink bundle:

100 data packets of 200 bytes (larger than the fragmentation threshold)
500 data packets of 60 bytes (smaller than the fragmentation threshold)

The following two voice streams were sent on the multilink bundle:

100 voice packets of 200 bytes from source port 100
300 voice packets of 200 bytes from source port 200

To confirm that LFI and load balancing are performed correctly:

Note:

Only the significant portions of command output are displayed and described in this example.

Verify packet fragmentation. From operational mode, enter the show interfaces lsq-0/0/0 command to check that large packets are fragmented correctly.

Meaning—The output shows a summary of packets transiting the device on the multilink bundle. Verify the following information on the multilink bundle:

The total number of transiting packets = 1000
The total number of transiting fragments=1100
The number of data packets that were fragmented =100

The total number of packets sent (600 + 400) on the multilink bundle match the number of transiting packets (1000), indicating that no packets were dropped.

The number of transiting fragments exceeds the number of transiting packets by 100, indicating that 100 large data packets were correctly fragmented.

Corrective Action—If the packets are not fragmented correctly, check your fragmentation threshold configuration. Packets smaller than the specified fragmentation threshold are not fragmented.

Verify packet encapsulation. To find out whether a packet is treated as an LFI or non-LFI packet, determine its encapsulation type. LFI packets are PPP encapsulated, and non-LFI packets are encapsulated with both PPP and MLPPP. PPP and MLPPP encapsulations have different overheads resulting in different-sized packets. You can compare packet sizes to determine the encapsulation type.

A small unfragmented data packet contains a PPP header and a single MLPPP header. In a large fragmented data packet, the first fragment contains a PPP header and an MLPPP header, but the consecutive fragments contain only an MLPPP header.

PPP and MLPPP encapsulations add the following number of bytes to a packet:

PPP encapsulation adds 7 bytes:

4 bytes of header+2 bytes of frame check sequence (FCS)+1 byte that is idle or contains a flag
MLPPP encapsulation adds between 6 and 8 bytes:

4 bytes of PPP header+2 to 4 bytes of multilink header

Figure 1 shows the overhead added to PPP and MLPPP headers.

Figure 1: PPP and MLPPP Headers

For CRTP packets, the encapsulation overhead and packet size are even smaller than for an LFI packet. For more information, see Example: Configuring the Compressed Real-Time Transport Protocol.

Table 6 shows the encapsulation overhead for a data packet and a voice packet of 70 bytes each. After encapsulation, the size of the data packet is larger than the size of the voice packet.

Table 6: PPP and MLPPP Encapsulation Overhead
Packet Type	Encapsulation	Initial Packet Size	Encapsulation Overhead	Packet Size after Encapsulation
Voice packet (LFI)	PPP	70 bytes	4 + 2 + 1 = 7 bytes	77 bytes
Data fragment (non-LFI) with short sequence	MLPPP	70 bytes	4 + 2 + 1 + 4 + 2 = 13 bytes	83 bytes
Data fragment (non-LFI) with long sequence	MLPPP	70 bytes	4 + 2 + 1 + 4 + 4 = 15 bytes	85 bytes

From operational mode, enter the show interfaces queue command to display the size of transmitted packet on each queue. Divide the number of bytes transmitted by the number of packets to obtain the size of the packets and determine the encapsulation type.

Verify load balancing. From operational mode, enter the show interfaces queue command on the multilink bundle and its constituent links to confirm whether load balancing is performed accordingly on the packets.

Meaning—The output from these commands shows the packets transmitted and queued on each queue of the link services interface and its constituent links. Table 7 shows a summary of these values. (Because the number of transmitted packets equaled the number of queued packets on all the links, this table shows only the queued packets.)

Table 7: Number of Packets Transmitted on a Queue
Packets Queued	Bundle lsq-0/0/0.0	Constituent Link se-1/0/0	Constituent Link se-1/0/1	Explanation
Packets on Q0	600	350	350	The total number of packets transiting the constituent links (350+350 = 700) exceeded the number of packets queued (600) on the multilink bundle.
Packets on Q2	400	100	300	The total number of packets transiting the constituent links equaled the number of packets on the bundle.
Packets on Q3	0	19	18	The packets transiting Q3 of the constituent links are for keepalive messages exchanged between constituent links. Thus no packets were counted on Q3 of the bundle.

On the multilink bundle, verify the following:

The number of packets queued matches the number transmitted. If the numbers match, no packets were dropped. If more packets were queued than were transmitted, packets were dropped because the buffer was too small. The buffer size on the constituent links controls congestion at the output stage. To correct this problem, increase the buffer size on the constituent links.
The number of packets transiting Q0 (600) matches the number of large and small data packets received (100+500) on the multilink bundle. If the numbers match, all data packets correctly transited Q0.
The number of packets transiting Q2 on the multilink bundle (400) matches the number of voice packets received on the multilink bundle. If the numbers match, all voice LFI packets correctly transited Q2.

On the constituent links, verify the following:

The total number of packets transiting Q0 (350+350) matches the number of data packets and data fragments (500+200). If the numbers match, all the data packets after fragmentation correctly transited Q0 of the constituent links.

Packets transited both constituent links, indicating that load balancing was correctly performed on non-LFI packets.
The total number of packets transiting Q2 (300+100) on constituent links matches the number of voice packets received (400) on the multilink bundle. If the numbers match, all voice LFI packets correctly transited Q2.

LFI packets from source port 100 transited se-1/0/0, and LFI packets from source port 200 transited se-1/0/1. Thus all LFI (Q2) packets were hashed based on the source port and correctly transited both constituent links.

Corrective Action—If the packets transited only one link, take the following steps to resolve the problem:

Determine whether the physical link is up (operational) or down (unavailable). An unavailable link indicates a problem with the PIM, interface port, or physical connection (link-layer errors). If the link is operational, move to the next step.
Verify that the classifiers are correctly defined for non-LFI packets. Make sure that non-LFI packets are not configured to be queued to Q2. All packets queued to Q2 are treated as LFI packets.
Verify that at least one of the following values is different in the LFI packets: source address, destination address, IP protocol, source port, or destination port. If the same values are configured for all LFI packets, the packets are all hashed to the same flow and transit the same link.

Use the results to verify load balancing.

Determine Why Packets Are Dropped on a PVC Between a Juniper Networks Device and a Third-Party Device

Problem
Solution

Problem

Description

You are configuring a permanent virtual circuit (PVC) between T1, E1, T3, or E3 interfaces on a Juniper Networks device and a third-party device, and packets are being dropped and ping fails.

Solution

If the third-party device does not have the same FRF.12 support as the Juniper Networks device or supports FRF.12 in a different way, the Juniper Networks device interface on the PVC might discard a fragmented packet containing FRF.12 headers and count it as a "Policed Discard."

As a workaround, configure multilink bundles on both peers, and configure fragmentation thresholds on the multilink bundles.

Troubleshooting Security Policies

Synchronizing Policies Between Routing Engine and Packet Forwarding Engine
Checking a Security Policy Commit Failure
Verifying a Security Policy Commit
Debugging Policy Lookup

Synchronizing Policies Between Routing Engine and Packet Forwarding Engine

Problem
Solution

Problem

Description
Environment
Symptoms

Description

Security policies are stored in the routing engine and the packet forwarding engine. Security policies are pushed from the Routing Engine to the Packet Forwarding Engine when you commit configurations. If the security policies on the Routing Engine are out of sync with the Packet Forwarding Engine, the commit of a configuration fails. Core dump files may be generated if the commit is tried repeatedly. The out of sync can be due to:

A policy message from Routing Engine to the Packet Forwarding Engine is lost in transit.
An error with the routing engine, such as a reused policy UID.

Environment

The policies in the Routing Engine and Packet Forwarding Engine must be in sync for the configuration to be committed. However, under certain circumstances, policies in the Routing Engine and the Packet Forwarding Engine might be out of sync, which causes the commit to fail.

Symptoms

When the policy configurations are modified and the policies are out of sync, the following error message displays - error: Warning: policy might be out of sync between RE and PFE <SPU-name(s)> Please request security policies check/resync.

Solution

Use the show security policies checksum command to display the security policy checksum value and use the request security policies resync command to synchronize the configuration of security policies in the Routing Engine and Packet Forwarding Engine, if the security policies are out of sync.

Checking a Security Policy Commit Failure

Problem
Solution

Problem

Description

Most policy configuration failures occur during a commit or runtime.

Commit failures are reported directly on the CLI when you execute the CLI command commit-check in configuration mode. These errors are configuration errors, and you cannot commit the configuration without fixing these errors.

Solution

To fix these errors, do the following:

Review your configuration data.
Open the file /var/log/nsd_chk_only. This file is overwritten each time you perform a commit check and contains detailed failure information.

Verifying a Security Policy Commit

Problem
Solution

Problem

Description

Upon performing a policy configuration commit, if you notice that the system behavior is incorrect, use the following steps to troubleshoot this problem:

Solution

Operational show Commands—Execute the operational commands for security policies and verify that the information shown in the output is consistent with what you expected. If not, the configuration needs to be changed appropriately.
Traceoptions—Set the traceoptions command in your policy configuration. The flags under this hierarchy can be selected as per user analysis of the show command output. If you cannot determine what flag to use, the flag option all can be used to capture all trace logs.

You can also configure an optional filename to capture the logs.

If you specified a filename in the trace options, you can look in the /var/log/<filename> for the log file to ascertain if any errors were reported in the file. (If you did not specify a filename, the default filename is eventd.) The error messages indicate the place of failure and the appropriate reason.

After configuring the trace options, you must recommit the configuration change that caused the incorrect system behavior.

Debugging Policy Lookup

Problem
Solution

Problem

Description

When you have the correct configuration, but some traffic was incorrectly dropped or permitted, you can enable the lookup flag in the security policies traceoptions. The lookup flag logs the lookup related traces in the trace file.

Solution

Log Error Messages used for Troubleshooting ISSU-Related Problems

The following problems might occur during an ISSU upgrade. You can identify the errors by using the details in the logs. For detailed information about specific system log messages, see System Log Explorer.

Chassisd Process Errors
Understanding Common Error Handling for ISSU
ISSU Support-Related Errors
Initial Validation Checks Failure
Installation-Related Errors
Redundancy Group Failover Errors
Kernel State Synchronization Errors

Chassisd Process Errors

Problem
Solution

Problem

Description

Errors related to chassisd.

Solution

Use the error messages to understand the issues related to chassisd.

When ISSU starts, a request is sent to chassisd to check whether there are any problems related to the ISSU from a chassis perspective. If there is a problem, a log message is created.

Understanding Common Error Handling for ISSU

Problem
Solution

Problem

Description

You might encounter some problems in the course of an ISSU. This section provides details on how to handle them.

Solution

Any errors encountered during an ISSU result in the creation of log messages, and ISSU continues to function without impact to traffic. If reverting to previous versions is required, the event is either logged or the ISSU is halted, so as not to create any mismatched versions on both nodes of the chassis cluster. Table 8 provides some of the common error conditions and the workarounds for them. The sample messages used in the Table 8 are from the SRX1500 device and are also applicable to all supported SRX Series Firewalls.

Table 8: ISSU-Related Errors and Solutions
Error Conditions	Solutions
Attempt to initiate an ISSU when previous instance of an ISSU is already in progress	The following message is displayed: `warning: ISSU in progress` You can abort the current ISSU process, and initiate the ISSU again using the `request chassis cluster in-service-upgrade abort` command.
Reboot failure on the secondary node	No service downtime occurs, because the primary node continues to provide required services. Detailed console messages are displayed requesting that you manually clear existing ISSU states and restore the chassis cluster. error: [Oct 6 12:30:16]: Reboot secondary node failed (error-code: 4.1) error: [Oct 6 12:30:16]: ISSU Aborted! Backup node maybe in inconsistent state, Please restore backup node [Oct 6 12:30:16]: ISSU aborted. But, both nodes are in ISSU window. Please do the following: 1. Rollback the node with the newer image using rollback command Note: use the 'node' option in the rollback command otherwise, images on both nodes will be rolled back 2. Make sure that both nodes (will) have the same image 3. Ensure the node with older image is primary for all RGs 4. Abort ISSU on both nodes 5. Reboot the rolled back node Starting with Junos OS Release 17.4R1, the hold timer for the initial reboot of the secondary node during the ISSU process is extended from 15 minutes (900 seconds) to 45 minutes (2700 seconds) in chassis clusters on SRX1500, SRX4100, SRX4200, and SRX4600 devices.
Secondary node failed to complete the cold synchronization	The primary node times out if the secondary node fails to complete the cold synchronization. Detailed console messages are displayed that you manually clear existing ISSU states and restore the chassis cluster. No service downtime occurs in this scenario. [Oct 3 14:00:46]: timeout waiting for secondary node node1 to sync(error-code: 6.1) Chassis control process started, pid 36707 error: [Oct 3 14:00:46]: ISSU Aborted! Backup node has been upgraded, Please restore backup node [Oct 3 14:00:46]: ISSU aborted. But, both nodes are in ISSU window. Please do the following: 1. Rollback the node with the newer image using rollback command Note: use the 'node' option in the rollback command otherwise, images on both nodes will be rolled back 2. Make sure that both nodes (will) have the same image 3. Ensure the node with older image is primary for all RGs 4. Abort ISSU on both nodes 5. Reboot the rolled back node
Failover of newly upgraded secondary failed	No service downtime occurs, because the primary node continues to provide required services. Detailed console messages are displayed requesting that you manually clear existing ISSU states and restore the chassis cluster. [Aug 27 15:28:17]: Secondary node0 ready for failover. [Aug 27 15:28:17]: Failing over all redundancy-groups to node0 ISSU: Preparing for Switchover error: remote rg1 priority zero, abort failover. [Aug 27 15:28:17]: failover all RGs to node node0 failed (error-code: 7.1) error: [Aug 27 15:28:17]: ISSU Aborted! [Aug 27 15:28:17]: ISSU aborted. But, both nodes are in ISSU window. Please do the following: 1. Rollback the node with the newer image using rollback command Note: use the 'node' option in the rollback command otherwise, images on both nodes will be rolled back 2. Make sure that both nodes (will) have the same image 3. Ensure the node with older image is primary for all RGs 4. Abort ISSU on both nodes 5. Reboot the rolled back node {primary:node1}
Upgrade failure on primary	No service downtime occurs, because the secondary node fails over as primary and continues to provide required services.
Reboot failure on primary node	Before the reboot of the primary node, devices being out of the ISSU setup, no ISSU-related error messages are displayed. The following reboot error message is displayed if any other failure is detected: Reboot failure on Before the reboot of primary node, devices will be out of ISSU setup and no primary node error messages will be displayed. Primary node

ISSU Support-Related Errors

Problem
Solution

Problem

Description

Installation failure occurs because of unsupported software and unsupported feature configuration.

Solution

Use the following error messages to understand the compatibility-related problems:

Initial Validation Checks Failure

Problem
Solution

Problem

Description

The initial validation checks fail.

Solution

The validation checks fail if the image is not present or if the image file is corrupt. The following error messages are displayed when initial validation checks fail when the image is not present and the ISSU is aborted:

When Image Is Not Present

When Image File Is Corrupted

If the image file is corrupted, the following output displays:

The primary node validates the device configuration to ensure that it can be committed using the new software version. If anything goes wrong, the ISSU aborts and error messages are displayed.

Installation-Related Errors

Problem
Solution

Problem

Description

The install image file does not exist or the remote site is inaccessible.

Solution

Use the following error messages to understand the installation-related problems:

ISSU downloads the install image as specified in the ISSU command as an argument. The image file can be a local file or located at a remote site. If the file does not exist or the remote site is inaccessible, an error is reported.

Redundancy Group Failover Errors

Problem
Solution

Problem

Description

Problem with automatic redundancy group (RG) failure.

Solution

Use the following error messages to understand the problem:

Kernel State Synchronization Errors

Problem
Solution

Problem

Description

Errors related to ksyncd.

Solution

Use the following error messages to understand the issues related to ksyncd:

ISSU checks whether there are any ksyncd errors on the secondary node (node 1) and displays the error message if there are any problems and aborts the upgrade.

Change History Table

Feature support is determined by the platform and release you are using. Use Feature Explorer to determine if a feature is supported on your platform.

Release

Description

17.4R1

Starting with Junos OS Release 17.4R1, the hold timer for the initial reboot of the secondary node during the ISSU process is extended from 15 minutes (900 seconds) to 45 minutes (2700 seconds) in chassis clusters on SRX1500, SRX4100, SRX4200, and SRX4600 devices.

ON THIS PAGE

Monitoring and Troubleshooting

Ping Hosts

Purpose

Action

Sample Output

command-name

Meaning

Monitor Traffic Through the Router or Switch

Display Real-Time Statistics About All Interfaces on the Router or Switch

Purpose

Action

Sample Output

command-name

Meaning

Display Real-Time Statistics About an Interface on the Router or Switch

Purpose

Action

Sample Output

command-name

Meaning

Dynamic Ternary Content Addressable Memory Overview

Applications using Dynamic TCAM Infrastructure

Features Using TCAM Resource

Monitoring TCAM Resource Usage

Example: Monitoring and Troubleshooting the TCAM Resource

Monitoring and Troubleshooting TCAM Resource in ACX Series Routers

Service Scaling on ACX5048 and ACX5096 Routers

Troubleshooting DNS Name Resolution in Logical System Security Policies (Primary Administrators Only)

Problem

Description

Cause

Solution

See Also

Troubleshooting the Link Services Interface

Determine Which CoS Components Are Applied to the Constituent Links

Problem

Description

Solution

See Also

Determine What Causes Jitter and Latency on the Multilink Bundle

Problem

Description

Solution

Determine If LFI and Load Balancing Are Working Correctly

Problem

Description

Solution

Determine Why Packets Are Dropped on a PVC Between a Juniper Networks Device and a Third-Party Device

Problem

Description

Solution

Troubleshooting Security Policies

Synchronizing Policies Between Routing Engine and Packet Forwarding Engine

Problem

Description

Environment

Symptoms

Solution

See Also

Checking a Security Policy Commit Failure

Problem

Description

Solution

Verifying a Security Policy Commit

Problem

Description

Solution

Debugging Policy Lookup

Problem

Description

Solution

Log Error Messages used for Troubleshooting ISSU-Related Problems

Chassisd Process Errors

Problem

Description

Solution

Understanding Common Error Handling for ISSU

Problem

Description