Known Issues

This section lists the known issues in Juniper Routing Director.

Device Life-Cycle Management

In a scaled environment, the network implementation plan provisioning might sporadically fail if multiple provisioning jobs are executed consecutively at a high rate. You might see the following error in the workflow GUI:
Provisioning is failing at upload-network-resources due to the error "numbered is in use and cannot be deleted; app.ErrorCode:3"
Workaround: Retry publishing the failed network implementation plan, and space out the provisioning jobs to avoid this issue.
In clusters with inter-node delays in the tens of milliseconds, the workflow execution time increases. As a result, the time required to onboard devices and instantiate services also increases.

Workaround: None.
Changing the router ID of a device after onboarding might create a duplicate node in the topology.

Workaround: If you prefer to change the router ID, you must offboard the device, update the router ID in the configuration, and then onboard the device again.
If you upgrade a network implementation plan to the latest service design version and publish it without making changes, the service design is upgraded again instead of publishing the network implementation plan.

Workaround: Instead of directly publishing a network implementation plan after service design upgrade, ensure that you edit the network implementation plan before publishing.
After you upgrade a service design, you must manually upgrade the service design that is used in the existing network implementation plans.

To upgrade the infrastructure service design used by an implementation plan:
1. On the Network Implementation Plan page (Inventory > Device Onboarding > Network Implementation Plan), select the plan and click More > Upgrade Service Design.
  
  The Upgrade Service Design page appears.
2. On the Upgrade Service Design dialog box, click the drop-down list to select the service design version to which you want to upgrade the network implementation plan, and click Yes.
A message indicating that the service design upgrade is in process is displayed. After a successful upgrade, the Service Design column displays the new version of the infrastructure service design used by the network implementation plan.
If a network implementation plan includes multiple devices, the device onboarding might not start automatically for some devices. However, you might notice that the device status is displayed as Connected on the Inventory page (Inventory > Devices > Network Inventory).

Workaround: You must manually trigger the onboarding process for such devices. To manually trigger the onboarding process:
1. On the Network Implementation Plan page (Inventory > Device Onboarding), select the network implementation plan and click More > Trigger Onboarding. The Trigger Onboarding page appears.
2. Click the Show All devices option and then select the device for which you want to restart onboarding.
3. Click OK. A message indicating that onboarding has started appears.
When a vMX is deployed using I2C ID 161, all commit operations will fail after any subscriber is created.

Workaround: Delete the subscribers and then commit the configuration.
Routing Director triggers the configuration templates included in a device profile and interface profile only during the initial onboarding of the device. You cannot use the configuration templates included in the device profiles and interface profiles to apply additional configuration on a device after the device is onboarded.

Workaround: If you need to apply additional configuration on a device after the device is onboarded, you need to manually apply the configuration using the CLI or by executing the configuration templates through the Routing Director GUI.

Observability

Upgrading from release 2.4.0 to release 2.5.0 fails for all components in routing observability.

Workaround: Delete all databases related to routing observability and reinstall routing observability components afresh.
Upgrading from release 2.4.0 to release 2.5.0 for fails for routing observability BGP components due to differences in implementations meant to improve ingest performance.

Workaround: Delete all BGP related databases and install release 2.5.0 BGP components.
Under scale ingest, the service bgp-measurements-compute may restart continuously after crashing. This affects the calculation of aggregates such as the total routes graph in Observability > Routing > Routing Explorer > Routing Status. Other related dashboards and tables are unaffected.

Workaround: None.
After you upgrade to Release 2.5.0, you might notice that the following pages might take more than 10 minutes to display device-related information:
1. Inventory page (Observability > Troubleshoot devices > Device-Name)
2. Devices tab on the Topology page (Observability > Topology)
Workaround: After you upgrade, restart services using the kubectl rollout restart -n streams deployment/papi-mon-v2 command.
The Alert icon and the alert message are out of sync during the first event change on the Custom KPI Collection page (Observability > Health > Custom KPI Collection).

Workaround: Wait for the next data refresh cycle, which automatically corrects the alert icon display issue.
On the LLM Connector (Routing Director chatbot), you might not be able to continue the conversation if the session expires or if you open a new conversation

Workaround: Execute the following commands on the Linux root shell of a primary node:

# kubectl get svc -n common | grep opensearch-cluster-master

# curl -X POST :9200/ask-paragon-chat-sessions/_rollover/
After a device is onboarded, Routing Director continuously monitors the KPIs related to device health. For each KPI, Routing Director monitors the KPI, forecasts the range, and detects any anomalies that occur. If a KPI value changes, the forecasted range takes approximately two hours to stabilize.

Workaround: None.
During heavy ingest scenarios such as onboarding of routers for the first time or router maintenance windows, it takes sometime for the total number of routes to be reflected on the Routing Status graph (Observability > Routing > Routing Explorer Routing Status tab).

If there are any events in the network, the Routing Status graph or the Routing Updates table (Observability > Routing > Route Explorer > Routing Updates) might display the data with substantial latency. We expect that the latency is reasonable during steady state operation of the network.

Also, the statistics in the Device tab (Observability > Routing > Route Explorer > Routing Status) or in the Adjacencies tab (Observability > Routing > Route Explorer) are updated with low latency (1 to 5 minutes).

Workaround: None.
If you try to create an LSP using the REST API and if you are reusing an existing LSP name, then the REST API server does not return an error.

Workaround: None.

While adding a device profile for a network implementation plan, if you enable Routing Protocol Analytics then the routing data is collected for the devices listed in the device profile. When you publish the network implementation plan, even though the onboarding workflow appears to be successful there might be errors related to the collection of routing data for these devices. Because of these errors, the devices will not be configured to send data to Routing Director and therefore the routing data will not be displayed on Route Explorer page of the Routing Director GUI. This issue occurs while offboarding devices as well, where the offboarded devices continue to send data to Routing Director.

This issue also occurs when you have not configured ASN or Router ID on the devices, or when you have locked device configuration for exclusive editing.

Workaround: To fix this issue:

Do one of the following:

Check the service logs by running the request paragon debug logs namespace routingbot app routingbot service routingbot-apiserver Shell command. Take the necessary action based on the error messages that you see in Table 1.

Table 1: Error Messages
Error Messages	Issue
Failed to get device profile info for dev_id {dev_id}: {res.status_code} - {res.text} Failed to get device info for dev_id {dev['dev_id']}. Skipping device.	The API call to PAPI to get the device information has failed.
No results found in the response for dev_id {dev_id} Failed to get device info for dev_id {dev['dev_id']}. Skipping device.	The API call to PAPI returns a response with no data.
Complete device info not found in the response for dev_id {dev_id} : {device_info}	The API call to PAPI returns a response with incomplete data.
No data found for dev_id {dev_id} from PF	The API call to Pathfinder to get the device information has failed.
Required data not found for dev_id {dev_id} from PF data:{node_data}	The API call to Pathfinder to get device information returns a response with incomplete data.
EMS config failed with error, for config: {cfg_data} or EMS Config push error {res} {res.text} \| try: {retries}. Failed to configure BMP on device {mac_id}	BGP configuration has failed.
Invalid format for major, minor, or release version : {os_version}	The device's OS version is not supported.
Error POST {self.config_server_path}/api/v2/config/device/{dev_id}/ {data} {res.json()}	Playbook application has failed.
Error PUT:{self.config_server_path}/api/v2/config/device/{dev_id}/ {data} {res_put.json()}	Playbook removal has failed.
Error PUT:{self.config_server_path}/api/v2/config/device/{dev_id}/ {data} {res_put.json()}	Device or playbook application to device-group has failed.
Error PUT {self.config_server_path}/api/v2/config/device-group/{site_id}/ {data} {res_put.json()}	Device or playbook removal from device-group has failed.

Examine the device configuration to check whether the device shows unexpected absence or presence of the configuration. For example, you can,
- View the configurations present under set groups paragon-routing-bgp-analytics routing-options bmp.
- Check the device configuration in the JTIMON pod.

After resolving the above issues, edit the device profile of the network implementation plan that you have applied for the device. Based on whether you are onboarding or offboarding devices, enable or disable the Routing Protocol Analytics option in the device profile.
Publish the network implementation plan.
Verify whether the required results are seen based on the data that is displayed on the Route Explorer page of the Routing Director GUI.

On the Interfaces accordion, FEC uncorrected errors charts are available only on interfaces that support speeds equal to or greater than 100-Gbps.
After you apply a new configuration for a device, the Active Configuration for Device-Name page (Observability> Troubleshoot Device > Device-Name > Configuration accordion > View active config link) does not display the latest configuration immediately. It takes several minutes for the latest changes to be reflected on the Active Configuration for Device-Name page.

Workaround: You can verify whether the new configurations are applied to the device by logging in to the device using CLI.
The number of unhealthy devices listed on the Troubleshoot Devices and Health Dashboard pages (Observability > Health) do not match.

Workaround: None.
You cannot delete unwanted nodes and links from the Routing Director GUI.

Workaround: Use the following REST APIs to delete nodes and links:
- REST API to delete a link:
  
  [DELETE] https://{{server_ip}}/topology/api/v1/orgs/{{org_id}}/{{topo_id}}/links/{{link_id}}
  
  Note:
  You can follow the steps described here to get the actual URL.
  
  For example,
  - URL: 'https://10.56.3.16/topology/api/v1/orgs/f9e9235b-37f1-43e7-9153-e88350ed1e15/10/links/15'
  - Curl:
- REST API to delete a node:
  
  [DELETE] https:// {{Server_IP}}/topology/api/v1/orgs/{{Org_ID}}/{{Topo_ID}}/nodes/{{Node_ID}}
  
  Note:
  You can follow the steps described here to get the actual URL.
  
  For examples,
  - URL: ' https://10.56.3.16/topology/api/v1/orgs/f9e9235b-37f1-43e7-9153-e88350ed1e15/10/nodes/1'
  - Curl:
  Use the following procedure to get the actual URL that you use in CURL for deleting a link or a node:
  1. Navigate to the Topology page (Observability > Topology).
  2. Open the developer tool in the browser by using the CTRL + Shift + I buttons in the keyboard.
  3. In the developers tool, select Network and select the XHR filter option.
  4. Identify the link index number or node number. To identify the link index number to the node number:
    1. On the Topology page of the Routing Director GUI, double click the link or the node that you want to delete.
      
      The Link Link-Name page or the Node Node-Name page appears.
    2. Navigate to the Details tab and note the link index number or the node number that is displayed.
  5. In the developers tool, select and click the row based on the link index number or the node number that is related to the link or the node that you want to delete.
  6. Copy the URL that you need to use to delete the link or node in CURL.

Not all optics modules support all the optics-related KPIs. See Table 2 for more information.

Workaround: None.

Table 2: KPIs Supported for Optics Modules
Module	Rx Loss of Signal KPI	Tx Loss of Signal KPI	Laser Disabled KPI
SFP optics	No	No	No
CFP optics	Yes	No	No
CFP_LH_ACO optics	Yes	No	No
QSFP optics	Yes	Yes	Yes
CXP optics	Yes	Yes	No
XFP optics	No	No	No

For PTX100002 devices, the following issues are observed on the Interface accordion (Observability > Health > Troubleshoot Devices > Device-Name > Overview):
- On the Pluggables Details for Device-Name page (Interfaces accordion > Pluggables data-link), the Optical Tx Power and Optical Rx Power graphs do not display any data.
- On the Input Traffic Details for Device-Name page (Interfaces accordion > Input Traffic data-link), the Signal Functionality graph does not display any data.

Service Orchestration

In a scaled environment when you provision multiple VPN instances parallelly, some of the instances may fail with the following error in the workflow:
Workaround: Retry provisioning for the failed VPN instances.
The following error message is displayed, although the service design upgrade is successful.

Service Design upgrade failed for 1 instance(s)

The incorrect error message is displayed only when you try to upgrade the service design in a VPWS instance that was originally provisioned in Release 2.4.1 and is now upgraded to Release 2.5.0.

Workaround: None.
On a scaled setup, the Resource Instance page takes a longer time to load due to slower responses of instances and the /service-designs REST API calls.

Workaround: None.
Before you upgrade an L2VPN service instance that was created in a release earlier to Release 2.5.0, ensure that you update the vpn-resources instance to Release 2.5.0.

Workaround: None.
In a scale set up, the View Network Resources page (Orchestration > Service > Resource Instances > More) might be unresponsive.

Workaround: Do one of the following:
- Select the network resources and click the edit option to view and download the details in the JSON format.
  
  or
- Use the REST API, /service-orchestration/api/v1/orgs/{org-id}/placement/network-elements, to view the details.
When creating l2-addr resource instance, adding only the LACP Admin Key without System ID results in a failure.

Workaround: When you create an l2-addr resource instance with LACP Admin Key, also create an entry in the System ID table to avoid failure.
If more than 2600 L3VPN services are provisioned, the placement service intermittently suffers a break in service which may cause service placement to fail. The placement service will automatically recover from this failure.

Workaround: Re-provision any services that failed placement due to this issue.
The Service Designs page (Orchestration > Service Catalog) might list certain internal service files. You can ignore these files.

Workaround: None.
The Add Access Parameter page displays all options for the Tag Type field (Untagged, dot1q, or qinq), irrespective of whether you have chosen tagged or untagged interfaces for a customer edge (CE) device.

Workaround: None.
When you modify the tag type from dot1q to qinq for the existing CE devices (that is, the ce or ces option of the CE Spec field) on the Access page (Orchestration > Resource Instances > Modify Resource-Instance-Name > Resource-Instance-Name > + icon above the Access Interfaces table), the VLAN information displays both tag types resources (numbered and dual numbered).

Workaround: Create a topology resource based on the tag type and do not modify the topology resources based on the tag type for the existing CE devices.
If different L3VPN services are running on the same IFD using different MTU values, then service provisioning fails.

Workaround: Ensure that the MTU values are the same for L3VPN services that share the same IFD.
The following accordions on the Passive Assurance tab (Orchestration > Instances > Service-Order-Name Details) displays incorrect or no data:
- BGP accordion—The VPN State column displays incorrect data for customer edge (CE) or provider edge (PE) devices with IPv4 or IPv6 neighbors.
- OSPF accordion—There are no IPv6 entries in the Neighbor Address column for CE or PE devices with IPv6 neighbors.
- L3VPN accordion—The VPN State column displays incorrect data for OSPF and BGP protocols. The Neighbor Session and VPN State columns are blank for CE or PE devices with static IPv4 or IPv6 address.
This issue occurs only for an L3VPN service.

Workaround: None.
The device name is not displayed when you hover over the View Details hyperlink in the Relevant Events section of the L3VPN accordion (Orchestration > Instances > Service Instances > Service-Instance-Name hyperlink > Service-Instance-Name Details > Passive Assurance tab).

Workaround: None.
For an MX 240 device, the OSPF-related data is not populated on the Passive Assurance tab (Orchestration > Instances > Service-Order-Name Details).

Workaround: Configure OSPF on the customer edge (CE) device.
When you click the Refresh icon on the Service-Instance-Name Details page (Orchestration > Instances > Service-Instance-Name), you may not see the latest events in the Relevant Events section.

Workaround: To view the latest events, instead of using the Refresh icon go to the Service Instance page (Orchestration > Instances) and select the service instance for which you need to see the latest events.
The Order History tab on the L3VPN-Name Details page (Orchestration > Instances > Service-Instance-Name hyperlink) lists all the order history if you deprovision a service instance and later provision a service using the same details as that of the deprovisioned service.

Workaround: None.
In a scaled setup, you cannot upgrade service designs in bulk.

Workaround: We recommend that you upgrade only one service design at a time.

Active Assurance

After you perform the Undelete operation, the commit operation for the Test Agent fails. This issue occurs regardless of the interface involved.

Workaround: Delete the orphan VLAN interfaces. After deletion, the commit functionality is restored for the affected Test Agent.
When you create a Monitor with 600 streams, you might encounter Monitor Creation Timeout error and the Monitor might automatically stop.

Workaround: Restart the Monitor from the Monitor-Name page (Observability > Active Assurance > Monitors >Monitor-Name) and click More > Start) on the Routing Director GUI.
When you click the Distribution tab on the Application-Name page (Observability > Health > Health Dashboard > Active Assurance (Tab) > Applications (Accordion) > View Details), the page hangs and you might not be able to see metrics and site-related data for a Measurement.

Workaround: None.
The status of a Test Agent is shown as offline after the device's Routing Engine switches over from the primary Routing Engine to the backup Routing Engine, or vice versa. This issue occurs only if you are using a Junos OS version that is older than 23.4R2.

Workaround: Reinstall Test Agent after the Routing Engine switchover.
When you add a new host to the existing Monitor, the new measurements are not reflected in the Active Assurance tab of the Health Dashboard (Observability > Health).

Workaround: None.

Network Optimization

You cannot configure OSPF TE Metric A and OSPF TE Metric Z parameters for a link that has OSPF enabled on it.

Workaround: None.
Due to Kafka message size limitation, you can delete only 200 LSPs at a time.

Workaround: None.
The link utilization reroute threshold feature works only if the same threshold values are specified on the nodes of a link.

Ensure that you specify the same values for Util Reroute Threshold AZ and Util Reroute Threshold ZA fields on the Edit Link page (Observability > Network > Topology > Link tab > Link-Name > Edit).
If the configuration database is locked exclusively by a root terminal session, then tunnel provisioning fails and the status is displayed as Unknown.

Workaround: Use the Edit LSP option in the GUI and re-provision the tunnel.
If there are multiple ECMP diverse paths and if you have enabled periodic re-optimization, then the diverse LSPs might switch back and forth between two routing paths.

Workaround: If you do not prefer this behavior, set the Path Type as Preferred on the Modify LSP page.
Sometimes, an LSP provisioning might not be successful, and you might see the PCC_Pending error displayed on the tunnels table of the Topology (Observability > Topology) page.

Workaround: Restart the PCEP session on head-end routers by deactivating and activating the protocols and PCE-related statements in the Junos OS configuration.
When you update the AS number on the Dynamic Topology tab of the Topology Settings Options page (Observability > Topology > Settings icon), the updated AS number is not reflected in the containerized routing protocol process (daemon) (cRPD).

Workaround: In addition to updating the AS number using the Routing Director GUI, you must log in to the cRPD CLI and update the AS number.
In rare cases, the topology-related information might be lost or incomplete. This issue occurs due to the inaccessibility of databases.

Workaround: Restart the toposerver to restore topology information.

To restart the toposerver connect to the primary node of the Routing Director cluster and run the kubectl -n $(kubectl get namespaces -o jsonpath='{.items}' | jq -r '.[] | select(.metadata.name| startswith("pf-"))|.metadata.name') rollout restart deployment toposerver command.
In broadcast links exist in the network, Segment Routing (SR) LSPs may not be created.

Workaround: Change broadcast links to point-to-point links in the router configuration.

Network Planner

Planner reports are deleted based on the retention policy settings. However, reports older than the retention period are deleted only when the scripts are run. The cleanup scripts are scheduled to run at midnight everyday.

Workaround: None.

Trust

There are no known issues in this release.

Administration

LDAP authentication may not work for users that are not included in the CN=Users container.

Workaround: Add users to the CN=Users container.

Installation and Upgrade

Sometimes when taking a backup, the OpenSearch backup status is Not Available but others are successful. For example:

This is due to the status flag not being updated.

Workaround: Validate the OpenSearch backup status using the following commands:

Determine the backup-server service IP address from the Linux root shell.
Retrieve the status of OpenSearch backup.
The status should be displayed as SUCCESS.

After restoring from a backup, sometimes Airflow installation fails when you try to reinstall all application services using the request paragon service start command. It is possible that the Airflow pods are still shutting down.

Workaround: Perform the following steps:
1. Stop all running services.
2. Run the following command from the Linux root shell to remove all Airflow db-pooler pods.
3. Wait till all the pods terminate. Use the following command to ensure that all Airflow db-pooler pods are removed and no pods are in Running status. The command output must be empty.
4. Perform the restore operation as usual.
5. After restore is complete, if the db-pooler pods don't come up automatically. Start the db-pooler pods manually.
You might encounter the following error while upgrading or redeploying Routing Director:
This issue occurs because /root/epic/host.ip is empty.

Workaround: Manually re-populate /root/epic/host.ip with the local IP address before upgrading or redeploying Routing Director. You can also optionally make the /root/epic/host.ip file immutable to prevent overwrites.

ON THIS PAGE