Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation  Back up to About Overview 

Known Issues

AWS Spoke

  • The AWS device activation process takes up to 30 minutes. If the process does not complete in 30 minutes, a timeout might occur and you must retry the process. You do not need to download the cloud formation template again.

    To retry the process:

    1. Log in to Customer Portal.
    2. Access the Activate Device page, enter the activation code, and click Next.
    3. After the CREATE_COMPLETE message is displayed on the AWS server, click Next on the Activate Device page to proceed with device activation.
  • For an AWS spoke, during the activation process, the device status on the Activate Device page is displayed as Detected even though the device is down.

    Workaround: None.

    Bug Tracking Number: CXU-19779.

CSO HA

  • In a CSO HA environment, two RabbitMQ nodes are clustered together, but the third RabbitMQ node does not join the cluster. This might occur just after the initial installation, if a virtual machine reboots, or if a virtual machine is powered off and then powered on.

    Workaround: Do the following:

    1. Log in to the RabbitMQ dashboard for the central microservices VM (http://central-microservices-vip:15672) and the regional microservices VM (http://regional-microservices-vip:15672).
    2. Check the RabbitMQ overview in the dashboards to see if all the available infrastructure nodes are present in the cluster.
    3. If an infrastructure node is not present in the cluster, do the following:
      1. Log in to the VM of that infrastructure node.
      2. Open a shell prompt and execute the following commands sequentially:

        rabbitmqctl stop_app

        service rabbitmq-server stop

        rabbitmqctl stop_app command

        rm -rf /var/lib/rabbitmq/mnesia/

        service rabbitmq-server start

        rabbitmqctl start_app

    4. In the RabbitMQ dashboards for the central and regional microservices VMs, confirm that all the available infrastructure nodes are present in the cluster.

    Bug Tracking Number: CXU-12107

  • In an HA setup, the time configured for the CAN VMs might not be synchronized with the time configured for the other VMs in the setup. This can cause issues in the throughput graphs.

    Workaround:

    1. Log in to can-vm1 as the root user.
    2. Modify the /etc/ntp.conf file to point to the desired NTP server.
    3. Restart the NTP process.

    After the NTP process restarts successfully, can-vm2 and can-vm3 automatically resynchronize their times with can-vm1.

    Bug Tracking Number: CXU-15681.

  • In an HA setup, after the central or regional microservices server goes down, policy deployments are stuck in the In Progress state.

    Workaround: Contact Juniper Networks Technical Support.

    Bug Tracking Number: CXU-20099.

  • In an HA setup, when a failed infrastructure VM recovers, it might not join the ArangoDB cluster.

    Workaround:

    1. Log in to the centralinfravm3 VM.
    2. Execute the service arangodb3.cluster stop command.
    3. Log in to the centralinfravm2 VM.
    4. Execute the service arangodb3.cluster stop command.
    5. Log in to the centralinfravm1 VM.
    6. Execute the service arangodb3.cluster stop command.
    7. On the centralinfravm1 VM, execute the service arangodb3.cluster start command and wait for 20 seconds for the command to finish executing.
    8. On the centralinfravm2 VM, execute the service arangodb3.cluster start command and wait for 20 seconds for the command to finish executing.
    9. On the centralinfravm3 VM, execute the service arangodb3.cluster start command and wait for 20 seconds for the command to finish executing.

    Bug Tracking Number: CXU-20430.

  • In some cases, when the power fails, the ArangoDB cluster does not form.

    Workaround: Use the workaround specified for CXU-20340.

    Bug Tracking Number: CXU-20346.

  • When an HA setup comes back up after a power outage, MariaDB instances do not come back up on the VMs.

    Workaround:

    You can recover the MariaDB instances by executing the recovery.sh script (that is packaged with the CSO installation package:

    1. Log in to the installer VM.
    2. Navigate to the current deployment directory for CSO; for example, /root/Contrail_Service_Orchestration_3.3/,
    3. Execute the ./recovery.sh command and follow the instructions.

    Bug Tracking Number: CXU-20260,

SD-WAN

  • In CSO Release 3.3.0, the LTE link can be only a backup link. Therefore, the SLA metrics are not applicable and default values of zero might be displayed on the Application SLA Performance page, which can be ignored.

    Workaround: None.

    Bug Tracking Number: CXU-19943

  • In a dual CPE spoke, non-cacheable applications do not work when the initial path is on CPE0 and the APBR path selected is on CPE1.

    Workaround: None.

    Bug Tracking Number: PR1340331

  • In an SRX Series dual CPE site, when the application traffic takes the Z-mode path, the application throughput reported in the Administration Portal GUI is lower than the actual data throughput.

    Workaround: None.

    Bug Tracking Number: PR1347723.

  • If all the active links, including OAM connectivity to CSO, are down and the LTE link is used for traffic, and if the DHCP addresses change to a new subnet, the traffic is dropped because CSO is unable to reconfigure the device.

    Workaround: None.

    Bug Tracking Number: CXU-19080.

  • On the Site SLA Performance page, applications with different SLA scores are plotted at the same coordinate on the x-axis.

    Workaround: None.

    Bug Tracking Number: CXU-19768.

  • When all local breakout links are down, site to Internet traffic fails even though there is an active overlay to the hub.

    Workaround: None.

    Bug Tracking Number: CXU-19807

  • When the CPE device is not able to reach CSO, DHCP address changes on WAN interfaces might not be detected and reconfigured.

    Workaround: None.

    Bug Tracking Number: CXU-19856

  • When the OAM link is down, the communication between the CPE devices and CSO does not work even though CSO can be reached over other WAN links. There is no impact to the traffic.

    Workaround: None.

    Bug Tracking Number: CXU-19881.

  • In a full mesh topology, the GRE IPsec down alarms are not created for some overlays during link failure.

    Workaround: None.

    Bug Tracking Number: CXU-20403.

  • If you specify an MPLS link without local breakout capability as the backup link, then Internet breakout traffic is dropped because the overlay link to hub will not be used for Internet traffic if local breakout is enabled for the site.

    Workaround: Configure an Internet or an LTE link as the backup link.

    Bug Tracking Number: CXU-20447.

  • If you define an SLA profile for a static SD-WAN policy but do not remove the default values for the SLA parameters and deploy the policy, the policy is deployed as a dyanmic SD-WAN policy.

    Workaround: When you define the SLA profile for a static SD-WAN policy, ensure that you remove the default values for the SLA parameters.

    Bug Tracking Number: CXU-20499.

  • If you modify the path preference of an existing SLA profile that has already been deployed and redeploy the SD-WAN policy, the path of the SLA profile is not updated on the CPE device.

    Workaround: Modify the path preference in an SLA profile that is not yet deployed.

    Bug Tracking Number: CXU-20540.

  • For non-cacheable applications, in a hub- and- spoke topology, on link switchover, in some cases, the traffic between the hub and spoke might take an incorrect physical path because the existing session flow is not updated. However, there is no traffic loss.

    Workaround: None.

    Bug Tracking Number: PR1341274

  • In the bandwidth-optimized SD-WAN mode, when the same SLA is used in the SD-WAN policy for different departments and an SLA violation occurs, two link switch events that appear identical, because the department name is missing from the event details, are displayed.

    Workaround: None.

    Bug Tracking Number: CXU-20529.

  • When you configure a high delay and loss on the OAM link, the link switch might be delayed or might not occur.

    Workaround: None.

    Bug Tracking Number: CXU-20562.

  • For a tenant with bandwidth-optimized SD-WAN mode, the SLA performance for the site is always displayed as 0/100.

    Workaround: None.

    Bug Tracking Number: CXU-20563.

Security Management

  • If you create firewall policy with more than 10 firewall policy intents and deploy the firewall policy on a tenant with 45 or more sites, the policy deployment fails.

    Workaround: None.

    Bug Tracking Number: CXU-20292

  • If you create a NAT pool, specify the Translation as Port/Range, configure the port as a range, and enter an incorrect starting port number, then you cannot enter the ending port number and the NAT pool is created with a single port value instead of a range.

    Workaround: When you create a NAT pool with a port range, ensure that the starting port number is between 1024 and 65,335, and then enter the corresponding ending port number between 1024 and 65,335.

    Bug Tracking Number: CXU-20366.

  • On the Active Database page in Customer Portal, the wrong installed device count is displayed. The count displayed is for all tenants and not for a specific tenant.

    Workaround: None.

    Bug Tracking Number: CXU-20531.

Site and Tenant Workflow

  • The tenant delete operation fails when CSO is installed with an external Keystone.

    Workaround: You must manually delete the tenant from the Contrail OpenStack user interface.

    Bug Tracking Number: CXU-9070

  • When both the OAM and data interfaces are untagged, ZTP fails when using a NFX Series platform as CPE.

    Workaround: Use tagged interfaces for both OAM and data.

    Bug Tracking Number: CXU-15084

  • The tenant creation job might fail if connectivity from CSO to the VRR is lost during job execution.

    Workaround: If the tenant creation job fails and the tenant is created in CSO, delete the tenant and retrigger the tenant creation.

    Bug Tracking Number: CXU-16884

  • If the tenant name exceeds 16 characters, the activation of the SRX Series hub device fails.

    Workaround: Delete the tenant and re-create a new tenant with name that has less than 16 characters and retry the activation workflow.

    Bug Tracking Number: PR1344369.

  • For tenants with a large number of spoke sites, the tenant deletion job fails because of token expiry.

    Workaround: Retry the tenant delete operation.

    Bug Tracking Number: CXU-19990.

  • In some cases, on the Monitor Overview page (Monitoring > Overview) for a site, the ZTP status is displayed incorrectly when you hover over the site.

    Workaround: None.

    Bug Tracking Number: CXU-20226.

  • In some cases, if automatic license installation is enabled in the device profile, after ZTP is complete, the license might not be installed on the CPE device even though license key is configured successfully.

    Workaround: Reinstall the license on the CPE device by using the Licenses page on the Administration Portal.

    Bug Tracking Number: PR1350302.

  • In the scenario where the redirect service from Juniper (redirect.juniper.net) is not being used, after you upgrade an NFX device to Junos OS Release 15.1X53-D472, the device is unable to connect to the regional server because the phone home server certificate (phd-ca.crt) is reverted to the factory default.

    Workaround: Manually copy the regional certificate to the NFX device.

    Bug Tracking Number: PR1350492.

  • LAN segments with overlapping IP prefixes are not supported across tenants or sites.

    Workaround: Create LAN segments with unique IP prefixes across tenants and sites.

    Bug Tracking Number: CXU-20347.

  • In a hub and spoke topology with multi-tenancy (network segmentation) enabled, the reverse traffic from the hub to the originating spoke might not take the same path as the traffic in the forward direction. There is no traffic loss.

    Workaround: None.

    Bug Tracking Number: CXU-20494.

  • In the Configure Site workflow for a full mesh topology with multitenancy enabled, the option to connect the CPEs only to the hub is not supported; that is, if you specify false for the used_for_meshing parameter, this option is ignored.

    Workaround: None.

    Bug Tracking Number: CXU-20495.

  • For hybrid WAN tenants, during site creation, all the VIMs in the system are displayed even though a specific VIM is already assigned during the tenant creation.

    Workaround: None.

    Bug Tracking Number: CXU-20371.

  • When you use DHCP for the activation of a dual CPE device, ZTP might fail because the device takes longer than expected to to connect to the Device Connectivity Service (DCS).

    Workaround: Retry the failed ZTP job.

    Bug Tracking Number: CXU-20467.

  • During site addition, if you create a department but do not assign a LAN segment to that department, during the site activation, the firewall policy deployment fails.

    Workaround: Do one of the following:

    • Go to the Site-Name page, and on the LAN tab, add a new LAN segment to the department that did not have any LAN segments assigned during site creation.

    • Alternatively, during site addition, when you create a department, ensure that you assign at least one LAN segment to that department.

    Bug Tracking Number: CXU-20502.

  • When the primary and backup interfaces of the CPE device uses the same WAN interface of the hub, the backup underlay might be used for Internet or site-to-site traffic even though the primary links are available.

    Workaround: Ensure that you connect the WAN links of each CPE device to unique WAN links of the hub.

    Bug Tracking Number: CXU-20564.

Topology

  • When a spoke is recalled, the configuration remains on the hub. When the spoke is reprovisioned, the activation fails and an error message indicating that the source and destination addresses of the tunnel cannot be the same is displayed in the logs.

    Workaround: Clean up the configuration of the recalled spoke in the hub and reprovision the spoke with a new name.

    Bug Tracking Number: CXU-20441.

User Interface

  • When you bring down or bring up an AWS availability zone, there might be a momentary slowdown in the response time of the Administration Portal GUI and some in-progress jobs might be affected.

    Workaround: Wait for five minutes and retry the failed jobs.

    Bug Tracking Number: CXU-20463.

General

  • If you create VNF instances in the Contrail cloud by using Heat Version 2.0 APIs, a timeout error occurs after 120 instances are created.

    Workaround: Contact Juniper Networks Technical Support.

    Bug Tracking Number: CXU-15033

  • When you upgrade the gateway router by using the CSO GUI, after the upgrade completes and the gateway router reboots, the gateway router configuration reverts to the base configuration and loses the IPsec configuration added during Zero Touch Provisioning (ZTP).

    Workaround: Before you upgrade the gateway router by using the CSO GUI, ensure that you do the following:

    1. Log in to the Juniper Device Manager (JDM) CLI of the NFX Series device.
    2. Execute the virsh list command to obtain the name of the gateway router (GWR_NAME).
    3. Execute the request virtual-network-functions GWR_NAME restart command, where GWR_NAME is the name of the gateway router obtained in the preceding step.
    4. Wait a few minutes for the gateway router to come back up.
    5. Log out of the JDM CLI.
    6. Proceed with the upgrade of the gateway router by using the CSO GUI.

    Bug Tracking Number: CXU-11823.

  • CSO may not come up after a power failure.

    Workaround:

    1. Log in to the installer VM.
    2. Navigate to the /root/Contrail_Service_Orchestration_3.3/ directory.
    3. Run the reinitialize_pods.py script as follows:
      ./python.sh recovery/components/reinitialize_pods.py
    4. SSH to the VRR by using the VRR IP address to check if you are able to access the VRR.

      If there is an error in connecting (port 22: Connection refused), then you must recover the VRR by following step 5 through 21.

    5. Log in to physical server hosting the VRR.
    6. Execute the virsh destroy vrr command to destroy the VRR.

      Warning: Do not execute the virsh undefine vrr command because doing so will cause the VRR configuration to be lost and the configuration cannot be recovered.

    7. Delete the VRR image that is located in the /root/ubuntu_vm/vrr/vrr-15.1R6.7.qcow2 directory.
    8. Copy the fresh VRR image from the /root/disks/vrr-15.1R6.7.qcow2 directory to the /root/ubuntu_vm/vrr/vrr-15.1R6.7.qcow2 directory.
    9. Execute the virsh start vrr command and wait for approximately 5 minutes for the command to finish executing.
    10. Execute the virsh list –all command to check if the VRR is running or not.

      If the VRR is not running, check that the image that was copied was the uncorrupted image and re-try the steps from step 7.

    11. If the VRR is running, navigate to the /root/ubuntu_vm/vrr/ directory.
    12. Run the ./vrr.exp command to push the base configuration to the VRR.
    13. Check if the VRR is reachable from the regional microservices VM. If the VRR is reachable, proceed to step 14. If the VRR is not reachable:
      1. Log in to the VRR.
      2. Check if the base configuration was pushed properly:
        • If the base configuration was pushed properly, re-check if the VRR is reachable from the regional microservices VM. If the VRR is reachable, proceed to step 14.

        • If the base configuration was not pushed properly:

          1. Add the necessary routes to reach CSO.
          2. Re-check if the VRR is reachable from the regional microservices VM. If the VRR is reachable, proceed to 14.
    14. Import the POP by using the URL https://central-ms-ip:4443/tssm/import-pop, where central-ms-ip is the IP address of the central microservices VM.
    15. Use POSTMAN to import the VRR.

      Note: Do not import the VRR until the VRR is reachable from the regional microservices VM.

      The following is the JSON format for the VRR. (In the JSON below, <vrr-ip-address> is the IP address of the VRR and <vrr-password> is the password that was configured for the VRR.

      {                "input": {                                "job_name_prefix": "ImportPop",                                "pop": [{                                                "dc_name": "regional",                                                "device": [{                                                                "name": "vrr-<vrr-ip-address>",                                                                "family": "VRR",                                                                "device_ip": "<vrr-ip-address>",                                                                "assigned_device_profile": "VRR_Advanced_SDWAN_option_1",                                                                "authentication": {                                                                                "password_based": {                                                                                                "username": "root",                                                                                                "password": "<vrr-password>"                                                                                }                                                                },                                                                "management_state": "managed",                                                                "pnf_package": "null"                                                }],                                                "name": "regional"                                }]                }}
    16. Verify whether the VRR is imported properly:
      1. Log in to the CSO Administration Portal.
      2. Click Resources > POPs > Import POPs > Import History and confirm that the ImportPop job is running and that it has completed successfully.
    17. On the Tenants page, add a tenant named recovery.
    18. After the tenant is successfully created, log in to the VRR and access the Junos OS CLI.
    19. Execute the show configuration|display set and verify that the tenant configuration (for the previously-configured tenants) is recovered.
    20. Execute the show bgp summary and check that the BGP status to the hub and spokes are Established.
    21. If the status is Not Established, add the routes for the OAM traffic of the hub and spokes to the VRR and recheck the status.

    Bug Tracking Number: CXU-16530

  • If you run the script to revert the upgraded setup to CSO Release 3.2.1, in some cases, the ArangoDB cluster becomes unhealthy.

    Workaround:

    1. Log in to the centralinfravm3 VM.
    2. Execute the service arangodb3 stop command and wait for 30 seconds.
      • If the command executes successfully, proceed to Step 3.

      • If there is no progress after 30 seconds:

        1. Press Ctrl+c to abort the command.
        2. Execute the kill -9 `ps -ef | grep arangod | grep -v grep | awk {'print $2'}` command.
    3. Log in to the centralinfravm2 VM.
    4. Execute the service arangodb3 stop command and wait for 30 seconds.
      • If the command executes successfully, proceed to Step 5.

      • If there is no progress after 30 seconds:

        1. Press Ctrl+c to abort the command.
        2. Execute the kill -9 `ps -ef | grep arangod | grep -v grep | awk {'print $2'}` command.
    5. Log in to the centralinfravm1 VM.
    6. Execute the service arangodb3 stop command and wait for 30 seconds.
      • If the command executes successfully, proceed to Step 7.

      • If there is no progress after 30 seconds:

        1. Press Ctrl+c to abort the command.
        2. Execute the kill -9 `ps -ef | grep arangod | grep -v grep | awk {'print $2'}` command.
    7. On the centralinfravm3 VM, execute the service arangodb3 stop command and wait for 20 seconds for the command to finish executing.
    8. On the centralinfravm2 VM, execute the service arangodb3 stop command and wait for 20 seconds for the command to finish executing.
    9. On the centralinfravm1 VM, execute the service arangodb3 stop command and wait for 20 seconds for the command to finish executing.
    10. Execute the netstat -tuplen | grep arangod command on all three central infrastructure VMs to check the status of the ArangoDB cluster. If the port binding is successful for all the central infrastructure VMs, then the ArangoDB cluster is healthy.

      The following is a sample output.

          tcp6 0 0 :::8528 :::* LISTEN 0 54213 9220/arangodb
          tcp6 0 0 :::8529 :::* LISTEN 0 44018 9327/arangod
          tcp6 0 0 :::8530 :::* LISTEN 0 91216 9289/arangod
          tcp6 0 0 :::8531 :::* LISTEN 0 42530 9232/arangod 

      Bug Tracking Number: CXU-20397.

  • On a CPE device configured with an LTE backup link, LTE link flaps are observed when the CPE device is running for a longer period.

    Workaround: None.

    Bug Tracking Number: PR1349613.

  • In an HA setup, when you upgrade from JCS 3.2.1 to JCS 3.3.0, the Kubernetes system pods for the central or regional load balancer VM are in the Terminating state. This causes the load balancer VM to be in the Not Ready state, which causes the health check to fail during the upgrade.

    Workaround:

    1. On the installer VM:
      • If the central load balancer VM is in the Not Ready state, execute the salt 'csp-central-lbvm*' cmd.run 'reboot' command.

      • If the regional load balancer VM is in the Not Ready state, execute the salt 'csp-regional-lbvm*' cmd.run 'reboot' command.

    2. Wait for some time until the nodes are in the Ready state.
    3. Rerun the upgrade.sh script to continue with the upgrade.

    Bug Tracking Number: CXU-20271.

  • The provisioning of CPE devices fails if all VRRs within a redundancy group are unavailable.

    Workaround: Recover the VRR that is down and retry the provisioning job.

    Bug Tracking Number: CXU-19063

  • In the centralized deployment, after you import a POP, the CPU, memory, and storage allocation are displayed as zero.

    Workaround: Refresh the UI, and the correct information is displayed.

    Bug Tracking Number: CXU-19105

  • The CSO health check displays the following error message: ERROR: ONE OR MORE KUBE-SYSTEM PODS ARE NOT RUNNING

    Workaround:

    1. Log in to the central microservices VM.
    2. Execute the kubectl get pods –namespace=kube-system command.
    3. If the kube-proxy process is not in the Running state, execute the kubectl apply –f /etc/kubernetes/manifests/kube-proxy.yaml command.

      Bug Tracking Number: CXU-20275.

  • In a department, if there are two LAN segments with DHCP enabled, only one DHCP server setting is deployed on the device.

    Workaround: Enable DHCP only for one LAN segment in a department.

    Bug Tracking Number: CXU-20519.

  • The Grant RMA operation fails for a multihomed hub device.

    Workaround: None.

    Bug Tracking Number: CXU-20457.

  • After the upgrade, the health check on the standalone Contrail Analytics Node (CAN) fails.

    Workaround:

    1. Log in to the CAN VM.
    2. Execute the docker exec analyticsdb service contrail-database-nodemgr restart command.
    3. Execute the docker exec analyticsdb service cassandra restart command.

    Bug Tracking Number: CXU-20470.

  • When the LTE modem is disconnected or disabled in the NFX250 CPE device, an alarm is triggered. However, the underlay link status on the Sites page might not display the alarm.

    Workaround: None.

    Bug Tracking Number: CXU-20492.

  • For a vSRX CPE, the auto-deployment of license fails with an error message indicating that no license is found even though a license exists on the vSRX instance.

    Workaround: Manually deploy the license by using the Push License workflow from the CSO GUI.

    Bug Tracking Number: CXU-20558.

  • The load services data operation or health check of the infrastructure components might fail if the data in the Salt server cache is lost because of an error.

    Workaround: If you encounter a Salt server-related error, do the following:

    1. Log in to the installer VM.
    2. Execute the salt '*' deployutils.get_role_ips 'cassandra' command to confirm whether one or more Salt minions have lost the cache.
      • If the output returns the IP address for all the Salt minions, this means that the Salt server cache is fine; proceed to step 7.

      • If the IP address for some minions is not present in the output, this means that the Salt server has lost its cache for those minions and must be rebuilt as explained from step 3.

    3. Navigate to the current deployment directory for CSO; for example, /root/Contrail_Service_Orchestration_3.3/.
    4. Redeploy the central infrastructure services (up to the NTP step):
      1. Execute the DEPLOYMENT_ENV=central ./deploy_infra_services.sh command.
      2. Press Ctrl+c when you see the following message on the console:
        2018-04-10 17:17:03 INFO utils.core Deploying roles set(['ntp']) to servers ['csp-central-msvm', 'csp-contrailanalytics-1', 'csp-central-k8mastervm', 'csp-central-infravm']
    5. Redeploy the regional infrastructure services (up to the NTP step):
      1. Execute the DEPLOYMENT_ENV=regional ./deploy_infra_services.sh command.
      2. Press Ctrl+c when you see a message similar to the one for the central infrastructure services.
    6. Execute the salt '*' deployutils.get_role_ips 'cassandra' command and confirm that the output displays the IP addresses of all the Salt minions.
    7. Re-run the load services data operation or the health component check that had previously failed.

    Bug Tracking Number: CXU-20815.

Modified: 2018-07-29