The AWS device activation process takes up to 30 minutes. If the process does not complete in 30 minutes, a timeout might occur and you must retry the process. You do not need to download the cloud formation template again.
To retry the process:
Bug Tracking Number: CXU-19102.
For an AWS spoke, during the activation process, the device status on the Activate Device page is displayed as Detected even though the device is down.
Workaround: None.
Bug Tracking Number: CXU-19779.
In a CSO HA environment, two RabbitMQ nodes are clustered together, but the third RabbitMQ node does not join the cluster. This might occur just after the initial installation, if a virtual machine reboots, or if a virtual machine is powered off and then powered on.
Workaround: Do the following:
rabbitmqctl stop_app
service rabbitmq-server stop
rabbitmqctl stop_app command
rm -rf /var/lib/rabbitmq/mnesia/
service rabbitmq-server start
rabbitmqctl start_app
Bug Tracking Number: CXU-12107
In an HA setup, the time configured for the CAN VMs might not be synchronized with the time configured for the other VMs in the setup. This can cause issues in the throughput graphs.
Workaround:
/etc/ntp.conf
file to point to the desired NTP server.After the NTP process restarts successfully, can-vm2 and can-vm3 automatically resynchronize their times with can-vm1.
Bug Tracking Number: CXU-15681.
In some cases, when power fails, the ArangoDB cluster does not form.
Workaround:
Bug Tracking Number: CXU-20346.
When a high availability (HA) setup comes back up after a power outage, MariaDB instances do not come back up on the VMs.
Workaround:
You can recover the MariaDB instances by executing the recovery.sh script (that is packaged with the CSO installation package:
/root/Contrail_Service_Orchestration_3.3/
,Bug Tracking Number: CXU-20260,
In a HA setup, the operation to roll back to CSO Release 3.2.1 might fail because during the health check the Redis component is reported as unhealthy.
Workaround:
The following is a sample command and its output:
root@csp-central-infravm3:~# redis-trib.rb check 192.0.2.12:6379
>>> Performing Cluster Check (using node 192.0.2.12:6379) M: f197f6be870b9a84c075d967ab23ce1657c2b864 192.0.2.12:6379 slots:0-5460 (5461 slots) master 1 additional replica(s) S: 2d9f94ea80e7c17b1b13c38213ae20f57157b9be 192.0.2.10:6380 slots: (0 slots) slave replicates 38f840d6acb17808d85a79b22e22fdbd2642535b M: 38f840d6acb17808d85a79b22e22fdbd2642535b 192.0.2.11:6379 slots:10923-16383 (5461 slots) master 1 additional replica(s) M: 94e2b955552ff162690d7b8c41b44dad7aa904cb 192.0.2.10:6379 slots:5461-10922 (5462 slots) master 1 additional replica(s) S: 0328dc8a9bc4be9411d9d01adc1ce3d15b106b93 192.0.2.11:6380 slots: (0 slots) slave replicates f197f6be870b9a84c075d967ab23ce1657c2b864 S: 0fb89166609886e7093efafbf9713759def4b066 192.0.2.12:6380 slots: (0 slots) slave replicates 94e2b955552ff162690d7b8c41b44dad7aa904cb [ERR] Nodes don't agree about configuration! >>> Check for open slots... [WARNING] Node 192.0.2.12:6379 has slots in importing state (6256,9093,9259). [WARNING] The following slots are open: 6256,9093,9259 >>> Check slots coverage... [OK] All 16384 slots covered.
service redis-master stop
service redis-slave stop
service redis-master start
service redis-slave start
If the cluster configuration is fine, the message [OK] All nodes agree about slots configuration is displayed in the output. The following is a sample command and its output:
root@csp-central-infravm3:~# redis-trib.rb check 192.0.2.12:6379
>>> Performing Cluster Check (using node 192.0.2.12:6379) S: f197f6be870b9a84c075d967ab23ce1657c2b864 192.0.2.12:6379 slots: (0 slots) slave replicates 0328dc8a9bc4be9411d9d01adc1ce3d15b106b93 M: 94e2b955552ff162690d7b8c41b44dad7aa904cb 192.0.2.10:6379 slots:5461-10922 (5462 slots) master 1 additional replica(s) M: 38f840d6acb17808d85a79b22e22fdbd2642535b 192.0.2.11:6379 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 2d9f94ea80e7c17b1b13c38213ae20f57157b9be 192.0.2.10:6380 slots: (0 slots) slave replicates 38f840d6acb17808d85a79b22e22fdbd2642535b M: 0328dc8a9bc4be9411d9d01adc1ce3d15b106b93 192.0.2.11:6380 slots:0-5460 (5461 slots) master 1 additional replica(s) S: 0fb89166609886e7093efafbf9713759def4b066 192.0.2.12:6380 slots: (0 slots) slave replicates 94e2b955552ff162690d7b8c41b44dad7aa904cb [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
Bug Tracking Number: CXU-21302.
In a HA setup, if you shut down all the CSO servers, after the servers are restarted successfully, MariaDB and ArangoDB fail to form their respective clusters.
Workaround:
To recover the MariaDB cluster, perform the following steps:
To recover the ArangoDB cluster, perform the following steps:
Bug Tracking Number: CXU-21819.
The connection of the Celery worker to RabbitMQ might be lost in some cases, such as the shutdown of the infrastructure VM. After the broken connection is detected, Celery is restarted. However, if an exception takes place during the restart, then the Celery worker loses the connection to RabbitMQ.
Workaround: To manually restart the connection between the Celery worker and RabbitMQ, perform the following steps:
The connection of the Celery worker to RabbitMQ is then established.
Bug Tracking Number: CXU-21823.
In a HA setup, if you onboard devices and deploy policies on the devices and if one of the policy deployments is in progress when a microservices or infrastructure node goes down, the deployment job is stuck in the In Progress state for about 90 minutes (the default timeout value), and you cannot perform deploy operations for the tenant for about 90 minutes.
Workaround: Wait for the job to fail and then redeploy the policy.
Bug Tracking Number: CXU-21922.
The LTE link can be only a backup link. Therefore, the SLA metrics are not applicable and default values of zero might be displayed on the Application SLA Performance page, which can be ignored.
Workaround: None.
Bug Tracking Number: CXU-19943
In an SRX Series dual CPE site, when the application traffic takes the Z-mode path, the application throughput reported in the Administration Portal GUI is lower than the actual data throughput.
Workaround: None.
Bug Tracking Number: PR1347723.
If all the active links, including OAM connectivity to CSO, are down and the LTE link is used for traffic, and if the DHCP addresses change to a new subnet, the traffic is dropped because CSO is unable to reconfigure the device.
Workaround: None.
Bug Tracking Number: CXU-19080.
On the Site SLA Performance page, applications with different SLA scores are plotted at the same coordinate on the x-axis.
Workaround: None.
Bug Tracking Number: CXU-19768.
When all local breakout links are down, site to Internet traffic fails even though there is an active overlay to the hub.
Workaround: None.
Bug Tracking Number: CXU-19807
When the CPE device is not able to reach CSO, DHCP address changes on WAN interfaces might not be detected and reconfigured.
Workaround: None.
Bug Tracking Number: CXU-19856
When the OAM link is down, the communication between the CPE devices and CSO does not work even though CSO can be reached over other WAN links. There is no impact to the traffic.
Workaround: None.
Bug Tracking Number: CXU-19881.
In the bandwidth-optimized SD-WAN mode, when the same SLA is used in the SD-WAN policy for different departments and an SLA violation occurs, two link switch events that appear identical, because the department name is missing from the event details, are displayed.
Workaround: None.
Bug Tracking Number: CXU-20529.
In a Cloud hub multihoming topology, after a link switch, the GRE tunnel links on the secondary hub might be displayed in red in the CSO GUI, even though the GRE tunnels are up.
Workaround: Wait for approximately 10 minutes and the links are displayed in green, indicating that the GRE tunnels are up.
Bug Tracking Number: CXU-20550.
On the SD-WAN Events page, for link switch events, if you mouse over the Reason field, the values displayed for the SLA metrics are the ones that are recorded when the system logs are sent from the device and not the values for which the SLA violation was detected.
Workaround: None.
Bug Tracking Number: CXU-21461.
In a tenant with real-time-optimized SD-WAN, the duration of the link switch violation (on the WAN tab of the Site-Name page) might be displayed incorrectly.
Workaround: None.
Bug Tracking Number: CXU-21590.
On the SD-WAN Policy page, if you click the up or down arrow to reorder a policy intent, then the policy intent moves to the top of the list instead of moving one intent above or below respectively. In addition, when you click Deploy, the changes are not deployed to the device.
Workaround:
There is no workaround for reordering the policy intent.
For deploying the policy, modify at least one policy intent and redeploy the policy.
Bug Tracking Number: CXU-20861.
If you try to delete the SLA profile associated with the SD-WAN policy immediately after deleting the SD-WAN policy, then an error message might be displayed and the SLA profile is not deleted.
Workaround: Wait for approximately three minutes after deleting the SD-WAN policy, and then trigger the deletion of the associated SLA profile.
Bug Tracking Number: CXU-22168.
On the Active Database page in Customer Portal, the wrong installed device count is displayed. The count displayed is for all tenants and not for a specific tenant.
Workaround: None.
Bug Tracking Number: CXU-20531.
If you restart a central microservices VM or the csp.secmgt-sm Kubernetes pod on a central microservices VM when the deployment of a firewall policy or NAT policy is in progress, the deployment job fails.
In addition, after the restart is completed, if you modify the firewall or NAT policy, the changes fail to deploy.
Workaround: After the restart of the central microservices VM or the csp.secmgt-sm Kubernetes pod is completed and the csp.secmgmt-sm Kubernetes pod is up, perform the following steps:
Bug Tracking Number: CXU-21106.
A user with the Tenant Administrator role cannot install application signatures on the devices belonging to a tenant.
Workaround: Have a user with the MSP Administrator role install application signatures on the devices for a tenant.
Bug Tracking Number: CXU-22064.
The tenant delete operation fails when CSO is installed with an external Keystone.
Workaround: You must manually delete the tenant from the Contrail OpenStack user interface.
Bug Tracking Number: CXU-9070
For tenants with a large number of spoke sites, the tenant deletion job fails because of token expiry.
Workaround: Retry the tenant delete operation.
Bug Tracking Number: CXU-19990.
In some cases, if automatic license installation is enabled in the device profile, after ZTP is complete, the license might not be installed on the CPE device even though license key is configured successfully.
Workaround: Reinstall the license on the CPE device by using the Licenses page on the Administration Portal.
Bug Tracking Number: PR1350302.
LAN segments with overlapping IP prefixes are not supported across tenants or sites.
Workaround: Create LAN segments with unique IP prefixes across tenants and sites.
Bug Tracking Number: CXU-20347.
On the Monitor > Overview page, if you select a cloud hub site and access the WAN tab, an error message is displayed.
Workaround: None.
Bug Tracking Number: CXU-20353.
When the primary and backup interfaces of the CPE device uses the same WAN interface of the hub, the backup underlay might be used for Internet or site-to-site traffic even though the primary links are available.
Workaround: Ensure that you connect the WAN links of each CPE device to unique WAN links of the hub.
Bug Tracking Number: CXU-20564.
After you configure a site, you cannot modify the configuration either before or after activation.
Workaround: None.
Bug Tracking Number: CXU-21165
If you initiate the RMA workflow on an NFX Series device that was successfully onboarded and provisioned with stage-2 templates, the device RMA operation might get stuck in the device activation stage if the stage-2 configuration templates have interdependencies.
Workaround: Ensure that the stage-2 templates that are deployed on the device do not have interdependencies before initiating the device RMA workflow.
Bug Tracking Number: CXU-21464.
On the Monitor > Overview page, if you click a site indicating that a major alarm was triggered (site icon color turns orange), and in the subsequent popup, click the link for major alarms in the Alerts & Alarms section, you are taken to the Alarms page. However, no alarm for the device is displayed.
Workaround: None.
Bug Tracking Number: CXU-21828.
If you create VNF instances in the Contrail cloud by using Heat Version 2.0 APIs, a timeout error occurs after 120 instances are created.
Workaround: Contact Juniper Networks Technical Support.
Bug Tracking Number: CXU-15033
When you upgrade the gateway router by using the CSO GUI, after the upgrade completes and the gateway router reboots, the gateway router configuration reverts to the base configuration and loses the IPsec configuration added during Zero Touch Provisioning (ZTP).
Workaround: Before you upgrade the gateway router by using the CSO GUI, ensure that you do the following:
Bug Tracking Number: CXU-11823.
CSO might not come up after a power failure.
Workaround:
/root/Contrail_Service_Orchestration_3.3/
directory../python.sh recovery/components/reinitialize_pods.py
If there is an error in connecting (port 22: Connection refused), then you must recover the VRR by following step 5 through 21.
Warning Do not execute the virsh undefine vrr command because doing so will cause the VRR configuration to be lost and the configuration cannot be recovered.
/root/ubuntu_vm/vrr/vrr-15.1R6.7.qcow2
directory./root/disks/vrr-15.1R6.7.qcow2
directory to the /root/ubuntu_vm/vrr/vrr-15.1R6.7.qcow2
directory.If the VRR is not running, check that the image that was copied was the uncorrupted image and re-try the steps from step 7.
If the base configuration was pushed properly, re-check if the VRR is reachable from the regional microservices VM. If the VRR is reachable, proceed to step 14.
If the base configuration was not pushed properly:
Note Do not import the VRR until the VRR is reachable from the regional microservices VM.
The following is the JSON format for the VRR. (In the JSON below, <vrr-ip-address> is the IP address of the VRR and <vrr-password> is the password that was configured for the VRR.
{ "input": { "job_name_prefix": "ImportPop", "pop": [{ "dc_name": "regional", "device": [{ "name": "vrr-<vrr-ip-address>", "family": "VRR", "device_ip": "<vrr-ip-address>", "assigned_device_profile": "VRR_Advanced_SDWAN_option_1", "authentication": { "password_based": { "username": "root", "password": "<vrr-password>" } }, "management_state": "managed", "pnf_package": "null" }], "name": "regional" }] }}
Bug Tracking Number: CXU-16530
If you run the script to revert the upgraded setup to CSO Release 3.2.1, in some cases, the ArangoDB cluster becomes unhealthy.
Workaround:
If the command executes successfully, proceed to Step 3.
If there is no progress after 30 seconds:
If the command executes successfully, proceed to Step 5.
If there is no progress after 30 seconds:
If the command executes successfully, proceed to Step 7.
If there is no progress after 30 seconds:
The following is a sample output.
tcp6 0 0 :::8528 :::* LISTEN 0 54213 9220/arangodb tcp6 0 0 :::8529 :::* LISTEN 0 44018 9327/arangod tcp6 0 0 :::8530 :::* LISTEN 0 91216 9289/arangod tcp6 0 0 :::8531 :::* LISTEN 0 42530 9232/arangod
Bug Tracking Number: CXU-20397.
The provisioning of CPE devices fails if all VRRs within a redundancy group are unavailable.
Workaround: Recover the VRR that is down and retry the provisioning job.
Bug Tracking Number: CXU-19063
The CSO health check displays the following error message: ERROR: ONE OR MORE KUBE-SYSTEM PODS ARE NOT RUNNING
Workaround:
Bug Tracking Number: CXU-20275.
After the upgrade, the health check on the standalone Contrail Analytics Node (CAN) fails.
Workaround:
Bug Tracking Number: CXU-20470.
The class-of-service scheduler configuration does not take effect on the CPE device.
Workaround:
set class-of-service interfaces interface-name unit * scheduler-map scheduler-map-name
set interfaces interface-name per-unit-scheduler
Where interface-name is the name of the physical interface (for example, ge-0/0/4), and scheduler-map-name is the name of the scheduler map.
Bug Tracking Number: CXU-20708.
The load services data operation or health check of the infrastructure components might fail if the data in the Salt server cache is lost because of an error.
Workaround: If you encounter a Salt server-related error, do the following:
If the output returns the IP address for all the Salt minions, this means that the Salt server cache is fine; proceed to step 7.
If the IP address for some minions is not present in the output, this means that the Salt server has lost its cache for those minions and must be rebuilt as explained from step 3.
/root/Contrail_Service_Orchestration_3.3.1/
.2018-04-10 17:17:03 INFO utils.core Deploying roles set(['ntp']) to servers ['csp-central-msvm', 'csp-contrailanalytics-1', 'csp-central-k8mastervm', 'csp-central-infravm']
Bug Tracking Number: CXU-20815.
In some cases, high values of round-trip time (RTT) and jitter are displayed in the CSO GUI because of high values reported in the device system log.
Workaround: None.
Bug Tracking Number: CXU-21434.
On an NFX Series CPE device, if you try to upgrade a vSRX gateway router, the upgrade might fail due to a lack of storage space on the VM.
Workaround:
Before triggering the upgrade of the vSRX gateway router on an NFX Series device, perform the following steps:
Trigger the upgrade of the vSRX gateway router by using the CSO GUI.
Bug Tracking Number: CXU-21440.
In some cases, when the infrastructure VMs in the CSO setup are unhealthy and you initiate the upgrade, the upgrade process fails to perform a health check before starting the upgrade.
Workaround: Recover the infrastructure VMs manually before proceeding with the upgrade.
Bug Tracking Number: CXU-21536.
For an MX Series cloud hub device, if you have configured the Internet link type as OAM_and_DATA, the reverse traffic fails to reach the spoke device if you do not configure additional parameters by using the Junos OS CLI on the MX Series device.
Workaround:
The name of the service set is in the format ssettenant-name_DefaultVPN-tenant-name, where tenant-name is the name of the tenant.
The following is an example of the command and output:
show configuration | display set | grep outside-service-interface
set groups mx-hub-Acme-Acme_DefaultVPN-vpn-routing-config services service-set ssetAcme_DefaultVPN-Acme next-hop-service outside-service-interface ms-1/0/0.4008
In this example, the tenant name is Acme and the multiservices interface used is ms-1/0/0.4008.
where ms-interface is the name of the multiservices interface obtained in the preceding step.
Bug Tracking Number: CXU-21818.
In a full mesh topology, the simultaneous deletion of LAN segments on all sites is not supported.
Workaround: Delete LAN segments on one site at a time.
Bug Tracking Number: CXU-21936.
On a CSO setup that was upgraded from Release 3.2.1 to Release 3.3.0, if you start upgrading to Release 3.3.1, the ArangoDB storage engine upgrade might fail because of an issue with the Salt server synchronization.
Workaround:
/root/Contrail_Service_Orchestration_3.3.1/
.2018-04-10 17:17:03 INFO utils.core Deploying roles set(['ntp']) to servers ['csp-central-msvm', 'csp-contrailanalytics-1', 'csp-central-k8mastervm', 'csp-central-infravm']
upgrade.sh
).Bug Tracking Number: CXU-22066.
When an SRX Series device with factory configuration is activated by using ZTP with a redirect server, the device activation fails because the learned phone home server is deleted during the activation process.
Workaround: Configure the phone home server IP address on the SRX Series device and retry the ZTP workflow.
Bug Tracking Number: CXU-22154.