ON THIS PAGE
Known Behavior
This section lists known limitations with this release.
Known Behavior in Contrail Networking Release 21.4.L4
- OpenStack
- Config
- vRouter (Kernel, DPDK), vRouter Agent
- DPDK and SR-IOV
- General Routing
- Kubernetes
- Contrail Fabric Management
OpenStack
-
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
Config
-
CEM-31838 Virtual machine live migration is failing in Rhosp17 between DPDK nodes in the following scenarios:
-
Keeping other virtual machine in destination compute node
-
Making other virtual machine shutdown in destination compute node
-
vRouter (Kernel, DPDK), vRouter Agent
-
CEM-31818 UDP traffic with default packet size of 1500 and above is getting dropped where as TCP and Ping is working fine.
-
CEM-31836 To observe the IF drops with TCP IPv6 traffic on DPDK compute, bringup VM1 on compute1 node and VM2 on compute2 node and then, send IPv6 traffic through iperf at a rate of 1MBps.
DPDK and SR-IOV
-
CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.
-
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.
General Routing
-
CEM-27502 To enable QOS on Intel XXV710 NICs, you must first perform the following procedure on the compute node:
Reboot the compute node.
Enter the systemctl start lldpad command.
Enter the systemctl status lldpaddcbtool sc ens1f0 dcb on command.
You should know be able to enable QoS on the NIC.
If you are unable to enable QoS on the NIC after performing step 3, proceed to step 4.
Enter the dcbtool sc dcbx v:force-cee command.
Kubernetes
-
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
Contrail Fabric Management
-
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
-
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
-
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
-
CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.
-
CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.
-
CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.
Known Behavior in Contrail Networking Release 21.4.L3
- OpenStack
- Deployment
- Config
- vRouter (Kernel, DPDK), vRouter Agent
- DPDK and SR-IOV
- SmartNIC
- General Routing
- Contrail Fabric Management
- Kubernetes
- General
OpenStack
-
CEM-30003 When a project having resources is renamed in Openstack, the new name of the project is not updated in Contrail Networking Release 21.4.L2. As a result, the project name in Openstack is different from Contrail Networking Release 21.4.L2. However, if the project does not have any resources, when any resource will be created in the respective project, the project name and the resource name are the same in Contrail Networking Release 21.4.L2 as in Openstack.
-
CEM-29229 An IPU upgrade from Contrail Networking Release 21.4 to Contrail Networking Release R21.4.L1 may fail with the rabbitmq container removing state.
Workaround: Complete the following procedure:
Enter the pcs resource cleanup command on all OpenStack nodes and ensure all resources are up.
Increase the timeout to 600 seconds by running the pcs resource op defaults update timeout=600s command.
Enter the openstack overcloud update converge command with all parameters.
After the update converage is complete, you may have to clean up some of the converge failures.
Re-enter the openstack overcloud update converge command with all parameters to clean up these converge failures.
-
CEM-29059 A fast forward upgrade (FFU) from Contrail Networking Release 1912 to Contrail Networking Release 21.4.L1 is not supported. The upgrade attempt fails while upgrading the controller.
-
CEM-26599 In deployments using remote compute, the time to create a virtual machine (VM) on a remote site can take an extended amount of time. The VM creation time can take over 6 minutes.
-
CEM-27083 During deployment, a core error sometimes occurs with the Contrail Collector service. The Contrail Collector service automatically restarts and no functional impact is observed when this issue is experienced.
-
CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.
-
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
-
CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.
-
CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.
Deployment
-
CEM-31309 After the deployment of Contrail Networking, the contrail command user interface is not accessible. If the firewall service of Contrail Networking is stoped, only then the Contrail Command user interface is accessible.
Workaround: Starting from Contrail Networking Release 21.4.L3, perform the following steps to access the Contrail Command user interface.
Install sshpass on the nodes before deploying Contrail Command.
After deploying Contrail Command, manually disable the firewalld on the Contrail Command installed node to access its user interface.
-
CEM-30799 While performing Zero Impact Upgrade (ZIU) procedure, the controllers upgrade after running the ZIU playbook and the contrail-controller-config-schema and contrail-controller-config-svcmonitor restarts continously. As the keystone is down, the contrail config api remains in the initialising state.
Workaround: In Contrail Networking Release 21.4.L3, before performing any upgrade (ZIU or ISSU), install Docker serially over containers. However, you can upgrade computes in parallel to Docker via script. After upgrading each docker host, verify the status of contrail and services. Do not proceed with upgrade on next hosts until all the services of contrail-status reports are running properly.
Use the following script to stop the running containers, upgrade the docker, and bring containers back:
docker ps --format '{{.Names}}' > running_containers for CONTAINER in $(cat running_containers); do sudo docker stop $CONTAINER; done yum install -y docker-ce-20.10.9 docker-ce-cli-20.10.9 docker-ce-rootless-extras-20.10.9 for CONTAINER in $(cat running_containers); do sudo docker start $CONTAINER; done
-
CEM-28044 After upgrading Contrail Networking Release 2011.L3 to Release 21.4.L1, the config API server stuck in the initializing status.
Workaround: To ensure the config API server does not stuck, perform the following procedure.
Download commandutil binary from https://support.juniper.net/support/downloads/?p=contrail#sw.
Assign permissions to the /tmp folder.
chmod u+x /tmp/commandutil
Copy the /tmp folder.
docker cp /tmp/commandutil contrail_command:/usr/bin/commandutil_21.4.L2
Run a utility (commandutil_21.4L2) and convert it into a db file (db.yml) format.
docker exec contrail_command commandutil_21.4.L2 convert --intype rdbms --outtype yaml --out /etc/contrail/db.yml -c /etc/contrail/command-app-server.yml;
Create a folder and move the db file to this folder.
mkdir -p ~/backups; mv /etc/contrail/db.yml ~/backups/
Config
-
CEM-30772 The ansible deployer of Contrail Networking Release 21.4.L2 introduces link-loop in the
/var/log/contrail
directory present in the contrail config nodes. This happens every time the Contrail Networking Release 21.4.L2 ansible deployer is started. Re-running ansible deployer playbooks fails due to mentioned recursion. This issue is resolved in Contrail Networking Release 21.4.L3. However, for Contrail Networking Release 21.4.L2, it requires a manual intervention to follow the given workaround.Workaround: Manually remove the incorrect symlink from all contrail config nodes:
sudo unlink /var/log/contrail/config-database-rabbitmq/config-database-rabbitmq
-
CEM-31301 In the contrail nodes running Cassandra service install disks with same speed. The Cassandra service is sensitive to timing and it is mandatory to keep all the instances of the services in sync with each other. Keeping disk at different speeds might take Cassandra out of sync.
vRouter (Kernel, DPDK), vRouter Agent
-
CEM-31080 Restricting virtual machine MTU lesser than VHOST MTU has to be done from the config node of Contrail Networking. However, in the current architectural design of Contrail Networking, the config node is not aware of the vhost MTU and thus, Contrail Networking does not support this option.
DPDK and SR-IOV
-
CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.
-
CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.
-
CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with
hw:mem_page_size='any'
flavor. As a workaround, use thehw:mem_page_size='large'
flavor instead to avoid the issue. -
CEM-26008 While creating a bond interface, though the interface name can be any word, it is recommended to use "bond".
-
CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.
-
CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.
-
CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.
-
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.
SmartNIC
-
CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.
General Routing
-
CEM-30058 In environments using managed physical network function (PNF) devices with an SRX device, the PNF service chain is not pushing the configuration to the SRX device. The PNF service chain, therefore, does not work.
Workaround: No known workaround in environments using managed PNF devices. The issue is not experienced in environments using unmanaged PNF devices.
-
CEM-30023 In remote compute environments using BGP, routes are not getting populated for the control node and for the BGP primary and secondary peers.
-
CEM-30002 The Contrail Collector is unexpectedly failing.
Workaround: No known workaround. The Contrail Collector services are restarted and return online automatically after the failure. This issue has no impact on traffic forwarding performance.
-
CEM-29723 In environments deployed using Ansible, the Contrail Collector is unexpectedly failing.
Workaround: No known workaround. The Contrail Collector services are restarted and return online automatically after the failure. This issue has no impact on traffic forwarding performance.
-
CEM-28889 BGP as a Service (BGPaaS) does not work across multiple pods in the same VM due to BGPaaS sessions flapping.
Workaround: Always configure BGPaaS to be deployed with a single pod in a single VM.
-
CEM-29152 A configuration push for a logical router interconnect is failing when a routing policy is configured.
-
CEM-28914 The Contrail vRouter is dropping packets from a virtual machine (VM) to a bare metal server (BMS) on an external network that is extended to a Juniper Networks MX series router.
Workaround: Disable Reverse Path Forwarding (RPF).
-
CEM-27502 To enable QOS on Intel XXV710 NICs, you must first perform the following procedure on the compute node:
Reboot the compute node.
Enter the systemctl start lldpad command.
Enter the systemctl status lldpaddcbtool sc ens1f0 dcb on command.
You should know be able to enable QoS on the NIC.
If you are unable to enable QoS on the NIC after performing step 3, proceed to step 4.
Enter the dcbtool sc dcbx v:force-cee command.
-
CEM-25388 Updating the DNS server attribute associated with VN does not work. To update, add, or remove the DNS server details in a VN, the user can delete and re-create the VN with desired DNS server details.
-
CEM-20477 Incase of EVPN service chain, the service chain with transparent firewall does not establish connection between left and right VNs.
-
CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.
-
CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.
-
CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.
-
CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.
-
CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".
-
CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.
-
CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.
-
CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.
Contrail Fabric Management
-
CEM-29418 Hitless software upgrade attempts are failing with a "MODULE FAILURE\nSee stdout/stderr for the exact error" error.
Workaround : QFX series switches can be upgraded manually by logging into the console of the device. You can achieve a hitless upgrade by putting the QFX series switch that is the target of the upgrade into maintenance mode.
-
CEM-29154 The Zero Touch Provisioning (ZTP) process fails to onboard an MX Series router due to an IP address not getting assigned to an FXP interface.
-
CEM-28971 An attempt to onboard a fabric using Zero Touch Provisioning (ZTP) is not completing. The control and collector processes are in the initializing state and the RabbitMQ procedure is continuously restarting on one control node.
-
CEM-28941 Configurations are not getting pushed to the fabric device.
Workaround: Restart RabbitMQ and the DM container.
-
CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.
-
CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.
-
CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.
-
CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.
-
CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.
As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.
https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/
ansible-playbooks/filter_plugins/fabric.py#L2594
After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.
-
CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.
-
CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.
-
CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.
-
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
-
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
-
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
-
CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.
-
CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.
-
CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.
-
CEM-19802 Security Groups cannot be used on QFX10K interfaces.
Kubernetes
-
CEM-29109 In Kubernetes environments using the Tungsten Fabric operator (tf-operator), the Contrail Query Engine goes down unexpectedly while deployed.
Workaround: No known workarounds. The Query Engine restarts automatically.
-
CEM-29067 In a Kubernetes environment deployed using tf-operator, the Contrail collector process fails during upgrades. The Contrail collector services restart and come back online automatically after the failure.
-
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
General
-
CEM-30854 In Ansible based deployments, for upgrading the cluster, do not use contrail command to trigger the upgrade. Instead, use the ansible playbook directly for upgrading.
-
CEM-30961 Non-default MTU feature will not work if non-linux based virtual machines like vSRX is used as workloads.
-
CEM-31060 If multiple config_api instances restart at the same time on a scaled environment, API connection error may occur due to out of sync databases.
Workaround: If the restart action is user intended, then after restarting one of the config_api_1 container, wait until this API server finishes database resync and then restart another config_api_1 container. If the system goes into an error state due to parallel restart of the config_api instances, then:
Repair config Cassandra database.
Delete entire folder /vnc_api_server_locks from zookeeper.
Restart first config_api_1 container and run curl command until it returns valid data. Similarly for the rest of config_api_1.
Restart collector.
-
CEM-31080 When configuring non-default MTU on the virtual networks, it is recommended not to configure larger MTU than the vhost0 interface's MTU.
-
CEM-31084 - If the MTU is specified at the VMI and VN level, the VMI MTU should take precedence. However, if the VMI has DHCP Option 26 enabled, the VMI MTU does not take precedence.
-
CEM-31095 The non-default MTU feature does not work with sub-interfaces.
Known Behavior in Contrail Networking Release 21.4.L2
- OpenStack
- DPDK and SR-IOV
- SmartNIC
- General Routing
- Contrail Fabric Management
- Kubernetes
- General
- Telemetry and Analytics
OpenStack
CEM-30003 When a project having resources is renamed in Openstack, the new name of the project is not updated in Contrail Networking Release 21.4.L2. As a result, the project name in Openstack is different from Contrail Networking Release 21.4.L2. However, if the project does not have any resources, when any resource will be created in the respective project, the project name and the resource name are the same in Contrail Networking Release 21.4.L2 as in Openstack.
CEM-29229 An IPU upgrade from Contrail Networking Release 21.4 to Contrail Networking Release R21.4.L1 may fail with the rabbitmq container removing state.
Workaround: Complete the following procedure:
Enter the pcs resource cleanup command on all OpenStack nodes and ensure all resources are up.
Increase the timeout to 600 seconds by running the pcs resource op defaults update timeout=600s command.
Enter the openstack overcloud update converge command with all parameters.
After the update converage is complete, you may have to clean up some of the converge failures.
Re-enter the openstack overcloud update converge command with all parameters to clean up these converge failures.
CEM-29059 A fast forward upgrade (FFU) from Contrail Networking Release 1912 to Contrail Networking Release 21.4.L1 is not supported. The upgrade attempt fails while upgrading the controller.
CEM-26599 In deployments using remote compute, the time to create a virtual machine (VM) on a remote site can take an extended amount of time. The VM creation time can take over 6 minutes.
CEM-27083 During deployment, a core error sometimes occurs with the Contrail Collector service. The Contrail Collector service automatically restarts and no functional impact is observed when this issue is experienced.
CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.
CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.
DPDK and SR-IOV
CEM-29841 In environments using DPDK, BGP sessions are not coming up due to issues with wrong next hop programming in packet flows.
CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.
CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.
CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with
hw:mem_page_size='any'
flavor. As a workaround, use thehw:mem_page_size='large'
flavor instead to avoid the issue.CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.
CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.
CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.
CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.
CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.
SmartNIC
CEM-29739 In deployments deployed using Ansible, XMPP might go down due to issues with vDNS server scaling.
Workaround: Configure less than 200 vDNS servers.
CEM-28861 On a compute node using an Intel 3000 smartNIC, the deployment sometimes fails due to issues related to initiating the NIC. The log shows a series of mbuf errors.
Workaround: Restart the contrail-vrouter-dpdk container on the failing node and rerun the deployment. You can also delete the problematic compute node and redeploy the cluster.
CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.
General Routing
CEM-30058 In environments using managed physical network function (PNF) devices with an SRX device, the PNF service chain is not pushing the configuration to the SRX device. The PNF service chain, therefore, does not work.
Workaround: No known workaround in environments using managed PNF devices. The issue is not experienced in environments using unmanaged PNF devices.
CEM-30023 In remote compute environments using BGP, routes are not getting populated for the control node and for the BGP primary and secondary peers.
CEM-30002 The Contrail Collector is unexpectedly failing.
Workaround: No known workaround. The Contrail Collector services are restarted and return online automatically after the failure. This issue has no impact on traffic forwarding performance.
CEM-29723 In environments deployed using Ansible, the Contrail Collector is unexpectedly failing.
Workaround: No known workaround. The Contrail Collector services are restarted and return online automatically after the failure. This issue has no impact on traffic forwarding performance.
CEM-28889 BGP as a Service (BGPaaS) does not work across multiple pods in the same VM due to BGPaaS sessions flapping.
Workaround: Always configure BGPaaS to be deployed with a single pod in a single VM.
CEM-29152 A configuration push for a logical router interconnect is failing when a routing policy is configured.
CEM-28914 The Contrail vRouter is dropping packets from a virtual machine (VM) to a bare metal server (BMS) on an external network that is extended to a Juniper Networks MX series router.
Workaround: Disable Reverse Path Forwarding (RPF).
CEM-27502 To enable QOS on Intel XXV710 NICs, you must first perform the following procedure on the compute node:
Reboot the compute node.
Enter the systemctl start lldpad command.
Enter the systemctl status lldpaddcbtool sc ens1f0 dcb on command.
You should know be able to enable QoS on the NIC.
If you are unable to enable QoS on the NIC after performing step 3, proceed to step 4.
Enter the dcbtool sc dcbx v:force-cee command.
CEM-25388 Updating the DNS server attribute associated with VN does not work. To update, add, or remove the DNS server details in a VN, the user can delete and re-create the VN with desired DNS server details.
CEM-20477 Incase of EVPN service chain, the service chain with transparent firewall does not establish connection between left and right VNs.
CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.
CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.
CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.
CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.
CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".
CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.
CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.
CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.
Contrail Fabric Management
CEM-29418 Hitless software upgrade attempts are failing with a "MODULE FAILURE\nSee stdout/stderr for the exact error" error.
Workaround : QFX series switches can be upgraded manually by logging into the console of the device. You can achieve a hitless upgrade by putting the QFX series switch that is the target of the upgrade into maintenance mode.
CEM-29154 The Zero Touch Provisioning (ZTP) process fails to onboard an MX Series router due to an IP address not getting assigned to an FXP interface.
CEM-28971 An attempt to onboard a fabric using Zero Touch Provisioning (ZTP) is not completing. The control and collector processes are in the initializing state and the RabbitMQ procedure is continuously restarting on one control node.
CEM-28941 Configurations are not getting pushed to the fabric device.
Workaround: Restart RabbitMQ and the DM container.
CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.
CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.
CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.
CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.
CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.
As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.
https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/
ansible-playbooks/filter_plugins/fabric.py#L2594
After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.
CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.
CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.
CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.
CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.
CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.
CEM-19802 Security Groups cannot be used on QFX10K interfaces.
Kubernetes
CEM-29109 In Kubernetes environments using the Tungsten Fabric operator (tf-operator), the Contrail Query Engine goes down unexpectedly while deployed.
Workaround: No known workarounds. The Query Engine restarts automatically.
CEM-29067 In a Kubernetes environment deployed using tf-operator, the Contrail collector process fails during upgrades. The Contrail collector services restart and come back online automatically after the failure.
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
General
CEM-30015 During a Fast Forward upgrade from Contrail Networking Release 1912.L4 to 21.4.L2, a RHOSP-upgrade-Kernel crash might be observed on a kernel compute node.
CEM-29197 A virtual port group (VPG) cannot be deleted after removing a virtual machine interface (VMI). Workaround: Delete the VMI instead of removing the VMI reference from the VPG.
CEM-29163 The Contrail Agent sometimes fails in remote compute environments the during overcloud load creation process. The agent automatically recovers and restarts.
CEM-28044 The Contrail Networking upgrade process might backup Config DB nodes in the wrong order.
Workaround: Use updated utility to backup the data instead. Alternatively, manually inspect the backup file, create an ordered list of contrail_config_database_node UUIDs (in the correct order). Pass those to the commandutil during restore.
The steps to create an ordered list of Config DB nodes are as follows:
Download commandutil binary from https://webdownload.juniper.net/swdl/dl/secure/site/1/record/165079.html.
Copy commandutil to /tmp folder.
chmod u+x /tmp/commandutil
docker cp /tmp/commandutil contrail_command:/usr/bin/commandutil_21.4.L2
docker exec contrail_command commandutil_21.4.L2 convert --intype rdbms --outtype yaml --out /etc/contrail/db.yml -c /etc/contrail/command-app-server.yml; mkdir -p ~/backups; mv /etc/contrail/db.yml ~/backups/
CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.
Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.
CEM-25524 In lab scenario with K8s deployment with one controller and one compute, while inter namespace traffic was enabled, contrail-controller core was observed very rarely and the system recovered itself after the core. This scenario did not happen in HA environments, which is the recommended deployment for production.
CEM-25331 On a scaled environment, restarting a controller node may keep the schema transformer service restarting for few times.
CEM-25109 In the VPG page in UI, VLAN information does not get reflected for internal SRIOV VPG. This is only an intermittent behaviour.
JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.
As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.
CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.
CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.
CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.
CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.
CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.
CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.
Telemetry and Analytics
CEM-30038 The Contrail Web application does not display BFD Neighbors information and Appformix for some of the QFX devices.
Workaround: Remove the show system core-dumps command from the Appformix Monitoring command list.
CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.
Workaround: Perform the following steps:
Stop Kafka on all the analytics nodes.
docker stop analytics_alarm_kafka_1
On a single Contrail controller, perform the Zookeeper cleanup.
docker exec -it config_database_zookeeper_1 bash
bin/zkCli.sh -server <IP>:2181
deleteall /brokers
deleteall /consumers
Start Kafka on all the analytics nodes.
docker start analytics_alarm_kafka_1
CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.
CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.
CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.
CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric
Known Behavior in Contrail Networking Release 21.4.L1
- OpenStack
- DPDK and SR-IOV
- SmartNIC
- General Routing
- Contrail Fabric Management
- Kubernetes
- General
- Telemetry and Analytics
OpenStack
CEM-30003 When a project having resources is renamed in Openstack, the new name of the project is not updated in Contrail Networking Release 21.4.L1. As a result, the project name in Openstack is different from Contrail Networking Release 21.4.L1. However, if the project does not have any resources, when any resource will be created in the respective project, the project name and the resource name are the same in Contrail Networking Release 21.4.L1 as in Openstack.
CEM-29229 An IPU upgrade from Contrail Networking Release 21.4 to Contrail Networking Release R21.4.L1 may fail with the rabbitmq container removing state.
Workaround: Complete the following procedure:
Enter the pcs resource cleanup command on all OpenStack nodes and ensure all resources are up.
Increase the timeout to 600 seconds by running the pcs resource op defaults update timeout=600s command.
Enter the openstack overcloud update converge command with all parameters.
After the update converage is complete, you may have to clean up some of the converge failures.
Re-enter the openstack overcloud update converge command with all parameters to clean up these converge failures.
CEM-29059 A fast forward upgrade (FFU) from Contrail Networking Release 1912 to Contrail Networking Release 21.4.L1 is not supported. The upgrade attempt fails while upgrading the controller.
CEM-26599 In deployments using remote compute, the time to create a virtual machine (VM) on a remote site can take an extended amount of time. The VM creation time can take over 6 minutes.
CEM-27083 During deployment, a core error sometimes occurs with the Contrail Collector service. The Contrail Collector service automatically restarts and no functional impact is observed when this issue is experienced.
CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.
CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.
DPDK and SR-IOV
CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.
CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.
CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with
hw:mem_page_size='any'
flavor. As a workaround, use thehw:mem_page_size='large'
flavor instead to avoid the issue.CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.
CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.
CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.
CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.
CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.
SmartNIC
CEM-28861 On a compute node using an Intel 3000 smartNIC, the deployment sometimes fails due to issues related to initiating the NIC. The log shows a series mbuf errors.
Workaround: Restart the contrail-vrouter-dpdk container on the failing node and rerun the deployment. You can also delete the problematic compute node and redeploy the cluster.
CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.
General Routing
CEM-28889 BGP as a Service (BGPaaS) does not work across multiple pods in the same VM due to BGPaaS sessions flapping.
Workaround: Always configure BGPaaS to be deployed with a single pod in a single VM.
CEM-29152 A configuration push for a logical router interconnect is failing when a routing policy is configured.
CEM-28914 When Reverse Path Forwarding (RPF) is enabled in a topology where a virtual machine (VM) on a compute node is sending packets to a bare metal server (BMS) over an external network extended to an MX Series router, reply packets are dropped in the Contrail vRouter. The error messages indicate that the RPF lookup is failing.
Workaround: Disable RPF.
CEM-27502 To enable QOS on Intel XXV710 NICs, you must first perform the following procedure on the compute node:
Reboot the compute node.
Enter the systemctl start lldpad command.
Enter the systemctl status lldpaddcbtool sc ens1f0 dcb on command.
You should know be able to enable QoS on the NIC.
If you are unable to enable QoS on the NIC after performing step 3, proceed to step 4.
Enter the dcbtool sc dcbx v:force-cee command.
CEM-25388 Updating the DNS server attribute associated with VN does not work. To update, add, or remove the DNS server details in a VN, the user can delete and re-create the VN with desired DNS server details.
CEM-20477 Incase of EVPN service chain, the service chain with transparent firewall does not establish connection between left and right VNs.
CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.
CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.
CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.
CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.
CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".
CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.
CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.
CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.
Contrail Fabric Management
CEM-29418 Hitless software upgrade attempts are failing with a "MODULE FAILURE\nSee stdout/stderr for the exact error" error.
Workaround : QFX series switches can be upgraded manually by logging into the console of the device. You can achieve a hitless upgrade by putting the QFX series switch that is the target of the upgrade into maintenance mode.
CEM-29154 The Zero Touch Provisioning (ZTP) process fails to onboard an MX Series router due to an IP address not getting assigned to an FXP interface.
CEM-28971 An attempt to onboard a fabric using Zero Touch Provisioning (ZTP) is not completing. The control and collector processes are in the initializing state and the RabbitMQ procedure is continuously restarting on one control node.
CEM-28941 Configurations are not getting pushed to the fabric device.
Workaround: Restart RabbitMQ and the DM container.
CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.
CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.
CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.
CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.
CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.
As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.
https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/
ansible-playbooks/filter_plugins/fabric.py#L2594
After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.
CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.
CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.
CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.
CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.
CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.
CEM-19802 Security Groups cannot be used on QFX10K interfaces.
Kubernetes
CEM-29067 In a Kubernetes environment deployed using tf-operator, the Contrail collector process fails during upgrades. The Contrail collector services restart and come back online automatically after the failure.
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
General
CEM-29197 A virtual port group (VPG) cannot be deleted after removing a virtual machine interface (VMI). Workaround: Delete the VMI instead of removing the VMI reference from the VPG.
CEM-29163 The Contrail Agent sometimes fails in remote compute environments the during overcloud load creation process. The agent automatically recovers and restarts.
CEM-28044 The Contrail Networking upgrade process might backup Config DB nodes in the wrong order.
Workaround: Use updated utility to backup the data instead. Alternatively, manually inspect the backup file, create an ordered list of contrail_config_database_node UUIDs (in the correct order). Pass those to the commandutil during restore.
The steps to create an ordered list of Config DB nodes are as follows:
Download commandutil binary from https://webdownload.juniper.net/swdl/dl/secure/site/1/record/165079.html.
Copy commandutil to /tmp folder.
chmod u+x /tmp/commandutil
docker cp /tmp/commandutil contrail_command:/usr/bin/commandutil_21.4.L2
docker exec contrail_command commandutil_21.4.L2 convert --intype rdbms --outtype yaml --out /etc/contrail/db.yml -c /etc/contrail/command-app-server.yml; mkdir -p ~/backups; mv /etc/contrail/db.yml ~/backups/
CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.
Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.
CEM-25524 In lab scenario with K8s deployment with one controller and one compute, while inter namespace traffic was enabled, contrail-controller core was observed very rarely and the system recovered itself after the core. This scenario did not happen in HA environments, which is the recommended deployment for production.
CEM-25331 On a scaled environment, restarting a controller node may keep the schema transformer service restarting for few times.
CEM-25109 In the VPG page in UI, VLAN information does not get reflected for internal SRIOV VPG. This is only an intermittent behaviour.
JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.
As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.
CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.
CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.
CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.
CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.
CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.
CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.
Telemetry and Analytics
CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.
Workaround: Perform the following steps:
Stop Kafka on all the analytics nodes.
docker stop analytics_alarm_kafka_1
On a single Contrail controller, perform the Zookeeper cleanup.
docker exec -it config_database_zookeeper_1 bash
bin/zkCli.sh -server <IP>:2181
deleteall /brokers
deleteall /consumers
Start Kafka on all the analytics nodes.
docker start analytics_alarm_kafka_1
CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.
CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.
CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.
CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric
Known Behavior in Contrail Networking Release 21.4
- OpenStack
- DPDK and SR-IOV
- SmartNIC
- General Routing
- Contrail Fabric Management
- Kubernetes
- General
- Telemetry and Analytics
OpenStack
CEM-30003 When a project having resources is renamed in Openstack, the new name of the project is not updated in Contrail Networking Release 21.4. As a result, the project name in Openstack is different from Contrail Networking Release 21.4. However, if the project does not have any resources, when any resource will be created in the respective project, the project name and the resource name are the same in Contrail Networking Release 21.4 as in Openstack.
CEM-26599 In deployments using remote compute, the time to create a virtual machine (VM) on a remote site can take an extended amount of time. The VM creation time can take over 6 minutes.
CEM-27083 During deployment, a core error sometimes occurs with the Contrail Collector service. The Contrail Collector service automatically restarts and no functional impact is observed when this issue is experienced.
CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.
CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.
DPDK and SR-IOV
CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.
CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.
CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with
hw:mem_page_size='any'
flavor. As a workaround, use thehw:mem_page_size='large'
flavor instead to avoid the issue.CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.
CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.
CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.
CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.
CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.
SmartNIC
CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.
General Routing
CEM-27502 To enable QOS on Intel XXV710 NICs, you must first perform the following procedure on the compute node:
Reboot the compute node.
Enter the systemctl start lldpad command.
Enter the systemctl status lldpaddcbtool sc ens1f0 dcb on command.
You should know be able to enable QoS on the NIC.
If you are unable to enable QoS on the NIC after performing step 3, proceed to step 4.
Enter the dcbtool sc dcbx v:force-cee command.
CEM-25388 Updating the DNS server attribute associated with VN does not work. To update, add, or remove the DNS server details in a VN, the user can delete and re-create the VN with desired DNS server details.
CEM-20477 Incase of EVPN service chain, the service chain with transparent firewall does not establish connection between left and right VNs.
CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.
CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.
CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.
CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.
CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".
CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.
CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.
CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.
Contrail Fabric Management
CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.
CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.
CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.
CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.
CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.
As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.
https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/
ansible-playbooks/filter_plugins/fabric.py#L2594
After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.
CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.
CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.
CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.
CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.
CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.
CEM-19802 Security Groups cannot be used on QFX10K interfaces.
Kubernetes
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
General
CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.
Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.
CEM-25524 In lab scenario with K8s deployment with one controller and one compute, while inter namespace traffic was enabled, contrail-controller core was observed very rarely and the system recovered itself after the core. This scenario did not happen in HA environments, which is the recommended deployment for production.
CEM-25331 On a scaled environment, restarting a controller node may keep the schema transformer service restarting for few times.
CEM-25109 In the VPG page in UI, VLAN information does not get reflected for internal SRIOV VPG. This is only an intermittent behaviour.
JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.
As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.
CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.
CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.
CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.
CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.
CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.
CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.
Telemetry and Analytics
CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.
Workaround: Perform the following steps:
Stop Kafka on all the analytics nodes.
docker stop analytics_alarm_kafka_1
On a single Contrail controller, perform the Zookeeper cleanup.
docker exec -it config_database_zookeeper_1 bash
bin/zkCli.sh -server <IP>:2181
deleteall /brokers
deleteall /consumers
Start Kafka on all the analytics nodes.
docker start analytics_alarm_kafka_1
CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.
CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.
CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.
CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric
Known Behavior in Contrail Networking Release 21.3.1
- OpenStack
- OpenShift
- DPDK and SR-IOV
- SmartNIC
- General Routing
- Contrail Fabric Management
- Kubernetes
- General
- Telemetry and Analytics
OpenStack
CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.
CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.
OpenShift
CEM-25482 While upgrading Contrail in Openshift based environments, contrail-collector core is observed during the upgrade process. As this happens once while the system is in the upgrade window and the service auto recovers, no functionality impact is expected.
CEM-21614 In an OpenShift 4.6 cluster, the
contrail-status
command might show warnings for Zookeeper and RabbitMQ and status not reported. As a workaround, you can ignore this warning and based on other active contrail services, you can consider the Zookeeper and RabbitMQ statuses also as active. To get the router agent status, entercontrail-status -t 15
on the compute node.CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “
opencontrail.org/ip_fabric_snat": "false
”.
DPDK and SR-IOV
CEM-23810 When Generic Segmention Offload (GSO) is enabled, IPv6 packets with header lengths of 128 bytes or more are dropped.
Workaround: Disable GSO.
CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.
CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.
CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with
hw:mem_page_size='any'
flavor. As a workaround, use thehw:mem_page_size='large'
flavor instead to avoid the issue.CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.
CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.
CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.
CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.
CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.
SmartNIC
CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.
General Routing
CEM-25388 Updating the DNS server attribute associated with VN does not work. To update, add, or remove the DNS server details in a VN, the user can delete and re-create the VN with desired DNS server details.
CEM-20477 Incase of EVPN service chain, the service chain with transparent firewall does not establish connection between left and right VNs.
CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.
CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.
CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.
CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.
CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".
CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.
CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.
CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.
Contrail Fabric Management
CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.
CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.
CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.
CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.
CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.
As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.
https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/
ansible-playbooks/filter_plugins/fabric.py#L2594
After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.
CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.
CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.
CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.
CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.
CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.
CEM-19802 Security Groups cannot be used on QFX10K interfaces.
Kubernetes
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
General
CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.
Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.
CEM-25538 Configuring BGPaaS can bring kernel agent state to initializing.
As a workaround, configure the iptables on compute again (as given in CEM-23911) and restart the agent.
CEM-25524 In lab scenario with K8s deployment with one controller and one compute, while inter namespace traffic was enabled, contrail-controller core was observed very rarely and the system recovered itself after the core. This scenario did not happen in HA environments, which is the recommended deployment for production.
CEM-25331 On a scaled environment, restarting a controller node may keep the schema transformer service restarting for few times.
CEM-25109 In the VPG page in UI, VLAN information does not get reflected for internal SRIOV VPG. This is only an intermittent behaviour.
JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.
As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.
CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.
CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.
CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.
CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.
CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.
CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.
Telemetry and Analytics
CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.
Workaround: Perform the following steps:
Stop Kafka on all the analytics nodes.
docker stop analytics_alarm_kafka_1
On a single Contrail controller, perform the Zookeeper cleanup.
docker exec -it config_database_zookeeper_1 bash
bin/zkCli.sh -server <IP>:2181
deleteall /brokers
deleteall /consumers
Start Kafka on all the analytics nodes.
docker start analytics_alarm_kafka_1
CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.
CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.
CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.
CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric
Known Behavior in Contrail Networking Release 21.3
- OpenStack
- OpenShift
- DPDK and SR-IOV
- SmartNIC
- General Routing
- Contrail Fabric Management
- Kubernetes
- General
- Telemetry and Analytics
OpenStack
CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.
CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.
OpenShift
CEM-25482 While upgrading Contrail in Openshift based environments, contrail-collector core is observed during the upgrade process. As this happens once while the system is in the upgrade window and the service auto recovers, no functionality impact is expected.
CEM-21614 In an OpenShift 4.6 cluster, the
contrail-status
command might show warnings for Zookeeper and RabbitMQ and status not reported. As a workaround, you can ignore this warning and based on other active contrail services, you can consider the Zookeeper and RabbitMQ statuses also as active. To get the router agent status, entercontrail-status -t 15
on the compute node.CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “
opencontrail.org/ip_fabric_snat": "false
”.
DPDK and SR-IOV
CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.
CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.
CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with
hw:mem_page_size='any'
flavor. As a workaround, use thehw:mem_page_size='large'
flavor instead to avoid the issue.CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.
CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.
CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.
CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.
CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.
SmartNIC
CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.
General Routing
CEM-25388 Updating the DNS server attribute associated with VN does not work. To update, add, or remove the DNS server details in a VN, the user can delete and re-create the VN with desired DNS server details.
CEM-20477 Incase of EVPN service chain, the service chain with transparent firewall does not establish connection between left and right VNs.
CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.
CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.
CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.
CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.
CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".
CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.
CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.
CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.
Contrail Fabric Management
CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.
CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.
CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.
CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.
CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.
As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.
https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/
ansible-playbooks/filter_plugins/fabric.py#L2594
After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.
CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.
CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.
CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.
CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.
CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.
CEM-19802 Security Groups cannot be used on QFX10K interfaces.
Kubernetes
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
General
CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.
Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.
CEM-25538 Configuring BGPaaS can bring kernel agent state to initializing.
As a workaround, configure the iptables on compute again (as given in CEM-23911) and restart the agent.
CEM-25524 In lab scenario with K8s deployment with one controller and one compute, while inter namespace traffic was enabled, contrail-controller core was observed very rarely and the system recovered itself after the core. This scenario did not happen in HA environments, which is the recommended deployment for production.
CEM-25331 On a scaled environment, restarting a controller node may keep the schema transformer service restarting for few times.
CEM-25109 In the VPG page in UI, VLAN information does not get reflected for internal SRIOV VPG. This is only an intermittent behaviour.
JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.
As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.
CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.
CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.
CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.
CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.
CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.
CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.
Telemetry and Analytics
CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.
Workaround: Perform the following steps:
Stop Kafka on all the analytics nodes.
docker stop analytics_alarm_kafka_1
On a single Contrail controller, perform the Zookeeper cleanup.
docker exec -it config_database_zookeeper_1 bash
bin/zkCli.sh -server <IP>:2181
deleteall /brokers
deleteall /consumers
Start Kafka on all the analytics nodes.
docker start analytics_alarm_kafka_1
CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.
CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.
CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.
CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric