Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Known Behavior

This section lists known limitations with this release.

Known Behavior in Contrail Networking Release 2011.L5

OpenStack

  • CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:

    1. Edit the /etc/kolla/mariadb/galera.cnf file to remove the wsrep address on one of the controllers as shown here.

      Note:

      If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.

    2. Docker start mariadb on the controller on which you edited the file.

    3. Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.

    4. Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.

  • CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.

OpenShift

  • CEM-21614 In an OpenShift 4.6 cluster, the contrail-status command might show warnings for Zookeeper and RabbitMQ and status not reported. As a workaround, you can ignore this warning and based on other active contrail services, you can consider the Zookeeper and RabbitMQ statuses also as active. To get the router agent status, enter contrail-status -t 15 on the compute node.

  • CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “opencontrail.org/ip_fabric_snat": "false”.

DPDK and SR-IOV

  • CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.

  • CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with hw:mem_page_size='any' flavor. As a workaround, use the hw:mem_page_size='large' flavor instead to avoid the issue.

  • CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.

  • CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.

  • CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.

  • CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.

  • CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.

  • JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.

SmartNIC

  • CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.

General Routing

  • CEM-28889 BGP as a Service (BGPaaS) does not work across multiple pods in the same VM due to BGPaaS sessions flapping.

    Workaround: Always configure BGPaaS to be deployed with a single pod in a single VM.

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • CEM-26594 vRouter agent assert seen if the vHost interface is not present. This scenario happens when a deployment fails and a vHost interface was not brought up. The vRouter agent will retry to come online and will continually assert and retry to assert for 5 minutes. This issue is typically a transient problem that resolves itself when the vHost interface is brought up and the vRouter agent automatically recovers.

  • CEM-21234 While running parallel workloads, the collector service is rarely restarted. The collector service will recovered by itself with no intervention.

  • CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.

  • CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.

  • CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.

  • CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.

  • CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".

  • CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.

  • CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.

Contrail Fabric Management

  • CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.

  • CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.

  • CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.

  • CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.

  • CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.

    As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.

    https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/ansible-playbooks/filter_plugins/fabric.py#L2594

    After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.

  • CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.

  • CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.

  • CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.

  • CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.

  • CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.

  • CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.

  • CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.

  • CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.

  • CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.

  • CEM-19802 Security Groups cannot be used on QFX10K interfaces.

Kubernetes

  • JCB-187287 High Availability provisioning of Kubernetes master is not supported.

General

  • JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.

    As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.

  • CEM-28618 LXD containers are not getting an IP address and are stuck in the pending state when deployed on bionic series machines.

    Workaround: Enter the following commands to edit and restart the LXD container, where juju-238c35-0-lxd-4 is the LXD container name.

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.

  • CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.

  • CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.

  • CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.

  • CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.

Telemetry and Analytics

  • CEM-28617 A Contrail-collect core file might be produced after a deployment.

    Workaround: Continue to work as normal. The presence of the core file has no impact on functionality.

  • CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.

    Workaround: Perform the following steps:

    1. Stop Kafka on all the analytics nodes.

      docker stop analytics_alarm_kafka_1

    2. On a single Contrail controller, perform the Zookeeper cleanup.

      docker exec -it config_database_zookeeper_1 bash

      bin/zkCli.sh -server <IP>:2181

      deleteall /brokers

      deleteall /consumers

    3. Start Kafka on all the analytics nodes.

      docker start analytics_alarm_kafka_1

  • CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.

  • CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.

  • CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.

  • CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric

  • CEM-10929 When Contrail Insights is querying LLDP table from a device through SNMP, if SNMP calls time out, Contrail Insights marks the device as invalidConfiguration and notifies the user to take a look. When the user verifies that snmpwalk is working and there are no network issues, click Edit and reconfigure that device from Settings > Network Devices to make Contrail Insights try to run LLDP discovery and add this device again.

Known Behavior in Contrail Networking Release 2011.L4

OpenStack

  • CEM-26498 In cloud environments that were deployed using Kolla Ansible and are using Contrail Networking, package installations sometimes fail with a GPG Check failure. This GPG Check error occurs due to expired keys in the upstream network that are refreshed by community. The deployment code, meanwhile, has been updated to disable the check.

    To enable the check and workaround this issue, set the variable K8S_YUM_REPO_GPGCHECK under global_configurations in instances.yaml.

  • CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.

  • CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:

    1. Edit the /etc/kolla/mariadb/galera.cnf file to remove the wsrep address on one of the controllers as shown here.

      Note:

      If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.

    2. Docker start mariadb on the controller on which you edited the file.

    3. Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.

    4. Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.

  • CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.

  • CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.

OpenShift

  • CEM-21614 In an OpenShift 4.6 cluster, the contrail-status command might show warnings for Zookeeper and RabbitMQ and status not reported. As a workaround, you can ignore this warning and based on other active contrail services, you can consider the Zookeeper and RabbitMQ statuses also as active. To get the router agent status, enter contrail-status -t 15 on the compute node.

  • CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “opencontrail.org/ip_fabric_snat": "false”.

DPDK and SR-IOV

  • CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.

  • CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.

  • CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with hw:mem_page_size='any' flavor. As a workaround, use the hw:mem_page_size='large' flavor instead to avoid the issue.

  • CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.

  • CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.

  • CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.

  • CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.

  • CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.

  • JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.

SmartNIC

  • CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.

General Routing

  • CEM-28889 BGP as a Service (BGPaaS) does not work across multiple pods in the same VM due to BGPaaS sessions flapping.

    Workaround: Always configure BGPaaS to be deployed with a single pod in a single VM.

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • CEM-26594 vRouter agent assert seen if the vHost interface is not present. This scenario happens when a deployment fails and a vHost interface was not brought up. The vRouter agent will retry to come online and will continually assert and retry to assert for 5 minutes. This issue is typically a transient problem that resolves itself when the vHost interface is brought up and the vRouter agent automatically recovers.

  • CEM-21234 While running parallel workloads, the collector service is rarely restarted. The collector service will recovered by itself with no intervention.

  • CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.

  • CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.

  • CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.

  • CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.

  • CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".

  • CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.

  • CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.

Contrail Fabric Management

  • CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.

  • CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.

  • CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.

  • CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.

  • CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.

    As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.

    https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/ansible-playbooks/filter_plugins/fabric.py#L2594

    After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.

  • CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.

  • CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.

  • CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.

  • CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.

  • CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.

  • CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.

  • CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.

  • CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.

  • CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.

  • CEM-19802 Security Groups cannot be used on QFX10K interfaces.

Kubernetes

  • JCB-187287 High Availability provisioning of Kubernetes master is not supported.

General

  • JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.

    As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.

  • CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.

  • CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.

  • CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.

  • CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.

Telemetry and Analytics

  • CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.

    Workaround: Perform the following steps:

    1. Stop Kafka on all the analytics nodes.

      docker stop analytics_alarm_kafka_1

    2. On a single Contrail controller, perform the Zookeeper cleanup.

      docker exec -it config_database_zookeeper_1 bash

      bin/zkCli.sh -server <IP>:2181

      deleteall /brokers

      deleteall /consumers

    3. Start Kafka on all the analytics nodes.

      docker start analytics_alarm_kafka_1

  • CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.

  • CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.

  • CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.

  • CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric

  • CEM-10929 When Contrail Insights is querying LLDP table from a device through SNMP, if SNMP calls time out, Contrail Insights marks the device as invalidConfiguration and notifies the user to take a look. When the user verifies that snmpwalk is working and there are no network issues, click Edit and reconfigure that device from Settings > Network Devices to make Contrail Insights try to run LLDP discovery and add this device again.

Known Behavior in Contrail Networking Release 2011.L3

OpenStack

  • CEM-23142 Few deployment instances of RHOSP13 deployment with Contrail Networking 2011.L2 failed during installation of Openstack components. RHOSP13 customers are advised to contact Redhat support referring Case ID - 02995585 prior to installation.

  • CEM-22783 Upgrading a RHOSP13 cluster from RHOSP13 images tested with Contrail Networking 2011.L1 to the RHOSP13 images tested with Contrail Networking 2011.L2 fails.

    As a workaround, disable the Ceilometer service before upgrading. It is recommended to contact Redhat support and refer the case ID 02989025 before proceeding with upgrade.

  • CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.

  • CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:

    1. Edit the /etc/kolla/mariadb/galera.cnf file to remove the wsrep address on one of the controllers as shown here.

      Note:

      If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.

    2. Docker start mariadb on the controller on which you edited the file.

    3. Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.

    4. Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.

  • CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.

  • CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.

  • CEM-5141 For deleting compute nodes, the UI workflow will not work. Instead, update the instances.yaml with “ENABLE_DESTROY: True” and “roles:” (leave it empty) and run the following playbooks.

    For example:

OpenShift

  • CEM-21614 In an OpenShift 4.6 cluster, the contrail-status command might show warnings for Zookeeper and RabbitMQ and status not reported. As a workaround, you can ignore this warning and based on other active contrail services, you can consider the Zookeeper and RabbitMQ statuses also as active. To get the router agent status, enter contrail-status -t 15 on the compute node.

  • CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “opencontrail.org/ip_fabric_snat": "false”.

DPDK and SR-IOV

  • CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.

  • CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.

  • CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with hw:mem_page_size='any' flavor. As a workaround, use the hw:mem_page_size='large' flavor instead to avoid the issue.

  • CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.

  • CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.

  • CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.

  • CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.

  • CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.

  • JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.

SmartNIC

  • CEM-20620 Gatewayless fowarding feature is not supported on Netronome.

  • CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.

General Routing

  • CEM-28889 BGP as a Service (BGPaaS) does not work across multiple pods in the same VM due to BGPaaS sessions flapping.

    Workaround: Always configure BGPaaS to be deployed with a single pod in a single VM.

  • CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.

  • CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.

  • CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.

  • CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.

  • CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".

  • CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.

  • CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.

  • CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.

Contrail Fabric Management

  • CEM-23931 Provisioning Contrail Fabric Manager using Ansible deployer intermittently fails while installing swift-client. If this condition occurs, preinstall swift-client on the controllers and then rerun the provisioning.

  • CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.

  • CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.

  • CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.

  • CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.

    As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.

    https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/ansible-playbooks/filter_plugins/fabric.py#L2594

    After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.

  • CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.

  • CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.

  • CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.

  • CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.

  • CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.

  • CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.

  • CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.

  • CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.

  • CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.

  • CEM-19802 Security Groups cannot be used on QFX10K interfaces.

Kubernetes

  • JCB-187287 High Availability provisioning of Kubernetes master is not supported.

General

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.

    As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.

  • CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.

  • CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.

  • CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.

  • CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.

  • CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.

  • CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.

Telemetry and Analytics

  • CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.

    Workaround: Perform the following steps:

    1. Stop Kafka on all the analytics nodes.

      docker stop analytics_alarm_kafka_1

    2. On a single Contrail controller, perform the Zookeeper cleanup.

      docker exec -it config_database_zookeeper_1 bash

      bin/zkCli.sh -server <IP>:2181

      deleteall /brokers

      deleteall /consumers

    3. Start Kafka on all the analytics nodes.

      docker start analytics_alarm_kafka_1

  • CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.

  • CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.

  • CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.

  • CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric

  • CEM-10929 When Contrail Insights is querying LLDP table from a device through SNMP, if SNMP calls time out, Contrail Insights marks the device as invalidConfiguration and notifies the user to take a look. When the user verifies that snmpwalk is working and there are no network issues, click Edit and reconfigure that device from Settings > Network Devices to make Contrail Insights try to run LLDP discovery and add this device again.

Known Behavior in Contrail Networking Release 2011.L2

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • CEM-23142 Few deployment instances of RHOSP13 deployment with Contrail Networking 2011.L2 failed during installation of Openstack components. RHOSP13 customers are advised to contact Redhat support referring Case ID - 02995585 prior to installation.

  • CEM-22783 Upgrading a RHOSP13 cluster from RHOSP13 images tested with Contrail Networking 2011.L1 to the RHOSP13 images tested with Contrail Networking 2011.L2 fails.

    As a workaround, disable the Ceilometer service before upgrading. It is recommended to contact Redhat support and refer the case ID 02989025 before proceeding with upgrade.

  • CEM-22835 During the high throughput data transfer between virtual machines (VMs) running in same DPDK enabled compute node, intermittent performance drop is observed along with checksum errors. Functionality is not impacted.

  • CEM-22632 With an MX Series router acting as Datacenter Gateway and if MPLSoUDP is used between the controllers and MX, Floating IP use-case does not work. Use MPLSoGRE or VXLAN instead.

  • CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.

  • CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.

  • CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.

  • CEM-21614 In an OpenShift 4.6 cluster, the contrail-status command might show warnings for Zookeeper and RabbitMQ and status not reported. As a workaround, you can ignore this warning and based on other active contrail services, you can consider the Zookeeper and RabbitMQ statuses also as active. To get the router agent status, enter contrail-status -t 15 on the compute node.

  • CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.

  • CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.

    Workaround: Perform the following steps:

    1. Stop Kafka on all the analytics nodes.

      docker stop analytics_alarm_kafka_1

    2. On a single Contrail controller, perform the Zookeeper cleanup.

      docker exec -it config_database_zookeeper_1 bash

      bin/zkCli.sh -server <IP>:2181

      deleteall /brokers

      deleteall /consumers

    3. Start Kafka on all the analytics nodes.

      docker start analytics_alarm_kafka_1

  • CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.

  • CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.

  • CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “opencontrail.org/ip_fabric_snat": "false”.

  • CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.

  • CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.

  • CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.

  • CEM-20620 Gatewayless fowarding feature is not supported on Netronome.

  • CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.

  • CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".

  • CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.

  • CEM-20308 After one of HA Master nodes failover (or) vrouter restart, further user PODs creation might fail without getting IP-address.

    As a workaround, find HA master nodes control-pods which are not in sync with respect to “new user-pod” and to restart them. Perform the following steps:

    1. Log in to 3 HA masters and find the crictl pod with name “control” and log in to it verify the command output of “curl --cert /etc/certificates/server-key-localhost --insecure https://localhost:8083/Snh_IFMapTableShowReq?table_name=virtual-machine” showing the name of the latest user pod which failed.

    2. Restart those control PODs which are not in sync.

  • CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.

  • CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.

  • CEM-19802 Security Groups cannot be used on QFX10K interfaces.

  • CEM-18979 The vRouter to vRouter encryption feature is beta quality and should be used for future product capability demonstrations only.

  • CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.

  • CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with hw:mem_page_size='any' flavor. As a workaround, use the hw:mem_page_size='large' flavor instead to avoid the issue.

  • CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.

  • CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.

  • CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.

  • CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.

  • CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:

    1. Edit the /etc/kolla/mariadb/galera.cnf file to remove the wsrep address on one of the controllers as shown here.

      Note:

      If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.

    2. Docker start mariadb on the controller on which you edited the file.

    3. Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.

    4. Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.

  • CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.

  • CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.

  • CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.

  • CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.

  • CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.

  • CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.

  • CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.

  • CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.

    As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.

    https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/ansible-playbooks/filter_plugins/fabric.py#L2594

    After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.

  • CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.

  • CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.

  • CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.

  • CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric

  • CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.

  • CEM-10929 When Contrail Insights is querying LLDP table from a device through SNMP, if SNMP calls time out, Contrail Insights marks the device as invalidConfiguration and notifies the user to take a look. When the user verifies that snmpwalk is working and there are no network issues, click Edit and reconfigure that device from Settings > Network Devices to make Contrail Insights try to run LLDP discovery and add this device again.

  • CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.

  • CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.

  • CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.

  • CEM-5141 For deleting compute nodes, the UI workflow will not work. Instead, update the instances.yaml with “ENABLE_DESTROY: True” and “roles:” (leave it empty) and run the following playbooks.

    For example:

  • CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.

  • CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.

  • CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.

  • CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.

  • CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.

  • JCB-187287 High Availability provisioning of Kubernetes master is not supported.

  • JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.

    As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.

  • JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.

Known Behavior in Contrail Networking Release 2011.L1

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • CEM-21910 Post deployment of a RHOSP16 cluster with IPA enabled, sometimes the vRouter agent will be in the initializing state. To clear this and bring back the agent to normal, issue the following command on the compute node:

  • CEM-21909 The deployment of a RHOSP16 cluster with IPA enabled needs the following parameters to be set in any heat-templates used for deployment:

  • CEM-21835 An allowed address pair (AAP) less than /32 must not be used, if Bare Metal Server (BMS) route (VXLAN route) is within the AAP CIDR.

  • CEM-21793 A BGP multihop session cannot be established with a Kubernetes container via a trunked subport when the Kubernetes container is running inside a VM instance that is managed by Openstack because Contrail Networking responds to the BGP request with the wrong MAC address.

  • CEM-21446 If the sFlow node deletion fails for the first time, re-provision the cluster to remove the node. The other nodes will stay active during the reprovisioning.

  • CEM-7262 You must not attach the Bidirectional Forwarding Detection (BFD) protocol a BGPaaS object. You can attach the BFD to to a VMI.

  • CEM-21570 The AS_PATH retain does not work with a 4-Byte autonomous system number (ASN) enabled cluster.

  • CEM-21614 In an OpenShift 4.6 cluster, the contrail-status command might show warnings for Zookeeper and RabbitMQ and status not reported. As a workaround, you can ignore this warning and based on other active contrail services, you can consider the Zookeeper and RabbitMQ statuses also as active. To get the router agent status, enter contrail-status -t 15 on the compute node.

  • CEM-21547 The SR-IOV configuration automation does not work for a Multihomed server.

  • CEM-21526 When you upgrade Contrail Networking from a release prior to R2011 to Contrail Networking Release R2011.138, a Kafka container running on an analytics node might report connection issues.

    Workaround: Perform the following steps:

    1. Stop Kafka on all the analytics nodes.

      docker stop analytics_alarm_kafka_1

    2. On a single Contrail controller, perform the Zookeeper cleanup.

      docker exec -it config_database_zookeeper_1 bash

      bin/zkCli.sh -server <IP>:2181

      deleteall /brokers

      deleteall /consumers

    3. Start Kafka on all the analytics nodes.

      docker start analytics_alarm_kafka_1

  • CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.

  • CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.

  • CEM-20828 In Contrail fabric management with ML2 plugin, the sFlow nodes are referred to with the UUID and cannot be referred to by the hostname.

  • CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “opencontrail.org/ip_fabric_snat": "false”.

  • CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.

  • CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.

  • CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.

  • CEM-20620 Gatewayless fowarding feature is not supported on Netronome.

  • CEM-20513 While upgrading Contrail Networking Release 19xx with RHOSP13 to Contrail Networking Release 2011 with RHOSP16.1, FFU upgrade of compute node fails with the error—'RPC failed at server. Insufficient access: Insufficient 'add' privilege to add the entry 'krbprincipalname=qemu/compute-0-ffu.internalapi.nodel8.local@NODEL8.LOCAL,cn=services,cn=accounts,dc=nodel8,dc=local'.

    Perform the following steps to apply the patch:

    • On undercloud:

      1. Apply the patch as given at https://review.opendev.org/c/openstack/tripleo-heat-templates/+/764064/3/deployment/nova/novajoin-container-puppet.yaml.

      2. Deploy undercloud by running openstack undercloud install command.

      3. Ensure the novajoin_notifier container is up and not constantly restarting.

        podman ps |grep novajoin

        Validate the status by checking /var/log/containers/novajoin/* logs.

        TLS-E can not work properly without the novajoin containers running.

      4. Ensure that the transport_url for novajoin does not contain guest user.

        grep ^transport_url /var/lib/config-data/puppet-generated/novajoin/etc/novajoin/join.conf

    • On compute (kernel/dpdk) node:

      1. Log in to knit admin.

        heat-admin@overcloud-contraildpdk-0-ffu ~]$ kinit admin

        Use FreeIPA password for admin@NODEL8.LOCAL.

      2. Add DNS recorded of compute node.

      3. Add IPA service.

      4. Add host to the IPA service.

  • CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.

  • CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".

  • CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.

  • CEM-20308 After one of HA Master nodes failover (or) vrouter restart, further user PODs creation might fail without getting IP-address.

    As a workaround, find HA master nodes control-pods which are not in sync with respect to “new user-pod” and to restart them. Perform the following steps:

    1. Log in to 3 HA masters and find the crictl pod with name “control” and log in to it verify the command output of “curl --cert /etc/certificates/server-key-localhost --insecure https://localhost:8083/Snh_IFMapTableShowReq?table_name=virtual-machine” showing the name of the latest user pod which failed.

    2. Restart those control PODs which are not in sync.

  • CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.

  • CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.

  • CEM-20248 While upgrading Contrail Networking Release 19xx with RHOSP13 to Contrail Networking Release 2011 with RHOSP16.1, overcloud nodes transitions to ERROR state after upgrading the undercloud. As a workaround, apply a patch as mentioned at https://bugzilla.redhat.com/show_bug.cgi?id=1850929.

  • CEM-19802 Security Groups cannot be used on QFX10K interfaces.

  • CEM-19151 During deployment we see race condition, due to which ipa-client installation on compute nodes fails. This is an issue with Red Hat. As a workaround, before deployment starts, modify the following file to add sleep of 400 seconds on undercloud.

  • CEM-18979 The vRouter to vRouter encryption feature is beta quality and should be used for future product capability demonstrations only.

  • CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.

  • CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with hw:mem_page_size='any' flavor. As a workaround, use the hw:mem_page_size='large' flavor instead to avoid the issue.

  • CEM-18909 In case of RHOSP16 deployment with TLS, XMPP connection down is seen post deployment completion. While this is a cosmetic issue and does not impact functionality, as a workaround, restart the vRouter agent container on all compute nodes to update status.

  • CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.

  • CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.

  • CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.

  • CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.

  • CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:

    1. Edit the /etc/kolla/mariadb/galera.cnf file to remove the wsrep address on one of the controllers as shown here.

      Note:

      If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.

    2. Docker start mariadb on the controller on which you edited the file.

    3. Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.

    4. Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.

  • CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.

  • CEM-17866 Monitoring/Operations page crashes with "Cannot read property 'className' of undefined". As a workaround, refresh the page to display the content properly.

  • CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.

  • CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.

  • CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.

  • CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.

  • CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.

  • CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.

  • CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.

    As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.

    https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/ansible-playbooks/filter_plugins/fabric.py#L2594

    After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.

  • CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.

  • CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.

  • CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.

  • CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric

  • CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.

  • CEM-10929 When Contrail Insights is querying LLDP table from a device through SNMP, if SNMP calls time out, Contrail Insights marks the device as invalidConfiguration and notifies the user to take a look. When the user verifies that snmpwalk is working and there are no network issues, click Edit and reconfigure that device from Settings > Network Devices to make Contrail Insights try to run LLDP discovery and add this device again.

  • CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.

  • CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.

  • CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.

  • CEM-5141 For deleting compute nodes, the UI workflow will not work. Instead, update the instances.yaml with “ENABLE_DESTROY: True” and “roles:” (leave it empty) and run the following playbooks.

    For example:

  • CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.

  • CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.

  • CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.

  • CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.

  • CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.

  • JCB-187287 High Availability provisioning of Kubernetes master is not supported.

  • JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.

    As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.

  • JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.

Known Behavior in Contrail Networking Release 2011

  • CEM-27916 In environments that were deployed using Juju Charms and are using Contrail Networking with LXD, nodes that are using LXD are not restarting after a Contrail Controller reboot. The reboot triggers an IP tables issue that is not resolvable by Contrail Networking. This IP tables issue is described in the Unable to reach host gateway from lxd container article on the Charmhub Discourse page.

    Workaround: Enable forward accept on the nodes using LXD by entering the sudo iptables -P FORWARD ACCEPT command.

  • CEM-20846 In rare cases, sFlow node provisioning fails while initializing kafka container. If this scenario happens during provisioning, redeploying will bring up the sFlow nodes.

  • CEM-20829 In Contrail fabric management with ML2 plugin, the telemetry_in_band_interface used for sFlow must be a physical interface. VLAN interfaces are not supported.

  • CEM-20828 In Contrail fabric management with ML2 plugin, the sFlow nodes are referred to with the UUID and cannot be referred to by the hostname.

  • CEM-20802 When creating any new user-defined namespace on Openshift-4.x/Contrail, by default SNAT is enabled and so all the pods part of this namespace by default can reach internet servers. As a workaround, explicitly configure the Contrail annotations on the namespace as “opencontrail.org/ip_fabric_snat": "false”.

  • CEM-20794 In Contrail fabric management with ML2 plugin, configuring LAG in VMs associated with SRIOV VFs will not work.

  • CEM-20781 Telemetry KPI display for Junos EVO devices are not supported.

  • CEM-20693 The BGP routes widget under Fabrics > Ports > Leaf network device account for routes from inet.0 table only.

  • CEM-20620 Gatewayless fowarding feature is not supported on Netronome.

  • CEM-20513 While upgrading Contrail Networking Release 19xx with RHOSP13 to Contrail Networking Release 2011 with RHOSP16.1, FFU upgrade of compute node fails with the error—'RPC failed at server. Insufficient access: Insufficient 'add' privilege to add the entry 'krbprincipalname=qemu/compute-0-ffu.internalapi.nodel8.local@NODEL8.LOCAL,cn=services,cn=accounts,dc=nodel8,dc=local'.

    Perform the following steps to apply the patch:

    • On undercloud:

      1. Apply the patch as given at https://review.opendev.org/c/openstack/tripleo-heat-templates/+/764064/3/deployment/nova/novajoin-container-puppet.yaml.

      2. Deploy undercloud by running openstack undercloud install command.

      3. Ensure the novajoin_notifier container is up and not constantly restarting.

        podman ps |grep novajoin

        Validate the status by checking /var/log/containers/novajoin/* logs.

        TLS-E can not work properly without the novajoin containers running.

      4. Ensure that the transport_url for novajoin does not contain guest user.

        grep ^transport_url /var/lib/config-data/puppet-generated/novajoin/etc/novajoin/join.conf

    • On compute (kernel/dpdk) node:

      1. Log in to knit admin.

        heat-admin@overcloud-contraildpdk-0-ffu ~]$ kinit admin

        Use FreeIPA password for admin@NODEL8.LOCAL.

      2. Add DNS recorded of compute node.

      3. Add IPA service.

      4. Add host to the IPA service.

  • CEM-20421 In Contrail Networking, the logical router (LR) does not support dynamic next-hop port-mirroring, when Juniper headers are enabled. The Juniper header is not supported in port-mirroring as VXLAN is not the tunnel type used for the dynamic next hop in this case.

  • CEM-20419 BFD session takes longer time to coms up after the agent is restarted when MAC/IP is enabled in the VN and associated with BFD health check with target-ip set to "all".

  • CEM-20414 During Contrail Command container restart one deploy job per contrail_cluster object is fired and leads to restart issues. As a workaround, modify the contrail-cluster or openstack-cluster object through commandcli instead of modifying/creating the endpoint directly through UI.

  • CEM-20308 After one of HA Master nodes failover (or) vrouter restart, further user PODs creation might fail without getting IP-address.

    As a workaround, find HA master nodes control-pods which are not in sync with respect to “new user-pod” and to restart them. Perform the following steps:

    1. Log in to 3 HA masters and find the crictl pod with name “control” and log in to it verify the command output of “curl --cert /etc/certificates/server-key-localhost --insecure https://localhost:8083/Snh_IFMapTableShowReq?table_name=virtual-machine” showing the name of the latest user pod which failed.

    2. Restart those control PODs which are not in sync.

  • CEM-20280 In a Kubernetes and OpenStack joint setup, vrouter-agent restart sometimes leads to an unauthorized operation error. To resolve the issue, restart the vrouter-agent again.

  • CEM-20272 In L2 DCI mode, if selected fabrics have the same overlay ASN numbers, overlay iBGP is used between fabric devices for L2 DCI mode. In this case, the border device (physical router) marked with DCI-gateway RB role (routing and bridging) must also have RR (route reflector) RB role assigned. Without RR RB role, Overlay IBGP session won’t stretch Layer 2 tenant virtual network across the fabric’s leaf devices. So, we recommend that for L2 DCI Mode, ensure the physical router device is marked as DCI Gateway RB role along with RR role.

  • CEM-20248 While upgrading Contrail Networking Release 19xx with RHOSP13 to Contrail Networking Release 2011 with RHOSP16.1, overcloud nodes transitions to ERROR state after upgrading the undercloud. As a workaround, apply a patch as mentioned at https://bugzilla.redhat.com/show_bug.cgi?id=1850929.

  • CEM-19802 Security Groups cannot be used on QFX10K interfaces.

  • CEM-19151 During deployment we see race condition, due to which ipa-client installation on compute nodes fails. This is an issue with Red Hat. As a workaround, before deployment starts, modify the following file to add sleep of 400 seconds on undercloud.

  • CEM-18979 The vRouter to vRouter encryption feature is beta quality and should be used for future product capability demonstrations only.

  • CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.

  • CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with hw:mem_page_size='any' flavor. As a workaround, use the hw:mem_page_size='large' flavor instead to avoid the issue.

  • CEM-18909 In case of RHOSP16 deployment with TLS, XMPP connection down is seen post deployment completion. While this is a cosmetic issue and does not impact functionality, as a workaround, restart the vRouter agent container on all compute nodes to update status.

  • CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.

  • CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.

  • CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.

  • CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.

  • CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:

    1. Edit the /etc/kolla/mariadb/galera.cnf file to remove the wsrep address on one of the controllers as shown here.

      Note:

      If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.

    2. Docker start mariadb on the controller on which you edited the file.

    3. Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.

    4. Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.

  • CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.

  • CEM-17866 Monitoring/Operations page crashes with "Cannot read property 'className' of undefined". As a workaround, refresh the page to display the content properly.

  • CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.

  • CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.

  • CEM-16855 IPv6 ipam subnet option “enable_dhcp” is always ignored.

  • CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.

  • CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.

  • CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.

  • CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.

    As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.

    https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/ansible-playbooks/filter_plugins/fabric.py#L2594

    After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.

  • CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.

  • CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.

  • CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.

  • CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric

  • CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.

  • CEM-10929 When Contrail Insights is querying LLDP table from a device through SNMP, if SNMP calls time out, Contrail Insights marks the device as invalidConfiguration and notifies the user to take a look. When the user verifies that snmpwalk is working and there are no network issues, click Edit and reconfigure that device from Settings > Network Devices to make Contrail Insights try to run LLDP discovery and add this device again.

  • CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.

  • CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.

  • CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.

  • CEM-5141 For deleting compute nodes, the UI workflow will not work. Instead, update the instances.yaml with “ENABLE_DESTROY: True” and “roles:” (leave it empty) and run the following playbooks.

    For example:

  • CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.

  • CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.

  • CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.

  • CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.

  • CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.

  • JCB-187287 High Availability provisioning of Kubernetes master is not supported.

  • JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.

    As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.

  • JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.