Known Behavior
This section lists known limitations with this release.
Known Behavior in Contrail Networking Release 2008
CEM-19462 The Infrastructure > External Systems page in Contrail Command does not display the CVFM plugin despite successful installation. Contact Juniper Networks Technical Assistance Center (JTAC) for assistance in patching the CVFM plugin issue.
CEM-19175 If the quality of internet connectivity is not good, then the ansible deployer times out while pulling packages from internet. As a workaround, rerun the playbook after fixing the internet connectivity.
CEM-19151 During deployment we see race condition, due to which ipa-client installation on compute nodes fails. This is an issue with Red Hat. As a workaround, before deployment starts, modify the following file to add sleep of 400 seconds on undercloud.
sudo vi /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml - name: DBG debug: msg: "sleep 400 sec if reboot_required == {{ reboot_required }}" - name: DBG sleep shell: sleep 400 when: - reboot_required is defined and reboot_required# then sudo find / -name kernelargs.yml # to find all such files on undercloud and in containers because I am not sure which exactly is used (from host or from container) # and overwriting such files in containers like sudo cp /usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml /var/lib/containers/storage/overlay/6dc6b96b1392e5302b63156fa093525e17131bef1203cad005a911ad09241f5a/diff/usr/share/ansible/roles/tripleo-kernel/tasks/kernelargs.yml
CEM-19093 When one of the HA master nodes go down, you might find contrail webUI broken and not accessible. As a workaround, restart the Contrail Web UI POD by using “oc delete <>” .
CEM-18979 The vRouter to vRouter encryption feature is beta quality and should be used for future product capability demonstrations only.
CEM-18999 In a heavily scaled datacenter with around 128 racks and 4000 VNs, 256k VMIs, if the Contrail Insights OpeSstack adapter is restarted, it might take around 4 hrs for it to re-sync with the API server.
CEM-18922 On DPDK compute, memory of the VMs are mapped to only one numa. VM creation fails after the hugepages in that numa are exhausted if it is launched with
hw:mem_page_size='any'
flavor. As a workaround, use thehw:mem_page_size='large'
flavor instead to avoid the issue.CEM-18909 In case of RHOSP16 deployment with TLS, XMPP connection down is seen post deployment completion. While this is a cosmetic issue and does not impact functionality, as a workaround, restart the vRouter agent container on all compute nodes to update status.
CEM-18864 After one of HA Master nodes failover (or) vrouter restart, you might observe further user PODs creation fails without getting IP address. As a workaround, find HA master nodes control-pods which are in sync with “new user-pod” and restart them. Perform the following steps:
Log in to 3 HA masters and find the crictl pod with name “control”. Log in to view the command output of “
curl --cert /etc/certificates/server-key-localhost --insecure https://localhost:8083/Snh_IFMapTableShowReq?table_name=virtual-machine
” which shows the names of the latest user-pod that failed.Restart those control PODs which are not sync.
CEM-18793 Canonical JuJu Contrail CNI (K8S) deployment using existing Keystone does not work with Version 2 Authorization policy definition.
CEM-18667 In a scaled Contrail Enterprise Multicloud cluster with around 4000 VNs, the command UI can take up to 8 minutes to display the cluster details.
CEM-18410 Some OpenShift System Pods may be seen “CrashLoopBackOff” due to “
Invalid configuration: unable to load OpenShift configuration: unable to retrieve authentication information for tokens: Post https://172.30.0.1:443/apis/authentication.k8s.io/v1/tokenreviews: dial tcp 172.30.0.1:443: connect: no route to host
” errors. This may be seen with Provisioning failure (or) may also be seen with Provisioning success. As a workaround, restart this pod using “oc delete pod <>” .CEM-18408 In DPDK1911 with X710 NIC performance degrades due to mbuf leak if txd and rxd are configured. Intel recommends configuring atleast 1K tx and rx descriptors on Fortville NICs for better and consistent performance, but they seem to have a degrading effect on X710 NIC.
CEM-18398 Contrail WebUI doesn’t work for System/Node status monitoring. As a workaround, check using CLI on the relevant nodes. This will not impact functionality.
CEM-18381 QFX5120 cannot be used as border leaf role in SP style for CRB role.
CEM-18349 Whenever there is an update received though rabbitmq for Physical Router, Physical Interface, nodes or ports objects, CVFM clears its local cache and reads all the VMIs, VNs, VPGs, Physical Routers, and so on objects again through Contrail API service. During this time, the API server can go unresponsive and recovers if the system has scaled objects.
CEM-18313 If router-VN is connected to master LR in a Contrail Enterprise Multicloud cluster, it might make DCI, PNF features to break.
CEM-18285 Image upgrade on a QFX-10008 device through Contrail Enterprise Multicloud does not work.
CEM-18251 In a stable setup, pod “contrail-control” restarts with “status-monitor” container errors/restarts. There is no functionality impact.
CEM-18195 For installing OpenShift4.x clusters with Contrail, enterprise grade disks are recommended. Preferably SSDs for servers hosting databases.
CEM-18193 In a scaled Contrail Enterprise Multicloud cluster, the job to delete devices might report failure, however the operation will be successful. Ignore the error and check if the devices are actually deleted.
CEM-18163 On a DPDK compute, if contrail-vrouter-agent crashes or if contrail-vrouter-agent is restarted in a scaled setup with many sub-interfaces, all the sub-interfaces and their parent interface may become inactive. As a workaround, stop / start the instances whose interfaces are down.
CEM-17991 In an OpenStack HA setup provisioned using Kolla and OpenStack Rocky, if you shut down all the servers at the same time and bring them up later, the Galera cluster fails. To recover the Galera cluster, follow these steps:
Edit the /etc/kolla/mariadb/galera.cnf file to remove the
wsrep
address on one of the controllers as shown here.wsrep_cluster_address = gcomm:// #wsrep_cluster_address = gcomm://10.x.x.8:4567,10.x.x.10:4567,10.x.x.11:4567
Note:If all the controllers are shut down in the managed scenario at the same time, you must select the controller that was shut down last.
Docker start mariadb on the controller on which you edited the file.
Wait for a couple of minutes, ensure that the mariadb container is not restarting, and then Docker start mariadb on the remaining controllers.
Restore the /etc/kolla/mariadb/galera.cnf file changes and restart the mariadb container on the previously selected controller.
CEM-17883 VLAN tag does not work with Mellanox CX5 cards with DPDK 19.11.
CEM-17866 Monitoring/Operations page crashes with "Cannot read property 'className' of undefined". As a workaround, refresh the page to display the content properly.
CEM-17648 In case of BMS to BMS EVPN “Transparent” service chaining, Tunneled packet sent out of Transparent service instance to QFX have vlan-id and hence Traffic from left-bms to right-bms gets dropped since the inner header of the tunneled packet has vlan-id info which is internal to vRouter and QFX is not aware of the vlan-id so the packet gets dropped by the switch.
CEM-17562 Under Security Groups, the entry appearing with __no_rule__ can be ignored.
CEM-15809 Updating VLAN-ID on a VPG in an enterprise style fabric is not supported. As a workaround, delete and recreate the fabric.
CEM-15764 In Octavia Load Balancer, traffic destined to the Floating IP of the load balancer VM does not get directed to the backend VMs. Traffic destined to the actual VM IP of the Load Balancer VM will work fine.
CEM-15561 vRouter offload with Mellanox NIC cards does not work. However the DPDK on Mellanox NICs without offload is supported.
CEM-14679 In fabric un-managed PNF use-case, some bogus static routes are pushed by DM under LR VRF on spines in case of CRB gateway.
As a workaround, change the value of the dummy_ip variable inside the device_manager docker. This line number below is based on the 2008 release code-base.
https://github.com/Juniper/contrail-controller/blob/R2008/src/config/fabric-ansible/
ansible-playbooks/filter_plugins/fabric.py#L2594
After changing the value to the desired subnet and saving the file, restart the DM docker to reflect the change. Note that, this step should be performed at the beginning before fabric onboarding.
CEM-14264 In release 2003, the Virtual Port Group create workflow will not pre-populate the VLAN-ID with the existing value that was defined with the first VPG for a given virtual network. The field is editable unlike in previous releases. This issue occurs in a fabric that was provisioned with the Fabric-wide VLAN-ID significance checkbox enabled.
CEM-13767 Though Contrail fabric manager has the ability for the user to use custom image names for the fabric devices, for platforms like QFX10000-60C which runs on vmhost-based platforms, while uploading the image to CFM, the image name should be chosen in junos-vmhost-install-x.tgz format.
CEM-13685 DPDK vRouter with MLNX CX5 takes about 10 minutes and also lcore crash is seen. This happens once during initial installation.
CEM-13380 AppFormix Flows does not show up for multi homed devices on the fabric
CEM-11163 In Fortville X710 NIC: With TX and RX buffers performance degrade is observed as mbufs gets exhausted.
CEM-10929 When Contrail Insights is querying LLDP table from a device through SNMP, if SNMP calls time out, Contrail Insights marks the device as invalidConfiguration and notifies the user to take a look. When the user verifies that snmpwalk is working and there are no network issues, click Edit and reconfigure that device from Settings > Network Devices to make Contrail Insights try to run LLDP discovery and add this device again.
CEM-9979 During upgrade of DPDK computes deployed with OOO Heat Templates in RHOSP environment, vRouter coredumps are observed. This is due to the sequence in which the services are started during upgrade and does not have impact on cluster operation.
CEM-8701 Onboarding of multiple BMS in parallel on SP-style fabric does not work. While bringing up a BMS using the Life Cycle Management workflow, sometimes on faster servers the re-image does not go through and instance not moved from ironic vn to tenant vn. This is because if the PXE boot request from the BMS is sent before the routes are converged between the BMS port and the TFTP service running in Contrail nodes. As a workaround, the servers can be rebooted or the BIOS in the servers can be configured to have a delayed boot.
CEM-8149 BMS LCM with fabric set with enterprise_style=True is not supported. By default, enterprise_style is set to False. Avoid using enterprise_style=True if the fabric object onboards the BMS LCM instance.
CEM-5141 For deleting compute nodes, the UI workflow will not work. Instead, update the instances.yaml with “ENABLE_DESTROY: True” and “roles:” (leave it empty) and run the following playbooks.
ansible-playbook -i inventory/ -e orchestrator=openstack --tags nova playbooks/install_openstack.yml ansible-playbook -i inventory/ -e orchestrator=openstack playbooks/install_contrail.yml
For example:
global_configuration: ENABLE_DESTROY: True ... ... instances: ... ... srvr5: provider: bms ip: 19x.xxx.x.55 roles: ... ...
CEM-5043 VNI update on a LR doesnt update the RouteTable. As a workaround, delete the LogicalRouter and create a new LogicalRouter with the new VNI.
CEM-4370 Additional links cannot be appended to service templates used to create PNF service chaining. If there is a need to add additional links, the service template needs to be deleted and re-added again.
CEM-4358 In Contrail fabric deployments configuring QFX5110 as spine (CRB-Gateway) does not work.
CEM-3959 BMS movement across TORs is not supported. To move BMS across TORs the whole VPG needs to be moved. That means if there are more than one BMS associated to one VPG, and one of the BMS need to be moved, the whole VPG need to be deleted and re-configured as per the new association.
CEM-3245 Multicast traffic originated from type-6 incapable QFX devices are duplicated by vRouters.
JCB-187287 High Availability provisioning of Kubernetes master is not supported.
JCB-184776 When the vRouter receives the head fragment of an ICMPv6 packet, the head fragment is immediately enqueued to the assembler. The flow is created as hold flow and then trapped to the agent. If fragments corresponding to this head fragment are already in the assembler or if new fragments arrive immediately after the head fragment, the assembler releases them to flow module. Fragments get enqueued in the hold queue if agent does not write flow action by the time the assembler releases fragments to the flow module. A maximum of three fragments are enqueued in the hold queue at a time. The remaining fragments are dropped from the assembler to the flow module.
As a workaround, the head fragment is enqueued to assembler only after flow action is written by agent. If the flow is already present in non-hold state, it is immediately enqueued to assembler.
JCB-177787 In DPDK vRouter use cases such as SNAT and LBaaS that require netns, jumbo MTU cannot be set. Maximum MTU allowed: <=1500.