Deploy DPDK vRouter for Optimal Container Networking
DPDK Overview
Cloud-Native Contrail® Networking™ supports the Data Plane Development Kit (DPDK). DPDK is an open-source set of libraries and drivers for rapid packet processing. Cloud-Native Contrail Networking accelerates container networking with DPDK vRouter technology. DPDK enables fast packet processing by allowing network interface cards (NICs) to send direct memory access (DMA) packets directly into an application’s address space. This method of packet routing lets the application poll for packets, which prevents the overhead of interrupts from the NIC.
Utilizing DPDK enables the Cloud-Native Contrail vRouter to process more packets per second than it could when running as a kernel module DPDK interface for container service functions. Cloud-Native Contrail Networking leverages the processing power of the DPDK vRouter to power high-demand container service functions.
When you provision a Contrail compute node with DPDK, the corresponding YAML file specifies the:
-
Number of CPU cores to use for forwarding packets.
-
Number of huge pages to allocate for DPDK.
-
UIO driver to use for DPDK.
DPDK vRouter Support for DPDK and Non-DPDK Workloads
When a container or pod needs access to the DPDK vRouter, the following workload types occur:
- Non-DPDK workload (pod): This workload includes non-DPDK pod applications that are unaware of the underlying DPDK vRouter. These applications are not designed for DPDK and do not use DPDK capabilities. In Cloud-Native Contrail Networking, this workload type functions normally in a DPDK vRouter-enabled cluster.
- Containerized DPDK workload: These workloads are built on the DPDK platform. DPDK interfaces are brought up using vHost protocol, which acts as a datapath for management and control functions. Pods act as the vHost Server, and the underlying DPDK vRouter acts as the vHost Client.
- Mix of Non-DPDK and DPDK workloads: The management or control channel on an application in this pod might be non-DPDK (Veth pair), and the datapath might be a DPDK interface.
Non-DPDK Pod Overview
A virtual ethernet (Veth) pair plumbs the networking of a non-DPDK pod. One end of the Veth pair attaches to the pod's namespace. The other end attaches to the kernel of the host machine. The Container Networking Interface (CNI) establishes the Veth pair and allocates IP addresses using IP Address Management (IPAM).
DPDK Pod Overview
A DPDK pod contains a vhost interface and a virtio interface. The pod uses the vhost interface for management purposes and the virtio interface for high-throughput packet processing applications. A DPDK application in the pod uses the vhost protocol to establish communication with the DPDK vRouter in the host. The DPDK application receives an argument to establish a filepath for a UNIX socket. The vRouter uses this socket to establish the control channel, run negotiations, and create vrings over huge pages of shared memory for high-speed datapaths.
Mix of Non-DPDK and DPDK Pod Overview
This pod might contain non-DPDK and DPDK applications. A non-DPDK application uses a non-DPDK interface (Veth pair), and the DPDK application uses the DPDK interfaces (vhost, virtio). These two workloads occur simultaneously.
DPDK vRouter Architecture
The Contrail DPDK vRouter is a container that runs inside the Contrail compute node. The vRouter runs as either a Linux kernel module or a user space DPDK process. The vRouter is responsible for transmitting packets between virtual workloads (tenants, guests) on physical devices. The vRouter also transmits packets between virtual interfaces and physical interfaces.
The Cloud-Native Contrail vRouter supports the following encapsulation protocols:
- MPLS over UDP (MPLSoUDP)
- MPLS over GRE (MPLSoGRE)
- Virtual Extensible LAN (VXLAN)
Compared with the traditional Linux kernel deployment, deploying the vRouter as a user space DPDK process drastically increases the performance and processing speed of the vRouter application. This increase in performance is the result of the following factors:
- The virtual network functions (VNFs) operating in user space are built for DPDK and designed to take advantage of DPDK’s packet processing power.
- DPDK's poll mode drivers (PMDs) use the physical interface (NIC) of a VM's host instead of the Linux kernel's interrupt-based drivers. The NIC’s registers operate in user space, which makes them accessible by DPDK’s PMDs.
As a result, the Linux OS does not need to manage the NIC's registers. This means that the DPDK application manages all packet polling, packet processing, and packet forwarding of a NIC. Instead of waiting for an I/O interrupt to occur, a DPDK application constantly polls for packets and processes these packets immediately upon receiving them.
DPDK Interface Support for Containers
The benefits and architecture of DPDK usually optimize VM networking. Cloud-Native Contrail Networking lets your Kubernetes containers take full advantage of these features. In Kubernetes, a containerized DPDK pod typically contains two or more interfaces. The following interfaces form the backbone of a DPDK pod:
- Vhost user protocol (for management): The vhost user protocol is a backend
component that interfaces with the host. In Cloud-Native Contrail Networking, the vhost
interface acts as a datapath for management and control functions between the pod and
vRouter. This protocol comprises the following two planes:
- The control plane exchanges information (memory mapping for DMA, capability negotiation for establishing and terminating the data plane) between a pod and vRouter through a Unix socket.
- The data plane is implemented through direct memory access and transmits data packets between a pod and vRouter.
- Virtio interface (for high-throughput applications): At a high level, virtio is a virtual device that transmits packets between a pod and vRouter. The virtio interface is a shared memory (shm) solution that lets pods access DPDK libraries and features.
These interfaces enable the DPDK vRouter to transmit packets between pods. The interfaces give pods access to advanced networking features provided by the vRouter (huge pages, lockless ring buffers, poll mode drivers). For more information about these features, visit A journey to the vhost-users realm.
Applications use DPDK to create vhost and virtio interfaces. The application or pod then uses DPDK libraries directly to establish control channels using Unix domain sockets. The interfaces establish datapaths between a pod and vRouter using shared memory vrings.
DPDK vRouter Host Prerequisites
In order to deploy a DPDK vRouter, you must configure the following huge pages and NICs on the host node:
- Huge pages configuration: Specify the percentage of host memory to be reserved for the
DPDK huge pages. The following command line shows huge pages set at
2MB:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 default_hugepagesz=2M hugepagesz=2M hugepages=8192"
The following example allocates four 1GB huge pages and 1024 2MB huge pages:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0 default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024"
Note:We recommend that you allocate 1GB for the huge pages size.
- Enable
(input-output
memory management
unit
(IOMMU): DPDK applications require IOMMU support. Configure IOMMU
settings and enable IOMMU from the BIOS. Apply the following flags as boot parameters to
enable
IOMMU:
"intel_iommu=on iommu=pt"
- Ensure
that the Kernel driver is loaded onto Port Forward 0 (port 0) of the host's NIC. Ensure
that DPDK PMD drivers are loaded onto Port Forward 1 (port 1) of the host's NIC. Note:
In an environment where both DPDK and kernel drivers use different ports of a common NIC, we strongly recommend that you deploy a DPDK node with kernel drivers bound to port 0 on the NIC. We further recommend that you deploy DPDK PMD drivers bound to port 1 of that NIC. Other port assignment configurations might cause performance issues. For more information, see section 24.9.11 of the following DPDK documentation: I40E Poll Mode Driver.
- PCI driver (vfio-pci, uio_pci_generic): Specify which PCI driver to use based on NIC
type. Note:
The vfio-pci is built-in.
-
uio_pci_generic
- Manually install the uio_pci_generic module if
needed:
root@node-dpdk1:~# apt install linux-modules-extra-$(uname -r)
- Verify that the uio_pci_generic module is
installed:
root@node-dpdk1:~# ls /lib/modules/5.4.0-59-generic/kernel/drivers/uio/ uio.ko uio_dmem_genirq.ko uio_netx.ko uio_pruss.ko uio_aec.ko uio_hv_generic.ko 'uio_pci_generic.ko' uio_sercos3.ko uio_cif.ko uio_mf624.ko uio_pdrv_genirq.ko
- Manually install the uio_pci_generic module if
needed:
-
Deploy a Kubernetes Cluster with DPDK vRouter in Compute Node
Cloud-Native Contrail Networking utilizes a DPDK deployer to launch a Kubernetes cluster with DPDK compatibility. This deployer performs lifecycle management functions and applies DPDK vRouter prerequisites. A custom resource (CR) for the DPDK vRouter is a subset of the deployer. The CR contains the following:
-
Controllers for deploying Cloud-Native Contrail Networking resources
-
Built-in controller logic for the vRouter
Apply the DPDK deployer YAML
file,
and deploy the DPDK vRouter CR with agentModeType: dpdk
using the following command:
kubectl apply -f <vrouter_cr.yaml>
After applying the CR YAML file, the deployer creates a daemonset for the vRouter. This deamonset spins up a pod with a DPDK container.
If you get an error message, ensure that your cluster has the custom resource definition (CRD) for the vRouter using the following command:
kubectl get crds
The following is an example of the output you receive:
NAME CREATED AT vrouters.dataplane.juniper.net 2021-06-16T16:06:34Z
If no CRD is present in the cluster, check the deployer using the following command:
kubectl get deployment contrail-k8s-deployer -n contrail-deploy -o yaml
Check the image used by the contrail-k8s-crdloader
container. This image should be the latest image
the
deployer
uses. Update the image and ensure that your new pod uses this image.
After you verify that your new pod is running the latest image, use the following command to verify that the CRD for the vRouter is present:
kubectl get crds
After you verify that the CRD for the vRouter is present, use the following command to apply the vRouter CR:
kubectl apply -f <vrouter_cr.yaml>
DPDK vRouter Custom Resource Settings
You can configure the following settings of the vRouter's CR:
service_core_mask
: Specify a service core mask. The service core mask enables you to dynamically allocate CPU cores for services.You can enter the following input formats:
-
Hexadecimal (for example, 0xf)
-
List of CPUs separated by commas (for example, 1,2,4)
-
Range of CPUs separated by a dash (for example, 1-4)
Note:PMDs require the bulk of your available CPU cores for packet processing. As a result, we recommend that you reserve a maximum of 1 to 2 CPU cores for
service_core_mask
anddpdk_ctrl_thread_mask
. These two cores share CPU power.-
cpu_core_mask
: Specify a CPU core mask. DPDK's PMDs use these cores for high-throughput packet-processing applications.The following are supported input formats:
-
Hexadecimal (for example, 0xf)
-
List of CPUs separated by commas (for example, 1,2,4)
-
Range of CPUs separated by a dash (for example, 1-4)
-
dpdk_ctrl_thread_mask
: Specify a control thread mask. DPDK uses these core threads for internal processing.The following are supported input formats:
-
Hexadecimal (for example, 0xf)
-
List of CPUs separated by commas (for example, 1,2,4)
-
Range of CPUs separated by a dash (for example, 1-4)
Note:PMDs require the bulk of your available CPU cores for packet processing. As a result, we recommend that you reserve a maximum of 1 to 2 CPU cores for
service_core_mask
anddpdk_ctrl_thread_mask
. These two cores share CPU power.-
dpdk_command_additional_args
: Specify DPDK vRouter settings that are not default settings. Arguments that you enter here are appended to the DPDK PMD command line.The following is an example argument:
.--yield_option 0