Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Pod Scheduling

SUMMARY Juniper Cloud-Native Contrail Networking (CN2) release 23.1 supports network-aware pod scheduling using contrail-scheduler. This feature enhances the Kubernetes pod scheduler with plugins that analyze the network metrics of a node before scheduling pods. This article provides overview, implementation, and deployment information about network-aware pod scheduling.

Pod Scheduling in Kubernetes

In Kubernetes, scheduling refers to the process of matching pods to nodes so that the kubelet is able to run them. A scheduler monitors requests for pod creation and attempts to assign these pods to suitable nodes using a series of extension points during a scheduling and binding cycle. Potential nodes are filtered based on attributes like the resource requirements of a pod. If a node doesn't have the available resources for a pod, that node is filtered out. If more than one node passes the filtering phase, Kubernetes scores and ranks the remaining nodes based on their suitability for a given pod. The scheduler assigns a pod to the node with the highest ranking. If two nodes have the same score, the scheduler picks a node at random.

Pod Scheduling in CN2

CN2 release 22.4 enhanced the default Kubernetes pod scheduler to schedule pods based on the Virtual Machine Interface (VMI) considerations of DPDK nodes. This enhanced scheduler, called contrail-scheduler, supports custom plugins that enable the scheduling of pods based on current active VMIs in a DPDK node.

CN2 release 23.1 improves on this feature by supporting two additional plugins. As a result of these plugins, contrail-scheduler schedules pods based on the following network metrics:

  • Number of active ingress/egress traffic flows

  • Bandwidth utilization

  • Number of virtual machine interfaces (VMIs)

Network-Aware Pod Scheduling Overview

Many high-performance applications have bandwidth or network interface requirements as well as the typical CPU or VMI requirements. If contrail-scheduler assigns a pod to a node with low bandwidth availability, that application cannot run optimally. CN2 release 23.1 addresses this issue with the introduction of a metrics collector, a central collector, and custom scheduler plugins. These components collect, store, and process network metrics so that the contrail-scheduler schedules pods based on these metrics.

Network-Aware Pod Scheduling Components

The following main components comprise CN2's network-aware pod scheduling solution:

  • Metrics collector: This runs in a container alongside the vRouter pod that runs on each node in the cluster. . The vRouter agent sends metrics data to the metrics collector over localhost: 6700 specified in the agent: default: collectors field of the vRouter CR Deployment. The metrics collector then forwards requested data to configured sinks which are specified in the configuration. The central collector is one of the configured sinks and recieves this data from the metrics collector.

  • Central collector: This component acts as an aggregator and stores data received from all of the nodes in a cluster via the metrics collector. The central collector exposes gRPC endpoints which consumers use to to request this data for nodes in a cluster. For example, the contrail-scheduler uses these gRPC endpoints to retrieve and process network metrics and schedule pods accordingly.

  • Contrail scheduler: This custom scheduler introduces the following three custom plugins:

    • VMICapacity plugin (available from release 22.4 onwards): Implements Filter, Score, and NormalizeScore extension points in the scheduler framework. The contrail-scheduler uses these extension points to determine the best node to assign a pod to based on active VMIs.

    • FlowsCapacity plugin: Determines the best node to schedule a pod based on the number of active flows in a node. Too many traffic flows on a node means more competition for new pod traffic. Pods and nodes with a lower flow count are ranked higher by the scheduler.
    • BandwidthUsage plugin: Determines the best node to assign a pod based on the bandwidth usage of a node. The node with the least bandwidth usage (ingoing and outgoing traffic) per second is ranked highest.

      Note:

      Depending on the configured plugins, each plugin sends out scores to the scheduler. The scheduler takes the weighted scores from from all of the plugins and finds the best node to schedule a pod.

Deploy Network-Aware Pod Scheduling Components

See the following sections for information about deploying the components for network-aware pod scheduling:

Metrics Collector Deployment

Central Collector Deployment

Contrail Scheduler Deployment

Metrics Collector Deployment

CN2 includes the metrics collector in vRouter pod deployments by default. The agent: default: field of the vRouter spec contains a collectors: field which is configured with the metric collector reciever address. The example below shows the value collectors: - localhost: 6700. Since the the metrics collector runs in the same pod as the vRouter agent, it can communicate over the localhost port. Note that port 6700 is fixed as the metrics collector reciever address and cannot be changed. The vRouter agent sends metrics data to this address.

The following is a section of a default vRouter deployment with the collector enabled:

Central Collector Deployment

The central collector Deployment object must always have a replica count set to 1. The following Deployment section shows an example:

A configMap provides key-value configuration data to the pods in your cluster. Create a configMap for the central collector configuration. This configuration is mounted in the container.

The following is an example of a central collector config file:

This config file contains the following fields:

  • http_port: Specifies the port that the central collector gRPC service runs on.

  • tls_config: Specifies what server_name and key_file the central collector service is associated with. This field contains upstream (northbound API) server information.

  • service_name: Specifies the name of the service the central collector exposes. In this case, central-collector.contrail is exposed as a service on top of the central collector Deployment. Consumers within the cluster can interact with the central collector using this service name.

  • metric_configmap: The fields in this section designate the details of the metrics collector configMap. Central collector uses this information to configure a metrics-collector sink with the required metrics the sink wants to receive. The following is a sample command to create a configMap:

The following is an example of a central collector Deployment:

Note:

Verify the volume and volumeMounts fieds before deploying.

The central collector service is exposed on top of the Deployment object. The following YAML file is an example of a central collector service file:

Note:

The name field must match the service name specified in the central collector configuration. The namespace must match the namespace of the central collector Deployment. For example, namespace: contrail.

Contrail Scheduler Deployment

Perform the following steps to deploy the contrail-scheduler:

  • Create a namespace for the contrail-scheduler.
  • Create a ServiceAccount object (required) and configure the cluster roles for the ServiceAccount. A ServiceAccount assigns a role to a pod or component within a cluster. In this case, the fields kind: ClusterRole and name: system:kube-scheduler grant the contrail-scheduler ServiceAccount the same permissions as the default Kubernetes scheduler (kube-scheduler).

  • Create a configMap for the VMI plugin configuration. You must create the configMap within the same namespace as the contrail-scheduler Deployment.

    The following is an example of a VMI plugin config:

  • Create a Secret for the kubeconfig file. This file is then mounted in the contrail-scheduler Deployment. Secrets store confidential data as files in a mounted volume or as a container environment variable.

  • Create a configMap for the contrail-scheduler config.

    The following is an example of a scheduler config:

    Note the following fields:

    • schedulerName: The name of the scheduler you want to deploy.

    • pluginConfig: Contains information about the plugins included in the contrail-scheduler deployment. The deployment includes the following plugins:

      • VMICapacity

      • FlowsCapacity

      • BandwidthUsage

    • config: This field contains the filepath where the VMI plugin config is mounted.

    • multiPoint: You can enable extension points for each of the included plugins. Instead of having to enable specific extension points for a plugin, the multiPoint field let's you enable or disable all of the extension points that are developed for a given plugin. The weights of a plugin decide the priority of a particular score from a plugin. This means that at the end of scoring, all of the plugins send out a weighted score. A pod is scheduled on a node with the highest aggregated score.

  • Create a contrail-scheduler Deployment. The following is an example of a Deployment:

After you apply this Deployment, the new contrail-scheduler is active.

Use the Contrail Scheduler to Deploy Pods

Enter the name of your contrail-scheduler to the schedulerName field to use the contrail-scheduler to schedule (deploy) new pods. The following is an example of a pod manifest with the schedulerName defined: