Pod Scheduling for Multi-cluster Deployments

SUMMARY Juniper Cloud-Native Contrail Networking (CN2) release 23.2 supports network-aware pod scheduling for multi-cluster deployments. This article provides information about the changes to this feature from the previous release (23.1), and additional components for multi-cluster support.

Pod Scheduling in CN2

Note:

This article provides information about network-aware pod scheduling for multi-cluster deployments. For information about this feature for single-cluster deployments, see Pod Scheduling.

Previous releases of CN2 enhanced the default Kubernetes pod scheduler to support custom plugins. This enhanced scheduler, called contrail-scheduler, schedules pods based on the following network metrics:

Number of active ingress/egress traffic flows
Bandwidth utilization
Number of virtual machine interfaces (VMIs)

Network-Aware Pod Scheduling Overview

Many high-performance applications have bandwidth or network interface requirements as well as the typical CPU or VMI requirements. If contrail-scheduler assigns a pod to a node with low bandwidth availability, that application cannot run optimally. CN2 release 23.1 addressed this issue with the introduction of a metrics-collector, a central-collector, and custom scheduler plugins. These components collect, store, and process network metrics so that the contrail-scheduler schedules pods based on these metrics. CN2 release 23.2 enables multi-cluster deployments to take advantage of this feature.

Custom Controlllers

In 23.2, CN2 introduces a metrics controller CR and a central collector CR. The following two new controllers manage the CRs:

MetricsConfig controller: This controller reconciles and monitors the MetricsConfig custom resource (CR). This controller writes config information for the metrics-collectors in the central and distributed clusters. This controller also listens to create, read, update, delete (CRUD) events of kubemanager resources on the central cluster. The config file of a metrics-collector contains the following information:
- The writeLocation field references the metrics-collector configMap details for mounting. The name, namespace, and key of the metrics-collector's configMap for writing it's configuration.
- The receiving service's IP address and port. The metrics-collector uses this service IP and port to forward requested metrics data.
- Configuration data regarding Transport Layer Security TLS (if required).
- A metrics section which defines the types of metrics the receiving service is requesting.
The following is an example of a metrics-collector config:
CentralCollector controller: This controller also reconciles and monitors the central collector CR. The CentralCollector controller manages the LCM of the central-collector. Additionally, it creates a MetricsConfig CR, which then configures the metrics-collector configuration. The CentralCollector controller also creates the clusterIP service for the central-collector. Unlike 23.1, you do not need to create a central-collector Deployment or configMap. Create and apply the CR and the central-collector controller manages all configuration, deployment, and LCM functions.

multi-cluster Pod Scheduling Components

Aside from the CentralCollector controller and the MetricsConfig controller, the following components comprise CN2's network-aware pod scheduling solution for multi-cluster deployments:

Metrics-collector: This component runs in a container alongside the vRouter pod that runs on each node in the central cluster and distributed clusters. The metrics-collector then forwards requested data to configured sinks which are specified in the configuration. The central collector is one of the configured sinks and recieves this data from the metrics collector. This release adds an additional field in the config file of the metrics collector. This field designates a cluster name, specifying which cluster the metrics collector is collecting data from and will send the same in the metadata to the reciever.
Central-collector: This component acts as an aggregator and stores data received from all of the nodes in a cluster via the metrics collector. The central collector exposes gRPC endpoints which consumers use to request this data for nodes in a cluster.
Contrail-scheduler: This custom scheduler introduces the following three custom plugins:
- VMICapacity plugin (available from release 22.4 onwards): Implements Filter, Score, and NormalizeScore extension points in the scheduler framework. The contrail-scheduler uses these extension points to determine the best node to assign a pod to based on active VMIs.
- FlowsCapacity plugin: Determines the best node to schedule a pod based on the number of active flows in a node. Too many traffic flows on a node means more competition for new pod traffic. Pods and nodes with a lower flow count are ranked higher by the scheduler.
- BandwidthUsage plugin: Determines the best node to assign a pod based on the bandwidth usage of a node. The node with the least bandwidth usage (ingoing and outgoing traffic) per second is ranked highest.

Metrics Collector Deployment

CN2 includes the metrics collector in vRouter pod deployments by default. The agent: default: field of the vRouter spec contains a collectors: field which is configured with the metric collector reciever address. The example below shows the value collectors: - localhost: 6700. Since the the metrics collector runs in the same pod as the vRouter agent, it can communicate over the localhost port. Note that port 6700 is fixed as the metrics collector reciever address and cannot be changed. The vRouter agent sends metrics data to this address.

The following is a section of a default vRouter deployment with the collector enabled:

Create Permissions

After you configure the vRouter to send metrics to the metrics-collector over port 6700, you must apply a Role-Based Access (RBAC) manifest. Applying this manifest creates required permissions for the contrail-telemetry-controller. The contrail-telemetry-controller reconciles the MetricsConfig CR and creates the configMap for the metrics-collector.

The following is an example RBAC manifest. Note that this manifest also creates the namespace (contrail-analytics) for the metrics-collector.

After applying the RBAC manifest, you must create a Deployment for the contrail-telemetry-controller. The following is an example Deployment.

Central Collector Deployment

Apply the CentralCollector CR. Once applied, the CentralCollector controller creates all of the necessary objects for the central-collector. The following is an example CentralCollector CR.

Create Configmaps

Perform the following steps to create configMaps for multi-cluster pod-scheduling components.

Note:

Perform these steps in each of the clusters of your multi-cluster environment.

Create a configMap called cluster-details:

Applying the CentralCollector CR in the previous section also creates a configMap called cluster-details. You must replicate this configMap in the same namespace where you intend to deploy contrail-scheduler. The CR creates this configMap in the same namespace as the CR. The configMap includes the following information:
- Central-collector's service clusterIP
- Metrics gRPC service port: Used by the contrail-scheduler to retrieve and process network metrics and schedule pods accordingly.
- Name of the cluster where the configMap is located: Used to identify the cluster where contrail-scheduler is running.
The following is an example cluster-details configMap:
Create a vmi-config configMap: This configMap defines the maximum VMI count allowed on DPDK nodes. The following is an example configMap:

Create a configMap for the contrail-scheduler: This configMap defines the contrail-scheduler configuration. The following is an example configMap with VMICapacity, FlowsCapacity, and BandwidthUsage pluigin information:

Create a ServiceAccount object (required) and configure the ClusterRoles for the ServiceAccount. A ServiceAccount assigns a role to a pod or component within a cluster. In the example below, the ServiceAccount assigns the same permissions as the default Kubernetes scheduler (kube-scheduler). The following is an example ServiceAccount:

The following is an example ClusterRoles object:

Create a kubeconfig Secret. Mount the kubeconfig file within the scheduler container when you apply the contrail-scheduler Deployment.

Create a Deployment for the contrail-scheduler. The following is an example Deployment:

Note:

When creating Secrets or configMaps, ensure that the key used during the creation matches the key used when mounting them in the Deployment. For example, in the Deployment above, the path /tmp/scheduler/scheduler-config is provided in the command section, and /tmp/scheduler is provided in the volume mount section. In this case, the key is "scheduler-config". If the keys do not match, you will need to explicitly specify a custom key in the volume mounts for custom file names. If no file name is given explicitly, use "config.yaml" as the key name.

ON THIS PAGE

Pod Scheduling for Multi-cluster Deployments

Pod Scheduling in CN2

Network-Aware Pod Scheduling Overview

Custom Controlllers

multi-cluster Pod Scheduling Components

Metrics Collector Deployment

Create Permissions

Central Collector Deployment

Create Configmaps