ON THIS PAGE
Pod Scheduling
SUMMARY Juniper Cloud-Native Contrail Networking (CN2) release 23.1 supports network-aware pod
scheduling using contrail-scheduler
. This feature enhances the Kubernetes pod
scheduler with plugins that analyze the network metrics of a node before scheduling pods. This
article provides overview, implementation, and deployment information about network-aware pod
scheduling.
Pod Scheduling in Kubernetes
In Kubernetes, scheduling refers to the process of matching pods to nodes so that the kubelet is able to run them. A scheduler monitors requests for pod creation and attempts to assign these pods to suitable nodes using a series of extension points during a scheduling and binding cycle. Potential nodes are filtered based on attributes like the resource requirements of a pod. If a node doesn't have the available resources for a pod, that node is filtered out. If more than one node passes the filtering phase, Kubernetes scores and ranks the remaining nodes based on their suitability for a given pod. The scheduler assigns a pod to the node with the highest ranking. If two nodes have the same score, the scheduler picks a node at random.
Pod Scheduling in CN2
CN2 release 22.4 enhanced the default Kubernetes pod scheduler to schedule pods based on
the Virtual Machine Interface (VMI) considerations of DPDK nodes. This enhanced scheduler,
called contrail-scheduler
, supports custom plugins that enable the
scheduling of pods based on current active VMIs in a DPDK node.
CN2 release 23.1 improves on this feature by supporting two additional plugins. As a result
of these plugins, contrail-scheduler
schedules pods based on the following
network metrics:
-
Number of active ingress/egress traffic flows
-
Bandwidth utilization
-
Number of virtual machine interfaces (VMIs)
Network-Aware Pod Scheduling Overview
Many high-performance applications have bandwidth or network interface requirements as well
as the typical CPU or VMI requirements. If contrail-scheduler
assigns a pod
to a node with low bandwidth availability, that application cannot run optimally. CN2
release 23.1 addresses this issue with the introduction of a metrics collector, a central
collector, and custom scheduler plugins. These components collect, store, and process
network metrics so that the contrail-scheduler
schedules pods based on
these metrics.
Network-Aware Pod Scheduling Components
The following main components comprise CN2's network-aware pod scheduling solution:
-
Metrics collector: This runs in a container alongside the vRouter pod that runs on each node in the cluster. . The vRouter agent sends metrics data to the metrics collector over
localhost: 6700
specified in theagent:
default:
collectors
field of the vRouter CRDeployment
. The metrics collector then forwards requested data to configured sinks which are specified in the configuration. The central collector is one of the configured sinks and recieves this data from the metrics collector. -
Central collector: This component acts as an aggregator and stores data received from all of the nodes in a cluster via the metrics collector. The central collector exposes gRPC endpoints which consumers use to to request this data for nodes in a cluster. For example, the
contrail-scheduler
uses these gRPC endpoints to retrieve and process network metrics and schedule pods accordingly. -
Contrail scheduler: This custom scheduler introduces the following three custom plugins:
-
VMICapacity
plugin (available from release 22.4 onwards): Implements Filter, Score, andNormalizeScore
extension points in the scheduler framework. Thecontrail-scheduler
uses these extension points to determine the best node to assign a pod to based on active VMIs. FlowsCapacity
plugin: Determines the best node to schedule a pod based on the number of active flows in a node. Too many traffic flows on a node means more competition for new pod traffic. Pods and nodes with a lower flow count are ranked higher by the scheduler.-
BandwidthUsage
plugin: Determines the best node to assign a pod based on the bandwidth usage of a node. The node with the least bandwidth usage (ingoing and outgoing traffic) per second is ranked highest.Note:Depending on the configured plugins, each plugin sends out scores to the scheduler. The scheduler takes the weighted scores from from all of the plugins and finds the best node to schedule a pod.
-
Deploy Network-Aware Pod Scheduling Components
See the following sections for information about deploying the components for network-aware pod scheduling:Metrics Collector Deployment
CN2 includes the metrics collector in vRouter pod deployments by default. The
agent:
default:
field of the vRouter spec contains a collectors:
field which is configured with the metric collector reciever address. The example below
shows the value collectors: - localhost: 6700
. Since the the metrics
collector runs in the same pod as the vRouter agent, it can communicate over the
localhost
port. Note that port 6700 is fixed as the metrics collector
reciever address and cannot be changed. The vRouter agent sends metrics data to this
address.
The following is a section of a default vRouter deployment with the collector enabled:
apiVersion: dataplane.juniper.net/v1 kind: Vrouter metadata: name: contrail-vrouter-nodes namespace: contrail spec: agent: default: collectors: - localhost:6700 xmppAuthEnable: true sandesh: introspectSslEnable: true
Central Collector Deployment
The central
collector Deployment object must always have a replica count set to 1. The following Deployment
section shows an
example:
spec: selector: matchLabels: component: central-collector replicas: 1 template: metadata: labels: component: central-collector
configMap
provides key-value configuration data to the pods in your cluster. Create a
configMap
for the central collector configuration. This configuration is
mounted in the container.The following is an example of a central collector config file:
http_port: 9090 tls_config: key_file: /etc/config/server.key cert_file: /etc/config/server.crt ca_file: /etc/config/ca.crt service_name: central-collector.contrail metric_configmap: name: mc_configmap namespace: contrail key: config.yaml
This config file contains the following fields:
-
http_port
: Specifies the port that the central collector gRPC service runs on. -
tls_config
: Specifies whatserver_name
andkey_file
the central collector service is associated with. This field contains upstream (northbound API) server information. -
service_name
: Specifies the name of the service the central collector exposes. In this case,central-collector.contrail
is exposed as a service on top of the central collectorDeployment
. Consumers within the cluster can interact with the central collector using this service name. -
metric_configmap
: The fields in this section designate the details of the metrics collectorconfigMap
. Central collector uses this information to configure a metrics-collector sink with the required metrics the sink wants to receive. The following is a sample command to create aconfigMap
:kubectl create cm -n contrail central-collector-config –from-file=config.yaml=<path-to-config-file>
The following is an example of a central collector Deployment
:
apiVersion: apps/v1 kind: Deployment metadata: name: central-collector namespace: contrail labels: app: central-collector spec: replicas: 1 selector: matchLabels: app: central-collector template: metadata: labels: app: central-collector spec: securityContext: fsGroup: 2000 runAsGroup: 3000 runAsNonRoot: true runAsUser: 1000 containers: - name: contrail-scheduler image: enterprise-hub.juniper.net/contrail-container-prod/central-collector:latest command: - /central-collector - --kubeconfig=/tmp/config/kubeconfig - --config=/etc/central-collector/config.yaml imagePullPolicy: Always volumeMounts: - mountPath: /tmp/config name: kubeconfig readOnly: true - mountPath: /etc/central-collector name: central-collector-config readOnly: true - mountPath: /etc/config/tls name: tls readOnly: true volumes: - name: kubeconfig secret: secretName: cc-kubeconfig - name: tls secret: secretName: central-collector-tls - name: central-collector-config configMap: name: central-collector-config
Verify the volume
and volumeMounts
fieds before
deploying.
The central collector service is exposed on top of the Deployment
object. The following YAML file is an example of a central collector service file:
apiVersion: v1 kind: Service metadata: name: central-collector namespace: contrail spec: selector: component: central-collector ports: - name: grpc port: <port-as-per-config> - name: json protocol: TCP port: 10000
The name
field must match the service name specified in the central
collector configuration. The namespace must match the namespace of the central collector
Deployment. For example, namespace: contrail
.
Contrail Scheduler Deployment
Perform the following steps to deploy the contrail-scheduler
:
- Create a namespace for the
contrail-scheduler
.kubectl create ns contrail-scheduler
-
Create a
ServiceAccount
object (required) and configure the cluster roles for theServiceAccount
. AServiceAccount
assigns a role to a pod or component within a cluster. In this case, the fieldskind: ClusterRole
andname: system:kube-scheduler
grant thecontrail-scheduler
ServiceAccount
the same permissions as the default Kubernetes scheduler (kube-scheduler
). -
Create a
configMap
for the VMI plugin configuration. You must create theconfigMap
within the same namespace as thecontrail-scheduler
Deployment
.kubectl create configmap vmi-config -n contrail-scheduler --from-file=vmi-config=<path-to-vmi-config>
The following is an example of a VMI plugin config:
nodeLabels: "test-agent-mode": "dpdk" maxVMICount: 64 address: "central-collector.contrail:9090"
-
Create a
Secret
for thekubeconfig
file. This file is then mounted in thecontrail-scheduler
Deployment
.Secrets
store confidential data as files in a mounted volume or as a container environment variable.kubectl create secret generic kubeconfig -n contrail-scheduler --from-file=kubeconfig=<path-to-kubeconfig-file>
Create a
configMap
for thecontrail-scheduler
config.kubectl create configmap scheduler-config -n contrail-scheduler --from-file=scheduler-config=<path-to-scheduler-config>
The following is an example of a scheduler config:
apiVersion: kubescheduler.config.k8s.io/v1beta3 clientConnection: acceptContentTypes: "" burst: 100 contentType: application/vnd.kubernetes.protobuf kubeconfig: /tmp/config/kubeconfig qps: 50 enableContentionProfiling: true enableProfiling: true kind: KubeSchedulerConfiguration leaderElection: leaderElect: false profiles: - schedulerName: contrail-scheduler pluginConfig: - args: apiVersion: kubescheduler.config.k8s.io/v1beta3 kind: VMICapacityArgs config: /tmp/vmi/config.yaml name: VMICapacity - args: apiVersion: kubescheduler.config.k8s.io/v1beta3 kind: FlowsCapacityArgs address: central-collector.contrail:9090 name: FlowsCapacity - args: apiVersion: kubescheduler.config.k8s.io/v1beta3 kind: BandwidthUsageArgs address: central-collector.contrail:9090 name: BandwidthUsage plugins: multiPoint: enabled: - name: VMICapacity weight: 50 - name: FlowsCapacity weight: 1 - name: BandwidthUsage weight: 20
Note the following fields:
-
schedulerName
: The name of the scheduler you want to deploy. -
pluginConfig
: Contains information about the plugins included in thecontrail-scheduler
deployment. The deployment includes the following plugins:-
VMICapacity
-
FlowsCapacity
-
BandwidthUsage
-
-
config
: This field contains the filepath where the VMI plugin config is mounted. -
multiPoint
: You can enable extension points for each of the included plugins. Instead of having to enable specific extension points for a plugin, themultiPoint
field let's you enable or disable all of the extension points that are developed for a given plugin. The weights of a plugin decide the priority of a particular score from a plugin. This means that at the end of scoring, all of the plugins send out a weighted score. A pod is scheduled on a node with the highest aggregated score.
-
-
Create a
contrail-scheduler
Deployment
. The following is an example of aDeployment
:apiVersion: apps/v1 kind: Deployment metadata: name: contrail-scheduler namespace: contrail-scheduler labels: app: scheduler spec: replicas: 1 selector: matchLabels: app: scheduler template: metadata: labels: app: scheduler spec: serviceAccountName: contrail-scheduler securityContext: fsGroup: 2000 runAsGroup: 3000 runAsNonRoot: true runAsUser: 1000 containers: - name: contrail-scheduler image: <registry>/contrail-scheduler:<tag> command: - /contrail-scheduler - --authentication-kubeconfig=/tmp/config/kubeconfig - --authorization-kubeconfig=/tmp/config/kubeconfig - --config=/tmp/scheduler/scheduler-config - --secure-port=10271 imagePullPolicy: Always livenessProbe: failureThreshold: 8 httpGet: path: /healthz port: 10271 scheme: HTTPS initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 30 resources: requests: cpu: 100m startupProbe: failureThreshold: 24 httpGet: path: /healthz port: 10271 scheme: HTTPS initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 30 volumeMounts: - mountPath: /tmp/config name: kubeconfig readOnly: true - mountPath: /tmp/scheduler name: scheduler-config readOnly: true - mountPath: /tmp/vmi name: vmi-config readOnly: true hostPID: false volumes: - name: kubeconfig secret: secretName: kubeconfig - name: scheduler-config configMap: name: scheduler-config - name: vmi-config configMap: name: vmi-config
After you apply this Deployment
, the new
contrail-scheduler
is active.
Use the Contrail Scheduler to Deploy Pods
Enter the name of your contrail-scheduler
to the
schedulerName
field to use the contrail-scheduler
to
schedule (deploy) new pods. The following is an example of a pod manifest with the
schedulerName
defined:
apiVersion: v1 kind: Pod metadata: name: my-app labels: app: web spec: schedulerName: contrail-scheduler containers: - name: app image: busybox command: - sh - -c - sleep 500