Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Kubernetes in Practice

 

This chapter introduces some of the fundamental objects and features of Kubernetes.

Imagine you have a pod that needs to be hosted on a machine with certain specifications (SSD HD, physical location, processing power, etc. ) or you want to search or group your pods for easier administration. What do you do? Labels are the way to go. In Kubernetes, labels are attached to an object.

Let’s use labels to launch a pod on a certain machine.

Labels

In Kubernetes, any object can be identified using a label.

You can assign multiple labels per object, but you should avoid using too many labels, or too few; too many will get you confused and too few won’t give the real benefits of grouping, selecting, and searching.

Best practice is to assign labels to indicate:

  • Application/program ID using this pod

  • Owner (who manages this pod/application)

  • Stage (the pod/application in development/testing/production version)

  • Resource requirements (SSD, CPU, storage)

  • Location (preferred location/zone/data center to run this pod/application)

Okay, let’s assign labels for (stage: testing) and (zone: production) to two nodes, respectively, then try to launch a pod in a node that has the label (stage: testing):

Now let’s launch a basic Nginx pod tagged with stage: testing in the nodeSelector and confirm it will land on a node tagged with stage: testing. Kube-scheduler uses labels mentioned in the nodeSelector section of the pod YAML to select the node to launch the pod:

Note

Kube-scheduler picks the node based on various factors like individual and collective resource requirements, hardware, software, or policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, and deadlines.

Note

You can assign a pod to a certain node without labels by adding the argument nodeName: nodeX under the spec in the YAML file where nodeX is the name of the node.

Namespace

As in many other platforms, there is normally more than one user (or team) working on a Kubernetes cluster. Suppose a pod named webserver1 has been built by a devops department, but when sales department attempts to launch a pod with the same name, the system will give an error:

Error from server (AlreadyExists): error when creating "webserver1.yaml": pods "webserver1" already exists

Kubernetes won’t allow the same object name for the Kubernetes resources to appear more than once in the same scope.

Namespaces provide the scope for the Kubernetes resource like project/tenant in OpenStack. Names of resources need to be unique within a namespace, but not across namespaces. It’s a natural way to divide cluster resources between multiple users.

Kubernetes starts with three initial namespaces:

  • default:The default namespace for objects with no other namespace.

  • kube-system: The namespace for objects created by the Kubernetes system.

  • kube-public: Initially created by kubeadm tool when deploying a cluster. By convention the purpose of this namespace is to make some resources readable by all users without authentication. It exists mostly in Kubernetes clusters boot-stapped with the kubeadm tool only.

Create a Namespace

Creating a namespace is pretty simple. The kubectl command does the magic. You don’t need to have a YAML file:

And the new namespace dev is now created:

Now the webserver1 pod in dev namespace won’t conflict with webserver1 pod in the sales namespace:

Quota

You can now apply constraints that limit resource consumption per namespace, similar to the OpenStack tenant. For example, you can limit the quantity of objects that can be created in a namespace, the total amount of compute resources that may be consumed by resources, etc. The constraint in k8s is called quota. Here’s an example:

There, we just created quota quota-onepod, and the constraint we gave is pods=1 – so only one pod is allowed to be created in this namespace:

And now create a pod within it:

That works fine, so now let’s create a second pod in it:

Immediately we run into the error exceeded quota. Let’s delete the quota quota-onepod. This new pod will be created after the quota is removed:

ReplicationController

You learned how to launch a pod representing your containers from its YAML file in Chapter 2. One question might arise in your container-filled mind: what if I need three pods that are exactly the same (each runs an Apache container) to make sure the web service appears more robust? Do I change the name in the YAML file then repeat the same commands to create the required pods? Or maybe with a shell script? Kubernetes already has the objects to address this demand with RC - ReplicationController, or RS – ReplicaSet.

A ReplicationController (rc) ensures that a specified number of pod replicas are running at any one time. In other words, a replication controller makes sure that a pod or a homogeneous set of pods is always up and available.

Creating an rc

Let’s create an rc with an example. First create a YAML file for an rc object named webserver:

Remember that kind indicates the object type that this YAML file defines, here it is an rc instead of a pod. In metadata it is showing the rc’s name as webserver. The spec is the detail specification of this rc object, and replicas: 3 indicates the same pod will be cloned to make sure the total number of pods created by the rc is always three. Finally, the template provides information about the containers that will run in the pod, the same as what you saw in a pod YAML file. Now use this YAML file to create the rc object:

If you are quick enough, you may capture the intermediate status when the new pods are being created:

Eventually you will see three pods launched:

Rc works with the pod directly. The workflows are shown in Figure 1.

Figure 1: rc Workflows
rc Workflows

With the replicas parameter specified in the rc object YAML file, the Kubernetes replication controller, running as part of kube-controller-manager process in the master node, will keep monitoring the number of running pods spawned by the rc and automatically launch new ones should any of them run into failure. The key thing to learn is that individual pods may die any time, but the pool as a whole is always up and running, making a robust service. You will understand this better when you learn Kubernetes service.

Test Rc

You can test an rc’s impact by deleting one of the pods. To delete a resource with kubectl, use the kubectl delete sub-command:

As you can see, when one pod is being terminated, a new pod is immediately spawned. Eventually the old pod will go away and the new pod will be up and running. The total number of running pods will remain unchanged.

You can also scale up or down replicas with rc. For example, to scale up from number of 3 to 5:

There are other benefits with rc. Actually, since this abstraction is so popular and heavily used, two very similar objects, rs -ReplicaSet and Deploy – Deployment, have been developed with more powerful features. Generally speaking, you can call them next generation rc. For now, let’s stop exploring more rc features and move our focus to these two new objects.

Before moving to the next object, you can delete the rc:

ReplicaSet

ReplicaSet, or rs object, is pretty much the same thing as an rc object, with just one major exception – the looks of selector:

Rc uses equally-based selectors only, while rs supports an extra selector format, set-based. Functionally the two forms of selectors do the same job – that is—select the pod with a matching label:

An rs is created and it launches a pod, just the same as what an rc would do. If you compare the kubectl describe on the two objects:

As you can see, for the most part the outputs are the same, with the only exception of the selector format. You can also scale the rs the same way as you would do with rc:

Before moving to the next object, delete the rs:

Deployment

You may wonder why Kubernetes has different objects to do almost the same job. As mentioned earlier, the features of rc have been extended through the rs and deployment. We’ve seen the rs, which has done the same job of rc, only with a different selector format. Now we’ll check out the other new object, DEPLOY – deployment, and explore the features coming from it.

Create a Deployment

If you simply change the kind attribute from ReplicaSet to Deployment you’ll get the YAML file of a deployment object:

Create a deployment with the kubectl command:

Actually, the deployment is a relatively higher-level of abstraction than rc and rs. Deployment does not create a pod directly, and the describe command reveals this:

Deployment Workflow

When you create a deployment a replica set is automatically created. The pods defined in the deployment object are created and supervised by the deployment’s replicaset.

The workflow is shown in Figure 2:

Figure 2: Deployment Workflow
Deployment Workflow

You might still be wondering why you need rs as one more layer sitting between deployment and pod and that’s answered next.

Rolling Update

The rolling update feature is one of the more powerful features that comes with the deployment object. Let’s demonstrate the feature with a test case to explain how it works.

Note

In fact, a similar rolling update feature exists for the old rc object. The implementation has quite a few drawbacks compared with the new version supported by Deployment. In this book we focus on the new implementation with Deployment.

Test Case: Rolling Update

Suppose you have a nginx-deployment, with replica=3 and pod image 1.7.9. We want to upgrade the image from version 1.7.9 to the new image version 1.9.1. With kuberctl you can use the set image option and specify the new version number to trigger the update:

Now check the deployment information again:

There are two changes you can observe here:

  • The image version in deployment is updated

  • A new rs nginx-deployment-6fdbb596db is created, with a replica set to 1

And with the new rs with replica being 1, a new pod (the fourth one) is now generated:

The new pod is with the new image:

While the old pod is still with the old image:

Let’s wait, and keep checking the pods status… eventually all old pods are terminated, and three new pods are running – the pod names confirm they are new ones:

So the update is done, and all pods are now running with the new version of the image.

How It Works

Hold on, you might argue, this is not updated, this should be called a replacement because Kubernetes used three new pods with new images to replace the old pods! Precisely speaking, this is true. But this is how it works. Kubernetes’s philosophy is that pods are cheap, and replacement is easy – imagine how much work it will be when you have to log in to each pod, uninstall old images, clean up the environment, only to install a new image. Let’s look at more details about this process and understand why it is called a rolling update.

When you update the pod with new software, the deployment object introduces a new rs that will start the pod update process. The idea here is not to log in to the existing pod and do the image update in -place, instead, the new rs just creates a new pod equipped with the new software release in it. Once this new (and additional) pod is up and running, the original rs will be scaled down by one, so the total number of running pods remains unchanged. The new rs will continue to scale up by one and the original rs scales down by one. This process repeats until the number of pods created by the new rs reaches the original replica number defined in the deployment, and that is when all of the original rs pods are terminated. The process is depicted in Figure 3.

Figure 3: Deployment Overview
Deployment Overview

As you can see, the whole process of creating a new rs, scaling up the new rs, and scaling down the old one simultaneously, is fully automated and taken care of by the deployment object. It is deployment that is deploying and driving the ReplicaSet object, which, in this sense, is working merely as a backend.

This is why deployment is considered a higher-layer object in Kubernetes, and also the reason why it is officially recommended that you never use ReplicaSet alone, without deployment.

Record

Deployment also has the ability to record the whole process of rolling updates, so in case it is needed, you can review the update history after the update job is done:

Pause/Resume/Undo

Additionally, you can also pause/resume the update process to verify the changes before proceeding:

You can even undo the update when things are going wrong during the maintenance window:

Typically you do this when something is broken in your deployment. Compared with how much work it takes to prepare for the software upgrade during maintenance windows in the old days, this is an amazing feature for anyone who suffered from software upgrade!

Tip

This is pretty similar to the Junos rollback magic command that you probably use every day when you need to quickly revert the changes you make to your router.

Secrets

All modern network systems need to deal with sensitive information, such as username, passwords, SSH keys, etc. in the platform. The same applies to the pods in a Kubernetes environment. However, exposing this information in your pod specs as cleartext may introduce security concerns and you need a tool or method to resolve the issue – at least to avoid the cleartext credentials as much as possible.

The Kubernetes secrets object is designed specifically for this purpose – it encodes all sensitive data and exposes it to pods in a controlled way.

The official definition of Kubernetes secrets is:

"A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. Such information might otherwise be put in a Pod specification or in an image; putting it in a secret object allows for more control over how it is used and reduces the risk of accidental exposure."

Users can create secrets, and the system also creates secrets. To use a secret, a pod needs to reference the secret.

There are many different types of secrets, each serving a specific use case, and there are also many methods to create a secret and a lot of different ways to refer to it in a pod. A complete discussion of secrets is beyond the scope of this book, so please refer to the official documentation to get all of the details and track all up-to-date changes.

Here, we’ll look at some commonly used secret types. You will also learn several methods to create a secret and how to refer to it in your pods. And once you get to the end of the section, you should understand the main benefits of a Kubernetes secrets object and how it can help improve your system security.

Let’s begin with a few secret terms:

  • Opaque: This type of secret can contain arbitrary key-value pairs, so it is treated as unstructured data from Kubernetes’ perspective. All other types of secret have constant content.

  • Kubernetes.io/Dockerconfigjson: This type of secret is used to authenticate with a private container registry (for example, a Juniper server) to pull your own private image.

  • TLS: A TLS secret contains a TLS private key and certificate. It is used to secure an ingress. You will see an example of an ingress with a TLS secret in Chapter 4.

  • Kubernetes.io/service-account-token: When processes running in containers of a pod access the API server, they have to be authenticated as a particular account (for example, account default by default). An account associated with a pod is called a service-account. Kubernetes.io/service-account-token type of secret contains information about Kubernetes service-account. We won’t elaborate on this type of secret and service-account in this book.

  • Opaque secret: The secret of type opaque represents arbitrary user-owned data – usually you want to put some kind of sensitive data in secret, for example, username, password, security pin, etc., just about anything you believe is sensitive and you want to carry into your pod.

Define Opaque Secret

First, to make our sensitive data looks less sensitive, let’s encode it with the base64 tool:

Then put the encoded version of the data in a secret definition YAML file:

Alternatively, you can define the same secret from kubectl CLI directly, with the --from-literal option:

Either way, a secret will be generated:

Refer Opaque Secret

Next you will need to use the secret in a pod, and the user information contained in the secret will be carried into the pod. As mentioned, there are different ways to refer the opaque secret in a pod, and correspondingly, the result will be different.

Typically, user information carried from a secret can appear in a container in one of these forms:

  • Files

  • Environmental variables

Now let’s demonstrate using secret to generate environmental variables in a container:

Spawn the pod and container from this YAML file:

Log in the container and verify the generated environmental variables:

The original sensitive data encoded with base64 is now present in the container!

Dockerconfigjson Secret

The dockerconfigjson secret, as the name indicates, carries the Docker account credential information that is typically stored in a .docker/config.json file. The image in a Kubernetes pod may point to a private container registry. In that case, Kubernetes needs to authenticate it with that registry in order to pull the image. The dockerconfigjson type of secret is designed for this very purpose.

Docker Credential Data

The most straightforward method to create a kubernetes.io/dockerconfigjson type of secret is to provide login information directly with the kubectl command and let it generate the secret:

Verify the secret creation:

Note

Only the first line in the output is the secret you have just created. The second line is a kubernetes.io/service-account-token type of secret that the Kubernetes system creates automatically when the contrail setup is up and running.

Now inspect the details of the secret:

Not surprisingly, you don’t see any sensitive information in the form of cleartext. There is a data portion of the output where you can see a very long string as the value of key: dockerconfigjson. Its appearance seems to have transformed from the original data, but at least it does not contain sensitive information anymore – after all one purpose of using a secret is to improve the system security.

However, the transformation is done by encoding, not encryption, so there is still a way to manually retrieve the original sensitive information: just pipe the value of key .dockerconfigjson into the base64 tool, and the original username and password information is viewable again:

Some highlights in this output are:

  • The python -mjson.tool is used to format the decoded json data before displaying to the terminal.

  • There is an auth key-value pair. It is the token generated based on the authentication information you gave (username and password).

  • Later on, when equipped with this secret, a pod will use this token, instead of the username and password to authenticate itself towards the private Docker registry hub.juniper.net in order to pull a Docker image.

Tip

Here’s another way to decode the data directly from the secret object:

The --output=xxxx option filters the kubectl get output so only the value of .dockerconfigjson under data is displayed. The value is then piped into base64 with option --decode (alias of -d) to get it decoded.

A docker-registry secret created manually like this will only work with a single private registry. To support multiple private container registries you can create a secret from the Docker credential file.

Docker Credential File (~/.Docker/config.json)

As the name of the key .dockerconfigjson in the secret we created indicates, it serves a similar role as the Docker config file: .docker/config.json. Actually, you can generate the secret directly from the Docker configuration file.

To generate the Docker credential information, first check the Docker config file:

There’s nothing really here. Depending on the usage of the set up you may see different output, but the point is that this Docker config file will be updated automatically every time you docker login a new registry:

The file mydockerpass.txt is the login password for username JNPR-FieldUser213. Saving the password in a file and then piping it to the docker login command with --password-stdin option has an advantage of not exposing the password cleartext in the shell history.

Tip

If you want you can write the password directly, and you will get a friendly warning that this is insecure.

Now the Docker credential information is generated in the updated config.json file:

The login process creates or updates a config.json file that holds the authorization token. Let’s create a secret from the .docker/config.json file:

YAML File

You can also create a secret directly from a YAML file the same way you create other objects like service or ingress.

To manually encode the content of the .docker/config.json file:

Then put the base64 encoded value of the .docker/config.json file as data in below the YAML file:

Keep in mind that base64 is all about encoding instead of encryption – it is considered the same as plain text. So sharing this file compromises the secret.

Refer Secret in Pod

After a secret is created, it can be referred to by a pod/rc or deployment in order to pull an image from the private registry. There are many ways to refer to secrets. This section will examine using imagePullSecrets under pod spec to refer to the secret.

An imagePullSecret is a way to pass a secret that contains a Docker (or other) image registry password to the kubelet so it can pull a private image on behalf of your pod.

Create a pod pulling the Juniper cSRX container from the private repository:

Now, generate the pod:

The cSRX is up and running:

And behind the scenes, the pod authenticates itself towards the private registry, pulls the image, and launches the cSRX container:

As you saw from our test, the secret objects are created independently of the pods, and inspecting the object spec does not provide the sensitive information directly on the screen.

Secrets are not written to the disk, but are instead stored in a tmpfs file system, only on nodes that need them. Also, secrets are deleted when the pod that is dependent on them is deleted.

On most native Kubernetes distributions, communication between users and the API server is protected by SSL/TLS. Therefore, secrets transmitted over these channels are properly protected.

Any given pod does not have access to the secrets used by another pod, which facilitates encapsulation of sensitive data across different pods. Each container in a pod has to request a secret volume in its volumeMounts for it to be visible inside the container. This feature can be used to construct security partitions at the pod level.

Service

When a pod gets instantiated, terminated, and moved from one node to another, and in so doing changes its IP address, how do we keep track of that to get uninterrupted functionalities from the pod? Even if the pod isn’t moving, how does traffic reach groups of pods via a single entity?

The answer to both questions is Kubernetes service.

Service is an abstraction that defines a logical set of pods and a policy, by which you can access them. Think of services as your waiter in a big restaurant – this waiter isn’t cooking, instead he’s an abstraction of everything happing in the kitchen and you only have to deal with this single waiter.

Service is a Layer 4 load balancer and exposes pod functionalities via a specific IP and port. The service and pods are linked via labels like rs. And there’s three different types of services:

  • ClusterIP

  • NodePort

  • LoadBalancer

ClusterIP Service

The clusterIP service is the simplest service, and the default mode if the ServiceType is not specified. Figure 4 illustrates how clusterIP service works.

Figure 4: ClusterIP Service
ClusterIP Service

You can see that the ClusterIP Service is exposed on a clusterIP and a service port. When client pods need to access the service it sends requests towards this clusterIP and service port. This model works great if all requests are coming from inside of the same cluster. The nature of the clusterIP limits the scope of the service to only be within the cluster. Overall, by default, the clusterIP is not externally reachable .

Create ClusterIP Service

The YAML file looks pretty simple and self-explanatory. It defines a service/service-web-clusterip with the service port 8888, mapping to targetPort, which means container port 80 in some pod. The selector indicates that whichever pod with a label and app: webserver will be the backend pod responding service request.

Okay, now generate the service:

Use kubectl commands to quickly verify the service and backend pod objects:

The service is created successfully, but there are no pods for the service. This is because there is no pod with a label matching the selector in the service. So you just need to create a pod with the proper label.

Now, you can define a pod directly but given the benefits of rc and the deployment over pods, as discussed earlier, using rc or deployment is more practical (you’ll soon see why).

As an example, let’s define a Deployment object named webserver:

The Deployment webserver has a label app: webserver, matching the selector defined in our service. The replicas: 1 instructs the controller to launch only one pod at the moment. Let’s see:

And immediately the pod is chosen to be the backend.

Other brief summaries about the previous kubectl get svc command output are:

  • The service got a clusterIP, or service IP, of 10.101.150.135 allocated from the service IP pool.

  • The service port is 8888 as what is defined in YAML.

  • By default, the protocol type is TCP if not declared in the YAML file. You can use protocol: UDP to declare a UDP service.

  • The backend pod can be located with the label selector.

Tip

The example shown here uses an equality-based selector (-l) to locate the backend pod, but you can also use a set-based syntax to archive the same effect. For example: kubectl get pod -o wide -l 'app in (webserver)'.

Verify ClusterIP Service

To verify that the service actually works, let’s start another pod as a client to initiate a HTTP request toward the service. For this test, we’ll launch and log in to a client pod and use the curl command to send an HTTP request toward the service. You’ll see the same pod being used as a client to send requests throughout this book:

Create the client pod:

Tip

The client pod is just another spawned pod based on the exact same image whatever the webserver Deployment and its pods do. This is the same as with physical servers and VMs: nothing stops a server from doing the client’s job:

The HTTP request toward the service reaches a backend pod running the web server application, which responds with a HTML page.

To better demonstrate which pod is providing the service, let’s set up a customized pod image that runs a simple web server. The web server is configured in such a way that when receiving a request it will return a simple HTML page with local pod IP and hostname embedded. This way the curl returns something more meaningful in our test.

The returned HTML looks relatively okay to read, but there is a way to make it easier to see, too:

The w3m tool is a lightweight console-based web browser installed in the host. With w3m you can render a HTML webpage into text, which is more readable than the HTML page.

Now that service is verified, requests to service have been redirected to the correct backend pod, with a pod IP of 10.47.255.238 and a pod name of webserver-7c7c458cc5-vl6zs.

Specify a ClusterIP

If you want to have a specific clusterIP, you can mention it in the spec. IP addresses should be in the service IP pool.

Here’s some sample YAML with specific clusterIP:

NodePort Service

The second general type of service, NodePort, exposes a service on each node’s IP at a static port. It maps the static port on each node with a port of the application on the pod as shown in Figure 5.

Figure 5: NodePort Service
NodePort Service

Here are some highlights in this services YAML file:

  • selector: The label selector that determines which set of pods is targeted by this service; here, any pod with the label app: webserver will be selected by this service as the backend.

  • Port: This is the service port.

  • TargetPort: The actual port used by the application in the container. Here, it’s port 80, as we are planning to run a web server.

  • NodePort: The port on the host of each node in the cluster.

Let’s create the service:

  • Type: The default service type is ClusterIP. In this example, we set the type to NodePort.

  • NodePort: By default, Kubernetes allocates node ports in the 30000-32767 range, if it is not mentioned in the spec. This can be changed using the flag --service-node-port-range. The NodePort value can also be set, but make sure it’s in the configured range

  • Endpoints: The podIP and the exposed container port. The request toward service IP and service port will be directed here, and 10.47.255.252:80 indicates that we have created a pod that has a matching label with the service, so its IP is selected as one of the backends.

Note

For this test, make sure there is at least one pod with the label app:webserver running. Pods in previous sections are all created with this label. Recreating the client pod suffices if you’ve removed them already.

Now we can test this by using the curl command to trigger an HTTP request toward any node IP address:

With the power of the NodePort service, you can access the web server running in the pod from any node via the nodePort 32001:

Load Balancer Service

The third service, the load balancer service, goes one step beyond the NodePort service by exposing the service externally using a cloud provider’s load balancer. The load balancer service by its nature automatically includes all the features and functions of NodePort and ClusterIP services.

Kubernetes clusters running on cloud providers support the automatic provision of a load balancer. The only difference between the three services is the type value. To reuse the same NodePort service YAML file, and create a load balancer service, just set the type to LoadBalancer:

The cloud will see this keyword and a load balancer will be created. Meanwhile, an external public load balancerIP is allocated to serve as the frontend virtual IP. Traffic coming to this loadbalancerIP will be redirected to the service backend pod. Please keep in mind that this redirection process is solely a transport layer operation. The loadbalancerIP and port will be translated to private backend clusterIP and it’s targetPort. It does not involve any application layer activities. There is nothing like parsing a URL, proxy HTTP request, and etc., like what happens in the HTTP proxying process. Because the loadbalancerIP is publicly reachable, any Internet host that has access to it (and the service port) can access the service provided by the Kubernetes cluster.

From an Internet host’s perspective, when it requests service, it refers this public external loadbalancerIP plus service port, and the request will reach the backend pod. The loadbalancerIP is acting as a gateway between service inside of the cluster and the outside world.

Some cloud providers allow you to specify the loadBalancerIP. In those cases, the load balancer is created with the user-specified loadBalancerIP. If the loadBalancerIP field is not specified, the load balancer is set up with an ephemeral IP address. If you specify a loadBalancerIP but your cloud provider does not support the feature, the loadbalancerIP field that you set is ignored.

How a load balancer is implemented in the load balancer service is vendor-specific. A GCE load balancer may work in a totally different way with an AWS load balancer. There is a detailed demonstration of how the load balancer service works in a Contrail Kubernetes environment in Chapter 4.

External IPs

Exposing service outside of the cluster can also be achieved via the externalIPs option. Here’s an example:

In the Service spec, externalIPs can be specified along with any of the service types. External IPs are not managed by Kubernetes and are the responsibility of the cluster administrator.

Note

External IPs are different from loadbalancerIP, which is the IP assigned by the cluster administrator, while external IPs come with the load balancer created by the cluster that supports it.

Service Implementation: Kube-proxy

By default, Kubernetes uses the kube-proxy module for services, but CNI providers can have their own implementations for services.

Kube-proxy can be deployed in one of the three modes:

  • user-space proxy-mode

  • iptables proxy-mode

  • ipvs proxy-mode

When traffic hits the node, it’s forwarded to one of the backend pods via a deployed kube-proxy forwarding plane. Detailed explanations and comparisons of these three modes will not be covered in this book, but you can check Kubernetes official website for more information. Chapter 4 illustrates how Juniper Contrail as a Container Network Interface (CNI) provider implements the service.

Endpoints

There is one object we haven’t explored so far: EP, or endpoint. We’ve learned that a particular pod or group of pods with matching labels are chosen to be the backend through label selector, so the service request traffic will be redirected to them. The IP and port information of the matching pods are maintained in the endpoint object. The pods may die and spawn any time, the mortal nature of the pod will most likely cause the new pods be respawned with new IP addresses. During this dynamic process the endpoints will always be updated accordingly, to reflect the current backend pod IPs, so the service traffic redirection will act properly. (CNI providers who have their own service implementation update the backend of the service based on the endpoint objects.)

Here is an example to demonstrate some quick steps to verify the service, corresponding endpoint, and the pod, with matching labels.

To create a service:

To list the endpoint:

To locate the pod with the label that is used by the selector in service:

And finally, scale the backend pods:

Now check the endpoints again, and you will see that they are updated accordingly:

Service Without Selector

In the preceding example, the endpoints object is automatically generated by the Kubernetes system whenever a service is created and at least one pod with a matching label exists. But another endpoint use case is a service that has no label selector defined in which you can manually map the service to the network address and the port where it’s running by manually adding an endpoint object. Then you can connect the endpoint with the service. This can be very useful in some scenarios, for example, in a setup where you have a backend web server running in a physical server, and you still want to integrate it into a Kubernetes service. In that case, you just create the service as usual, and then create an endpoint with an address and port pointing to the web server. That’s it! The service does not care about the backend type, it just redirects the service request traffic exactly the same way as if all backend is a pod.

Ingress

Now that you’ve now seen ways of exposing a service to clients outside the cluster, another method is Ingress. In the service section, service works in transport layer. In reality, you access all services via URLs.

Ingress, or ing for short, is another core concept of Kubernetes that allows HTTP/HTTPS routing that does not exist in service. Ingress is built on top of service. With ingress, you can define URL-based rules to distribute HTTP/HTTPS routes to multiple different backend services, therefore, ingress exposes services via HTTP/HTTPS routes. After that the requests will be forwarded to each service’s corresponding backend pods.

Ingress Versus Service

There are similarities between load balancer service and ingress. Both can expose service outside of the cluster, but there are some significant differences.

Operation Layer

Ingress operates at the application layer of the OSI network model, while service only operates at the transport layer. Ingress understands the HTTP/HTTPS protocol, service only enacts forwarding based on the IP and the port, which means it does not care about the application layer protocol (HTTP/HTTPS) details. Ingress can operate at the transport layer, but service does the same thing, so it doesn’t make sense for ingress to do it as well, unless there is a special reason to do so.

Forwarding Mode

Ingress does the application layer proxy pretty much in the same way a traditional web load balancer does. A typical web load balancer proxy sitting between machine A (client) and B (server), works at the application layer. It is aware of the application layer protocols (HTTP/HTTPS) so the client-server interaction does not look transparent to the load balancer. Basically it creates two connections each with the source, (A), and the destination, (B), machine. Machine A does not even know about the existence of machine B. For machine A, the proxy is the only thing it talks to and it does not care how and where the proxy gets its data.

Number of Public IPs.

Each service of the ingress needs a public IP if it is exposed directly to the outside of the cluster. When ingress is a frontend to all these services, one public IP is sufficient, which makes life easy for cloud administrators.

Ingress Object

Before going into detail about the ingress object, the best way to get a feel for it is to look at the YAML definition:

You can see it looks pretty simple. The spec defines only one item – that is the rules. The rules say a host, which is the Juniper URL here, may have two possible paths in the URL string. The path is whatever follows the host in the URL, in this case they are /dev and /qa. Each path is then associated to a different service. When ingress sees HTTP requests arrive, it proxies the traffic to each URL path associated backend service. Each service, as we’ve learned in this service section, will deliver the request to its corresponding backend path. That’s it. Actually this is one of the three types of ingress that Kubernetes supports today – simple fan-out ingress. The other two types of ingress will be discussed later in this chapter.

About URL, Host, and Path

The terms host and path are used frequently in Kubernetes Ingress documentation. The host is a fully qualified domain name of the server. The path, or url-path is the rest of the string part after the host in a URL. If the case is one of having a port in the URL, then it is the stringafter the port.

Take a look at the following URL:

The host is www.juniper.net, whatever follows port 1234 is called path, my/resource in this example. If a URL has no port, then the strings following host are the path. For more details you can read RFC 1738, but for the purpose of this book, understanding what is introduced here will suffice.

If you now think Kubernetes Ingress just defines some rules and the rules are just to instruct the system to direct incoming request to different services, based on the URLs, you are basically right at a high level. Figure 6 illustrates the interdependency between the three Kubernetes objects: ingress, service, and pod.

Figure 6: Ingress, Service, and Pod
Ingress, Service, and
Pod

In practice there are other things you need to understand, to handle the ingress rules, you need at least one more component called the ingress controller.

Ingress Controller

An ingress controller is responsible for reading the ingress rules and then programming the rules into the proxy, which does the real work – dispatching traffic based on the host / URL.

Ingress controllers are typically implemented by third-party vendors. Different Kubernetes environments have different ingress controllers based on the need of the cluster. Each ingress controller has its own implementations to program the ingress rules. The bottom line is, there has to be an ingress controller running in the cluster.

Some ingress controller providers are:

  • nginx

  • gce

  • haproxy

  • avi

  • f5

  • istio

  • contour

You may deploy any number of ingress controllers within a cluster. When you create an ingress, you should annotate each ingress with the appropriate ingress.class to indicate which ingress controller should be used (if more than one exists within your cluster).

The annotation used in ingress objects will be explained in the annotation section.

Ingress Examples

There are three types of ingresses:

  • Single Service ingress

  • Simple Fanout ingress

  • Name-based Virtual Hosting ingress

We’ve looked at the simple fanout ingress, so now let’s look at a YAML file example for the other two types of ingress.

Single Service Ingress

This is the simplest form of ingress. The ingress will get an external IP so the service can be exposed to the public, however, it has no rules defined, so it does not parse host or path in the URLs. All requests go to the same service.

Simple Fanout Ingress

We checked this out at the beginning of this section. Compared to single service ingress, simple fanout ingress is more practical. It’s not only able to expose service via a public IP, but it is also able to do URL routing or fan out based on the path. This is a very common usage scenario when a company wants to direct traffic to each of its department’s dedicated servers based on the suffix of URL after the domain name.

Virtual Host Ingress

The name-based virtual host is similar to simple fanout ingress in that it is able to do rule-based URL routing. The unique power of this type of ingress is that it supports routing HTTP traffic to multiple host names at the same IP address. The example here may not be practical (unless one day the two domains merge!) but it is good enough to showcase the idea. In the YAML file two hosts are defined, the “juniperhr” and “junipersales” URL respectively. Even though ingress will be allocated with only one public IP, based on the host in the URL, requests toward that same public IP will still be routed to different backend services. That’s why it is called a virtual hosting ingress and there’s a very detailed case study in Chapter 4 for you to explore.

Note

It is also possible to merge a simple fanout ingress and a virtual host ingress into one, but the details are not covered here.

Multiple Ingress Controller

You can have multiple ingress controllers in one cluster but the cluster needs to know which one to choose. For example, in Chapter 4 we’ll talk about Contrail’s built-in ingress controller, which does not stop us from installing another third-party ingress controller like the nginx ingress controller. Instead you end up having two ingress controllers in the same cluster with the names:

  • opencontrail (default)

  • nginx

Contrail’s implementation is the default one, so you don’t have to specifically select it. To select nginx as ingress controller, use this annotation. Kubernetes.io/ingress.class:

This will tell Contrail’s ingress controller opencontrail to ignore the ingress configuration.

Kubernetes Network Policy

The Kubernetes networking model requires all pods to be able to access all other pods by default. This is called a flat network because it follows an allow-any-any model. It significantly simplifies the design and implementation of Kubernetes networking and makes it much more scalable.

Note

Chapter 4 details the requirements that Kubernetes enforces on network implementations.

Security is an important concern. In reality, in many cases a certain level of network segmentation methods is required to ensure that only certain pods can talk to each other, and that is when Kubernetes network policy comes into the picture. A Kubernetes network policy defines the access permissions for groups of pods the same way a security group in the cloud is used to control access to VM instances.

Kubernetes supports network policy via the NetworkPolicy object, which is a Kubernetes resource just like pod, service, ingress, and many others you’ve learned about earlier in this chapter. The role of the Network Policy object is to define how groups of pods are allowed to communicate with each other.

Let’s examine how Kubernetes network policy works:

  1. 1Initially, in a Kubernetes cluster, all pods are non-isolated by default and they work in an allow-any-any model so any pod can talk to any other pod.

  2. 2. Now apply a network policy named policy1 to pod A. In policy policy1 you define a rule to explicitly allow pod A to talk to pod B. In this case let’s call pod A a target pod because it is the pod that the network policy will act on.

  3. 3. From this moment on, a few things happen:

    • Target pod A can talk to pod B, and can talk to pod B only, because B is the only pod you allowed in the policy. Due to the nature of the policy rules, you can call the rule a whitelist.

    • For target pod A only, any connections that are not explicitly allowed by the whitelist of this network policy policy1 will be rejected. You don’t need to explicitly define this in policy1, because it will be enforced by the nature of Kubernetes network policy. Let’s call this implicit policy the deny all policy.

    • As for other non-targeted pods, for example, pod B or pod C, which are not applied with policy1, nor to any other network policies, will continue to follow the allow-any-any model. Therefore they are not affected and can continue to communicate to all other pods in the cluster. This is another implicit policy, an allow all policy.

  4. Assuming you also want pod A to be able to communicate to pod C, you need to update the network policy policy1 and its rules to explicitly allow it. In other words, you need to keep updating the whitelist to allow more traffic types.

As you can see, when you define a policy, at least three policies will be applied in the cluster:

  • Explicit policy1: This is the network policy you defined, with the whitelist rules allowing certain types of traffic for the selected (target) pod.

  • An implicit deny all network policy: This denies all other traffic that is not in the whitelist of the target pod.

  • An implicit allow all network policy: This allows all other traffic for the other non-targeted pods that are not selected by policy1. We’ll see deny all and allow all policies again in Chapter 8.

Here are some highlights of the Kubernetes network policy.

  • Pod specific: Network policy specification applies to one pod or a group of pods based on label, same way as rc or Deploy do.

  • Whitelist-based rules: explicit rules that compose a whitelist, and each rule describes a certain type of traffic to be allowed. All other traffic not described by any rules in the whitelist will be dropped for the target pod.

  • Implicit allow all: A pod will be affected only if it is selected as the target by a network policy, and it will be affected only by the selecting network policy. The absence of a network policy applied on a pod indicates an implicit allow all policy to this pod. In other words, if a non-targeted pod continues its allow-any-any networking model.

  • Separation of ingress and egress: Policy rules need to be defined for a specific direction. The direction can be Ingress, Egress, none, or both.

  • Flow-based (vs. packet-based): Once the initiating packet is allowed, the return packet in the same flow will also be allowed. For example, suppose an ingress policy applied on pod A allows an ingress HTTP request, then the whole HTTP interaction will be allowed for pod A. This includes the three-way TCP connection establishment and all data and acknowledgments in both directions.

Note

Network policies are implemented by the network component, so you must be using a network solution that supports network policy. Simply creating the NetworkPolicy resource without a controller to implement it will have no effect. In this book Contrail is such a network component with network policy implemented. In Chapter 8, you’ll see how these network policies work in Contrail.

Network Policy Definition

Like all other objects in Kubernetes, network policy can be defined in a YAML file. Let’s look at an example (the same example will be used in Chapter 8):

Let’s look at the spec part of this YAML file since the other sections are somewhat self-explanatory. The spec has the following structure:

Here you can see that a network policy definition YAML file can logically be divided into four sections:

  • podSelector: This defines the pods selection. It identifies the pods to which the current network policy will be applied.

  • policyTypes: Specifies the type of policy rules: Ingress, Egress or both.

  • ingress: Defines the ingress policy rules for the target pods.

  • egress: Defines the egress policy rules for the target pods.

Next we’ll look at each section in more detail.

Selecting Target Pods

When you define a network policy, Kubernetes needs to know which pods you want this policy to act on. Similar to how service selects its backend pods, the network policy selects pods to which it will be applied based on labels:

Here, all pods that have the label app: webserver-dev are selected to be the target pods by the network policy. All of the following content in spec will apply to only the target pods.

Policy Types

The second section defines the policyTypes for the target pods:

PolicyTypes can either be ingress, egress, or both. And both types define specific traffic types in the form of one or more rules, as discussed next.

Policy Rules

The ingress and egress sections define the direction of traffic, from the selected target pods’ perspective. For example, consider the following simplified example:

Assuming the target pod is webserver-dev pod, and there is only one pod client1-dev in the cluster having a matching label client1-dev, two things will happen:

  1. The ingress direction: the pod webserver-dev can accept a TCP session with a destination port 80, initiated from pod client1-dev. This explains why we said Kubernetes network policy is flow-based instead of packet-based. The TCP connection could not be established if the policy would have been packet-based designed because on receiving the incoming TCP sync, the returning outgoing TCP sync-ack would have been rejected without a matching egress policy.

  2. The egress direction: pod webserver-dev can initiate a TCP session with a destination port 8080, towards pod client1-dev.

Tip

For the egress connection to go through, the other end needs to define an ingress policy to allow the incoming connection.

Network Policy Rules

Each from or to statement defines a rule in the network policy:

  • A from statement defines an ingress policy rule.

  • A to statement defines an egress policy rule

  • Both rules can optionally have ports statements, which will be discussed later.

So you can define multiple rules to allow complex traffic modes for each direction:

Each rule identifies the network endpoints where the target pods can communicate. Network endpoints can be identified by different methods:

  • ipBlock: Selects pods based on an IP address block.

  • namespaceSelector: Selects pods based on the label of the namespace.

  • podSelector: Selects pods based on label of the pod.

Note

The podSelector selects different things when it is used in different places of a YAML file. Previously (under spec) it selected pods that the network policy applies to, which we’ve called target pods. Here, in a rule (under from or to), it selects which pods the target pod is communicating with. Sometimes we call these pods peering pods, or endpoints.

So the YAML structure for a rule can look like this:

For example:

Here, the ingress network endpoints are subnet 10.169.25.20/32; or all pods in namespaces that have the label project: jtac; or pods which have the label app: client1-dev in current namespace (namespace of target pod), and the egress network point is pod dbserver-dev. We’ll come to the ports part soon.

AND versus OR

It’s also possible to specify only a few pods from namespaces, instead of communicating with all pods. In our example, podSelector is used all along, which assumes the same namespace as the target pod. Another method is to use podSelector along with a namespaceSelector. In that case, the namespaces that the pods belong to are those with matching labels with namespaceSelector, instead of the same as the target pod’s namespace.

For example, assuming that the target pod is webserver-dev and its namespace is dev, and only namespace qa has a label project=qa matching to the namespaceSelector:

Here, the target pod can only communicate with those pods that are in namespace qa, AND (not OR) with the label app: client1-qa.

Be careful here because it is totally different than the definition below, which allows the target pod to talk to those pods that are: in namespaces qa, OR (not AND) with label app: client1-qa in the target pod’s namespace dev:

Protocol and Ports

It is also possible to specify ports for an ingress and egress rule. The protocol type can also be specified along with a protocol port. For example:

The ports in ingress say that the target pods can allow incoming traffic for the specified ports and protocol. Ports in egress say that target pods can initiate traffic to specified ports and protocol. If port are not mentioned, all ports and protocols are allowed.

Line-By-Line Explanation

Let’s look at our example again in detail:

You should now know exactly what the network policy is trying to enforce.

Lines 1-3: pod webserver-dev is selected by the policy, so it is the target pod; all following policy rules will apply on it, and on it alone.

Lines 4-6: the policy will define rules for both Ingress and Egress traffic.

Lines 7-19: ingress: section defines the ingress policy.

Line 8: from: and line 17: ports, these two sections define one policy rule on ingress policy.

Lines 9-16: these eight lines under the from: section compose an ingress whitelist:

  • Lines 9-10: any incoming data with source IP being 10.169.25.20/32 can access the target pod webserver-dev.

  • Lines 11-13: any pods under namespace jtac can access target pod webserver-dev.

  • Lines 14-16: any pods with label client1-dev can access target pod webserver-dev.

Lines 17-19: ports section is second (and optional) part of the same policy rule. Only TCP port 80 (web service) on target pod webserver-dev is exposed and accessible. Access to all other ports will be denied.

Lines 20-26: egress: section defines the egress policy.

Lines 21: to: and line 24: ports, these two sections define one policy rule in egress policy.

  • Lines 21-24: these four lines under to: section compose an egress whitelist, here the target pod can send egress traffic to pod dbserver-dev.

Line 25: ports section is second part of the same policy rule. The target pod webserver-pod can only start TCP session with a destination port of 80 to other pods.

And that’s not all. If you remember at the beginning of this chapter, we talked about the Kubernetes default allow-any-any network model and the implicit deny-all, allow-all policies, you will realize that so far we just explained the explicit part of it (policy1 in our network policy introduction section). After that, there are two more implicit policies:

The deny all network policy: for the target pod webserver-dev, deny all other traffic that is other than what is explicitly allowed in the above whitelists, this implies at least two rules:

  • ingress: deny all incoming traffic destined to the target pod webserver-dev, other than what is defined in the ingress whitelist.

  • egress: deny all outgoing traffic sourcing from the target pod webserver-dev, other than what is defined in the egress whitelist.

An allow all network policy allows all traffic for other pods that are not target of this network policy, on both ingress and egress direction.

Note

In Chapter 8 we’ll take a more in depth look at these implicit network policies and their rules in Contrail implementation.

Create Network Policy

You can create and verify the network policy the same way that you create other Kubernetes objects:

In Chapter 8 we’ll set up a test environment to verify the effect of this network policy in more detail.

Liveness Probe

What happens if the application in the pod is running but it can’t serve its main purpose, for whatever reason? Also applications that run for a long time might transition to broken states, and if this is the case the last thing you want is a call reporting a problem in an application that could be easily fixed with restarting the pod. Liveness probes are a Kubernetes feature made specifically for this kind of situation. Liveness probes send a pre-defined request to the pod on a regular basis then restart the pod if the request fails. The most commonly used liveness probe is HTTP GET request, but it can also open the TCP socket or even issue a command.

Next is an HTTP GET request probe example, where the initialDelaySeconds is the waiting time before the first try to HTTP GET request to port 80, then it will run the probe every 20 seconds as specified in periodSeconds. If this fails the pod will restart automatically. You have the option to specify the path, which here is just the main website. Also you can send the probe with a customized header. Take a quick look:

Now let’s launch this pod then log in to it to terminate the process that handles the HTTP GET request:

You can see that the pod was automatically restarted, and you can also see the reason for that restart in the event:

This is a TCP socket probe example. A TCP socket probe is similar to the HTTP GET request probes, but it will open the TCP socket:

The command is like HTTP GET and TCP socket probes. But the probe will execute the command in the container:

Readiness Probe

A liveness probe makes sure that your pod is in good health, but for some applications it isn’t enough. Some applications need to load large files before starting. You might think if you set a higher initialDelaySeconds value then the problem is solved but this is not an efficient solution. The readiness probe is a solution especially for Kubernetes services, as the pod will not receive the traffic until it is ready. Whenever the readiness probe fails, the endpoint for the pod is removed from the service and it will be added back when the readiness probe succeeds. The readiness probe is configured in the same way as the liveness probe:

Note

It’s recommended to use both the readiness probe and the liveness probe whereby the liveness probe restarts the pod if it failed and the readiness probe makes sure the pod is ready before it gets traffic.

Probe Parameters

Probes have a number of parameters that you can use to more precisely control the behavior of liveness and readiness checks.

  • initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated.

  • periodSeconds: How often (in seconds) to perform the probe. Default is 10 seconds. Minimum value is 1.

  • timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.

  • successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.

  • failureThreshold: When a pod starts and the probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of a liveness probe means restarting the pod. In case of a readiness probe the pod will be marked Unready. Defaults to 3. Minimum value is 1.

And HTTP probes have additional parameters that can be set on httpGet:

  • host: The host name to connect to, which defaults to the pod IP. You probably want to set “Host” in httpHeaders instead.

  • scheme: The scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.

  • path: Path to access on the HTTP server.

  • httpHeaders: Custom headers to set in the request. HTTP allows repeated headers.

  • port: Name or number of the port to access on the container. Number must be in the range 1 to 65535.

Annotation

You have already seen how labels in Kubernetes are used for identifying, selecting, and organizing objects. But labels are just one way to attach metadata to Kubernetes objects.

Another way is annotations, which is a key/value map that attaches non-identifying metadata to objects. Annotation has a lot of use cases, such as attaching:

  • pointers for logging and analytics

  • phone numbers, directory entries, and web sites

  • timestamps, image hashes, and registry addresses

  • network, namespaces

  • and, types of ingress controller

Here’s an example for annotations:

Annotations can be used to assign network information to pods, and in Chapter 9, you’ll see how a Kubernetes annotation can instruct Juniper Contrail to attach an interface to a certain network. Cool.

Before seeing annotations in action, let’s first create a network with a minimum configuration based on the de facto Kubernetes network custom resource definition. NetworkAttachmentDefinition is used here to indicate the CNI as well as the parameters of the network to which we will attach to the interface pod:

The type, awesome-plugin, is the name of the CNI which could be Flannel, Calico, Contrail-K8s-cni, etc.

Create a pod and use annotations to attach its interface to a network called net-a:

Note

According to the official Kubernetes network custom resource definition, the annotation k8s.v1.cni.cncf.io/networks is used to represent NetworkAttachmentDefinition and has two formats:

Note

To maintain compatibility with existing Kubernetes deployments, all pods must attached to the cluster-wide default network, which means even if you have attached one pod interface to a specific network, this pod would have two interfaces: one attached to the cluster-wide default network, and the other attached to the network specified in the annotation argument (net-a in this case).