Many Kubernetes application are installed and managed using Operators now. If you touched Kubernetes you must heared about them. Maybe you have worked with them following different installation guides. But can you confidently explain what are Operators? Why do we need them? What are Custom Resource (CR), CustomResourceDefinition (CRD), Operator Lifecycle Manager (OLM), ClusterServiceVersion (CSV), CatalogSource, Subscriptions, InstallPlan, OperatorGroup and other concepts which are following Operators topic?

The goal of this article is to try to explain Operators, make you feel more confident when work with them and allow you to work with them more efficiently.

I will start from the beginning by explaining core concepts which you have to understand before going further to the Operators.

The article includes the following topics:
1. Kubernetes API, Resources and Controllers
2. CustomResourceDefinition (CRD), Custom Resource (CR), Operators
3. Operator Lifecycle Manager (OLM)

1. Kubernetes API, Resources and Controllers

One of the core components of Kubernetes is API Server. Kubernetes components like Scheduler, Controller Manager, Kubelet, kube-proxy use API Server to communicate in the cluster. The API Server also used by end users - for example, all commands run with kubectl are going to API Server.

When we call Kubernetes API we are working with Kubernetes Resources. A Resource in Kubernetes is a collection of API objects. For example, "Pods" - is a resource which collects all Pods deployed in the cluster. Other example of resources are Deployments, ReplicaSets, Jobs, etc.

You can check available resources in the cluster with:
kubectl api-resources

Controllers are control loops that watch the state of your cluster, then make or request changes where needed. They work in a non-terminating loop compare controled component's current state with desired state and adjust it if required. For example, ReplicaSet controller creates and controles the number of Pod replicas.

Kubernetes understands Pods and it has ReplicaSet controller, therefore, it can create a Pod and it can control the number of replicas deployed. But Pods are application agnostic, they have no idea neither which applications are run there nor how to manage (administrate) those applications.
For example, let's say I have a database Pod running. If I want a multi-pods database cluster instead of the single Pod I cannot just scale it up. I should understand the underline application and configure it correctly to be able to run it in a cluster configuration.
What if I decide to upgrade it? Again, updating image version can be not enough for some applications. Other tasks can include backup, restore, logging and metrics exposure, auto-scaling, auto-tuning, abnormally detection etc.

Here we come to a question: would not be good to have Kubernetes to do all those tasks for us!?
What if we had an option to "explain" Kubernetes our application architecture so it could know how to run it, scale it etc.? What if we could create a Custom Resourse (just like Pods) with its own Controller which would perform all administrator's activities for our application?
Good for us, we can do it in Kubernetes!

2. CustomResourceDefinition (CRD), Custom Resource (CR), Operators

In order to register a new custom resourse in Kubernetes we create a CustomResourceDefinition (CRD). CRD can be considered as a schema and it defines what can be configured for our custom resourse. In other words, it defines "spec:" section for our custom resource.

If Kubernetes didn't know about Pods and we would like to create a Pod like a custom resourse, in its CRD we would define such properties as affinity, containers, imagePullSecrets, serviceAccountName, tollerations etc.
(You can run kubectl explain po.spec to get a list of properties available for Pods specification.)

After we created a CRD, it's logically that we can create a Custom Resource (CR) itself following the spec defined in the CRD.

CustomResourceDefinition (CRD) with a Custom Resource (CR)

When we apply the CRD defined above a new resourse called CustomPod will be registerer in Kubernetes API Server.
The list of CRDs can be gotten with kubectl get crd

Applying a CRD in k8s and getting a list of CRDs

After that we can create a custom resourse (CustomPod in our case):

Applying Custom Resource definition and getting it using kubectl

Ok, looks good, but what has happened in the system when I created a CustomPod?

When I create a Kubernetes Pod and set image for its containers - the Kubernetes download image and run the Pod. But it looks like nothing has happened when I created my CustomPod.

That is true. Because we don't have a controller which would implement the specification we have provided for the CustomPod. And that controller called Operator in Kubernetes terms.

Operator is a custom controller running in a pod and it watches for a CR you are defining. In other words, Operator is an application that runs as a Deployment in Kubernetes and it can read a definition of a Custom Resource and apply the specification from that custom resource.

The Operator continue to monitor the state of the CR. For example, if you delete a pod, the Operator will recreate the deleted pod to be comply with CR manifest. Like what Kubernetes does for stateless application, the Operator does a controller work for stateful application. It can also perform more complex task like upgrading automatically, backing up, etc.

But how to create an Operator for our CustomPod?
The Operator can be developed using Helm, Ansible or Go. I am not going to provide an example of implementation in this article, it could be a topic for a future post. Operator Framework (an open source toolkit to manage) provides OperatorSDK which could be used to build an Operator (https://sdk.operatorframework.io/).
I terms of Operator capabilities there are five levels of maturity. Starting from Basic installation, to Seamless upgrades and finally to so-called "Auto Pilot". These levels are described on the link: https://sdk.operatorframework.io/docs/advanced-topics/operator-capabilities/operator-capabilities/

With Helm you can achieve up to second level (including). While Ansible and Go allows to develop fully capable Operators.

To put it all together,
CRD (CustomResourseDefinition) defines a schema for a k8s Custom Resourse and registers it on the API Server
CR (Custom Resourse) is an implementation of a CRD
Operator is a controller (special application) which watches for defined CRs and deploy and manage the Custom Resourse

CRD, Custom Resource and Operator

3. Operator Lifecycle Manager (OLM)

Operators can be installed in a cluster as Deployments (kubectl apply -f an-operator.yaml). But in order to simplify Operators installation and managing process there is a special toolkit called Operator Lifecycle Manager (OLM).

OLM extends Kubernetes to provide a declarative way to install, manage, and upgrade Operators on a cluster.
For example, OLM is installed by default in RH OpenShift and aids cluster administrators in installing, upgrading, and granting access to Operators running on their cluster.

OLM can be considered as a package manager (like YUM) but for Operators.
OLM registers the following CustomResourceDefinitions (CRDs) which are required for it to work:

  • ClusterServiceVersion (CSV)
  • CatalogSource
  • Subscription
  • InstallPlan
  • OperatorGroup
  1. ClusterServiceVersion (CSV): a primary metadata resource that describes the Operator. Think of it as a Linux package (like RPM).
    CSV is a YAML manifest that tells OLM everything it needs to know to run Operator, including its Deployment, image, RBAC rules, documentation etc.
    If the Operator deployment was deleted, OLM will recreate it to bring the cluster back to desired state based on the CSV.

Each CSV contains:

  • General metadata about the Operator: name, version, description, icon etc.
  • Operator installation info: description of created deployments and required permissions
  • CRDs: CRDs required by the Operator and CRDs it depends on
  • Annotations on the CRD fields: provides hints to users on how to properly specify values for the fields

You can get a list of CSVs using oc get csv. CSV is a namespaced resource, therefore, if you want to see all available CSVs in the cluster run: oc get csv -A. For instance, here is an output of CSVs installed by IBM Cloud Pak for Integration:

IBM Cloud Pak for Integration (CP4I) CSVs

Here you can see "Version" and "Replaces" fields. The CSV version is the same as the Operator’s, and a new CSV is generated when upgrading Operator versions. OLM can upgrade CSVs automatically and "Replaces" shows the name of the CSV being replaced by this CSV.

  1. CatalogSource is a repository of CSVs, CRDs, and packages that define an application.
    It contains information for accessing a repository of Operators and represents a repository of CSVs, CRDs, and packages that define an application.
    To make an Operator available in your cluster, you must add it to a catalog by deploying a CatalogSource resource. The catalog can be then queried to locate the Operator for installation via OLM.

For example, when installing IBM Cloud Pak for Integration the first step is to add a Catalog Source to OpenShift cluster. It can be added in Developer perspective of OpenShift web console. When installed IBM Catalog Source makes IBM's Operators (CSVs) available in OpenShift's OperatorHub.

IBM's catalog source installation in OpenShift

To list available CatalogSources run oc get CatalogSources -A:

get Catalog Sources available in the cluster

If we describe, for example, "ibm-operator-catalog" CatalogSource we will see that its specification defines an image, sourceType(grps) and updateStrategy:

a specification of a CatalogSource

grpc sourceType with an image means that OLM will pull the specified image and run a pod that has an api endpoint that can be queried for the metadata in the store.

Catalog sources can automatically check for new versions to keep up to date. UpdateStrategy defines how updated catalog source images can be discovered. Consists of an interval that defines polling duration and an embedded strategy type.

Within a CatalogSource, Operators are organized into packages and streams of updates called channels. For example, for ETCD operator the package name is etcd and there are a few channels available:

channels available for ETCD
  1. Subscription tells OLM which CatalogSource and its channel to use and whether to perform updates automatically or manually. End users create a subscription to install and subsequently update Operators that OLM provides. A subscription is made to a channel, which is a stream of Operator version, such as "stable" or "beta".

For example, subscriptions installed with IBM Cloud Pak for Integration are on the screenshot below:

IBM CP4I subscriptions

Lets describe IBM MQ subscription:
oc get subs ibm-mq-v1.5-ibm-operator-catalog-openshift-marketplace -n openshift-operators -o yaml
Its specification defines subscription to CatalogSource "ibm-operator-catalog" and channel "v1.5". It also tells OLM to perform automatic update of IBM MQ Operator.

IBM MQ subscription in OLM
  1. InstallPlan
    When a Subscription find a new version of Operator, the subscription create an appropriate InstallPlan. Users can also create an InstallPlan resource directly. InstallPlan describes the full list of resources that OLM will create to satisfy the CSV's resource requirements. The resources include CRD, SA, (Cluster)Role, (Cluster)RoleBinding. OLM watches for resolved InstallPlans and creates all of the discovered resources for it.

To list InstallPlans in the cluster run oc get installplan -A

InstallPlans available in an OpenShift cluster
  1. OperatorGroup
    Used to control Operator multitenancy or, in other words, to make an Operator to work in its own environment. Thus, an Operator belonging to an OperatorGroup will not react to custom resource changes in a namespace not indicated by the group. OperatorGroup configures all Operators deployed in the same namespace as the OperatorGroup object to watch for their Custom Resource (CR) in a list of namespaces or cluster-wide.

To list OperatorGroups run oc get operatorgroup -A

OperatorGroups available in a cluster

For example, let's describe ibm-common-services-operators OperatorGroup:
oc get operatorgroup ibm-common-services-operators -n ibm-common-services -o yaml
Its specification defines a list of targetNamespaces:

A specification of an OperatorGroup

In this example, the operator will be scoped to "ibm-common-services" namespace.

An Operator is considered a member of an OperatorGroup if the following conditions are true:
- The Operator’s CSV exists in the same namespace as the OperatorGroup.
- The Operator’s CSV’s InstallModes support the set of namespaces targeted by the OperatorGroup.

To resume the OLM concept:
OLM is a special toolkit which helps to install and manage Operators in a k8s cluster. OLM is installed by default in RH OpenShift. OLM provides 5 CRDs which are required for it to work.

  • ClusterServiceVersion (CSV): a manifest which describes an Operator to OLM
  • CatalogSource: a repository of CSVs
  • Subscription: subscribes OLM to a CatalogSource and its channels for automation upgrades of Operators
  • InstallPlan: contains a list of resources required to be created by OLM in order to update an Operator
  • OperatorGroup: scopes Operators to particular namespaces