Home
Products
OpenShift Container Platform
4.18
Scalability and performance
Chapter 11. Using CPU Manager and Topology Manager

Chapter 11. Using CPU Manager and Topology Manager

CPU Manager manages groups of CPUs and constrains workloads to specific CPUs.

CPU Manager is useful for workloads that have some of these attributes:

Require as much CPU time as possible.
Are sensitive to processor cache misses.
Are low-latency network applications.
Coordinate with other processes and benefit from sharing a single processor cache.

Topology Manager collects hints from the CPU Manager, Device Manager, and other Hint Providers to align pod resources, such as CPU, SR-IOV VFs, and other device resources, for all Quality of Service (QoS) classes on the same non-uniform memory access (NUMA) node.

Topology Manager uses topology information from the collected hints to decide if a pod can be accepted or rejected on a node, based on the configured Topology Manager policy and pod resources requested.

Topology Manager is useful for workloads that use hardware accelerators to support latency-critical execution and high throughput parallel computation.

To use Topology Manager you must configure CPU Manager with the static policy.

11.1. Setting up CPU Manager

To configure CPU manager, create a KubeletConfig custom resource (CR) and apply it to the desired set of nodes.

Procedure

Label a node by running the following command:

Copy to Clipboard Toggle word wrap

oc label node perf-node.example.com cpumanager=true

# oc label node perf-node.example.com cpumanager=true

To enable CPU Manager for all compute nodes, edit the CR by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
oc edit machineconfigpool worker
```
```
# oc edit machineconfigpool worker
```

Add the custom-kubelet: cpumanager-enabled label to metadata.labels section.

Copy to Clipboard Toggle word wrap

metadata:
  creationTimestamp: 2020-xx-xxx
  generation: 3
  labels:
    custom-kubelet: cpumanager-enabled

metadata:
  creationTimestamp: 2020-xx-xxx
  generation: 3
  labels:
    custom-kubelet: cpumanager-enabled

Create a KubeletConfig, cpumanager-kubeletconfig.yaml, custom resource (CR). Refer to the label created in the previous step to have the correct nodes updated with the new kubelet config. See the machineConfigPoolSelector section:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static 
     cpuManagerReconcilePeriod: 5s 
```
```
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static 
```
1
```
     cpuManagerReconcilePeriod: 5s 
```
2
1
Specify a policy:
none. This policy explicitly enables the existing default CPU affinity scheme, providing no affinity beyond what the scheduler does automatically. This is the default policy.
static. This policy allows containers in guaranteed pods with integer CPU requests. It also limits access to exclusive CPUs on the node. If static, you must use a lowercase s.
2
Optional. Specify the CPU Manager reconcile frequency. The default is 5s.
Create the dynamic kubelet config by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
oc create -f cpumanager-kubeletconfig.yaml
```
```
# oc create -f cpumanager-kubeletconfig.yaml
```
This adds the CPU Manager feature to the kubelet config and, if needed, the Machine Config Operator (MCO) reboots the node. To enable CPU Manager, a reboot is not needed.

Check for the merged kubelet config by running the following command:

Copy to Clipboard Toggle word wrap

oc get machineconfig 99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet -o json | grep ownerReference -A7

# oc get machineconfig 99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet -o json | grep ownerReference -A7

Example output

Copy to Clipboard Toggle word wrap

       "ownerReferences": [
            {
                "apiVersion": "machineconfiguration.openshift.io/v1",
                "kind": "KubeletConfig",
                "name": "cpumanager-enabled",
                "uid": "7ed5616d-6b72-11e9-aae1-021e1ce18878"
            }
        ]

       "ownerReferences": [
            {
                "apiVersion": "machineconfiguration.openshift.io/v1",
                "kind": "KubeletConfig",
                "name": "cpumanager-enabled",
                "uid": "7ed5616d-6b72-11e9-aae1-021e1ce18878"
            }
        ]

Check the compute node for the updated kubelet.conf file by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
oc debug node/perf-node.example.com
```
```
# oc debug node/perf-node.example.com
sh-4.2# cat /host/etc/kubernetes/kubelet.conf | grep cpuManager
```
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
cpuManagerPolicy: static        
cpuManagerReconcilePeriod: 5s   
```
```
cpuManagerPolicy: static        
```
1
```
cpuManagerReconcilePeriod: 5s   
```
2
1
cpuManagerPolicy is defined when you create the KubeletConfig CR.
2
cpuManagerReconcilePeriod is defined when you create the KubeletConfig CR.
Create a project by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
oc new-project <project_name>
```
```
$ oc new-project <project_name>
```

Create a pod that requests a core or multiple cores. Both limits and requests must have their CPU value set to a whole integer. That is the number of cores that will be dedicated to this pod:

Copy to Clipboard Toggle word wrap

cat cpumanager-pod.yaml

# cat cpumanager-pod.yaml

Example output

Copy to Clipboard Toggle word wrap

apiVersion: v1
kind: Pod
metadata:
  generateName: cpumanager-
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: cpumanager
    image: gcr.io/google_containers/pause:3.2
    resources:
      requests:
        cpu: 1
        memory: "1G"
      limits:
        cpu: 1
        memory: "1G"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
  nodeSelector:
    cpumanager: "true"

apiVersion: v1
kind: Pod
metadata:
  generateName: cpumanager-
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: cpumanager
    image: gcr.io/google_containers/pause:3.2
    resources:
      requests:
        cpu: 1
        memory: "1G"
      limits:
        cpu: 1
        memory: "1G"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
  nodeSelector:
    cpumanager: "true"

Create the pod:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
oc create -f cpumanager-pod.yaml
```
```
# oc create -f cpumanager-pod.yaml
```

Verification

Verify that the pod is scheduled to the node that you labeled by running the following command:

Copy to Clipboard Toggle word wrap

oc describe pod cpumanager

# oc describe pod cpumanager

Example output

Copy to Clipboard Toggle word wrap

Name:               cpumanager-6cqz7
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:  perf-node.example.com/xxx.xx.xx.xxx
...
 Limits:
      cpu:     1
      memory:  1G
    Requests:
      cpu:        1
      memory:     1G
...
QoS Class:       Guaranteed
Node-Selectors:  cpumanager=true

Name:               cpumanager-6cqz7
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:  perf-node.example.com/xxx.xx.xx.xxx
...
 Limits:
      cpu:     1
      memory:  1G
    Requests:
      cpu:        1
      memory:     1G
...
QoS Class:       Guaranteed
Node-Selectors:  cpumanager=true

Verify that a CPU has been exclusively assigned to the pod by running the following command:

Copy to Clipboard Toggle word wrap

oc describe node --selector='cpumanager=true' | grep -i cpumanager- -B2

# oc describe node --selector='cpumanager=true' | grep -i cpumanager- -B2

Example output

Copy to Clipboard Toggle word wrap

NAMESPACE    NAME                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
cpuman       cpumanager-mlrrz    1 (28%)       1 (28%)     1G (13%)         1G (13%)       27m

NAMESPACE    NAME                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
cpuman       cpumanager-mlrrz    1 (28%)       1 (28%)     1G (13%)         1G (13%)       27m

Verify that the cgroups are set up correctly. Get the process ID (PID) of the pause process by running the following commands:

Copy to Clipboard Toggle word wrap

oc debug node/perf-node.example.com

# oc debug node/perf-node.example.com

Copy to Clipboard Toggle word wrap

systemctl status | grep -B5 pause

sh-4.2# systemctl status | grep -B5 pause

Note

If the output returns multiple pause process entries, you must identify the correct pause process.

Example output

Copy to Clipboard Toggle word wrap

├─init.scope

# ├─init.scope
│ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 17
└─kubepods.slice
  ├─kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice
  │ ├─crio-b5437308f1a574c542bdf08563b865c0345c8f8c0b0a655612c.scope
  │ └─32706 /pause

Verify that pods of quality of service (QoS) tier Guaranteed are placed within the kubepods.slice subdirectory by running the following commands:

Copy to Clipboard Toggle word wrap

cd /sys/fs/cgroup/kubepods.slice/kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice/crio-b5437308f1ad1a7db0574c542bdf08563b865c0345c86e9585f8c0b0a655612c.scope

# cd /sys/fs/cgroup/kubepods.slice/kubepods-pod69c01f8e_6b74_11e9_ac0f_0a2b62178a22.slice/crio-b5437308f1ad1a7db0574c542bdf08563b865c0345c86e9585f8c0b0a655612c.scope

Copy to Clipboard Toggle word wrap

for i in `ls cpuset.cpus cgroup.procs` ; do echo -n "$i "; cat $i ; done

# for i in `ls cpuset.cpus cgroup.procs` ; do echo -n "$i "; cat $i ; done

Note

Pods of other QoS tiers end up in child cgroups of the parent kubepods.

Example output

Copy to Clipboard Toggle word wrap

cpuset.cpus 1
tasks 32706

cpuset.cpus 1
tasks 32706

Check the allowed CPU list for the task by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
grep ^Cpus_allowed_list /proc/32706/status
```
```
# grep ^Cpus_allowed_list /proc/32706/status
```
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
```
 Cpus_allowed_list:    1
```
```
 Cpus_allowed_list:    1
```

Verify that another pod on the system cannot run on the core allocated for the Guaranteed pod. For example, to verify the pod in the besteffort QoS tier, run the following commands:

Copy to Clipboard Toggle word wrap

cat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc494a073_6b77_11e9_98c0_06bba5c387ea.slice/crio-c56982f57b75a2420947f0afc6cafe7534c5734efc34157525fa9abbf99e3849.scope/cpuset.cpus

# cat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc494a073_6b77_11e9_98c0_06bba5c387ea.slice/crio-c56982f57b75a2420947f0afc6cafe7534c5734efc34157525fa9abbf99e3849.scope/cpuset.cpus

Copy to Clipboard Toggle word wrap

oc describe node perf-node.example.com

# oc describe node perf-node.example.com

Example output

Copy to Clipboard Toggle word wrap

...
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 ephemeral-storage:           124768236Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      8162900Ki
 pods:                        250
Allocatable:
 attachable-volumes-aws-ebs:  39
 cpu:                         1500m
 ephemeral-storage:           124768236Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      7548500Ki
 pods:                        250
-------                               ----                           ------------  ----------  ---------------  -------------  ---
  default                                 cpumanager-6cqz7               1 (66%)       1 (66%)     1G (12%)         1G (12%)       29m

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests          Limits
  --------                    --------          ------
  cpu                         1440m (96%)       1 (66%)

...
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 ephemeral-storage:           124768236Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      8162900Ki
 pods:                        250
Allocatable:
 attachable-volumes-aws-ebs:  39
 cpu:                         1500m
 ephemeral-storage:           124768236Ki
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      7548500Ki
 pods:                        250
-------                               ----                           ------------  ----------  ---------------  -------------  ---
  default                                 cpumanager-6cqz7               1 (66%)       1 (66%)     1G (12%)         1G (12%)       29m

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests          Limits
  --------                    --------          ------
  cpu                         1440m (96%)       1 (66%)

This VM has two CPU cores. The system-reserved setting reserves 500 millicores, meaning that half of one core is subtracted from the total capacity of the node to arrive at the Node Allocatable amount. You can see that Allocatable CPU is 1500 millicores. This means you can run one of the CPU Manager pods since each will take one whole core. A whole core is equivalent to 1000 millicores. If you try to schedule a second pod, the system will accept the pod, but it will never be scheduled:

Copy to Clipboard Toggle word wrap

NAME                    READY   STATUS    RESTARTS   AGE
cpumanager-6cqz7        1/1     Running   0          33m
cpumanager-7qc2t        0/1     Pending   0          11s

NAME                    READY   STATUS    RESTARTS   AGE
cpumanager-6cqz7        1/1     Running   0          33m
cpumanager-7qc2t        0/1     Pending   0          11s

11.2. Topology Manager policies

Topology Manager aligns Pod resources of all Quality of Service (QoS) classes by collecting topology hints from Hint Providers, such as CPU Manager and Device Manager, and using the collected hints to align the Pod resources.

Topology Manager supports four allocation policies, which you assign in the KubeletConfig custom resource (CR) named cpumanager-enabled:

none policy: This is the default policy and does not perform any topology alignment.
best-effort policy: For each container in a pod with the best-effort topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager stores this and admits the pod to the node.
restricted policy: For each container in a pod with the restricted topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager rejects this pod from the node, resulting in a pod in a Terminated state with a pod admission failure.
single-numa-node policy: For each container in a pod with the single-numa-node topology management policy, kubelet calls each Hint Provider to discover their resource availability. Using this information, the Topology Manager determines if a single NUMA Node affinity is possible. If it is, the pod is admitted to the node. If a single NUMA Node affinity is not possible, the Topology Manager rejects the pod from the node. This results in a pod in a Terminated state with a pod admission failure.

11.3. Setting up Topology Manager

To use Topology Manager, you must configure an allocation policy in the KubeletConfig custom resource (CR) named cpumanager-enabled. This file might exist if you have set up CPU Manager. If the file does not exist, you can create the file.

Prerequisites

Configure the CPU Manager policy to be static.

Procedure

To activate Topology Manager:

Configure the Topology Manager allocation policy in the custom resource.

Copy to Clipboard Toggle word wrap

oc edit KubeletConfig cpumanager-enabled

$ oc edit KubeletConfig cpumanager-enabled

Copy to Clipboard Toggle word wrap

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static 
     cpuManagerReconcilePeriod: 5s
     topologyManagerPolicy: single-numa-node

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static


     cpuManagerReconcilePeriod: 5s
     topologyManagerPolicy: single-numa-node

1: This parameter must be static with a lowercase s.
2: Specify your selected Topology Manager allocation policy. Here, the policy is single-numa-node. Acceptable values are: default, best-effort, restricted, single-numa-node.

11.4. Pod interactions with Topology Manager policies

The example Pod specs below help illustrate pod interactions with Topology Manager.

The following pod runs in the BestEffort QoS class because no resource requests or limits are specified.

Copy to Clipboard Toggle word wrap

spec:
  containers:
  - name: nginx
    image: nginx

spec:
  containers:
  - name: nginx
    image: nginx

The next pod runs in the Burstable QoS class because requests are less than limits.

Copy to Clipboard Toggle word wrap

spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"

spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"

If the selected policy is anything other than none, Topology Manager would not consider either of these Pod specifications.

The last example pod below runs in the Guaranteed QoS class because requests are equal to limits.

Copy to Clipboard Toggle word wrap

spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"
      requests:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"

spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"
      requests:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"

Topology Manager would consider this pod. The Topology Manager would consult the hint providers, which are CPU Manager and Device Manager, to get topology hints for the pod.

Topology Manager will use this information to store the best topology for this container. In the case of this pod, CPU Manager and Device Manager will use this stored information at the resource allocation stage.

Chapter 11. Using CPU Manager and Topology Manager

11.1. Setting up CPU Manager

11.2. Topology Manager policies

11.3. Setting up Topology Manager

11.4. Pod interactions with Topology Manager policies

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links