Kubernetes scheduling

From bkvaluemeal
Jump to navigation Jump to search

No scheduler

If no scheduler is found within the cluster, pods may be manually allocated to nodes, but only at the time of creation. This can be achieved either by a Binding, or by defining spec/nodeName.

Example
binding.yaml pod.yaml
---
apiVersion: v1
kind: Binding
metadata:
  name: nginx
target:
  apiVersion: v1
  kind: Node
  name: node01
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
  nodeName: node01
If using spec/nodeName
kubectl replace --force -f pod.yaml

or if using a Binding

kubectl apply -f binding.yaml

Taints and tolerations

Workloads can be cordoned to either any-but or only-on a node, or group of nodes, with a taint effect such as

  • NoSchedule — no new pods will be scheduled unless they have a matching toleration, but existing pods are unaffected
  • PreferNoSchedule — like above, but the scheduler will make best effort unless there is no other option
  • NoExecute — both new pods and existing pods will be evicted from the node unless they have a matching toleration
    • Pods that do not tolerate the taint are evicted immediately
    • Pods that tolerate the taint without specifying spec/tolerationSeconds remain
    • Pods that tolerate the taint and also define spec/tolerationSeconds are evicted after the elapsed time
Example
pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
    class: frontend
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
  tolerations:
  - key: class
    value: frontend
    operator: Equal
kubectl taint node01 class=frontend:NoSchedule
kubectl apply -f pod.yaml

Node selectors and affinities

Labeling nodes and declaring either node selectors or affinities allows scheduling pods on specific nodes or groups of nodes. Node selectors permit only one constraint, but affinities and anti-affinities allow for complex scheduling parameters.

Example
selector.yaml affinity.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
    class: frontend
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
  nodeSelector:
    size: medium
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
    class: frontend
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
  affinity:
    nodeAffinity:
      requiredDuringSchedulingRequiredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: size
            operator: NotIn
            values:
            - small
      preferedDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: size
            operator: In
            values:
            - medium
kubectl label nodes node01 size=small
kubectl label nodes node02 size=medium
kubectl label nodes node03 size=large

Static pods

The Kubelet supports both a CLI argument of --pod-manifest-path and a config parameter staticPodPath for defining critical workloads. These workloads cannot be managed via normal means and rely on the manifest definition found in that node's directory.

Often times, the Kubelet's configuration file is located at /var/lib/kubelet/config.yaml, but this will not necessarily always be true as the parameter is configurable. If in doubt, the CLI arguments passed to the Kubelet will always tell the true location of this file.

When running workloads in this way, the Kubelet can only schedule pods. These pods names are implicitly suffixed with the name of the node and are further distinguished by the object's metadata/ownerReferences having a kind=Node.

---
metadata:
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Node
    name: node01
    uid: 00000000-0000-0000-0000-000000000000