Node Selection and Affinity Rules

How the Scheduler Works

Every Pod that doesn’t have a nodeName set goes through the Kubernetes scheduler. The scheduler’s job is to find the best node for the Pod, and it does this in two distinct phases.

Filtering (also called predicates) eliminates nodes that cannot run the Pod. Reasons a node might be filtered out include: insufficient CPU or memory to satisfy the Pod’s resource requests, the node has a taint the Pod doesn’t tolerate, the Pod’s nodeSelector labels don’t match, a required node affinity rule excludes the node, or the node’s disk pressure or other conditions make it unready. After filtering, the remaining nodes are called feasible.

Scoring (also called priorities) ranks the feasible nodes. Each node receives a score based on factors like resource balance (prefer nodes with more available resources), pod affinity preferences (prefer nodes near related Pods), and anti-affinity preferences (avoid nodes with conflicting Pods). Preferred affinity rules with weights contribute to the score — a preference with weight 100 has more influence than one with weight 10. The node with the highest aggregate score is selected.

Binding assigns the Pod to the winning node by setting the Pod’s .spec.nodeName field.

The following diagram illustrates this three-stage process:

Kubernetes scheduler decision flow showing filtering, scoring, and binding phases

The diagram shows the scheduler receiving an unscheduled Pod, running it through filtering to eliminate ineligible nodes, scoring the remaining feasible nodes, and binding the Pod to the highest-scoring node. Filtering is a hard gate — nodes either pass or are eliminated entirely. Scoring is a soft preference — every feasible node receives a numeric score, and the highest score wins. If no nodes survive filtering, the Pod stays in Pending state until conditions change.

Understanding this two-phase model is critical because every scheduling API in Kubernetes maps to one of these phases. nodeSelector and requiredDuringScheduling rules are filters. preferredDuringScheduling rules are scoring inputs. Knowing which phase a rule affects tells you whether it can cause a Pod to stay Pending (filtering) or whether the scheduler will find an alternative (scoring).

nodeName: Direct Assignment

The most direct way to place a Pod is to set .spec.nodeName explicitly:

apiVersion: v1
kind: Pod
metadata:
  name: pinned-pod
spec:
  nodeName: worker-2
  containers:
    - name: app
      image: nginx:1.25

This bypasses the scheduler entirely. The Pod is assigned to worker-2 without filtering or scoring. If worker-2 doesn’t exist, has insufficient resources, or has taints, the Pod still attempts to run there — and fails.

Because nodeName bypasses all scheduling logic, it’s rarely used in production. It breaks high availability (the Pod is tied to a single node that might go down), ignores resource constraints (the scheduler’s filtering is skipped), and hardcodes infrastructure details into workload definitions. On the CKAD, you should know it exists but prefer nodeSelector or nodeAffinity for placement control.

nodeSelector: Simple Label Matching

nodeSelector is the standard way to constrain a Pod to nodes with specific labels. It’s a map of key-value pairs — the Pod is scheduled only on nodes whose labels include all the specified pairs.

First, label a node:

kubectl label node worker-1 disk=ssd

Verify the label:

kubectl get nodes --show-labels | grep disk

Now create a Pod that requires SSD storage:

apiVersion: v1
kind: Pod
metadata:
  name: ssd-app
spec:
  nodeSelector:
    disk: ssd
  containers:
    - name: app
      image: nginx:1.25

The scheduler filters out any node that doesn’t have the label disk=ssd. If worker-1 is the only node with that label, ssd-app will always land there. If no node has the label, the Pod stays Pending.

kubectl apply -f ssd-app.yaml
kubectl get pod ssd-app -o wide

NAME      READY   STATUS    RESTARTS   AGE   IP          NODE
ssd-app   1/1     Running   0          5s    10.42.1.5   worker-1

Multiple labels work as a logical AND:

nodeSelector:
  disk: ssd
  region: us-east

The Pod is scheduled only on nodes that have both disk=ssd and region=us-east. There is no way to express OR logic, negative matching, or preferences with nodeSelector — for those, you need nodeAffinity.

Built-in Node Labels

Kubernetes automatically applies several labels to every node:

Label	Example Value	Description
`kubernetes.io/hostname`	`worker-1`	Node hostname
`kubernetes.io/os`	`linux`	Operating system
`kubernetes.io/arch`	`amd64`	CPU architecture
`topology.kubernetes.io/zone`	`us-east-1a`	Cloud availability zone
`topology.kubernetes.io/region`	`us-east-1`	Cloud region

You can use these in nodeSelector without manually labeling nodes:

nodeSelector:
  kubernetes.io/arch: amd64

nodeAffinity: Expressive Rules

nodeAffinity extends nodeSelector with operators, multiple expressions, and the ability to specify both hard requirements and soft preferences.

Required Affinity (Hard Rule)

requiredDuringSchedulingIgnoredDuringExecution is a filtering constraint. If no node matches, the Pod is not scheduled.

apiVersion: v1
kind: Pod
metadata:
  name: zone-restricted
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                  - us-east-1a
                  - us-east-1b
  containers:
    - name: app
      image: nginx:1.25

This Pod runs only on nodes in zone us-east-1a or us-east-1b. The In operator checks if the node label’s value is in the provided list.

The name IgnoredDuringExecution means that if a node’s labels change after the Pod is running (removing the matching label), the Pod is not evicted. The rule is enforced only at scheduling time.

Available operators:

Operator	Behavior
`In`	Label value is one of the listed values
`NotIn`	Label value is not in the listed values
`Exists`	Label key exists (value doesn’t matter)
`DoesNotExist`	Label key does not exist
`Gt`	Label value is greater than (numeric comparison)
`Lt`	Label value is less than (numeric comparison)

Multiple matchExpressions within a single nodeSelectorTerm are ANDed:

nodeSelectorTerms:
  - matchExpressions:
      - key: disk
        operator: In
        values: ["ssd"]
      - key: region
        operator: In
        values: ["us-east"]

Both conditions must be true for the node to pass. Multiple nodeSelectorTerms at the top level are ORed — the node must match at least one term.

Preferred Affinity (Soft Rule)

preferredDuringSchedulingIgnoredDuringExecution is a scoring preference. It influences node ranking without eliminating nodes:

apiVersion: v1
kind: Pod
metadata:
  name: prefer-ssd
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 80
          preference:
            matchExpressions:
              - key: disk
                operator: In
                values:
                  - ssd
        - weight: 20
          preference:
            matchExpressions:
              - key: region
                operator: In
                values:
                  - us-east
  containers:
    - name: app
      image: nginx:1.25

Each preference has a weight from 1 to 100. The scheduler adds these weights to the node’s score during the scoring phase. A node with disk=ssd and region=us-east gets a score boost of 100 (80 + 20). A node with only disk=ssd gets 80. A node with neither still qualifies — it’s scored lower, not eliminated.

Combining Required and Preferred

In practice, you often combine both:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/os
              operator: In
              values: ["linux"]
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 50
        preference:
          matchExpressions:
            - key: disk
              operator: In
              values: ["ssd"]

This says: “must run on Linux nodes (hard requirement), and prefer SSD nodes if available (soft preference).” The Pod will never land on a Windows node, but if no SSD nodes are available, it still schedules on a Linux node with spinning disks.

podAffinity: Schedule Near Other Pods

Node affinity selects nodes based on node labels. Pod affinity selects nodes based on which other Pods are already running there. The question changes from “what kind of node do I want?” to “which Pods do I want to be near?”

apiVersion: v1
kind: Pod
metadata:
  name: frontend
  labels:
    app: frontend
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - api
          topologyKey: kubernetes.io/hostname
  containers:
    - name: web
      image: nginx:1.25

This Pod will only be scheduled on a node where a Pod with label app=api is already running. The topologyKey: kubernetes.io/hostname means “same node” — the topology domain is individual hosts.

If no node is running an app=api Pod, the frontend Pod stays Pending.

topologyKey Explained

The topologyKey field defines the topology domain for affinity calculations. It’s a node label key that groups nodes into domains:

topologyKey	Domain	Meaning
`kubernetes.io/hostname`	Individual node	Same node
`topology.kubernetes.io/zone`	Availability zone	Same zone (e.g., `us-east-1a`)
`topology.kubernetes.io/region`	Region	Same region (e.g., `us-east-1`)

With topologyKey: topology.kubernetes.io/zone, the Pod is scheduled in the same zone as matching Pods — not necessarily the same node, but a node in the same availability zone.

podAntiAffinity: Schedule Away from Other Pods

Pod anti-affinity is the inverse: ensure that Pods are not co-located. The most common use case is spreading replicas of the same application across nodes to improve availability.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - api
              topologyKey: kubernetes.io/hostname
      containers:
        - name: api
          image: api-server:2.0

Each replica of api-server is placed on a different node. If the cluster has only two worker nodes, the third replica stays Pending — there’s no node without an existing app=api Pod.

Using preferredDuringSchedulingIgnoredDuringExecution instead of requiredDuringScheduling relaxes this to a best-effort spread:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - api
          topologyKey: kubernetes.io/hostname

Now the scheduler tries to spread replicas across nodes but will place multiple replicas on the same node if necessary. This avoids Pending Pods in small clusters while still achieving distribution when possible.

nodeSelector vs nodeAffinity: When to Use Which

Capability	nodeSelector	nodeAffinity
Simple key=value matching	Yes	Yes
In / NotIn operators	No	Yes
Exists / DoesNotExist	No	Yes
Gt / Lt (numeric)	No	Yes
Soft preferences (weights)	No	Yes
OR logic between terms	No	Yes

Use nodeSelector when a single label match is sufficient — it’s less YAML and harder to misconfigure. Use nodeAffinity when you need multiple conditions, negation, or soft preferences. On the CKAD, the question usually specifies which to use; if it doesn’t, nodeSelector is faster to type.