Pod and Container Security Context

A securityContext is a set of fields in the Pod or container spec that instruct the container runtime how to run the process. These fields control the user and group IDs, filesystem permissions, Linux capabilities, and privilege escalation behavior. Getting them right is the difference between a container that follows the principle of least privilege and one that hands an attacker a root shell on your cluster.

Pod-Level vs Container-Level

SecurityContext exists at two levels:

apiVersion: v1
kind: Pod
metadata:
  name: security-demo
spec:
  securityContext:          # Pod-level — applies to ALL containers
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
    - name: app
      image: busybox
      securityContext:      # Container-level — applies to THIS container
        runAsUser: 2000
        allowPrivilegeEscalation: false

Override rule: when the same field is set at both levels, the container-level value wins. In the example above, the app container runs as UID 2000 (container-level), not UID 1000 (Pod-level). The runAsGroup and fsGroup from the Pod-level still apply because the container does not override them.

Fields that exist only at the Pod level: fsGroup, supplementalGroups, sysctls. Fields that exist only at the container level: capabilities, readOnlyRootFilesystem, allowPrivilegeEscalation. Fields that exist at both levels: runAsUser, runAsNonRoot, runAsGroup, seLinuxOptions, seccompProfile.

runAsUser

Specifies the UID (user ID) the container process runs as:

securityContext:
  runAsUser: 1000

The process inside the container runs as UID 1000. If the container image defines a USER instruction in its Dockerfile, runAsUser overrides it. This is powerful — you can enforce non-root execution on any image, even one that defaults to root.

kubectl exec security-demo -- whoami
# Output depends on whether UID 1000 has a name in /etc/passwd
# If not mapped: "whoami: unknown uid 1000"

kubectl exec security-demo -- id
# uid=1000 gid=3000 groups=2000

runAsNonRoot

A boolean guard that rejects the container if it would run as root:

securityContext:
  runAsNonRoot: true

If the container image specifies USER root (or does not specify a USER, defaulting to root), and no runAsUser is set to override it, the kubelet refuses to start the container. The Pod enters a CreateContainerConfigError state with the message container has runAsNonRoot and image will run as root.

This is a safety net. Set runAsNonRoot: true at the Pod level, then set an explicit runAsUser at the container level. If someone later changes the image to one that defaults to root, the guard catches it.

spec:
  securityContext:
    runAsNonRoot: true
  containers:
    - name: app
      image: nginx:1.25
      securityContext:
        runAsUser: 101  # nginx user in the nginx image

runAsGroup

Specifies the primary GID (group ID) for the container process:

securityContext:
  runAsGroup: 3000

All processes in the container run with GID 3000 as their primary group. This affects file creation — new files get group ownership 3000.

kubectl exec security-demo -- id
# uid=1000 gid=3000 groups=2000

fsGroup

A Pod-level field that sets a supplemental group for all volumes:

spec:
  securityContext:
    fsGroup: 2000

When fsGroup is set, Kubernetes:

Adds GID 2000 to the supplemental groups of every container in the Pod.
Changes the group ownership of all files in mounted volumes to GID 2000.
Sets the setgid bit on volume directories, so new files inherit the group.

This is essential when a non-root container needs to write to a PersistentVolume. Without fsGroup, a volume might be owned by root, and UID 1000 cannot write to it.

apiVersion: v1
kind: Pod
metadata:
  name: fsgroup-demo
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 2000
  volumes:
    - name: data
      emptyDir: {}
  containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "ls -la /data && touch /data/test && ls -la /data/test && sleep 3600"]
      volumeMounts:
        - name: data
          mountPath: /data

kubectl logs fsgroup-demo
# drwxrwsrwx 2 root 2000 ... /data
# -rw-r--r-- 1 1000 2000 ... /data/test

The s in drwxrwsrwx is the setgid bit. The file test is owned by UID 1000 (the user) with GID 2000 (the fsGroup).

readOnlyRootFilesystem

Mounts the container’s root filesystem as read-only:

securityContext:
  readOnlyRootFilesystem: true

With this setting, any attempt to write to the root filesystem fails:

kubectl exec security-demo -- touch /test
# touch: /test: Read-only file system

Applications that need to write temporary files (logs, caches, PID files) must use emptyDir volumes mounted at the appropriate paths:

containers:
  - name: app
    image: nginx:1.25
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
      - name: tmp
        mountPath: /tmp
      - name: cache
        mountPath: /var/cache/nginx
      - name: run
        mountPath: /var/run
volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}
  - name: run
    emptyDir: {}

This pattern is common in production: a read-only root filesystem with writable emptyDir volumes for specific paths. It prevents an attacker from modifying binaries, installing tools, or writing backdoors to the container filesystem.

allowPrivilegeEscalation

Controls whether a process can gain more privileges than its parent:

securityContext:
  allowPrivilegeEscalation: false

When set to false, the no_new_privs flag is applied to the container process. This prevents setuid binaries (like sudo, su, or ping) from granting elevated privileges. The container process — and all its child processes — cannot escalate beyond the privileges it started with.

Always set this to false unless the application explicitly requires setuid behavior. Most application containers have no legitimate reason to escalate privileges.

Linux Capabilities

Linux capabilities divide root’s monolithic power into discrete units. Instead of granting full root access, you grant specific capabilities:

Capability	Allows
`NET_BIND_SERVICE`	Bind to ports below 1024
`SYS_TIME`	Modify system clock
`NET_RAW`	Use raw sockets (ping)
`SYS_PTRACE`	Trace processes (debugging)
`CHOWN`	Change file ownership

The security best practice is to drop all capabilities and add back only what the application needs:

securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_BIND_SERVICE

This container can bind to port 80 (below 1024) but cannot change file ownership, modify the system clock, use raw sockets, or perform any other privileged operation.

To see which capabilities a running container has:

kubectl exec security-demo -- cat /proc/1/status | grep -i cap
# CapPrm: 0000000000000400
# CapEff: 0000000000000400

The hex values map to specific capability sets. For exam purposes, you need to know the YAML syntax for dropping and adding capabilities, not the hex decoding.

A container without drop: ALL retains a default set of capabilities that varies by runtime. The Kubernetes documentation lists the default set — it includes CHOWN, DAC_OVERRIDE, FOWNER, FSETID, KILL, SETGID, SETUID, NET_BIND_SERVICE, and several others. Dropping all of them and adding back selectively is the safest approach.

Complete Locked-Down Pod

This YAML represents a production-hardened container with every security field set to its most restrictive value:

apiVersion: v1
kind: Pod
metadata:
  name: hardened-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 10000
    runAsGroup: 10000
    fsGroup: 10000
  volumes:
    - name: tmp
      emptyDir: {}
    - name: cache
      emptyDir: {}
  containers:
    - name: app
      image: nginx:1.25
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
      ports:
        - containerPort: 8080
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /var/cache/nginx

What this achieves:

runAsNonRoot: true — rejects images that default to root without an explicit UID
runAsUser: 10000 — process runs as an unprivileged UID
runAsGroup: 10000 — primary group is unprivileged
fsGroup: 10000 — volumes are accessible by the container’s group
allowPrivilegeEscalation: false — no setuid/setgid escalation
readOnlyRootFilesystem: true — root filesystem is immutable
capabilities.drop: ALL — no Linux capabilities retained
emptyDir volumes — writable space only where explicitly needed

Testing Security Context

Deploy the hardened Pod and verify each constraint:

kubectl apply -f hardened-pod.yaml
kubectl wait --for=condition=ready pod/hardened-pod --timeout=30s

Verify User and Group

kubectl exec hardened-pod -- whoami
# whoami: unknown uid 10000
# (or the mapped username if /etc/passwd contains UID 10000)

kubectl exec hardened-pod -- id
# uid=10000 gid=10000 groups=10000

Verify Read-Only Filesystem

kubectl exec hardened-pod -- touch /test
# touch: /test: Read-only file system

kubectl exec hardened-pod -- touch /tmp/test
# (succeeds — /tmp is an emptyDir)

Verify Capabilities

kubectl exec hardened-pod -- cat /proc/1/status | grep CapEff
# CapEff: 0000000000000000
# (no effective capabilities)

Verify No Privilege Escalation

kubectl exec hardened-pod -- cat /proc/1/status | grep NoNewPrivs
# NoNewPrivs: 1

Pod-Level vs Container-Level: When to Use Each

Scenario	Level	Reason
All containers should run as non-root	Pod	Applies uniformly
Each container needs a different UID	Container	Override per container
Volumes need shared group access	Pod (`fsGroup`)	Only available at Pod level
Drop capabilities for one container	Container	Only available at container level
Read-only filesystem for one container	Container	Only available at container level
Shared security baseline for all containers	Pod	Set Pod-level, override at container level as needed

Exam Strategy

SecurityContext tasks on the CKAD exam typically provide a running Pod and ask you to add security constraints. The workflow:

Get the existing YAML: kubectl get pod <name> -o yaml > pod.yaml
Delete the running Pod: kubectl delete pod <name>
Edit the YAML: Add the required securityContext fields at the correct level.
Re-apply: kubectl apply -f pod.yaml
Verify: kubectl exec <pod> -- id, kubectl exec <pod> -- touch /test

Know which fields exist at which level. A common exam mistake is placing capabilities at the Pod level (it only exists at the container level) or placing fsGroup at the container level (it only exists at the Pod level). These misplacements cause validation errors that cost precious minutes.

Practice combining multiple security constraints in a single Pod. Exam tasks frequently ask for three or four security settings at once — for example, “run as user 1000, drop all capabilities, set the root filesystem to read-only, and prevent privilege escalation.” Having the YAML structure memorized means you can write a complete locked-down Pod spec in under two minutes without consulting documentation.