Pod and Container Security Context
SummaryCovers the SecurityContext API at both Pod and...
Covers the SecurityContext API at both Pod and...
Covers the SecurityContext API at both Pod and container levels, including runAsUser, runAsNonRoot, runAsGroup, fsGroup, readOnlyRootFilesystem, allowPrivilegeEscalation, and Linux capabilities. Explains the override hierarchy where container-level settings take precedence over Pod-level settings. Includes a complete locked-down Pod YAML and verification commands.
Pod and Container Security Context
A securityContext is a set of fields in the Pod or container spec that instruct the container runtime how to run the process. These fields control the user and group IDs, filesystem permissions, Linux capabilities, and privilege escalation behavior. Getting them right is the difference between a container that follows the principle of least privilege and one that hands an attacker a root shell on your cluster.
Pod-Level vs Container-Level
SecurityContext exists at two levels:
apiVersion: v1
kind: Pod
metadata:
name: security-demo
spec:
securityContext: # Pod-level — applies to ALL containers
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
containers:
- name: app
image: busybox
securityContext: # Container-level — applies to THIS container
runAsUser: 2000
allowPrivilegeEscalation: false
Override rule: when the same field is set at both levels, the container-level value wins. In the example above, the app container runs as UID 2000 (container-level), not UID 1000 (Pod-level). The runAsGroup and fsGroup from the Pod-level still apply because the container does not override them.
Fields that exist only at the Pod level: fsGroup, supplementalGroups, sysctls. Fields that exist only at the container level: capabilities, readOnlyRootFilesystem, allowPrivilegeEscalation. Fields that exist at both levels: runAsUser, runAsNonRoot, runAsGroup, seLinuxOptions, seccompProfile.
runAsUser
Specifies the UID (user ID) the container process runs as:
securityContext:
runAsUser: 1000
The process inside the container runs as UID 1000. If the container image defines a USER instruction in its Dockerfile, runAsUser overrides it. This is powerful — you can enforce non-root execution on any image, even one that defaults to root.
kubectl exec security-demo -- whoami
# Output depends on whether UID 1000 has a name in /etc/passwd
# If not mapped: "whoami: unknown uid 1000"
kubectl exec security-demo -- id
# uid=1000 gid=3000 groups=2000
runAsNonRoot
A boolean guard that rejects the container if it would run as root:
securityContext:
runAsNonRoot: true
If the container image specifies USER root (or does not specify a USER, defaulting to root), and no runAsUser is set to override it, the kubelet refuses to start the container. The Pod enters a CreateContainerConfigError state with the message container has runAsNonRoot and image will run as root.
This is a safety net. Set runAsNonRoot: true at the Pod level, then set an explicit runAsUser at the container level. If someone later changes the image to one that defaults to root, the guard catches it.
spec:
securityContext:
runAsNonRoot: true
containers:
- name: app
image: nginx:1.25
securityContext:
runAsUser: 101 # nginx user in the nginx image
runAsGroup
Specifies the primary GID (group ID) for the container process:
securityContext:
runAsGroup: 3000
All processes in the container run with GID 3000 as their primary group. This affects file creation — new files get group ownership 3000.
kubectl exec security-demo -- id
# uid=1000 gid=3000 groups=2000
fsGroup
A Pod-level field that sets a supplemental group for all volumes:
spec:
securityContext:
fsGroup: 2000
When fsGroup is set, Kubernetes:
- Adds GID 2000 to the supplemental groups of every container in the Pod.
- Changes the group ownership of all files in mounted volumes to GID 2000.
- Sets the setgid bit on volume directories, so new files inherit the group.
This is essential when a non-root container needs to write to a PersistentVolume. Without fsGroup, a volume might be owned by root, and UID 1000 cannot write to it.
apiVersion: v1
kind: Pod
metadata:
name: fsgroup-demo
spec:
securityContext:
runAsUser: 1000
fsGroup: 2000
volumes:
- name: data
emptyDir: {}
containers:
- name: app
image: busybox
command: ["sh", "-c", "ls -la /data && touch /data/test && ls -la /data/test && sleep 3600"]
volumeMounts:
- name: data
mountPath: /data
kubectl logs fsgroup-demo
# drwxrwsrwx 2 root 2000 ... /data
# -rw-r--r-- 1 1000 2000 ... /data/test
The s in drwxrwsrwx is the setgid bit. The file test is owned by UID 1000 (the user) with GID 2000 (the fsGroup).
readOnlyRootFilesystem
Mounts the container’s root filesystem as read-only:
securityContext:
readOnlyRootFilesystem: true
With this setting, any attempt to write to the root filesystem fails:
kubectl exec security-demo -- touch /test
# touch: /test: Read-only file system
Applications that need to write temporary files (logs, caches, PID files) must use emptyDir volumes mounted at the appropriate paths:
containers:
- name: app
image: nginx:1.25
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /var/cache/nginx
- name: run
mountPath: /var/run
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
- name: run
emptyDir: {}
This pattern is common in production: a read-only root filesystem with writable emptyDir volumes for specific paths. It prevents an attacker from modifying binaries, installing tools, or writing backdoors to the container filesystem.
allowPrivilegeEscalation
Controls whether a process can gain more privileges than its parent:
securityContext:
allowPrivilegeEscalation: false
When set to false, the no_new_privs flag is applied to the container process. This prevents setuid binaries (like sudo, su, or ping) from granting elevated privileges. The container process — and all its child processes — cannot escalate beyond the privileges it started with.
Always set this to false unless the application explicitly requires setuid behavior. Most application containers have no legitimate reason to escalate privileges.
Linux Capabilities
Linux capabilities divide root’s monolithic power into discrete units. Instead of granting full root access, you grant specific capabilities:
| Capability | Allows |
|---|---|
NET_BIND_SERVICE | Bind to ports below 1024 |
SYS_TIME | Modify system clock |
NET_RAW | Use raw sockets (ping) |
SYS_PTRACE | Trace processes (debugging) |
CHOWN | Change file ownership |
The security best practice is to drop all capabilities and add back only what the application needs:
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
This container can bind to port 80 (below 1024) but cannot change file ownership, modify the system clock, use raw sockets, or perform any other privileged operation.
To see which capabilities a running container has:
kubectl exec security-demo -- cat /proc/1/status | grep -i cap
# CapPrm: 0000000000000400
# CapEff: 0000000000000400
The hex values map to specific capability sets. For exam purposes, you need to know the YAML syntax for dropping and adding capabilities, not the hex decoding.
A container without drop: ALL retains a default set of capabilities that varies by runtime. The Kubernetes documentation lists the default set — it includes CHOWN, DAC_OVERRIDE, FOWNER, FSETID, KILL, SETGID, SETUID, NET_BIND_SERVICE, and several others. Dropping all of them and adding back selectively is the safest approach.
Complete Locked-Down Pod
This YAML represents a production-hardened container with every security field set to its most restrictive value:
apiVersion: v1
kind: Pod
metadata:
name: hardened-pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10000
runAsGroup: 10000
fsGroup: 10000
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
containers:
- name: app
image: nginx:1.25
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
ports:
- containerPort: 8080
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /var/cache/nginx
What this achieves:
- runAsNonRoot: true — rejects images that default to root without an explicit UID
- runAsUser: 10000 — process runs as an unprivileged UID
- runAsGroup: 10000 — primary group is unprivileged
- fsGroup: 10000 — volumes are accessible by the container’s group
- allowPrivilegeEscalation: false — no setuid/setgid escalation
- readOnlyRootFilesystem: true — root filesystem is immutable
- capabilities.drop: ALL — no Linux capabilities retained
- emptyDir volumes — writable space only where explicitly needed
Testing Security Context
Deploy the hardened Pod and verify each constraint:
kubectl apply -f hardened-pod.yaml
kubectl wait --for=condition=ready pod/hardened-pod --timeout=30s
Verify User and Group
kubectl exec hardened-pod -- whoami
# whoami: unknown uid 10000
# (or the mapped username if /etc/passwd contains UID 10000)
kubectl exec hardened-pod -- id
# uid=10000 gid=10000 groups=10000
Verify Read-Only Filesystem
kubectl exec hardened-pod -- touch /test
# touch: /test: Read-only file system
kubectl exec hardened-pod -- touch /tmp/test
# (succeeds — /tmp is an emptyDir)
Verify Capabilities
kubectl exec hardened-pod -- cat /proc/1/status | grep CapEff
# CapEff: 0000000000000000
# (no effective capabilities)
Verify No Privilege Escalation
kubectl exec hardened-pod -- cat /proc/1/status | grep NoNewPrivs
# NoNewPrivs: 1
Pod-Level vs Container-Level: When to Use Each
| Scenario | Level | Reason |
|---|---|---|
| All containers should run as non-root | Pod | Applies uniformly |
| Each container needs a different UID | Container | Override per container |
| Volumes need shared group access | Pod (fsGroup) | Only available at Pod level |
| Drop capabilities for one container | Container | Only available at container level |
| Read-only filesystem for one container | Container | Only available at container level |
| Shared security baseline for all containers | Pod | Set Pod-level, override at container level as needed |
Exam Strategy
SecurityContext tasks on the CKAD exam typically provide a running Pod and ask you to add security constraints. The workflow:
- Get the existing YAML:
kubectl get pod <name> -o yaml > pod.yaml - Delete the running Pod:
kubectl delete pod <name> - Edit the YAML: Add the required
securityContextfields at the correct level. - Re-apply:
kubectl apply -f pod.yaml - Verify:
kubectl exec <pod> -- id,kubectl exec <pod> -- touch /test
Know which fields exist at which level. A common exam mistake is placing capabilities at the Pod level (it only exists at the container level) or placing fsGroup at the container level (it only exists at the Pod level). These misplacements cause validation errors that cost precious minutes.
Practice combining multiple security constraints in a single Pod. Exam tasks frequently ask for three or four security settings at once — for example, “run as user 1000, drop all capabilities, set the root filesystem to read-only, and prevent privilege escalation.” Having the YAML structure memorized means you can write a complete locked-down Pod spec in under two minutes without consulting documentation.