StatefulSets, DaemonSets, and When to Use Each
SummaryCovers StatefulSets in depth: stable Pod identity with...
Covers StatefulSets in depth: stable Pod identity with...
Covers StatefulSets in depth: stable Pod identity with ordinal naming, headless Service for stable DNS, volumeClaimTemplates for per-Pod persistent storage, ordered vs parallel Pod management, and update strategies. Covers DaemonSets: one Pod per node, use cases for cluster-wide agents, nodeSelector for targeted scheduling, update strategies, and tolerations. Includes complete YAML manifests for both resource types and three hands-on exercises.
StatefulSets, DaemonSets, and When to Use Each
Deployments treat Pods as interchangeable cattle — any Pod can be replaced by any other Pod without affecting the application. This works for stateless web servers, API gateways, and worker processes. It fails catastrophically for databases, message brokers, and distributed consensus systems that depend on stable identities and persistent storage.
Kubernetes provides two specialized workload controllers for scenarios where Deployments fall short: StatefulSets for applications that need stable identity and storage, and DaemonSets for applications that need exactly one Pod on every node.
StatefulSets
The Problem with Deployments for Stateful Apps
Consider deploying a three-node PostgreSQL cluster with streaming replication. Each node needs:
- A stable hostname so replicas can connect to the primary at a predictable address
- Persistent storage that survives Pod restarts and rescheduling — and that stays bound to the same Pod identity
- Ordered startup so the primary initializes before replicas attempt to connect
A Deployment provides none of these. Pod names are random hashes (postgres-7d6f8b5c4d-xkm2p). PVCs are shared or must be manually managed. Pods start and stop in arbitrary order. A StatefulSet addresses all three requirements.
StatefulSet Guarantees
A StatefulSet provides three guarantees that Deployments do not:
1. Stable, unique Pod identity. Each Pod gets a predictable name derived from the StatefulSet name and an ordinal index: web-0, web-1, web-2. This name persists across restarts and rescheduling. If web-1 is deleted, the replacement Pod is also named web-1.
2. Stable, persistent storage. Each Pod gets its own PersistentVolumeClaim through volumeClaimTemplates. The PVC is named <template-name>-<pod-name> (e.g., data-web-0). When a Pod is rescheduled to a different node, its PVC follows it (assuming the storage class supports it). When a Pod is deleted, its PVC is not deleted — the data persists.
3. Ordered, graceful management. By default, Pods are created in order (0, 1, 2) and terminated in reverse order (2, 1, 0). Each Pod must be Running and Ready before the next Pod is created. This ensures the primary database node is available before replicas attempt to connect.
Pod Naming and Ordinal Index
StatefulSet Pods follow the naming pattern <statefulset-name>-<ordinal>:
web-0 # First Pod (ordinal 0)
web-1 # Second Pod (ordinal 1)
web-2 # Third Pod (ordinal 2)
The ordinal index is stable — if web-1 fails and is replaced, the new Pod is still web-1. This stability allows applications to embed identity into their configuration. A Kafka broker can derive its broker ID from the ordinal. A Redis Cluster node can derive its slot assignment.
Headless Service Requirement
A StatefulSet requires a headless Service — a Service with clusterIP: None. This Service doesn’t load-balance traffic to a random Pod. Instead, it creates individual DNS records for each Pod:
web-0.nginx-headless.default.svc.cluster.local
web-1.nginx-headless.default.svc.cluster.local
web-2.nginx-headless.default.svc.cluster.local
The DNS record format is <pod-name>.<service-name>.<namespace>.svc.cluster.local. Each record resolves to the Pod’s IP address, allowing other Pods to connect to a specific StatefulSet member by name.
The headless Service is defined separately from the StatefulSet and linked via the serviceName field:
apiVersion: v1
kind: Service
metadata:
name: nginx-headless
spec:
clusterIP: None
selector:
app: nginx
ports:
- port: 80
targetPort: 80
volumeClaimTemplates
Instead of referencing an existing PVC, a StatefulSet defines PVC templates that generate a unique PVC for each Pod:
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
This template creates PVCs named data-web-0, data-web-1, data-web-2. Each PVC is bound to its respective Pod and is not deleted when the StatefulSet is scaled down or deleted. This is intentional — persistent data should survive workload changes.
To reclaim storage after a StatefulSet is deleted, manually delete the PVCs:
kubectl delete pvc data-web-0 data-web-1 data-web-2
Complete StatefulSet YAML
apiVersion: v1
kind: Service
metadata:
name: nginx-headless
labels:
app: nginx
spec:
clusterIP: None
selector:
app: nginx
ports:
- port: 80
name: web
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: nginx-headless
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
name: web
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Key fields:
- serviceName — must match the headless Service name. This field is required; the StatefulSet won’t be created without it.
- selector.matchLabels — must match the Pod template labels, same as Deployments.
- volumeClaimTemplates — defines PVCs that are created per Pod. The
namehere (data) matches thevolumeMounts[].namein the container spec.
podManagementPolicy
The default policy is OrderedReady — Pods are created sequentially (0, then 1, then 2), each waiting for the previous Pod to be Running and Ready.
For applications that don’t require ordered startup, set Parallel:
spec:
podManagementPolicy: Parallel
replicas: 3
With Parallel, all Pods are created simultaneously, like a Deployment. This reduces startup time but removes the ordering guarantee. Use Parallel when Pods are independent — for example, a StatefulSet used purely for stable storage identifiers without inter-Pod dependencies.
StatefulSet Update Strategies
StatefulSets support two update strategies:
RollingUpdate (default). Pods are updated in reverse ordinal order (2, 1, 0). Each Pod is terminated and recreated before moving to the next. The partition parameter can restrict the update to a subset of Pods — only Pods with an ordinal greater than or equal to the partition value are updated. This enables canary deployments:
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 2 # Only update Pod web-2
OnDelete. Pods are not automatically updated. You manually delete Pods, and the StatefulSet controller recreates them with the new spec. This gives full control over the update order:
spec:
updateStrategy:
type: OnDelete
Scaling a StatefulSet
Scale up:
kubectl scale statefulset web --replicas=5
New Pods are created in order: web-3, then web-4. Each waits for the previous to be Ready (unless podManagementPolicy: Parallel).
Scale down:
kubectl scale statefulset web --replicas=2
Pods are removed in reverse order: web-4, then web-3, then web-2. Their PVCs remain — data is preserved even after scale-down.
DaemonSets
One Pod Per Node
A DaemonSet ensures that exactly one copy of a Pod runs on every node (or a selected subset of nodes) in the cluster. When a new node joins the cluster, the DaemonSet controller schedules a Pod on it. When a node is removed, the Pod is garbage collected.
This differs fundamentally from Deployments and StatefulSets, which manage a fixed replica count distributed across available nodes by the scheduler. A DaemonSet’s replica count is determined by the number of matching nodes, not by a replicas field.
Use Cases
DaemonSets are the standard pattern for cluster-wide infrastructure agents:
| Agent Type | Examples |
|---|---|
| Log collection | Fluentd, Fluent Bit, Filebeat |
| Monitoring | Prometheus Node Exporter, Datadog agent |
| Networking | Calico, Cilium, kube-proxy |
| Storage | CSI node drivers, local-volume-provisioner |
| Security | Falco, Twistlock defenders |
These agents need to run on every node because they collect node-level data (logs, metrics, network packets) or provide node-level services (network routing, volume mounting).
Complete DaemonSet YAML
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-collector
labels:
app: log-collector
spec:
selector:
matchLabels:
app: log-collector
template:
metadata:
labels:
app: log-collector
spec:
containers:
- name: fluentd
image: fluentd:v1.16
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: containers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containers
Key observations:
- No
replicasfield. The number of Pods is determined by the number of matching nodes. - hostPath volumes. DaemonSets commonly mount host directories because they need access to node-level data. This is a legitimate use of
hostPath— unlike in Deployments, each DaemonSet Pod runs on a unique node, so there’s no contention. - Resource limits. Essential for DaemonSet Pods to prevent a logging agent from consuming all CPU or memory on a node.
Node Selection with nodeSelector
By default, a DaemonSet runs on every node, including control plane nodes (if they have no taints preventing it). To restrict a DaemonSet to specific nodes, use nodeSelector:
spec:
template:
spec:
nodeSelector:
node-role.kubernetes.io/worker: ""
This runs the DaemonSet Pod only on nodes labeled node-role.kubernetes.io/worker. Use this to exclude control plane nodes or target nodes with specific hardware (GPU nodes, SSD nodes).
Verify which nodes are running DaemonSet Pods:
kubectl get pods -l app=log-collector -o wide
NAME READY STATUS NODE
log-collector-abc12 1/1 Running worker-1
log-collector-def34 1/1 Running worker-2
log-collector-ghi56 1/1 Running worker-3
Tolerations for Control Plane Nodes
Control plane nodes typically have a taint that prevents regular Pods from scheduling on them:
node-role.kubernetes.io/control-plane:NoSchedule
If you need a DaemonSet Pod on control plane nodes (e.g., for monitoring), add a toleration:
spec:
template:
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
DaemonSet Update Strategies
DaemonSets support two update strategies:
RollingUpdate (default). Pods are updated one node at a time. The maxUnavailable parameter controls how many Pods can be down simultaneously:
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
OnDelete. Pods are not automatically updated. You manually delete Pods on specific nodes, and the DaemonSet controller recreates them with the new spec.
Checking DaemonSet Status
kubectl get daemonset log-collector
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
log-collector 3 3 3 3 3 <none> 5d
Key columns:
- DESIRED — number of nodes that should run the Pod
- CURRENT — number of Pods created
- READY — number of Pods in Ready state
- UP-TO-DATE — number of Pods running the latest spec
- AVAILABLE — number of Pods available (matching
minReadySeconds)
If DESIRED ≠ CURRENT, some nodes may have taints the DaemonSet doesn’t tolerate. If CURRENT ≠ READY, Pods may be failing their health checks.
StatefulSet vs Deployment vs DaemonSet
| Characteristic | Deployment | StatefulSet | DaemonSet |
|---|---|---|---|
| Pod names | Random hash | Ordinal (web-0, web-1) | Random hash per node |
| Pod identity | Interchangeable | Stable, unique | One per node |
| Scaling | replicas field | replicas field | Number of nodes |
| Storage | Shared PVC or none | Per-Pod PVC (volumeClaimTemplates) | Typically hostPath |
| Startup order | Arbitrary | Sequential (by default) | Arbitrary (per node) |
| Shutdown order | Arbitrary | Reverse sequential | Arbitrary |
| DNS | Service load-balancing | Per-Pod DNS via headless Service | N/A (node-level) |
| Use case | Stateless apps | Databases, caches, consensus | Node agents, log/metrics |
Decision rule: If Pods are interchangeable, use a Deployment. If Pods need stable identity and storage, use a StatefulSet. If you need one Pod per node, use a DaemonSet.
Exercises
Exercise 1: Helm Chart Installation with Custom Values
Requirements:
- Add the Bitnami Helm repository
- Install the
bitnami/nginxchart as a release namedmy-webin namespacehelm-exercisewith:- 2 replicas
- Service type
ClusterIP
- Verify the release is deployed and the Pods are running
- Upgrade the release to 3 replicas
- Roll back to the original 2-replica configuration
- Verify the rollback was successful
Exercise 2: StatefulSet with Headless Service
Requirements:
- Create a namespace
stateful-exercise - Create a headless Service named
web-headlessinstateful-exercisewith:- No cluster IP (
clusterIP: None) - Selector:
app: web - Port 80
- No cluster IP (
- Create a StatefulSet named
webinstateful-exercisewith:- 3 replicas
serviceName: web-headless- Container:
nginx:1.25 - A volumeClaimTemplate named
htmlrequesting 100Mi storage - Mount the volume at
/usr/share/nginx/html
- Verify Pods are named
web-0,web-1,web-2 - Verify each Pod has its own PVC:
html-web-0,html-web-1,html-web-2 - Write a unique file to each Pod’s volume and verify it persists after Pod deletion:
kubectl exec web-0 -n stateful-exercise -- sh -c 'echo "pod-0-data" > /usr/share/nginx/html/index.html' - Delete
web-0and verify the replacement Pod retains the data - Verify stable DNS resolution:
kubectl run dns-test --rm -it --image=busybox -n stateful-exercise -- nslookup web-0.web-headless
Exercise 3: DaemonSet with Node Selection
Requirements:
- Create a namespace
daemon-exercise - Label one of your nodes with
disk=ssd:kubectl label node <node-name> disk=ssd - Create a DaemonSet named
node-monitorindaemon-exercisewith:- Container:
busybox:1.36 - Command:
["sh", "-c", "while true; do echo $(hostname) $(date); sleep 60; done"] nodeSelector: { disk: ssd }- Resource requests: 50m CPU, 64Mi memory
- Container:
- Verify the DaemonSet only runs on the labeled node(s)
- Label a second node with
disk=ssdand verify a new Pod appears automatically - Remove the label from one node and verify the Pod is removed:
kubectl label node <node-name> disk-
Solutions are provided in the next chapter.