StatefulSets, DaemonSets, and When to Use Each

Deployments treat Pods as interchangeable cattle — any Pod can be replaced by any other Pod without affecting the application. This works for stateless web servers, API gateways, and worker processes. It fails catastrophically for databases, message brokers, and distributed consensus systems that depend on stable identities and persistent storage.

Kubernetes provides two specialized workload controllers for scenarios where Deployments fall short: StatefulSets for applications that need stable identity and storage, and DaemonSets for applications that need exactly one Pod on every node.

StatefulSets

The Problem with Deployments for Stateful Apps

Consider deploying a three-node PostgreSQL cluster with streaming replication. Each node needs:

A stable hostname so replicas can connect to the primary at a predictable address
Persistent storage that survives Pod restarts and rescheduling — and that stays bound to the same Pod identity
Ordered startup so the primary initializes before replicas attempt to connect

A Deployment provides none of these. Pod names are random hashes (postgres-7d6f8b5c4d-xkm2p). PVCs are shared or must be manually managed. Pods start and stop in arbitrary order. A StatefulSet addresses all three requirements.

StatefulSet Guarantees

A StatefulSet provides three guarantees that Deployments do not:

1. Stable, unique Pod identity. Each Pod gets a predictable name derived from the StatefulSet name and an ordinal index: web-0, web-1, web-2. This name persists across restarts and rescheduling. If web-1 is deleted, the replacement Pod is also named web-1.

2. Stable, persistent storage. Each Pod gets its own PersistentVolumeClaim through volumeClaimTemplates. The PVC is named <template-name>-<pod-name> (e.g., data-web-0). When a Pod is rescheduled to a different node, its PVC follows it (assuming the storage class supports it). When a Pod is deleted, its PVC is not deleted — the data persists.

3. Ordered, graceful management. By default, Pods are created in order (0, 1, 2) and terminated in reverse order (2, 1, 0). Each Pod must be Running and Ready before the next Pod is created. This ensures the primary database node is available before replicas attempt to connect.

Pod Naming and Ordinal Index

StatefulSet Pods follow the naming pattern <statefulset-name>-<ordinal>:

web-0    # First Pod (ordinal 0)
web-1    # Second Pod (ordinal 1)
web-2    # Third Pod (ordinal 2)

The ordinal index is stable — if web-1 fails and is replaced, the new Pod is still web-1. This stability allows applications to embed identity into their configuration. A Kafka broker can derive its broker ID from the ordinal. A Redis Cluster node can derive its slot assignment.

Headless Service Requirement

A StatefulSet requires a headless Service — a Service with clusterIP: None. This Service doesn’t load-balance traffic to a random Pod. Instead, it creates individual DNS records for each Pod:

web-0.nginx-headless.default.svc.cluster.local
web-1.nginx-headless.default.svc.cluster.local
web-2.nginx-headless.default.svc.cluster.local

The DNS record format is <pod-name>.<service-name>.<namespace>.svc.cluster.local. Each record resolves to the Pod’s IP address, allowing other Pods to connect to a specific StatefulSet member by name.

The headless Service is defined separately from the StatefulSet and linked via the serviceName field:

apiVersion: v1
kind: Service
metadata:
  name: nginx-headless
spec:
  clusterIP: None
  selector:
    app: nginx
  ports:
    - port: 80
      targetPort: 80

volumeClaimTemplates

Instead of referencing an existing PVC, a StatefulSet defines PVC templates that generate a unique PVC for each Pod:

volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi

This template creates PVCs named data-web-0, data-web-1, data-web-2. Each PVC is bound to its respective Pod and is not deleted when the StatefulSet is scaled down or deleted. This is intentional — persistent data should survive workload changes.

To reclaim storage after a StatefulSet is deleted, manually delete the PVCs:

kubectl delete pvc data-web-0 data-web-1 data-web-2

Complete StatefulSet YAML

apiVersion: v1
kind: Service
metadata:
  name: nginx-headless
  labels:
    app: nginx
spec:
  clusterIP: None
  selector:
    app: nginx
  ports:
    - port: 80
      name: web
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: nginx-headless
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.25
          ports:
            - containerPort: 80
              name: web
          volumeMounts:
            - name: data
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi

Key fields:

serviceName — must match the headless Service name. This field is required; the StatefulSet won’t be created without it.
selector.matchLabels — must match the Pod template labels, same as Deployments.
volumeClaimTemplates — defines PVCs that are created per Pod. The name here (data) matches the volumeMounts[].name in the container spec.

podManagementPolicy

The default policy is OrderedReady — Pods are created sequentially (0, then 1, then 2), each waiting for the previous Pod to be Running and Ready.

For applications that don’t require ordered startup, set Parallel:

spec:
  podManagementPolicy: Parallel
  replicas: 3

With Parallel, all Pods are created simultaneously, like a Deployment. This reduces startup time but removes the ordering guarantee. Use Parallel when Pods are independent — for example, a StatefulSet used purely for stable storage identifiers without inter-Pod dependencies.

StatefulSet Update Strategies

StatefulSets support two update strategies:

RollingUpdate (default). Pods are updated in reverse ordinal order (2, 1, 0). Each Pod is terminated and recreated before moving to the next. The partition parameter can restrict the update to a subset of Pods — only Pods with an ordinal greater than or equal to the partition value are updated. This enables canary deployments:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 2   # Only update Pod web-2

OnDelete. Pods are not automatically updated. You manually delete Pods, and the StatefulSet controller recreates them with the new spec. This gives full control over the update order:

spec:
  updateStrategy:
    type: OnDelete

Scaling a StatefulSet

Scale up:

kubectl scale statefulset web --replicas=5

New Pods are created in order: web-3, then web-4. Each waits for the previous to be Ready (unless podManagementPolicy: Parallel).

Scale down:

kubectl scale statefulset web --replicas=2

Pods are removed in reverse order: web-4, then web-3, then web-2. Their PVCs remain — data is preserved even after scale-down.

DaemonSets

One Pod Per Node

A DaemonSet ensures that exactly one copy of a Pod runs on every node (or a selected subset of nodes) in the cluster. When a new node joins the cluster, the DaemonSet controller schedules a Pod on it. When a node is removed, the Pod is garbage collected.

This differs fundamentally from Deployments and StatefulSets, which manage a fixed replica count distributed across available nodes by the scheduler. A DaemonSet’s replica count is determined by the number of matching nodes, not by a replicas field.

Use Cases

DaemonSets are the standard pattern for cluster-wide infrastructure agents:

Agent Type	Examples
Log collection	Fluentd, Fluent Bit, Filebeat
Monitoring	Prometheus Node Exporter, Datadog agent
Networking	Calico, Cilium, kube-proxy
Storage	CSI node drivers, local-volume-provisioner
Security	Falco, Twistlock defenders

These agents need to run on every node because they collect node-level data (logs, metrics, network packets) or provide node-level services (network routing, volume mounting).

Complete DaemonSet YAML

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
  labels:
    app: log-collector
spec:
  selector:
    matchLabels:
      app: log-collector
  template:
    metadata:
      labels:
        app: log-collector
    spec:
      containers:
        - name: fluentd
          image: fluentd:v1.16
          resources:
            limits:
              cpu: 200m
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 100Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

Key observations:

No replicas field. The number of Pods is determined by the number of matching nodes.
hostPath volumes. DaemonSets commonly mount host directories because they need access to node-level data. This is a legitimate use of hostPath — unlike in Deployments, each DaemonSet Pod runs on a unique node, so there’s no contention.
Resource limits. Essential for DaemonSet Pods to prevent a logging agent from consuming all CPU or memory on a node.

Node Selection with nodeSelector

By default, a DaemonSet runs on every node, including control plane nodes (if they have no taints preventing it). To restrict a DaemonSet to specific nodes, use nodeSelector:

spec:
  template:
    spec:
      nodeSelector:
        node-role.kubernetes.io/worker: ""

This runs the DaemonSet Pod only on nodes labeled node-role.kubernetes.io/worker. Use this to exclude control plane nodes or target nodes with specific hardware (GPU nodes, SSD nodes).

Verify which nodes are running DaemonSet Pods:

kubectl get pods -l app=log-collector -o wide

NAME                  READY   STATUS    NODE
log-collector-abc12   1/1     Running   worker-1
log-collector-def34   1/1     Running   worker-2
log-collector-ghi56   1/1     Running   worker-3

Tolerations for Control Plane Nodes

Control plane nodes typically have a taint that prevents regular Pods from scheduling on them:

node-role.kubernetes.io/control-plane:NoSchedule

If you need a DaemonSet Pod on control plane nodes (e.g., for monitoring), add a toleration:

spec:
  template:
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

DaemonSet Update Strategies

DaemonSets support two update strategies:

RollingUpdate (default). Pods are updated one node at a time. The maxUnavailable parameter controls how many Pods can be down simultaneously:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

OnDelete. Pods are not automatically updated. You manually delete Pods on specific nodes, and the DaemonSet controller recreates them with the new spec.

Checking DaemonSet Status

kubectl get daemonset log-collector

NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
log-collector   3         3         3       3             3           <none>          5d

Key columns:

DESIRED — number of nodes that should run the Pod
CURRENT — number of Pods created
READY — number of Pods in Ready state
UP-TO-DATE — number of Pods running the latest spec
AVAILABLE — number of Pods available (matching minReadySeconds)

If DESIRED ≠ CURRENT, some nodes may have taints the DaemonSet doesn’t tolerate. If CURRENT ≠ READY, Pods may be failing their health checks.

StatefulSet vs Deployment vs DaemonSet

Characteristic	Deployment	StatefulSet	DaemonSet
Pod names	Random hash	Ordinal (web-0, web-1)	Random hash per node
Pod identity	Interchangeable	Stable, unique	One per node
Scaling	replicas field	replicas field	Number of nodes
Storage	Shared PVC or none	Per-Pod PVC (volumeClaimTemplates)	Typically hostPath
Startup order	Arbitrary	Sequential (by default)	Arbitrary (per node)
Shutdown order	Arbitrary	Reverse sequential	Arbitrary
DNS	Service load-balancing	Per-Pod DNS via headless Service	N/A (node-level)
Use case	Stateless apps	Databases, caches, consensus	Node agents, log/metrics

Decision rule: If Pods are interchangeable, use a Deployment. If Pods need stable identity and storage, use a StatefulSet. If you need one Pod per node, use a DaemonSet.

Exercises

Exercise 1: Helm Chart Installation with Custom Values

Requirements:

Add the Bitnami Helm repository
Install the bitnami/nginx chart as a release named my-web in namespace helm-exercise with:
- 2 replicas
- Service type ClusterIP
Verify the release is deployed and the Pods are running
Upgrade the release to 3 replicas
Roll back to the original 2-replica configuration
Verify the rollback was successful

Exercise 2: StatefulSet with Headless Service

Requirements:

Create a namespace stateful-exercise
Create a headless Service named web-headless in stateful-exercise with:
- No cluster IP (clusterIP: None)
- Selector: app: web
- Port 80
Create a StatefulSet named web in stateful-exercise with:
- 3 replicas
- serviceName: web-headless
- Container: nginx:1.25
- A volumeClaimTemplate named html requesting 100Mi storage
- Mount the volume at /usr/share/nginx/html
Verify Pods are named web-0, web-1, web-2
Verify each Pod has its own PVC: html-web-0, html-web-1, html-web-2

Write a unique file to each Pod’s volume and verify it persists after Pod deletion:

kubectl exec web-0 -n stateful-exercise -- sh -c 'echo "pod-0-data" > /usr/share/nginx/html/index.html'

Delete web-0 and verify the replacement Pod retains the data

Verify stable DNS resolution:

kubectl run dns-test --rm -it --image=busybox -n stateful-exercise -- nslookup web-0.web-headless

Exercise 3: DaemonSet with Node Selection

Requirements:

Create a namespace daemon-exercise
Label one of your nodes with disk=ssd:
```
kubectl label node <node-name> disk=ssd
```
Create a DaemonSet named node-monitor in daemon-exercise with:
- Container: busybox:1.36
- Command: ["sh", "-c", "while true; do echo $(hostname) $(date); sleep 60; done"]
- nodeSelector: { disk: ssd }
- Resource requests: 50m CPU, 64Mi memory
Verify the DaemonSet only runs on the labeled node(s)
Label a second node with disk=ssd and verify a new Pod appears automatically
Remove the label from one node and verify the Pod is removed:
```
kubectl label node <node-name> disk-
```

Solutions are provided in the next chapter.