Probe Configuration Solutions

Exercise 1: Diagnose and Fix a CrashLoopBackOff from a Broken Liveness Probe

Step 1: Create the Pod

Save the following manifest as broken-probe.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: broken-probe
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      ports:
        - containerPort: 80
      livenessProbe:
        httpGet:
          path: /does-not-exist
          port: 80
        periodSeconds: 2
        failureThreshold: 3

Apply it:

kubectl apply -f broken-probe.yaml

Step 2: Observe the Failure

Wait approximately 10–15 seconds, then check the Pod status:

kubectl get pods broken-probe

Expected output:

NAME           READY   STATUS    RESTARTS      AGE
broken-probe   1/1     Running   2 (4s ago)    20s

The restart count climbs. After several restarts with increasing back-off delays, the status changes to CrashLoopBackOff:

NAME           READY   STATUS             RESTARTS      AGE
broken-probe   0/1     CrashLoopBackOff   4 (12s ago)   45s

Note: The term “CrashLoopBackOff” is slightly misleading here. The container itself is not crashing — nginx starts and runs correctly. The kubelet is killing the container because the liveness probe reports failure. From kubernetes’s perspective, the effect is the same: the container is repeatedly terminated and restarted.

Step 3: Diagnose with Events

kubectl describe pod broken-probe

Look at the Events section at the bottom:

Events:
  Type     Reason     Age                From     Message
  ----     ------     ----               ----     -------
  Normal   Scheduled  60s                default-scheduler  Successfully assigned default/broken-probe to ...
  Normal   Pulled     8s (x4 over 60s)   kubelet  Container image "nginx:1.25" already present on machine
  Normal   Created    8s (x4 over 60s)   kubelet  Created container nginx
  Normal   Started    8s (x4 over 60s)   kubelet  Started container nginx
  Warning  Unhealthy  4s (x10 over 56s)  kubelet  Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    4s (x3 over 52s)   kubelet  Container nginx failed liveness probe, will be restarted

Two events reveal the root cause:

“Liveness probe failed: HTTP probe failed with statuscode: 404” — The path /does-not-exist returns a 404, which is outside the 200–399 success range.
“Container nginx failed liveness probe, will be restarted” — After failureThreshold: 3 consecutive failures (3 × 2s = 6 seconds), the kubelet kills the container.

The container’s own logs show no errors because nginx itself is functioning correctly:

kubectl logs broken-probe

...
2026/03/01 10:00:01 [error] 29#29: *1 open() "/usr/share/nginx/html/does-not-exist" failed (2: No such file or directory)

The 404 is nginx’s expected response for a nonexistent file. The problem is the probe configuration, not the application.

Step 4: Fix the Probe

Edit the Pod. Since Pods are immutable for most fields, delete and recreate:

kubectl delete pod broken-probe

Update the YAML — change the probe path from /does-not-exist to /:

apiVersion: v1
kind: Pod
metadata:
  name: broken-probe
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      ports:
        - containerPort: 80
      livenessProbe:
        httpGet:
          path: /
          port: 80
        periodSeconds: 2
        failureThreshold: 3

Apply the corrected manifest:

kubectl apply -f broken-probe.yaml

Step 5: Verify the Fix

kubectl get pods broken-probe

Expected output after a few seconds:

NAME           READY   STATUS    RESTARTS   AGE
broken-probe   1/1     Running   0          10s

Zero restarts. The Pod stays Running because nginx returns a 200 status code for the / path, and the liveness probe passes consistently.

Verify with kubectl describe pod:

kubectl describe pod broken-probe | grep -A2 Liveness

    Liveness:       http-get http://:80/ delay=0s timeout=1s period=2s #success=1 #failure=3

And the events should show only Normal events — no Unhealthy warnings.

Cleanup

kubectl delete pod broken-probe

Exercise 2: Diagnose a Pod Stuck in Pending

Step 1: Create the Pod

Save the following manifest as pending-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: pending-pod
spec:
  containers:
    - name: nginx
      image: nginx:1.25
      resources:
        requests:
          cpu: "100"

The request of 100 means 100 whole CPU cores. No standard cluster node has 100 cores available.

Apply it:

kubectl apply -f pending-pod.yaml

Step 2: Observe the Pending State

kubectl get pods pending-pod

Expected output:

NAME          READY   STATUS    RESTARTS   AGE
pending-pod   0/1     Pending   0          30s

The Pod stays in Pending indefinitely. READY is 0/1 and RESTARTS stays at 0 — no container has ever started.

Step 3: Diagnose from Events

kubectl describe pod pending-pod

The Events section reveals the scheduling failure:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  15s   default-scheduler  0/1 nodes are available:
           1 Insufficient cpu. preemption:
           0/1 nodes are available: 1 No preemption victims found for
           incoming pod.

The message is unambiguous: “Insufficient cpu.” The scheduler evaluated every node in the cluster and none had 100 CPUs available.

You can also view this event from the namespace-wide event stream:

kubectl get events --sort-by='.lastTimestamp' --field-selector type=Warning

LAST SEEN   TYPE      REASON             OBJECT              MESSAGE
15s         Warning   FailedScheduling   pod/pending-pod     0/1 nodes are available: 1 Insufficient cpu...

Step 4: Understand Why There Are No Logs

Unlike CrashLoopBackOff, a Pending Pod has never started a container. There are no logs to retrieve:

kubectl logs pending-pod

Error from server (BadRequest): container "nginx" in pod "pending-pod" is waiting to start: ContainerCreating

For Pending Pods, events are the only diagnostic tool. The scheduler’s FailedScheduling event always explains why placement failed.

Step 5: Verify the Diagnosis and Clean Up

The root cause is confirmed: the Pod requests 100 CPU cores, which exceeds the capacity of every node. In a real scenario, the fix is to reduce the CPU request to a reasonable value (e.g., 100m for 0.1 cores, or 500m for half a core).

kubectl delete pod pending-pod

Key Takeaway

Pending is exclusively a scheduling problem. The container image is never pulled, the container runtime is never invoked. Diagnosis happens entirely through events. Memorize the common FailedScheduling messages:

Insufficient cpu or Insufficient memory — lower the request or add nodes.
didn’t match Pod’s node affinity/selector — correct the nodeSelector or node labels.
had taint {key:NoSchedule} — add a toleration.
persistentvolumeclaim “x” not bound — provision the PersistentVolume or fix the StorageClass.