Load Shedding and Backpressure - architecting resilient distributed systems high-scale engineering and failure mode mitigation • Dev|Journal

Introduction to Load Shedding and Backpressure

Load shedding is a proactive strategy to maintain system stability under high load by intentionally rejecting or dropping incoming requests before the backend becomes overwhelmed [1]. This approach is crucial in preventing cascading failures across microservices, which can occur when a failure in one component consumes all available system threads. In contrast, backpressure is a mechanism where a downstream system signals an upstream system to slow down or stop the flow of data because it cannot process it at the current rate.

Admission Control and Its Importance

Admission control is the process of deciding whether to accept or reject an incoming request based on current system health, available resources, and predefined policies. This process is vital in preventing overload and ensuring that the system can handle the incoming requests without compromising its performance. By implementing admission control, systems can prevent bypassing admission control during high load, which can lead to cascading failures.

Little’s Law and Its Application

Little’s Law is a theorem from queueing theory that states that the long-term average number of items (L) in a stationary system is equal to the long-term average arrival rate (λ) multiplied by the average time (W) an item spends in the system: L = λW. This law can be applied to estimate concurrency limits, which is essential in load shedding. The version of Little’s Law used for concurrency limits is defined as: Limit = Average RPS * Average Latency [6].

Token Bucket and Leaky Bucket Algorithms

The Token Bucket algorithm is used for traffic shaping and rate limiting, allowing for bursts by accumulating ‘tokens’ up to a maximum capacity, consuming them for each request. On the other hand, the Leaky Bucket algorithm provides a constant outflow rate regardless of the inflow, used to smooth out bursty traffic into a steady stream. The key difference between these algorithms is that Token Bucket allows for bursts up to the bucket size, whereas Leaky Bucket enforces a rigid output rate [6].

Implementation of Load Shedding

NGINX implements load shedding via ‘limit_req’ and ‘limit_conn’ directives [1, 2]. Common HTTP status codes for shed requests are 429 (Too Many Requests) and 503 (Service Unavailable) [2]. The Vegas algorithm for concurrency limits uses a delay-based approach where the bottleneck queue is estimated as L * (1 - minRTT/sampleRtt) [10]. Adaptive limits increase the limit by 1 if the queue < alpha, and decrease by 1 if queue > beta [10].

Conclusion

In conclusion, load shedding and backpressure are essential mechanisms for maintaining system stability under high load. By implementing admission control, using Little’s Law to estimate concurrency limits, and applying algorithms such as Token Bucket and Leaky Bucket, systems can prevent overload and ensure that they can handle incoming requests without compromising their performance.

Sources

[1] https://umatechnology.org/load-shedding-rules-for-nginx-ingress-layers-made-for-99-999-slas/ [2] https://github.com/Netflix/concurrency-limits [6] https://github.com/Netflix/concurrency-limits [10] https://github.com/Netflix/concurrency-limits