Skip to main content
modern python mastery technical interview patterns for production code

Combined Techniques: Heaps + Windows

9 min read Chapter 19 of 34
Summary

This section synthesizes heaps and sliding windows to...

This section synthesizes heaps and sliding windows to address dynamic optimization problems in streaming data, such as running median computation and streaming top-K queries. It introduces the MedianFinder class, which uses dual heaps (max-heap via negation for lower half, min-heap for upper half) with rebalancing to achieve O(1) median retrieval and O(log n) insertion. Extending to sliding windows, the sliding_window_median function incorporates lazy deletion using a dictionary to mark removed elements, enabling amortized O(log k) operations per window shift. Performance analysis compares heap-based methods to alternatives like balanced trees, highlighting efficiency gains with O(n log k) time for sliding windows. Common anti-patterns are addressed, such as avoiding list.pop(0) in favor of deque.popleft(), and production challenges like memory management and thread-safety are discussed. The principles are applied to streaming top-K problems, using min-heaps with lazy deletion for O(N log K) time complexity, with practical examples in monitoring system latency percentiles.

Combined Techniques: Heaps + Windows

Building upon the isolated explorations of priority queues using heapq and sliding window patterns with deque and itertools, this section synthesizes these data structures to address dynamic optimization challenges in streaming data. The combination of heaps for efficient order statistic maintenance and windows for bounded sequence analysis enables solutions to problems like running median computation and streaming top-K queries, which are foundational in real-time monitoring, financial tick analysis, and latency percentile tracking. By leveraging Python 3.12+ features—such as strict type hints, typing.Protocol, and match/case—these implementations achieve both performance efficiency and code clarity, adhering to modern idiomatic practices.

Running Median with Dual Heaps

At the core of combining heaps and windows is the MedianFinder class, which implements a running median using two heaps: a max-heap for the lower half of numbers (simulated by negating values in a min-heap) and a min-heap for the upper half. This dual heap structure allows for O(1) median retrieval and O(log n) insertion by maintaining balanced heap sizes, with rebalancing logic to ensure the size difference is at most one. The following code exemplifies this approach with Python 3.12+ type annotations and docstrings, avoiding mutable default arguments by using None with conditional initialization where applicable.

from heapq import heappush, heappop
from typing import List
import itertools
from collections import deque
from collections.abc import Sequence
from typing import Protocol, TypeVar, Generic

T = TypeVar('T', int, float)

class MedianFinder:
    """Implements running median using two heaps."""
    def __init__(self) -> None:
        self.max_heap: List[T] = []  # Lower half, store negatives for max-heap
        self.min_heap: List[T] = []  # Upper half

    def add_num(self, num: T) -> None:
        if not self.max_heap or num <= -self.max_heap[0]:
            heappush(self.max_heap, -num)
        else:
            heappush(self.min_heap, num)
        # Rebalance heaps
        if len(self.max_heap) > len(self.min_heap) + 1:
            heappush(self.min_heap, -heappop(self.max_heap))
        elif len(self.min_heap) > len(self.max_heap):
            heappush(self.max_heap, -heappop(self.min_heap))

    def find_median(self) -> float:
        if len(self.max_heap) == len(self.min_heap):
            return (-self.max_heap[0] + self.min_heap[0]) / 2.0
        else:
            return -self.max_heap[0] if len(self.max_heap) > len(self.min_heap) else self.min_heap[0]

# Reference shared material: CH4_class_RunningMedian for similar implementation.

Median retrieval is O(1): if heap sizes are equal, the median is the average of the roots; otherwise, it is the root of the larger heap. Inserting a number involves adding to the appropriate heap based on comparison with the current median estimate and rebalancing to enforce the size constraint, with O(log n) time complexity. This structure outperforms naive approaches like repeated sorting or using bisect.insort, which incur O(n) insertion time for maintaining a sorted list.

Sliding Window Median with Lazy Deletion

Extending the running median to a sliding window introduces the challenge of element removal as the window slides. The sliding window median problem computes the median for each fixed-size window over a sequence, efficiently addressed by augmenting the dual heap structure with lazy deletion. In this technique, elements that move out of the window are marked for removal in a dictionary (e.g., dict[element, count]) instead of immediate deletion from heaps, enabling amortized O(log k) operations per window shift, where k is the window size. Before accessing heap roots, the heaps are pruned by checking the lazy deletion dictionary to ensure invariants hold.

The implementation below demonstrates this with Python 3.12+ features, including collections.defaultdict for the lazy deletion dict and strict type hints for function signatures.

def sliding_window_median(nums: List[int], k: int) -> List[float]:
    """Compute median for each sliding window of size k using heaps with lazy deletion."""
    import heapq
    from collections import defaultdict
    
    max_heap: List[int] = []  # Lower half, store negatives
    min_heap: List[int] = []  # Upper half
    lazy: defaultdict[int, int] = defaultdict(int)  # Lazy deletion dictionary
    medians: List[float] = []
    
    def add_num(num: int) -> None:
        if not max_heap or num <= -max_heap[0]:
            heapq.heappush(max_heap, -num)
        else:
            heapq.heappush(min_heap, num)
        rebalance()
    
    def remove_num(num: int) -> None:
        lazy[num] += 1
        rebalance()
    
    def rebalance() -> None:
        while max_heap and lazy.get(-max_heap[0], 0) > 0:
            lazy[-heapq.heappop(max_heap)] -= 1
        while min_heap and lazy.get(min_heap[0], 0) > 0:
            lazy[heapq.heappop(min_heap)] -= 1
        if len(max_heap) > len(min_heap) + 1:
            heapq.heappush(min_heap, -heapq.heappop(max_heap))
        elif len(min_heap) > len(max_heap):
            heapq.heappush(max_heap, -heapq.heappop(min_heap))
    
    def get_median() -> float:
        rebalance()  # Ensure heaps are pruned
        if len(max_heap) == len(min_heap):
            return (-max_heap[0] + min_heap[0]) / 2.0
        else:
            return -max_heap[0] if len(max_heap) > len(min_heap) else min_heap[0]
    
    # Initialize first window
    for i in range(k):
        add_num(nums[i])
    medians.append(get_median())
    
    # Slide window
    for i in range(k, len(nums)):
        remove_num(nums[i - k])  # Remove leftmost element
        add_num(nums[i])          # Add new element
        medians.append(get_median())
    
    return medians

This approach contrasts with a naive method that sorts each window repeatedly, leading to O(k log k) time per window shift and O(n k log k) total for n windows. By using heaps with lazy deletion, the time reduces to O(n log k), with space complexity O(k) for the heaps and lazy deletion dict.

Performance and Complexity Analysis

To dissect the efficiency gains, consider the following performance comparison tables, which highlight time and space complexities for heap-based versus alternative methods.

AspectHeap-Based Median FinderBalanced Tree (Bisect Module)
Insert TimeO(log n)O(n) for inserting into sorted list
Median Query TimeO(1)O(1) for accessing middle element
Space ComplexityO(n)O(n)
Best Use CaseDynamic data streams, frequent insertsStatic or infrequently updated data
Idiomatic in PythonUses heapq, type hintsUses bisect.insort, less efficient for streams
AlgorithmTime Complexity per OperationSpace ComplexityNotes
Naive Sliding Window Median (sorting)O(k log k) per windowO(k)Inefficient for large k or n
Heap + Lazy Deletion Sliding Window MedianAmortized O(log k) per elementO(k)Efficient, uses dual heaps
Streaming Top-K with Sliding WindowO(N log K) for N elementsO(K)Min-heap of size K with lazy deletion

Complexity analysis further elaborates:

  • MedianFinder: add_num operates in O(log n) time with O(1) additional space per insertion; find_median is O(1); overall space is O(n) for storing n elements in heaps.
  • Sliding Window Median: Initialization builds heaps for the first window in O(k log k) time; per slide, add_num and remove_num with lazy deletion are amortized O(log k); total for n elements is O(n log k) time, with O(k) space for heaps and lazy dict.
  • Streaming Top-K with Sliding Window: This problem finds the K largest or smallest elements in a moving window of size N, solved using a min-heap of size K (for top-K largest) updated incrementally. Per element, heap operations like heappushpop take O(log K) time, with total O(N log K) for N elements and O(K) space, using similar lazy deletion for removal.

Type-Safe Implementations

Adhering to Python 3.12+ standards, type annotations ensure clarity and static analysis. The following diagrams describe key data structures and function signatures.

Heap structures:

  • max_heap: List[int] (with negated values for max-heap simulation)
  • min_heap: List[int]
  • lazy_deletion_dict: dict[int, int] or defaultdict[int, int]

Deque for sliding window indices:

  • dq: deque[int] # Stores indices of elements in the sequence

Function signatures:

  • def add_num(self, num: int) -> None:
  • def sliding_window_median(nums: List[int], k: int) -> List[float]:
  • def streaming_top_k(elements: Sequence[int], k: int, window_size: int) -> List[int]:

Protocol for generic heap operations:

from typing import Protocol, TypeVar
T = TypeVar('T')
class HeapProtocol(Protocol[T]):
    def push(self, item: T) -> None: ...
    def pop(self) -> T: ...

Using typing.Protocol instead of abstract base classes for structural typing aligns with duck typing principles, as discussed in prerequisite sections like CH1-S1.

Common Pitfalls and Best Practices

Several anti-patterns can undermine the efficiency of heap and window combinations. This section calls out key issues with corrective measures.

  1. Anti-pattern: Using list.pop(0) for sliding window instead of deque.popleft(), leading to O(n) time per removal. Fix: Use collections.deque with maxlen or manual popleft for O(1) operations, as covered in sibling section CH4-S2.

  2. Anti-pattern: Not implementing lazy deletion for sliding window median, causing O(k log k) time to delete from heaps directly. Fix: Introduce a lazy deletion dictionary to mark elements and prune heaps lazily, as demonstrated in the sliding_window_median code.

  3. Anti-pattern: Using global variables for heap states in concurrent code, risking race conditions. Fix: Encapsulate state in classes and use threading.Lock for thread-safety, since heapq and deque are not thread-safe.

  4. Anti-pattern: Missing type hints in function signatures, reducing code clarity and static analysis. Fix: Add strict type hints per style guide, e.g., def func(nums: List[int]) -> List[float].

  5. Anti-pattern: Overusing match/case for simple heap logic where if/else suffices, adding unnecessary complexity. Fix: Reserve match/case for state machines or complex pattern matching, as shown in relevant materials like CH1-S2_class_HTTPState.

Production gotchas further highlight deployment challenges:

  • Memory blow-up: Unbounded heap growth if lazy deletion is not correctly implemented, especially for long-running streams. Mitigation involves regularly pruning heaps and monitoring memory usage.
  • Thread-safety: heapq and deque are not thread-safe; concurrent access can lead to data corruption. Mitigation uses locks (e.g., threading.Lock) or atomic operations in multi-threaded environments.
  • Performance overhead: Frequent heap operations in tight loops can cause CPU spikes. Mitigation includes profiling code and considering batching updates or alternative data structures for specific workloads.
  • Version compatibility: Code relying on Python 3.12+ features (e.g., new typing enhancements) may break in older versions. Mitigation specifies Python version in requirements or uses compatibility shims.
  • Error handling: Neglecting edge cases like empty windows or invalid inputs can crash production systems. Mitigation adds validation and error handling with specific exception types, avoiding bare except clauses.

Extending to Streaming Top-K

The principles of combining heaps and windows extend naturally to streaming top-K with sliding window. This problem maintains the K largest or smallest elements in a moving window, with applications in real-time analytics and monitoring. A min-heap of size K tracks the K largest elements; as the window slides, new elements are added with heappushpop and old elements are removed using lazy deletion similar to the median case. The algorithm achieves O(N log K) time complexity for N elements and O(K) space, leveraging the same lazy deletion dict for efficient updates.

For instance, in a monitoring system latency percentiles scenario, this approach enables efficient computation of top latencies over a sliding time window, avoiding the inefficiencies of repeated sorting. By referencing shared materials like top_k_manual_heap, implementations can ensure consistency with established patterns.

In summary, the fusion of heaps and sliding windows through techniques like dual heap structures and lazy deletion provides robust solutions for dynamic optimization problems. By adhering to Python 3.12+ idioms—such as strict type hints, typing.Protocol, and avoidance of anti-patterns—developers can build scalable, maintainable systems for streaming data analysis.