Skip to main content

On This Page

Solving CUDA Out of Memory Errors in Stable Diffusion WebUI

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How to Fix CUDA Out of Memory Errors in Stable Diffusion WebUI

Stable Diffusion WebUI often triggers CUDA out of memory errors during high-resolution generations. SDXL models require roughly 6.6 GB in fp16 just for U-Net weights, frequently exceeding the VRAM limits of consumer GPUs.

Why This Matters

VRAM management is often a configuration problem rather than a hardware limitation. PyTorch’s allocator may not release memory between runs, leading to fragmentation where a successful generation is followed by a crash despite identical settings. This technical reality means a well-tuned 8 GB card can outperform a poorly configured 12 GB card.

Key Insights

  • Memory-efficient attention via —xformers can reduce VRAM usage by 30-40% (West, 2026).
  • Model splitting via —medvram allows the U-Net, VAE, and text encoder to avoid being resident simultaneously at a 10-15% speed cost.
  • PyTorch CUDA caching allocator tuning using PYTORCH_CUDA_ALLOC_CONF prevents memory fragmentation into unusable chunks.

Working Examples

Command line arguments and environment variables for VRAM optimization.

# webui-user.sh
export COMMANDLINE_ARGS="--xformers --medvram --opt-split-attention --no-half-vae"

# Linux/Mac environment variable for allocator tuning
export PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:512,garbage_collection_threshold:0.8"

Manual VRAM flush function for custom inference scripts.

import torch
import gc

def cleanup_vram():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()
    print(f"Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
    print(f"Reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")

Practical Applications

  • Use case: High-resolution image generation using Hires fix to run a second pass at upscaled resolution instead of native high resolution.
  • Pitfall: Using —no-half-vae; while it prevents black-image artifacts from fp16 overflow, it can spike VRAM during the decode step.

References:

Continue reading

Next article

OpenSparrow v2.3: Zero-Dependency Visual Admin Panel for PHP and PostgreSQL

Related Content