Overview
Stable Diffusion XL (SDXL) is Stability AI’s flagship text-to-image model, offering exceptional quality and versatility. It’s the most widely-used and well-supported diffusion model, with excellent community support and thousands of fine-tuned variants available.SDXL is the recommended starting point for most users due to its excellent balance of quality, speed, and VRAM requirements.
Model Variants
SDXL Base 1.0
The standard SDXL model optimized for high-quality image generation.- Resolution: Native 1024x1024 (can generate up to 2048x2048)
- Quality: Excellent detail and composition
- VRAM: 8GB minimum, 12GB recommended
- Speed: ~4 seconds per image (RTX 4090, 50 steps)
SDXL Turbo
A distilled variant optimized for ultra-fast generation (1-4 steps).- Speed: 3-4x faster than base (1-4 inference steps)
- Quality: Very good (slightly below base)
- VRAM: 8GB minimum
- Use Case: Rapid prototyping, real-time applications
SDXL Refiner
A specialized model for refining SDXL base outputs (optional).Loading SDXL with HyperGen
Basic Loading
Optimized Loading
For better performance and lower VRAM usage:Memory-Optimized Loading
For GPUs with 8GB VRAM:Training LoRAs with SDXL
SDXL is the most popular model for LoRA training due to its excellent quality and wide compatibility.Basic LoRA Training
Recommended Training Parameters
- Quick Training (8GB VRAM)
- Balanced Training (12GB VRAM)
- High-Quality Training (16GB+ VRAM)
For fast iteration and testing:Settings:
- Lower rank (8) for faster training
- Fewer steps for quick results
- Works on 8GB VRAM
- Training time: ~10 minutes (50 images)
Training for Different Use Cases
Style Transfer LoRA
Style Transfer LoRA
Learning an artistic style or aesthetic:Dataset:(Describe what you see, not “in artistic style”)
- 50-200 images in the target style
- Consistent aesthetic across all images
- Captions describing content, not style
- High resolution (1024x1024+)
Character/Subject LoRA
Character/Subject LoRA
Learning a specific person, character, or object:Dataset:
- 20-100 images of the subject
- Variety of poses, angles, and expressions
- Different lighting conditions
- Detailed captions
Concept LoRA
Concept LoRA
Learning a new concept or composition style:Dataset:
- 30-150 images demonstrating the concept
- Varied examples showing different aspects
- Captions focusing on composition and elements
Inference Parameters
Basic Generation
Parameter Guide
Text description of the desired image
What to avoid in the generated imageCommon negative prompts:
Number of denoising steps
- 20-30: Fast, good quality
- 40-50: Better quality (recommended)
- 50-100: Highest quality, diminishing returns
How closely to follow the prompt
- 5-6: More creative, less literal
- 7-8: Balanced (recommended)
- 9-12: Very literal, can be oversaturated
Image height in pixels (must be multiple of 8)
- 1024: Standard (recommended)
- 768: Faster, lower quality
- 1536-2048: Higher detail, slower
Image width in pixels (must be multiple of 8)
Recommended Settings
Speed Priority
Balanced
Quality Priority
Advanced Generation
Performance Benchmarks
Generation Performance
Based on NVIDIA RTX 4090, 1024x1024 resolution:| Steps | VRAM | Time | Quality |
|---|---|---|---|
| 20 | ~9GB | ~1.8s | Good |
| 30 | ~9GB | ~2.5s | Very Good |
| 40 | ~9GB | ~3.5s | Excellent |
| 50 | ~9GB | ~4.2s | Excellent+ |
Training Performance
LoRA training on RTX 4090, 50 images:| Configuration | VRAM | Time (1000 steps) |
|---|---|---|
| Rank 8, Batch 1 | ~8GB | ~10 min |
| Rank 16, Batch 1 | ~9GB | ~12 min |
| Rank 32, Batch 1 | ~11GB | ~15 min |
| Rank 32, Batch 2 | ~14GB | ~18 min |
VRAM Requirements
1
8GB VRAM
GPUs: RTX 3060 12GB, RTX 2080 TiCapabilities:
- Generation: 1024x1024
- Training: Rank 8-16
- Batch size: 1
2
12GB VRAM
GPUs: RTX 3060, RTX 4070 TiCapabilities:
- Generation: 1024x1024
- Training: Rank 16-32
- Batch size: 1-2
3
16GB+ VRAM
GPUs: RTX 4080, RTX 4090, A100Capabilities:
- Generation: Up to 2048x2048
- Training: Rank 32-64
- Batch size: 2-4
Best Practices
Prompt Engineering
- Structure
- Negative Prompts
- Style Control
- Quality Modifiers
Good prompt structure:Examples: Good:L Poor:
Training Best Practices
1
Dataset Quality
Prepare high-quality training data: Do:
- Use high-resolution images (1024x1024 or higher)
- Ensure consistent quality
- Include variety (poses, angles, lighting)
- Write detailed captions
- 20-150 images is usually sufficient
- Use low-resolution or blurry images
- Include duplicates
- Mix different subjects in same dataset
- Leave images uncaptioned
2
Caption Writing
Write effective captions: Good caption:L Poor caption:Tips:
- Describe what you see objectively
- Include composition, lighting, colors
- Be consistent in style
- Don’t describe what you want to learn
3
Hyperparameter Selection
Choose appropriate hyperparameters:Start with defaults:Adjust based on results:
- Underfitting? Increase steps, rank, or learning rate
- Overfitting? Decrease steps, add more data
- Out of memory? Reduce rank or batch size
4
Checkpoint Management
Save and test checkpoints:Test checkpoints at 500, 1000, 1500, and 2000 steps to find the best one.
Memory Optimization
VAE Slicing
VAE Slicing
Reduce VAE memory usage:
- Reduces VRAM by ~10%
- Minimal performance impact
- Recommended for all users
Attention Slicing
Attention Slicing
Reduce attention memory usage:
- Reduces VRAM by ~15-20%
- Small performance impact (~5% slower)
- Useful for 8GB GPUs
CPU Offload
CPU Offload
Offload to CPU when not in use:
- Reduces VRAM by ~40-50%
- Significant performance impact (~20% slower)
- Use only if necessary
Lower Precision
Lower Precision
Use float16 instead of float32:
- Reduces VRAM by ~50%
- Minimal quality impact
- Strongly recommended
Troubleshooting
Common Issues
Out of Memory (Generation)
Out of Memory (Generation)
Error: CUDA out of memory during image generationSolutions:
-
Enable memory optimizations:
-
Reduce image resolution:
-
Use float16 precision:
-
Generate fewer images:
Out of Memory (Training)
Out of Memory (Training)
Error: CUDA out of memory during LoRA trainingSolutions:
-
Reduce LoRA rank:
-
Use batch size 1:
-
Use gradient accumulation:
-
Use float16 precision:
Poor Image Quality
Poor Image Quality
Issue: Generated images are low quality or don’t match promptSolutions:
-
Increase inference steps:
-
Adjust guidance scale:
-
Improve prompt:
-
Use negative prompts:
Poor Training Results
Poor Training Results
Issue: Trained LoRA doesn’t work wellSolutions:
-
Increase training steps:
-
Improve dataset:
- Add more images
- Improve caption quality
- Use higher resolution images
- Add more variety
-
Adjust hyperparameters:
-
Check earlier checkpoints:
- Model might be overfitting
- Try checkpoint-500 or checkpoint-1000
Slow Generation
Slow Generation
Issue: Image generation is very slowSolutions:
-
Reduce inference steps:
-
Use SDXL Turbo:
-
Reduce resolution:
-
Disable CPU offload if enabled:
Example Workflows
Basic Image Generation
LoRA Training Pipeline
Batch Generation
Community Fine-Tunes
SDXL has thousands of community fine-tunes available. Here are some popular ones:Next Steps
Training Guide
Complete LoRA training documentation
Dataset Preparation
Learn how to prepare training data
Serving SDXL
Deploy SDXL with the API
Supported Models
View all compatible models