SDXL Model Guide - HyperGen

Overview

Stable Diffusion XL (SDXL) is Stability AI’s flagship text-to-image model, offering exceptional quality and versatility. It’s the most widely-used and well-supported diffusion model, with excellent community support and thousands of fine-tuned variants available.

SDXL is the recommended starting point for most users due to its excellent balance of quality, speed, and VRAM requirements.

Model Variants

SDXL Base 1.0

The standard SDXL model optimized for high-quality image generation.

from hypergen import model

m = model.load("stabilityai/stable-diffusion-xl-base-1.0")
m.to("cuda")

Key Features:

Resolution: Native 1024x1024 (can generate up to 2048x2048)
Quality: Excellent detail and composition
VRAM: 8GB minimum, 12GB recommended
Speed: ~4 seconds per image (RTX 4090, 50 steps)

SDXL Turbo

A distilled variant optimized for ultra-fast generation (1-4 steps).

from hypergen import model

m = model.load("stabilityai/sdxl-turbo")
m.to("cuda")

Key Features:

Speed: 3-4x faster than base (1-4 inference steps)
Quality: Very good (slightly below base)
VRAM: 8GB minimum
Use Case: Rapid prototyping, real-time applications

SDXL Refiner

A specialized model for refining SDXL base outputs (optional).

from hypergen import model

# Load base model
m_base = model.load("stabilityai/stable-diffusion-xl-base-1.0")

# Load refiner (optional)
m_refiner = model.load("stabilityai/stable-diffusion-xl-refiner-1.0")

The refiner is optional and typically used for professional workflows. Most users don’t need it.

Loading SDXL with HyperGen

Basic Loading

from hypergen import model

# Load SDXL
m = model.load("stabilityai/stable-diffusion-xl-base-1.0")
m.to("cuda")

# Generate an image
image = m.generate("A majestic lion in the African savanna")
image[0].save("output.png")

Optimized Loading

For better performance and lower VRAM usage:

from hypergen import model

# Load with fp16 precision
m = model.load(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype="float16",
    variant="fp16",
    use_safetensors=True,
)
m.to("cuda")

Using torch_dtype="float16" reduces VRAM usage by ~50% with minimal quality loss.

Memory-Optimized Loading

For GPUs with 8GB VRAM:

from hypergen import model

m = model.load("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype="float16")
m.to("cuda")

# Enable memory optimizations
m.enable_vae_slicing()           # Reduce VAE memory usage
m.enable_attention_slicing()     # Reduce attention memory usage

Training LoRAs with SDXL

SDXL is the most popular model for LoRA training due to its excellent quality and wide compatibility.

Basic LoRA Training

from hypergen import model, dataset

# Load model
m = model.load("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype="float16")
m.to("cuda")

# Load dataset
ds = dataset.load("./my_images")

# Train LoRA
lora = m.train_lora(
    ds,
    steps=1000,
    rank=16,
    alpha=32,
    learning_rate=1e-4,
)

Recommended Training Parameters

Quick Training (8GB VRAM)
Balanced Training (12GB VRAM)
High-Quality Training (16GB+ VRAM)

For fast iteration and testing:

lora = m.train_lora(
    ds,
    steps=800,
    learning_rate=1e-4,
    rank=8,
    alpha=16,
    batch_size=1,
    gradient_accumulation_steps=4,
)

Settings:

Lower rank (8) for faster training
Fewer steps for quick results
Works on 8GB VRAM
Training time: ~10 minutes (50 images)

Training for Different Use Cases

Style Transfer LoRA

Learning an artistic style or aesthetic:

lora = m.train_lora(
    ds,
    steps=1500,
    learning_rate=1e-4,
    rank=16,
    alpha=32,
    batch_size=1,
    gradient_accumulation_steps=4,
)

Dataset:

50-200 images in the target style
Consistent aesthetic across all images
Captions describing content, not style
High resolution (1024x1024+)

Example caption:

A landscape with mountains and a lake, trees in the foreground

(Describe what you see, not “in artistic style”)

Character/Subject LoRA

Learning a specific person, character, or object:

lora = m.train_lora(
    ds,
    steps=1000,
    learning_rate=5e-5,
    rank=32,
    alpha=64,
    batch_size=1,
    gradient_accumulation_steps=4,
)

Dataset:

20-100 images of the subject
Variety of poses, angles, and expressions
Different lighting conditions
Detailed captions

Example caption:

A photo of [subject name], smiling, wearing a blue shirt,
front-facing portrait, natural lighting

Concept LoRA

Learning a new concept or composition style:

lora = m.train_lora(
    ds,
    steps=2000,
    learning_rate=1e-4,
    rank=24,
    alpha=48,
    batch_size=1,
    gradient_accumulation_steps=8,
)

Dataset:

30-150 images demonstrating the concept
Varied examples showing different aspects
Captions focusing on composition and elements

Inference Parameters

Basic Generation

image = m.generate(
    prompt="A serene Japanese garden with cherry blossoms",
    negative_prompt="blurry, low quality, distorted",
    num_inference_steps=50,
    guidance_scale=7.5,
    height=1024,
    width=1024,
)

Parameter Guide

prompt

str

required

Text description of the desired image

negative_prompt

str

default:""

What to avoid in the generated imageCommon negative prompts:

negative_prompt="blurry, low quality, distorted, deformed, bad anatomy"

num_inference_steps

int

default:50

Number of denoising steps

20-30: Fast, good quality
40-50: Better quality (recommended)
50-100: Highest quality, diminishing returns

guidance_scale

float

default:7.5

How closely to follow the prompt

5-6: More creative, less literal
7-8: Balanced (recommended)
9-12: Very literal, can be oversaturated

height

int

default:1024

Image height in pixels (must be multiple of 8)

1024: Standard (recommended)
768: Faster, lower quality
1536-2048: Higher detail, slower

width

int

default:1024

Image width in pixels (must be multiple of 8)

Recommended Settings

Speed Priority

num_inference_steps=20
guidance_scale=7.0
height=768
width=768

Generation time: ~1.5s

Balanced

num_inference_steps=40
guidance_scale=7.5
height=1024
width=1024

Generation time: ~3.5s

Quality Priority

num_inference_steps=50
guidance_scale=7.5
height=1024
width=1024

Generation time: ~4.5s

Advanced Generation

# Generate with custom seed for reproducibility
image = m.generate(
    prompt="A futuristic cityscape at night",
    seed=42,
    num_inference_steps=50,
)

# Generate multiple variations
images = m.generate(
    prompt="A cat wearing a wizard hat",
    num_images=4,  # Generate 4 images
    guidance_scale=7.5,
)

# High-resolution generation
image = m.generate(
    prompt="A detailed portrait",
    height=1536,
    width=1536,
    num_inference_steps=60,
)

Performance Benchmarks

Generation Performance

Based on NVIDIA RTX 4090, 1024x1024 resolution:

Steps	VRAM	Time	Quality
20	~9GB	~1.8s	Good
30	~9GB	~2.5s	Very Good
40	~9GB	~3.5s	Excellent
50	~9GB	~4.2s	Excellent+

Training Performance

LoRA training on RTX 4090, 50 images:

Configuration	VRAM	Time (1000 steps)
Rank 8, Batch 1	~8GB	~10 min
Rank 16, Batch 1	~9GB	~12 min
Rank 32, Batch 1	~11GB	~15 min
Rank 32, Batch 2	~14GB	~18 min

VRAM Requirements

8GB VRAM

GPUs: RTX 3060 12GB, RTX 2080 TiCapabilities:

Generation: 1024x1024
Training: Rank 8-16
Batch size: 1

12GB VRAM

GPUs: RTX 3060, RTX 4070 TiCapabilities:

Generation: 1024x1024
Training: Rank 16-32
Batch size: 1-2

16GB+ VRAM

GPUs: RTX 4080, RTX 4090, A100Capabilities:

Generation: Up to 2048x2048
Training: Rank 32-64
Batch size: 2-4

Best Practices

Prompt Engineering

Structure
Negative Prompts
Style Control
Quality Modifiers

Good prompt structure:

[Main subject], [Details], [Style], [Lighting], [Quality modifiers]

Examples: Good:

prompt = """
A majestic mountain landscape with snow-capped peaks,
pine forest in the foreground, golden hour lighting,
photorealistic, highly detailed, 8k
"""

L Poor:

prompt = "nice mountain"

Training Best Practices

Dataset Quality

Prepare high-quality training data: Do:

Use high-resolution images (1024x1024 or higher)
Ensure consistent quality
Include variety (poses, angles, lighting)
Write detailed captions
20-150 images is usually sufficient

L Don’t:

Use low-resolution or blurry images
Include duplicates
Mix different subjects in same dataset
Leave images uncaptioned

Caption Writing

Write effective captions: Good caption:

A person wearing a red jacket and blue jeans,
standing in front of a brick wall,
natural daylight, slight smile

L Poor caption:

person

Tips:

Describe what you see objectively
Include composition, lighting, colors
Be consistent in style
Don’t describe what you want to learn

Hyperparameter Selection

Choose appropriate hyperparameters:Start with defaults:

steps=1000
learning_rate=1e-4
rank=16
alpha=32

Adjust based on results:

Underfitting? Increase steps, rank, or learning rate
Overfitting? Decrease steps, add more data
Out of memory? Reduce rank or batch size

Checkpoint Management

Save and test checkpoints:

lora = m.train_lora(
    ds,
    steps=2000,
    save_steps=500,  # Save every 500 steps
    output_dir="./checkpoints"
)

Test checkpoints at 500, 1000, 1500, and 2000 steps to find the best one.

Memory Optimization

VAE Slicing

Reduce VAE memory usage:

m.enable_vae_slicing()

Reduces VRAM by ~10%
Minimal performance impact
Recommended for all users

Attention Slicing

Reduce attention memory usage:

m.enable_attention_slicing()

Reduces VRAM by ~15-20%
Small performance impact (~5% slower)
Useful for 8GB GPUs

CPU Offload

Offload to CPU when not in use:

m.enable_model_cpu_offload()

Reduces VRAM by ~40-50%
Significant performance impact (~20% slower)
Use only if necessary

Lower Precision

Use float16 instead of float32:

m = model.load(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype="float16"
)

Reduces VRAM by ~50%
Minimal quality impact
Strongly recommended

Troubleshooting

Common Issues

Out of Memory (Generation)

Error: CUDA out of memory during image generationSolutions:

Enable memory optimizations:

m.enable_vae_slicing()
m.enable_attention_slicing()

Reduce image resolution:

image = m.generate(prompt, height=768, width=768)

Use float16 precision:

m = model.load("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype="float16")

Generate fewer images:

image = m.generate(prompt, num_images=1)

Out of Memory (Training)

Error: CUDA out of memory during LoRA trainingSolutions:

Reduce LoRA rank:

lora = m.train_lora(ds, rank=8, alpha=16)

Use batch size 1:
```
lora = m.train_lora(ds, batch_size=1)
```

Use gradient accumulation:

lora = m.train_lora(ds, batch_size=1, gradient_accumulation_steps=8)

Use float16 precision:

m = model.load("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype="float16")

Poor Image Quality

Issue: Generated images are low quality or don’t match promptSolutions:

Increase inference steps:

image = m.generate(prompt, num_inference_steps=50)

Adjust guidance scale:

image = m.generate(prompt, guidance_scale=8.0)

Improve prompt:

prompt = "A detailed [subject], [style], highly detailed, 8k, professional"

Use negative prompts:

negative_prompt = "blurry, low quality, distorted"

Poor Training Results

Issue: Trained LoRA doesn’t work wellSolutions:

Increase training steps:
```
lora = m.train_lora(ds, steps=2000)
```
Improve dataset:
- Add more images
- Improve caption quality
- Use higher resolution images
- Add more variety

Adjust hyperparameters:

lora = m.train_lora(
    ds,
    steps=1500,
    learning_rate=5e-5,
    rank=32,
    alpha=64,
)

Check earlier checkpoints:
- Model might be overfitting
- Try checkpoint-500 or checkpoint-1000

Slow Generation

Issue: Image generation is very slowSolutions:

Reduce inference steps:

image = m.generate(prompt, num_inference_steps=30)

Use SDXL Turbo:

m = model.load("stabilityai/sdxl-turbo")
image = m.generate(prompt, num_inference_steps=4)

Reduce resolution:

image = m.generate(prompt, height=768, width=768)

Disable CPU offload if enabled:
```
m.disable_model_cpu_offload()
```

Example Workflows

Basic Image Generation

from hypergen import model

# Load SDXL
m = model.load("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype="float16")
m.to("cuda")

# Generate image
image = m.generate(
    prompt="A serene mountain lake at sunset, photorealistic, highly detailed",
    negative_prompt="blurry, low quality",
    num_inference_steps=40,
    guidance_scale=7.5,
)

image[0].save("mountain_lake.png")

LoRA Training Pipeline

from hypergen import model, dataset

# Load model
m = model.load("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype="float16")
m.to("cuda")

# Load and prepare dataset
ds = dataset.load("./training_images")

# Train LoRA with checkpoints
lora = m.train_lora(
    ds,
    steps=2000,
    learning_rate=1e-4,
    rank=16,
    alpha=32,
    batch_size=1,
    gradient_accumulation_steps=4,
    save_steps=500,
    output_dir="./lora_checkpoints"
)

print("Training complete!")
print("Checkpoints saved in ./lora_checkpoints/")

Batch Generation

from hypergen import model

# Load model
m = model.load("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype="float16")
m.to("cuda")

# Define prompts
prompts = [
    "A majestic lion",
    "A serene landscape",
    "A futuristic city",
    "A beautiful flower",
]

# Generate multiple images
for i, prompt in enumerate(prompts):
    image = m.generate(
        prompt=f"{prompt}, highly detailed, 8k, professional",
        negative_prompt="blurry, low quality",
        num_inference_steps=40,
    )
    image[0].save(f"output_{i}.png")

print(f"Generated {len(prompts)} images")

Community Fine-Tunes

SDXL has thousands of community fine-tunes available. Here are some popular ones:

# Anime style
m = model.load("stablediffusionapi/anything-v5")

# Realistic photography
m = model.load("SG161222/RealVisXL_V4.0")

# Artistic style
m = model.load("RunDiffusion/Juggernaut-XL-v9")

# Product photography
m = model.load("playgroundai/playground-v2.5-1024px-aesthetic")

Browse HuggingFace’s SDXL models for more options.

Next Steps

Training Guide

Complete LoRA training documentation

Dataset Preparation

Learn how to prepare training data

Serving SDXL

Deploy SDXL with the API

Supported Models

View all compatible models

Getting Started

Training

Serving

Models

​Overview

​Model Variants

​SDXL Base 1.0

​SDXL Turbo

​SDXL Refiner

​Loading SDXL with HyperGen

​Basic Loading

​Optimized Loading

​Memory-Optimized Loading

​Training LoRAs with SDXL

​Basic LoRA Training

​Recommended Training Parameters

​Training for Different Use Cases

​Inference Parameters

​Basic Generation

​Parameter Guide

​Recommended Settings

Speed Priority

Balanced

Quality Priority

​Advanced Generation

​Performance Benchmarks

​Generation Performance

​Training Performance

​VRAM Requirements

​Best Practices

​Prompt Engineering

​Training Best Practices

​Memory Optimization

​Troubleshooting

​Common Issues

​Example Workflows

​Basic Image Generation

​LoRA Training Pipeline

​Batch Generation

​Community Fine-Tunes

​Next Steps

Training Guide

Dataset Preparation

Serving SDXL

Supported Models

​Additional Resources

Overview

Model Variants

SDXL Base 1.0

SDXL Turbo

SDXL Refiner

Loading SDXL with HyperGen

Basic Loading

Optimized Loading

Memory-Optimized Loading

Training LoRAs with SDXL

Basic LoRA Training

Recommended Training Parameters

Training for Different Use Cases

Inference Parameters

Basic Generation

Parameter Guide

Recommended Settings

Advanced Generation

Performance Benchmarks

Generation Performance

Training Performance

VRAM Requirements

Best Practices

Prompt Engineering

Training Best Practices

Memory Optimization

Troubleshooting

Common Issues

Example Workflows

Basic Image Generation

LoRA Training Pipeline

Batch Generation

Community Fine-Tunes

Next Steps

Additional Resources