Overview
FLUX.1 is Black Forest Labs’ state-of-the-art text-to-image generation model, representing the cutting edge of diffusion model technology. It delivers exceptional image quality, outstanding prompt adherence, and remarkable detail in generated images.FLUX.1 Dev
Best for: Production use, highest quality
- Superior image quality
- Excellent prompt following
- Requires 16GB+ VRAM
FLUX.1 Schnell
Best for: Fast iteration, prototyping
- Optimized for speed (1-4 steps)
- Good quality/speed tradeoff
- Requires 12GB+ VRAM
Model Variants
FLUX.1 Dev
The development variant optimized for the highest quality outputs.- Quality: State-of-the-art image generation
- Prompt Following: Excellent text comprehension
- License: Non-commercial (requires license for commercial use)
- VRAM: 16GB+ recommended
FLUX.1 Schnell
The “schnell” (fast) variant optimized for rapid generation.- Speed: 3-4x faster than Dev
- Quality: Excellent (slightly below Dev)
- License: Apache 2.0 (permissive, commercial-friendly)
- VRAM: 12GB+ recommended
Loading FLUX.1 with HyperGen
Basic Loading
Advanced Loading Options
Low VRAM Configuration
For systems with limited VRAM (12GB):Training LoRAs with FLUX.1
FLUX.1 supports efficient LoRA fine-tuning with HyperGen’s optimized training pipeline.Basic LoRA Training
Recommended Training Parameters
- Style Transfer
- Subject/Character
- Concept
Goal: Learn an artistic style or aestheticDataset Requirements:
- 50-200 images
- Consistent style across images
- Captions describing content, not style
Memory-Optimized Training
For 16GB VRAM GPUs:High-Quality Training
For 24GB+ VRAM GPUs:Inference Parameters
Generation Settings
Recommended Parameters
FLUX.1 Dev
Quality Priority:Balanced:Speed Priority:
FLUX.1 Schnell
Best Settings:Alternative:
Advanced Generation
Performance Benchmarks
Generation Benchmarks
Based on NVIDIA RTX 4090, 1024x1024 images:| Variant | Steps | VRAM Used | Time | Quality |
|---|---|---|---|---|
| FLUX.1 Dev | 50 | ~18GB | ~8s | Outstanding |
| FLUX.1 Dev | 30 | ~18GB | ~5s | Excellent |
| FLUX.1 Dev | 20 | ~18GB | ~3.5s | Very Good |
| FLUX.1 Schnell | 4 | ~16GB | ~1.5s | Excellent |
| FLUX.1 Schnell | 2 | ~16GB | ~1s | Very Good |
Training Benchmarks
LoRA training on RTX 4090, 50 images, rank 32:| Configuration | VRAM | Time (1000 steps) | Time (2000 steps) |
|---|---|---|---|
| Batch 1, GA 1 | ~16GB | ~20 min | ~40 min |
| Batch 1, GA 4 | ~16GB | ~22 min | ~44 min |
| Batch 1, GA 8 | ~16GB | ~25 min | ~50 min |
| Batch 2, GA 4 | ~22GB | ~28 min | ~56 min |
GA = Gradient Accumulation Steps. Higher values slightly increase training time but improve quality.
GPU Requirements
Minimum (Schnell)
VRAM: 12GBExamples:
- RTX 3060 (12GB)
- RTX 4070
- A10
- Enable optimizations
- Batch size 1
- Rank 16-32
Recommended (Dev)
VRAM: 16GBExamples:
- RTX 4080
- RTX 4090
- A100 (40GB)
- Standard settings
- Batch size 1-2
- Rank 32-64
Optimal (Dev)
VRAM: 24GB+Examples:
- RTX 4090
- A100 (40GB)
- H100
- Maximum quality
- Batch size 2-4
- Rank 64-128
Best Practices
Prompt Engineering
FLUX.1 has excellent prompt comprehension. Here are tips for best results:- Structure
- Details
- Style Control
- Text in Images
Good prompt structure:Example:
Training Best Practices
1
Dataset Preparation
Quality over quantity:
- Use high-resolution images (1024x1024 or higher)
- Ensure consistent quality across dataset
- 20-150 images is usually sufficient
- Remove duplicates and near-duplicates
2
Caption Quality
Write descriptive captions:
- Describe what you see, not what you want to learn
- Include details about composition, lighting, colors
- Be consistent in caption style
- Use natural language
3
Hyperparameter Tuning
Start with defaults, then adjust:
- Begin with recommended settings
- If underfitting (not learning), increase:
- Training steps
- LoRA rank
- Learning rate (carefully)
- If overfitting (memorizing), decrease:
- Training steps
- LoRA rank
- Add more training images
4
Monitor Training
Save checkpoints regularly:Test different checkpoints to find the best one.
Memory Optimization
Enable Model CPU Offload
Enable Model CPU Offload
Offload model components to CPU when not in use:Pros: Reduces VRAM by 40-50%
Cons: Slower generation (10-20% slower)
Enable VAE Slicing
Enable VAE Slicing
Process VAE in smaller slices:Pros: Reduces VRAM by 10-15%
Cons: Minimal performance impact
Enable Attention Slicing
Enable Attention Slicing
Compute attention in slices:Pros: Reduces VRAM by 15-20%
Cons: Slower generation (5-10% slower)
Use Lower Rank
Use Lower Rank
Reduce LoRA rank during training:Pros: Reduces VRAM by 20-30%
Cons: Lower model capacity
Troubleshooting
Common Issues
Out of Memory During Generation
Out of Memory During Generation
Solutions:
-
Enable memory optimizations:
-
Reduce image resolution:
-
Generate fewer images at once:
Out of Memory During Training
Out of Memory During Training
Solutions:
-
Reduce batch size:
-
Lower LoRA rank:
-
Use gradient accumulation:
Poor Training Results
Poor Training Results
Possible causes and solutions:
-
Not enough training steps:
- Increase to 2000-3000 steps
-
Low quality dataset:
- Use higher resolution images
- Add more diverse examples
- Improve caption quality
-
Wrong hyperparameters:
- Try learning_rate=4e-5 or 6e-5
- Increase rank to 64
- Adjust alpha to 2x rank
Slow Generation Speed
Slow Generation Speed
Solutions:
-
Use FLUX.1 Schnell instead of Dev:
-
Reduce inference steps:
-
Disable CPU offload if enabled:
Example Projects
Portrait LoRA Training
Style Transfer LoRA
Batch Generation
License Information
FLUX.1 Dev
License: FLUX.1 Dev Non-Commercial License- Personal use
- Research
- Evaluation
- L Commercial use (requires separate license)
FLUX.1 Schnell
License: Apache 2.0- Personal use
- Research
- Commercial use
- Modification and distribution
Next Steps
Training Guide
Complete LoRA training documentation
Dataset Preparation
Learn how to prepare training data
Serving FLUX.1
Deploy FLUX.1 with the API
API Reference
Complete model API documentation