Glossary

LoRA Training Basics

The step-by-step process of teaching a diffusion model a new face, object, or style using a small set of training images.

Training a LoRA requires three things: a base model (e.g., Flux Dev), a dataset of images, and a training script (Kohya-ss, SimpleTuner, or a managed service like Synexa).

Dataset preparation is the most important step. For a face LoRA: collect 15–30 images showing the subject from varied angles, lighting conditions, and expressions. Crop tightly to the face or slightly wider; remove backgrounds if possible. Mislabelled or low-quality images hurt more than adding good ones helps — quality beats quantity.

Captioning: each training image needs a text caption. Auto-captioners (BLIP-2, Florence-2) generate initial captions; you then prepend a unique trigger token (e.g., 'ohwx person') that the LoRA will learn to associate with the subject. During inference, including that token 'activates' the learned identity.

Hyperparameters: key settings are learning rate (1e-4 to 1e-5), network rank (r=16–64 — higher rank = more expressive but larger file and more overfitting risk), and number of training steps (500–2000 for faces). Overfitting produces images that look exactly like the training photos but fail to generalise to new poses and prompts.

On Synexa, dataset prep, captioning, and training are handled automatically — you upload images and receive a ready-to-use LoRA file, with no GPU setup required.

Frequently Asked Questions

How many images do I need to train a good LoRA?
15–30 images is the practical sweet spot for face LoRAs. Object or style LoRAs may need 50–200 images with greater variety.
What is a trigger word and do I need one?
A trigger word (or token) is the unique string you include in prompts to activate the LoRA's learned concept. It is required for identity LoRAs and optional for style LoRAs.
How long does LoRA training take?
On a single A100, a face LoRA typically trains in 10–30 minutes. Synexa's managed pipeline completes most jobs in under 20 minutes.
Can I train on images that contain multiple people?
Training images should ideally show only the target subject. Multiple people in frame confuse the model about which identity to learn.

← All glossary terms

Ready to generate?

Free credits on signup. No card. 4K output, no watermark.