Guide · 16 min read

AI Image Prompting Masterclass — From Beginner to Pro

Most prompts fail because they read like a wish, not a brief. This is the framework professional studios use in 2026: a six-part structure (subject, scene, light, camera, style, modifiers), with model-specific quirks called out. Master this once and you'll stop burning credits on re-rolls.

The six-part prompt structure

Every great prompt has the same six parts in roughly this order: (1) subject — who or what is in the frame, (2) scene — where they are and what they're doing, (3) light — direction, quality, colour temperature, (4) camera — lens, aperture, distance, (5) style — film stock, era, art reference, (6) modifiers — quality boosters, negative prompts, weights.

Example: 'Editorial portrait of a 28-year-old woman in a black slip dress (subject), leaning against a rain-wet brick wall at night in Tokyo's Shibuya district (scene), neon signs reflecting on her skin, cool blue key light with warm magenta rim from behind (light), shot on Hasselblad H6D 85mm f/1.4, three-quarter framing (camera), 90s Wong Kar-wai cinematography, anamorphic flare (style), ultra detailed, sharp focus on eyes (modifiers).'

That prompt produces a usable image first try on Flux, SDXL, and Midjourney. The same scene as 'a girl in Tokyo at night, cinematic' produces 50 generic results and a wasted afternoon.

Lighting vocabulary that actually steers the model

Lighting is the single highest-leverage word group. Specific terms the models all understand: rim light, key light, fill light, hair light, kicker, beauty dish, softbox, ring light, golden hour, blue hour, overcast, harsh midday, candlelight, neon ambient, tungsten interior, mixed colour temperature, single-source, three-point lighting, Rembrandt lighting, butterfly lighting, split lighting, chiaroscuro, low-key, high-key.

Avoid 'good lighting' and 'beautiful lighting' — they mean nothing to the model. Name the specific setup.

Camera vocabulary

Lens choice is shorthand for an entire look. 24mm = wide environmental; 35mm = documentary, street; 50mm = natural perspective; 85mm = classic portrait; 135mm = compressed flattering portrait; 200mm = sports, wildlife. Aperture: f/1.2-f/2 = creamy bokeh; f/4-f/5.6 = sharp subject + soft background; f/8-f/16 = everything in focus.

Camera bodies the models recognise: Hasselblad H6D, Phase One, Leica M, Canon R5, Sony A7R, Arri Alexa, RED Komodo. Specify when it matters — 'shot on Hasselblad' steers toward medium-format aesthetic regardless of which model you use.

Style references — the safest and the riskiest

Safe and effective references: '90s Wong Kar-wai', 'Roger Deakins cinematography', 'Annie Leibovitz portrait', 'Helmut Newton', 'Tim Walker editorial', 'Vogue Paris cover', 'Kodachrome 200', 'Polaroid 600', 'Cinestill 800T', 'Hollywood Golden Age'.

Risky references — they work but may be filtered or produce off-brand results: living artists by name (Greg Rutkowski, etc. — many models have been trained to ignore these), copyrighted franchises (Marvel, Disney — output usually unusable commercially), specific celebrity faces (some models block, others produce uncanny results).

Model-specific quirks

Flux: best at natural language. Write prompts as full sentences, not tag lists. Negative prompts barely needed. CFG ~3.5. Handles complex multi-clause prompts better than any other model.

SDXL: prefers comma-separated descriptors with weights. Heavy negative prompt required. CFG 7. Use ((double parens)) for emphasis.

Midjourney: terse prompts win. Use --ar, --stylize, --weird, --cref. Avoid lengthy negatives (use --no instead). The model imposes strong stylisation by default.

Ideogram: optimised for text-in-image. Put the text in quotes. Specify font style ('bold sans-serif', 'condensed serif'). Other models still struggle with legible text.

Negative prompts — what to always include

Universal SDXL negative: 'blurry, low quality, deformed hands, extra fingers, watermark, text, signature, cropped, bad anatomy, jpeg artifacts'.

Flux needs much less: usually 'watermark, text, deformed hands' is enough.

Midjourney: use --no for specific subjects you want excluded (e.g. --no people, --no text).

Iterative workflow

Pros don't first-try perfection. The workflow is: (1) draft a prompt using the six-part structure, (2) generate 4-8 variants on a fast tier, (3) pick the closest, lock the seed, (4) tweak one variable at a time (light, then camera, then style) until it's right, (5) re-render the winner on the high-quality tier with the locked seed.

Tools that let you lock the seed and tweak one word (Synexa, Automatic1111, ComfyUI) save more time than tools that don't. If your tool re-rolls the entire image every time you change a word, switch tools.

Common failure modes and the fix

Bad hands: add 'detailed hands, five fingers' to the positive and 'extra fingers, deformed hands, mutated hands' to the negative. Crop tight on the subject so hands aren't in frame, or generate then inpaint hands separately.

Generic faces: add a specific ethnicity, age, distinctive feature (freckles, jawline, eye colour). 'Beautiful woman' produces the same model-favoured face every time.

Over-saturated 'AI sheen': add 'natural skin tones, subtle film grain, muted colour grade' and lower CFG. Avoid 'masterpiece, 8k, hyperrealistic' — these push the model toward the over-rendered look.

Subject keeps drifting from prompt: raise CFG by 1-2 points, or simplify the prompt — too many competing clauses confuse the model.

Frequently asked questions

What is the best way to write an AI image prompt?
Use a six-part structure: subject, scene, lighting, camera, style, modifiers. Write in full sentences for Flux; comma-separated descriptors for SDXL; terse phrases for Midjourney.
How long should an AI image prompt be?
Flux handles 60-120 word prompts well. SDXL prefers 30-80 words of comma-separated descriptors. Midjourney works best with 15-40 word prompts. Longer is not better — clarity is.
Do I need a negative prompt?
Yes for SDXL — always include 'blurry, low quality, deformed hands, watermark, text'. Mostly no for Flux — the model is well-aligned out of the box. For Midjourney, use the --no flag instead of negative prompts.
Why do my AI images all look the same?
Two likely causes: your prompts are too generic (add specific lighting, camera, and style references), or you're not varying the seed (lock seed only when iterating, otherwise leave it random).
How do I prompt for photorealistic AI images?
Use Flux Pro Ultra. Specify lens (85mm), aperture (f/1.4), camera body (Hasselblad H6D), natural lighting (golden hour, window light), and avoid 'hyperrealistic 8k masterpiece' phrases that push toward the over-saturated AI look.

Ready to generate?

Free credits on signup. No card. 4K output, no watermark.