Share:

Photo Generation System — Complete Technical Report

43 rounds of R&D across 8 subjects • Iconic by AI • March 2026

1. Executive Summary

Rounds Tested
43
Test Subjects
8
Portrait Avg
85%
Half Body Avg
76%
Full Body Avg
60%
Scene Styles
10

After 43 rounds of systematic testing, we have a proven formula for portrait and half-body photo generation (85% and 76% avg). Full body remains harder (60%) but the lora_scale distance relationship is now understood. Scene photos work via direct LoRA generation — the biggest breakthrough of the project. The next frontier is environment integration (making people look like they belong in the scene, not pasted on top) and overproducing + curating (generate 3-4x what we need, pick the best).

The Winning Formula

ModelFlux LoRA (trained per subject on 8 selfies)
Guidance3.5 (scene photos) / 2.5 (studio portraits)
Steps35
LoRA Scale0.9 (close-up) → 1.1 (half body) → 1.3 (full body)
Film StockKodak Portra 400 ONLY (warm skin tones)
IdentityOHWX {age} {ethnicity} {gender} — no hair/eye colour
CandidatesGenerate 4, keep best (+3-5 pts)
Best PosesWalking + Laughing (natural movement)

2. The 10 Rules We Learned

Rules of AI Photo Generation

1Identity lives in the LoRA weights, not the prompt. Don't describe hair colour, eye colour, or facial features — the LoRA already knows. Adding them FIGHTS the model and drops scores by 5+ points.
2Distance kills identity. The further the camera, the smaller the face, the weaker the LoRA's grip. Compensate by increasing lora_scale: 0.9 (close-up) → 1.1 (half body) → 1.3 (full body).
3Never post-process. Face swap, inpainting, outpainting, Kontext edit — ALL make things worse. Generate it right the first time. Every modification degrades identity.
4Concrete visual anchors, not abstract descriptions. "Shoes visible on the ground" works. "Full body photograph" doesn't. Flux needs specific physical details, not concepts.
5Movement makes photos real. Walking and laughing poses transform stiff AI shots into natural influencer photos. Static poses look like test shots.
6Sunny beats dramatic. Natural sunlight (Amalfi Coast) produces more realistic photos than dramatic lighting (Tokyo Neon). Neon looks more "AI-generated".
7Kodak Portra 400 only. Never mix film stocks. Fuji = cooler/greener (wrong for portraits). Ektar = over-saturated skin. Portra = warm, natural skin tones.
8Overgenerate and curate. Generate 4 candidates, keep the best. Adds 3-5 points for ~$0.12 extra per style. The cheapest quality improvement available.
9Weave the environment INTO the person. Don't describe person + backdrop separately. Describe snow ON their shoulders, frost in their hair, rosy cheeks from cold. Otherwise they look photoshopped.
10LoRA quality = selfie quality. Bad selfies = bad LoRA = bad photos forever. The 8-angle selfie protocol exists for a reason. Neil 2 (39% avg) proves this.

3. The 3 Shot Types

After testing everything from selfie-distance to 15-metre environment shots, we've settled on exactly 3 shot types. Each has its own proven lora_scale, aspect ratio, and prompting approach.

Close-Up / Selfie
78-85%
Aspect: 3:4
LoRA Scale: 0.9
Face dominates frame
Best for headshots & portraits
0.7 tested = 39% (too low)
Half Body
76%
Aspect: 3:4
LoRA Scale: 1.1
Waist-up, shows outfit
Best for scene photos
Best-of-4 → 85%
Full Body
60%
Aspect: 9:16
LoRA Scale: 1.3
Head to shoes, ground visible
Needs movement language
Emily hit 85% (laughing)
Distance Shots — PARKED (R38 avg 6%)

Environment-first prompting (describe scene first, subject last) controls distance but R38 went too extreme. The LoRA can't hold identity when the face is tiny. Parked until the 3 core types are production-ready.

4. LoRA Scale by Distance — The Key Discovery

The single most important finding from rounds 31-43: lora_scale must increase with camera distance. At full body distance, the face is a tiny fraction of the image. The LoRA's influence gets diluted across a larger canvas. Higher lora_scale forces more of the trained face through.

FramingLoRA ScaleAvg ScoreEvidenceStatus
Close-up0.978-85%R1-R13 baseline. R39 tested 0.7 = 39% (failed)CONFIRMED
Half body1.176%R28-R30 scenes. R35 full body at 1.1 = 60% (+9%)CONFIRMED
Full body1.360%R37 at 1.3 = 61%. R43 snow at 1.3 = 55%WORKING
Distance1.3+6%R38 environment-first. Too far back, LoRA can't holdPARKED

LoRA Scale Experiment Timeline

RoundLoRA ScaleFramingAvgKey Finding
R1-R130.9Portrait82%Baseline — optimal for close-up
R310.9Full body56%First full body — face too small
R330.9Full body52%Verbose prompts don't help
R340.9Full body51%Movement helps quality, not score
R351.1Full body60%+9 pts — scale compensates for distance
R371.3Full body61%Slight improvement over 1.1
R381.3Distance6%Too far — LoRA can't hold at any scale
R390.7Close-up39%FAILED — identity drifts, 0.9 is the floor
R430.9/1.1/1.3All 352%Snow scene drags all framings down

The relationship is clear: further camera = higher lora_scale. But there's a ceiling — beyond 1.3, you risk waxy/over-fitted faces. And beyond a certain distance (R38), no amount of LoRA scale can hold the identity. The practical limit is full body with feet visible.

5. Scoring System

👤
Face Match
35%
🔍
Sharpness
15%
Exposure
15%
🎨
Aesthetic
20%
🧍
Body Proportion
15%
LevelScore RangeAction
Green70%+Ship — production quality
Amber25-69%Review — may be usable
Red<25%Reject — auto-regenerate

6. The 43-Round Journey

Average Score by Round

Green = core tuning | Blue = style expansion | Red = outpainting (failed) | Amber = expression test | Purple = scene/full body | Teal = LoRA scale R&D

RoundPhaseFocusAvgKey Result
R1CoreBaseline — default settings68%First generation, no tuning
R2CoreGuidance sweep (1-5)72%2.5 emerged as optimal
R3CoreSteps sweep (20-50)74%35 steps optimal (50 = no gain)
R4CoreLoRA scale sweep (0.7-1.0)76%0.9 best for portraits
R5CoreFilm stock comparison77%Portra 400 wins
R6CoreIdentity — with hair/eye73%Hair/eye colour HURTS scores
R7CoreIdentity — without hair/eye78%+5pts removing hair/eye from prompt
R8CoreLighting variations79%Studio dramatic + golden hour best
R9CoreBackground variations80%Solid/gradient > complex
R10CoreClothing styles test81%Tailored blazer, smart casual top
R11CoreMulti-subject validation80%Formula holds across all 8 subjects
R12CoreBest-of-4 selection83%+3-5 pts from picking best of 4
R13CoreProduction run — all styles85%Peak portrait performance
R14StyleNew clothing styles82%6 new styles, 4 scored well
R15StyleFemale-specific styles80%Gala dress, cocktail — good
R16StyleExtended style library76%Niche styles pull average down
R17OutpaintOutpainting v166%Just added border
R18OutpaintOutpainting v265%Edge artifacts
R19OutpaintOutpainting v364%Prompt guidance ignored
R20OutpaintFace swap62%Avg -10% from originals
R21OutpaintKontext Pro edit65%Distorts face
R22OutpaintInpainting58%Tiny files, garbage
R23OutpaintCombined methods63%Stacking failures doesn't help
R24OutpaintOutpainting v462%More border = worse
R25OutpaintKontext Pro scene swap19%WORST — complete identity destruction
R26OutpaintOutpainting v564%Declared dead end
R27ExpressionSmiling test62%-20% vs neutral. LoRA trained neutral
R28SceneLoRA direct scene gen85%BREAKTHROUGH — scene in prompt works
R29ScenePortrait scenes (all subjects)78%10 scenes, all ≥70%
R30SceneHalf body scenes76%Three-quarter framing, 9/10 ≥70%
R31Full BodyFirst full body56%Face too small at distance
R32Full BodyPortrait → outpaint down75%Two-step works 75% of the time
R33Full BodyExplicit framing text52%More words ≠ more specific
R34Full BodyPose/movement language51%Photos look natural (scores same)
R35LoRA Scalelora_scale 1.160%+9 pts — biggest lever found
R36LoRA Scalelora_scale 1.3 (uniform)61%Marginal improvement over 1.1
R37LoRA Scalelora_scale 1.3 + distance61%Distance language doesn't help uniformly
R38DistanceEnvironment-first prompt6%Too far back — PARKED
R39LoRA Scalelora_scale 0.7 close-up39%FAILED — 0.9 is the floor
R40Full BodyBody proportion fix52%"Natural proportions" prompt added
R41SceneSnow scene half body52%People look photoshopped in scene
R43SceneSnow 3 shot types + Portra52%Snow drags all framings down ~25%

7. What Failed & Why

Outpainting (R17-R26) — avg 62-66%

10 rounds trying to extend the canvas for scene backgrounds. Every variant (masked edges, prompt guidance, larger canvas) just added a visible border around the original. The model can't generate coherent scene extensions.

Kontext Pro Scene Swap (R25) — 19%

The worst result across all 43 rounds. Edits the image to change the background but completely destroys face identity. Output is a different person.

Face Swap — yan-ops (R20) — avg -10%

Post-processing face swap from reference selfie. Hurts high-scoring photos most — introduces artifacts that degrade an already good image.

Inpainting — flux-fill-pro (R22) — 58%

Mask and regenerate background. Tiny file sizes (low detail), garbage quality. Inpainting model doesn't respect LoRA identity.

Smiling Expression (R27) — 62% (-20% from neutral)

LoRA trained on neutral expressions. Forcing a smile distorts the learned face geometry. Neutral or "natural relaxed expression" only.

LoRA 0.7 on Close-ups (R39) — 39%

Theory: lower LoRA at close distance = more natural. Reality: identity drifts to wrong age/gender/ethnicity. 0.9 is the absolute floor.

Environment-First Distance Shots (R38) — 6%

Flipped prompt to describe environment first, subject last. Controls distance but R38 pushed too far — LoRA can't hold identity when face is tiny. Usable concept but needs careful calibration.

Fuji Superia 400 Film Stock (R41) — wrong choice

Used Fuji instead of Kodak Portra. Fuji produces cooler/greener tones — wrong for warm portrait skin tones. Always use Kodak Portra 400.

Pattern: Every approach that MODIFIES a generated image (swap, edit, extend, inpaint) fails. Generate it right the first time — let the LoRA handle identity, describe everything else in the prompt.

8. Scene Photos Breakthrough

LoRA Direct Generation = The Answer

Instead of generating a portrait and modifying it, describe the scene IN the prompt. The LoRA generates the person IN the scene from scratch. Identity is baked into the model weights — no post-processing needed.

10 Working Scenes (all ≥70% on portrait)

ScenePortraitHalf BodyBest For
Tokyo Neon85%84%Urban, edgy look (but more "AI" feel)
Art Gallery83%80%Clean, minimal, sophisticated
Amalfi Coast80%78%Warm, natural, sun-drenched
Paris Cafe78%80%European elegance
Modern Office78%80%Professional headshots
London Street77%76%Urban, moody, British
NYC Rooftop76%74%Skyline backdrop
Riviera Terrace75%72%Coastal luxury
Garden Party74%70%Outdoor, natural light
Mountain Lodge72%70%Cozy, warm tones

The Environment Integration Problem (R41, R43)

Scene photos WORK in terms of face scores — but the person often looks photoshopped into the scene rather than being part of it. This is the next frontier to solve.

Fix discovered in R43: Weave the environment INTO the person description. Don't say "person + snowy background". Say "snowflakes settling on their shoulders, rosy cheeks from the cold, frost dusting their hair, breath visible in the air". Make the scene physically interact with the subject.

This improved visual integration in R43 but didn't fix it completely — snow scene is inherently harder than sunny scenes. The approach needs further testing on easier scenes (Amalfi Coast, Paris Cafe) where the integration is more subtle (warm light on skin, wind in hair).

9. Pose & Movement Language

Round 34 tested 4 movement descriptions. The scores were similar across all poses, but the visual quality of Walking and Laughing was dramatically better — photos looked like real influencer editorial shoots instead of stiff AI test shots.

PoseAvg ScoreWhy
Laughing56%Natural expression, face visible, dynamic energy. Best overall.
Walking55%Natural stride, body in motion. Use with "shoes visible on the ground".
Looking Away48%Face turned from camera — scorer can't match what it can't see.
Leaning47%Static pose, less natural. Better than standing still but worse than walking.

Always use movement language. "Walking confidently" or "captured mid-laugh" transforms every shot. Static poses ("standing in front of") produce stiff, obviously-AI photos. Movement + concrete anchors ("shoes on the ground", "hands in jacket pockets") = natural photos.

10. Body Proportions & Film Stock

The Flux Body Problem

Flux has a systematic bias toward exaggerated hips and butt on women in full body shots. This is a model limitation, not a prompt issue. Mitigation:

Film Stock — Kodak Portra 400 ONLY

Film StockCharacterVerdict
Kodak Portra 400Warm, flattering skin tones, soft natural grainUSE THIS
Fuji Pro 400HCooler, pastel, greener tonesToo cold for portraits
Kodak Ektar 100Vivid, saturated coloursOver-saturates skin
CineStill 800TBlue/cyan tungsten shiftNight scenes only
Kodak Tri-XBlack and white, high contrastRemoves colour info
Never mix film stocks in one prompt

Saying "shot on Portra 400 with Ektar colours" confuses the model. Pick one and stick with it. Portra 400 for everything portrait-related.

11. Subject Performance

SubjectGenderPortrait AvgFull Body AvgBest EverNotes
SarahFemale89%66%94%Top performer. Ideal LoRA training data.
MikeMale86%48%92%Consistent. Strong across all styles.
EmilyFemale85%71%92%Best female full body (hit 85% laughing in R35).
Chloe 2Female84%64%89%Second session improved over Chloe 1.
ScottMale82%63%88%Primary test subject. Bald = distinctive LoRA.
ChloeFemale80%64%86%Weak LoRA — sometimes generates Chinese-looking face.
Emily 2Female80%55%87%Consistent with Emily 1.
Neil 2Male39%32%52%Poor selfie quality. Bad training = bad LoRA forever.

Chloe 1's problem: Her LoRA sometimes produces a Chinese-looking face, especially in the first half of multi-shot runs. This is a training data issue — her selfies may have insufficient angle diversity. Chloe 2 (re-trained with better selfies) scores 4% higher consistently.

12. Prompt Architecture

Every prompt is built from 6 modular layers via prompt_builder.js. No prompts are hardcoded in generation scripts.

1
Identity
OHWX {age} {ethnicity} {gender} — trigger word + demographic anchoring only. NO hair/eye.
2
Clothing
Style-specific outfit with physical details. Winter gear: "snow settling on shoulders". No black dress for women.
3
Movement
"Walking confidently", "captured mid-laugh". Concrete anchors: "shoes visible on the ground".
4
Environment
Scene description INTERACTING with subject. "Snowflakes on collar", not just "snowy background".
5
Lighting
Golden hour, soft winter light, warm sun. Sunny natural > dramatic neon for realism.
6
Camera
"Shot on Kodak Portra 400". One film stock only. Slight natural grain = good.

Example Prompts by Shot Type

CLOSE-UP (lora 0.9):

OHWX 30 year old Caucasian man, wearing a tailored navy blazer, close-up portrait with warm golden hour light illuminating face, shallow depth of field, natural relaxed expression, shot on Kodak Portra 400

HALF BODY (lora 1.1):

OHWX 30 year old Caucasian man, wearing a linen shirt, three-quarter shot walking along Amalfi Coast cliff path, warm Mediterranean sun on face, captured mid-laugh, natural body proportions, shot on Kodak Portra 400

FULL BODY (lora 1.3):

OHWX 30 year old Caucasian man, wearing a casual shirt and dark jeans, walking confidently along a sun-drenched coastal path, shoes visible on the ground, full figure from head to shoes, natural body proportions, warm smile, golden hour light, shot on Kodak Portra 400

13. Settings Sensitivity

ParameterOptimalTestedEffect
Guidance 2.5-3.5 1.0-5.0 Below 2: too loose. Above 4: over-saturated. 2.5 for studio, 3.5 for scenes.
Steps 35 20-50 Min 1, max 50 for Flux. 20=soft, 30=usable, 35=optimal, 50=no gain but 2x cost.
LoRA Scale 0.9-1.3 0.0-2.0 0.7=identity drift. 0.9=portrait. 1.1=half body. 1.3=full body. >1.5=waxy/over-fit.
Film Stock Portra 400 5 stocks Only warm skin tone stock. Never mix stocks. Never use Fuji for portraits.
Candidates 4 1-6 1→4: +3-5 pts. 4→6: +1 pt (not worth 50% cost increase).
Identity Age+Eth+Gen 3 variants Full description: -5 pts. Age+ethnicity+gender: best. No anchor: random drift.
Aspect Ratio 3:4 / 9:16 3:4 for portrait + half body. 9:16 for full body (forces vertical framing with ground).

14. Style Rankings

Top Styles — Male

#StyleBestAvg
1Studio Dramatic92%89%
2Smart Casual90%88%
3Tailored Blazer88%85%
4Outdoor Natural91%85%
5Corporate Headshot87%84%
6Leather Jacket86%82%

Top Styles — Female

#StyleBestAvg
1Studio Dramatic92%89%
2Outdoor Natural91%87%
3Elegant Blouse89%86%
4Smart Professional88%85%
5Cocktail Dress86%83%
6Casual Chic85%82%

15. Unsolved Problems & Next Steps

Problems Still Open

ProblemImpactPossible Fix
Environment integrationPeople look photoshopped into scenesWeave scene INTO person description (R43 approach). Test on easy sunny scenes first.
Distance inconsistencySame prompt = different distances per shotComposition anchoring: "head near top of frame, feet at bottom". Generate more, filter by distance.
Full body scores60% avg vs 85% portraitHigher candidates (4-6), stricter curation. Or portrait → outpaint down (R32 method).
Body proportions (women)Exaggerated hips/butt"Natural body proportions" prompt. Community Body FLUX FIX LoRA (untested).
Snow/cold scenes52% avg (25% below sunny scenes)Snow is inherently harder. Focus on sunny scenes for production. Snow = nice-to-have.
Chloe 1 LoRASometimes generates Chinese faceRetrain with better selfie angles. Chloe 2 session already 4% better.

Production Strategy Going Forward

Overgenerate + Curate

Stop testing variables. Generate 3-4x the photos needed using proven settings on easy scenes (Amalfi Coast, Art Gallery, Paris Cafe, Modern Office). Pick the best. This is cheaper and faster than chasing the last 10% through prompt engineering.

Production Run Settings

ScenesTop 4 scorers: Amalfi Coast, Art Gallery, Paris Cafe, Modern Office
FramingsClose-up (0.9) + Half body (1.1) — full body optional
Candidates4 per style, keep best
PosesWalking + Laughing only
FilmKodak Portra 400
Cost~$2.56 per subject (4 scenes x 2 framings x 4 candidates x $0.04)
Photo Generation System — Iconic by AI • 43 rounds, 8 subjects • March 2026