We have 7 demo accounts — real people with real selfies and trained LoRA models. Every time we change the photo generation code (prompts, settings, scoring, post-processing), we regenerate ALL demo batches and compare before/after scores. This gives us a controlled, repeatable way to measure whether changes actually improve quality.
node run_test_batches.jsOld photos are backed up — nothing gets deleted. Every run is saved to test_results/ with full score history.
| Name | Gender | Age | Height | Photos | Avg | Best | Worst | ≥85% | ≥70% | <50% | Gap to 85% |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Scott | Male | 35 | 5'7" | 8 | 81% | 87% | 79% | 1 | 8/8 | 0 | -4% |
| Sarah | Female | 45 | 5'3" | 8 | 78% | 94% | 52% | 3 | 6/8 | 0 | -7% |
| Emily | Female | 18 | 5'8" | 21 | 73% | 80% | 62% | 0 | 18/21 | 0 | -12% |
| Neil 1 | Male | 49 | 6'1" | 17 | 73% | 83% | 50% | 0 | 14/17 | 0 | -12% |
| Neil 2 | Male | 49 | 6'1" | 11 | 71% | 77% | 64% | 0 | 7/11 | 0 | -14% |
| Mike | Male | 52 | 6'0" | 21 | 69% | 77% | 58% | 0 | 14/21 | 0 | -16% |
| Chloe | Female | 16 | 5'6" | 17 | 61% | 73% | 44% | 0 | 1/17 | 3 | -24% |
| OVERALL | 103 | 72% | 94% | 44% | 4 | 68/103 | 3 | -13% | |||
The LoRA generates photos where the face sometimes drifts — different angle, slightly different features, wrong expression. The clothing and background are fine, but the face doesn't match the selfies closely enough. Some photos nail it (94%!), others miss badly (44%).
Sarah's best photo scores 94% — proof the LoRA CAN produce a near-perfect face. Her worst is 52%. If we could put that 94% face on every photo, we'd be done. The face is right. The outfit changes shouldn't affect the face score.
The gap between best and worst per person tells the story:
| Person | Best | Worst | Gap | Meaning |
|---|---|---|---|---|
| Sarah | 94% | 52% | 42pts | LoRA CAN do it — just inconsistent |
| Chloe | 73% | 44% | 29pts | Teen face is hard — needs retrain at 1500 steps |
| Neil 1 | 83% | 50% | 33pts | Some styles break the face more than others |
| Mike | 77% | 58% | 19pts | Moderate drift — fixable |
| Emily | 80% | 62% | 18pts | Young face, moderate drift |
| Neil 2 | 77% | 64% | 13pts | Most consistent — smallest gap |
| Scott | 87% | 79% | 8pts | Almost there — best performer |
Instead of trying to get the LoRA to generate a perfect face every time (unreliable), we fix the face after generation. This is how professional AI photo studios work:
| Model | What It Does | Best For | Speed | Cost |
|---|---|---|---|---|
| Face Fusion lucataco/facefusion |
Takes a source face and swaps it onto a target photo. Preserves pose, lighting, expression of target. | Exactly what we need — swap selfie face onto generated photo | ~10s | ~$0.01 |
| InsightFace Swap yan-ops/face_swap |
InsightFace-based face swap. High quality face replacement. | Same approach — proven InsightFace tech | ~8s | ~$0.01 |
| Flux Inpainting black-forest-labs/flux-fill-pro |
Masks part of image, regenerates masked area with prompt. Could mask clothes/bg and keep face. | Could change clothes on best photo — but might shift face | ~15s | ~$0.03 |
| CodeFormer lucataco/codeformer |
Face restoration — removes blemishes, smooths skin, fixes artifacts after swap. | Post-swap cleanup — makes face look natural | ~5s | ~$0.005 |
| IP-Adapter Face Various |
Uses a face image as a "style reference" during generation. Guides the model to produce similar face. | Could help during generation — but less reliable than post-swap | ~20s | ~$0.03 |
Generate photo with LoRA. Face quality depends on luck.
Generate photo, then swap in the real face. Guaranteed consistency.
| Step | Current Cost | New Cost | Change |
|---|---|---|---|
| LoRA Training | ~$2.50 | ~$2.50 | Same |
| Photo Generation (21 photos x 2 candidates) | ~$1.26 | ~$1.26 | Same |
| Face Swap (21 photos) | - | ~$0.21 | New |
| Face Restore (21 photos) | - | ~$0.11 | New |
| Face Scoring | ~$0 (local) | ~$0 (local) | Same |
| TOTAL per client | ~$3.76 | ~$4.08 | +$0.32 (+8%) |
8% cost increase for a potential jump from 72% to 85%+ average. That's the trade.
01_Classic_Studio_swapped.png). Score both. Compare side-by-side. We don't delete anything — we just see if swapping improves scores.
generate_photos.js. Every generated photo gets the swap applied. Score after swap. Auto-regen only if still under 85% after swap.
Closest to target. Face swap could push all 8 photos above 85%.
Best single photo (94%). 3 already above 85%. Yoga Studio needs swap badly.
Consistent 70s. No single photo above 85%. Face swap should lift the entire batch.
Worst performer. Teen face + 1000-step LoRA. Needs retrain at 1500 steps AND face swap.
Pick one batch (Scott — already at 81%). Run face swap on all 8 photos using his best selfie as the source. Save swapped versions alongside originals. Score both. If face swap lifts all photos to 85%+, we roll it out to all 7 batches.
Estimated time: 30 minutes to build, 2 minutes to run on Scott's 8 photos.
Estimated cost: ~$0.08 (8 photos x $0.01 swap).
If it works on Scott, we immediately test on Chloe (hardest case at 61%). If it works on both, we integrate it into the pipeline.