Bias Notice
Last updated · 2026-05-04
MOG.LAB rates faces. Any model that does that inherits the blind spots of the data it was trained on. This page is our plain-English account of where ours falls short and what we're doing about it.
What the model was trained on
Our primary scoring head is a CNN fine-tuned on SCUT-FBP5500, a public benchmark dataset of 5,500 face images with human aesthetic ratings. SCUT-FBP5500 is, by the authors' own description, predominantly East Asian and Caucasian young adults. There are far fewer Black, South Asian, Hispanic, Indigenous, older, or gender-nonconforming faces in that dataset than appear on the open internet, let alone in the world.
Concretely, this means the model has been told what “attractive” looks like by a sample that doesn't reflect the full range of humans who might use this app. On faces underrepresented in the training data, scores are noisier and biased — usually downward.
What we do at inference time
We z-score the model's output within (apparent age bucket, gender presentation) buckets, using a precomputed reference distribution. This means the score you see compares you against a reference distribution of similar age and gender presentation, instead of pretending one absolute scale exists across very different demographics.
We deliberately do notz-score on perceived ethnicity. That's a slippery slope toward “you're a 7 for[group]” which is exactly the bias we're trying to remove. The fix for ethnicity bias belongs in the training data, not in inference-time normalization.
What we're doing to fix it
- More diverse training data. Mixing in MEBeauty (multi-ethnic) and our own opt-in pairwise calibration data so the model sees a wider population during training.
- Pairwise human calibration.Asking humans to compare pairs of faces (psychometrically more reliable than absolute 0–10 ratings) and fitting a Bradley–Terry model on top.
- Honesty about uncertainty. Every score ships with a confidence band. When the ensemble is split, the round is decided by a second model and the band widens. We surface that.
- A strict quality gate. Bad lighting, bad angle, or partial occlusion voids the round (no ELO change) instead of converting an artifact into a verdict against you. See the camera setup screen before each round.
What we are NOT doing
- We are not estimating or labeling your ethnicity, sexual orientation, health, mood, or income.
- We are not using your score for anything outside the leaderboard inside this app. Not for hiring, dating, advertising, identity verification, insurance, or background checks. None of that.
- We do not believe a face score is a meaningful measure of a person. The product is designed as a calibrated meme, not a verdict. If a number on a website damages how you feel about your face, please close the tab.
Reporting bias
If you believe the model is treating you (or a category of people) unfairly, please tell us: bias@mog.lab. Specific examples (handles, approximate timestamps) are far more useful than general impressions and we read every report.