Bias Notice

Last updated · 2026-05-04

MOG.LAB rates faces. Any model that does that inherits the blind spots of the data it was trained on. This page is our plain-English account of where ours falls short and what we're doing about it.

What the model was trained on

Our primary scoring head is a CNN fine-tuned on SCUT-FBP5500, a public benchmark dataset of 5,500 face images with human aesthetic ratings. SCUT-FBP5500 is, by the authors' own description, predominantly East Asian and Caucasian young adults. There are far fewer Black, South Asian, Hispanic, Indigenous, older, or gender-nonconforming faces in that dataset than appear on the open internet, let alone in the world.

Concretely, this means the model has been told what “attractive” looks like by a sample that doesn't reflect the full range of humans who might use this app. On faces underrepresented in the training data, scores are noisier and biased — usually downward.

What we do at inference time

We z-score the model's output within (apparent age bucket, gender presentation) buckets, using a precomputed reference distribution. This means the score you see compares you against a reference distribution of similar age and gender presentation, instead of pretending one absolute scale exists across very different demographics.

We deliberately do notz-score on perceived ethnicity. That's a slippery slope toward “you're a 7 for[group]” which is exactly the bias we're trying to remove. The fix for ethnicity bias belongs in the training data, not in inference-time normalization.

What we're doing to fix it

More diverse training data. Mixing in MEBeauty (multi-ethnic) and our own opt-in pairwise calibration data so the model sees a wider population during training.
Pairwise human calibration.Asking humans to compare pairs of faces (psychometrically more reliable than absolute 0–10 ratings) and fitting a Bradley–Terry model on top.
Honesty about uncertainty. Every score ships with a confidence band. When the ensemble is split, the round is decided by a second model and the band widens. We surface that.
A strict quality gate. Bad lighting, bad angle, or partial occlusion voids the round (no ELO change) instead of converting an artifact into a verdict against you. See the camera setup screen before each round.

What we are NOT doing

We are not estimating or labeling your ethnicity, sexual orientation, health, mood, or income.
We are not using your score for anything outside the leaderboard inside this app. Not for hiring, dating, advertising, identity verification, insurance, or background checks. None of that.
We do not believe a face score is a meaningful measure of a person. The product is designed as a calibrated meme, not a verdict. If a number on a website damages how you feel about your face, please close the tab.

Reporting bias

If you believe the model is treating you (or a category of people) unfairly, please tell us: bias@mog.lab. Specific examples (handles, approximate timestamps) are far more useful than general impressions and we read every report.