Five dimensions.
One framework each.

Every score is anchored to a named, established framework — not vibes. Same scrape input, same score on a re-run.

dimensions.json

5 dimensions

dimension

Headline Clarity

framework

MECLABS Conversion Heuristic →

Flint McGlaughlin's heuristic — C = 4m + 3v + 2(i-f) - 2a — names the H1 as the load-bearing element of value (v) and friction (f). We reward H1s that immediately answer who this is for and what they get; we penalize H1s that lean on the subhead to disambiguate.

dimension

Value Proposition

framework

April Dunford's positioning →

We score against Dunford's positioning components: a named competitive alternative, a unique attribute, the value that attribute unlocks, and the best-fit customer. Category-speak with no named alternative or no defensible attribute scores below 50.

dimension

CTA Strength

framework

Cialdini commitment + Shapiro specific-outcome →

We reward CTAs that name the outcome ('Start my 3-minute audit', 'Generate my report'). Generic verbs that name no outcome ('Get Started', 'Learn More', 'Sign Up') default to the 40–59 band regardless of placement.

dimension

Social Proof

framework

Cialdini social proof + NN/g trustworthiness →

Specific and verifiable beats voluminous: 'Used by 12,400 Shopify stores' beats a wall of 30 unlabelled logos. Unverifiable claims ('Loved by thousands') and logo walls without context lose points.

dimension

Visual Hierarchy

framework

NN/g F-pattern + Gestalt proximity →

Based on the screenshot and headline ordering, does the eye's first fixation land on the value proposition? Or on a nav item, hero illustration, or decoration? Pages whose F-pattern's first horizontal sweep crosses the value prop score higher.

rubric.md

0–100 bands

// per-dimension scoring scale

90–100World-class. You'd screenshot this in a deck.
75–89Strong. Minor gaps.
60–74Solid but forgettable.
40–59Noticeable weaknesses.
20–39Actively hurts conversion.
0–19Broken.

The overall score is the rounded arithmetic mean of the five dimension scores. The model never sets the overall directly — that removes an entire class of "fudged the total to match the gut verdict" failures.

disclosure.json

full transparency

// the parts we don't hide

model: claude-sonnet-4-6
temperature: 0 (no creative sampling)
scoring: average of 5 dimensions, no hand-tuning
prompt version: v0.4.0-2026-04-17
reproducibility: Same scrape input → same score on re-run, gated by an internal eval harness with stddev ≤ 2.

non_negotiables.md

guardrails

// rules baked into the prompt

▸ Every analysis paragraph must contain at least two direct quotes from the actual page text. No quote, no score.
▸ When a score crosses below 60 or above 85, the analysis must name the framework being applied. That is the evidence you paid for.
▸ Each "steal this" is a concrete, copy-pasteable tactic — implementable in under 15 minutes — not an abstraction like "clarify your value prop."
▸ Banned phrases: could be improved, might benefit from, consider adding, overall,. If the model writes one, the report is rejected.