Auditor Visibility Fix Report
Date: 2026-05-04
Fix: Above-the-Fold scoring now uses visible page content instead of raw HTML offset
Status: ✅ Deployed — all 233 tests pass
Problem Statement
The _score_above_fold() function in landing_page_engine/auditor.py used a raw character
offset (html[:3000]) to identify content "above the fold." On pages that use inline<style> blocks for critical CSS (a legitimate performance best practice), this window
captured only CSS — not the hero HTML — resulting in artificially low above-the-fold scores.
Affected pages: Both Level 2 demo pages (rebuilt_v2_90_target.html,rebuilt_v3_95_target.html). Both use inline <style> blocks of approximately
8,000–10,000 characters, pushing hero content beyond the 3,000-character window.
Score impact before fix:
- Both pages scored 5/10 on
above_the_fold - The dimension correctly detected viewport meta, hero class, and H1 (global HTML scans)
but missed the CTA button, subheadline, and image (positional scans via
html_top)
Root Cause
# BEFORE (broken): html_top = html[:3000] # first ~3000 chars = likely above fold
For pages with large inline CSS, html[:3000] contains only the <style> block.
The regex patterns for CTA buttons, subheadlines, and images find nothing, so those
3 signals (worth up to 4 points) all return 0.
Fix Applied
New helper function _extract_visible_html()
Added immediately before _score_above_fold() in auditor.py:
def _extract_visible_html(html: str) -> str:
"""Strip non-visible content for accurate above-fold detection."""
text = re.sub(r'<head[^>]*>.*?</head>', '', html, flags=re.IGNORECASE | re.DOTALL)
text = re.sub(r'<style[^>]*>.*?</style>', '', text, flags=re.IGNORECASE | re.DOTALL)
text = re.sub(r'<script[^>]*>.*?</script>', '', text, flags=re.IGNORECASE | re.DOTALL)
text = re.sub(r'<!--.*?-->', '', text, flags=re.DOTALL)
return text
Updated _score_above_fold()
# AFTER (fixed): visible_html = _extract_visible_html(html) html_top = visible_html[:3000] # first ~3000 chars of VISIBLE content
The global scans (viewport meta, hero class, H1) continue to operate on the full html
string — they work correctly already and don't depend on character position.
Score Impact
| Page | Dimension | Before | After | Delta |
|------|-----------|--------|-------|-------|
| v2 (90-target) | above_the_fold | 5/10 | 7/10 | +2 |
| v2 (90-target) | TOTAL | 78/100 | 80/100 | +2 |
| v3 (95-target) | above_the_fold | 5/10 | 7/10 | +2 |
| v3 (95-target) | TOTAL | 83/100 | 85/100 | +2 |
Tier change: v3 crosses the 85-point threshold from Good → Optimized.
Why 7 and not higher
The 3 points below maximum (10 - 7 = 3) reflect real page characteristics:
- +1 point missed: image above fold — neither page has a hero image; both use CSS
- +2 points missed: ATF subheadline — the audience sub-headline appears after the
<p> element. After stripping CSS, 3,000 visible characters captures
the skip-to-content link, <header>, <nav>, and the opening of the <section class="hero">.
The sub-headline <p> falls just past this window in both pages.
These are accurate scores. The pages genuinely have a hero subheadline just beyond the
visible-window boundary, and no hero image. A Lighthouse audit on a live page would
confirm this interpretation.
Test Results Post-Fix
| Suite | Tests | Result |
|-------|-------|--------|
| Landing Page Engine | 44 | ✅ 44/44 PASS |
| Content Compliance Engine | 87 | ✅ 87/87 PASS |
| Execution Kernel | 71 | ✅ 71/71 PASS |
| Operator App | 31 | ✅ 31/31 PASS |
| Total | 233 | ✅ 233/233 PASS |
Zero regressions introduced. All existing tests continue to pass.
Files Modified
landing_page_engine/auditor.py— added_extract_visible_html(), updated_score_above_fold()
Files Created
landing_page_engine/reports/auditor_visibility_fix_report.md(this file)landing_page_engine/reports/updated_v2_v3_scorecard.mdFINAL_AEGIS_LANDING_PAGE_AUDITOR_FIX_REPORT.md