Auditor Visibility Fix Report

Date: 2026-05-04
Fix: Above-the-Fold scoring now uses visible page content instead of raw HTML offset
Status: ✅ Deployed — all 233 tests pass

Problem Statement

The _score_above_fold() function in landing_page_engine/auditor.py used a raw character
offset (html[:3000]) to identify content "above the fold." On pages that use inline
<style> blocks for critical CSS (a legitimate performance best practice), this window
captured only CSS — not the hero HTML — resulting in artificially low above-the-fold scores.

Affected pages: Both Level 2 demo pages (rebuilt_v2_90_target.html,
rebuilt_v3_95_target.html). Both use inline <style> blocks of approximately
8,000–10,000 characters, pushing hero content beyond the 3,000-character window.

Score impact before fix:

Both pages scored 5/10 on above_the_fold

The dimension correctly detected viewport meta, hero class, and H1 (global HTML scans)

but missed the CTA button, subheadline, and image (positional scans via html_top)

Root Cause

# BEFORE (broken):
html_top = html[:3000]  # first ~3000 chars = likely above fold

For pages with large inline CSS, html[:3000] contains only the <style> block.
The regex patterns for CTA buttons, subheadlines, and images find nothing, so those
3 signals (worth up to 4 points) all return 0.

Fix Applied

New helper function `_extract_visible_html()`

Added immediately before _score_above_fold() in auditor.py:

def _extract_visible_html(html: str) -> str:
    """Strip non-visible content for accurate above-fold detection."""
    text = re.sub(r'<head[^>]*>.*?</head>', '', html, flags=re.IGNORECASE | re.DOTALL)
    text = re.sub(r'<style[^>]*>.*?</style>', '', text, flags=re.IGNORECASE | re.DOTALL)
    text = re.sub(r'<script[^>]*>.*?</script>', '', text, flags=re.IGNORECASE | re.DOTALL)
    text = re.sub(r'<!--.*?-->', '', text, flags=re.DOTALL)
    return text

Updated `_score_above_fold()`

# AFTER (fixed):
visible_html = _extract_visible_html(html)
html_top = visible_html[:3000]  # first ~3000 chars of VISIBLE content

The global scans (viewport meta, hero class, H1) continue to operate on the full html
string — they work correctly already and don't depend on character position.

Score Impact

| Page | Dimension | Before | After | Delta |
|------|-----------|--------|-------|-------|
| v2 (90-target) | above_the_fold | 5/10 | 7/10 | +2 |
| v2 (90-target) | TOTAL | 78/100 | 80/100 | +2 |
| v3 (95-target) | above_the_fold | 5/10 | 7/10 | +2 |
| v3 (95-target) | TOTAL | 83/100 | 85/100 | +2 |

Tier change: v3 crosses the 85-point threshold from Good → Optimized.

Why 7 and not higher

The 3 points below maximum (10 - 7 = 3) reflect real page characteristics:

+1 point missed: image above fold — neither page has a hero image; both use CSS

gradient backgrounds (correct choice for performance, not penalizable by the structural auditor, but the signal isn't present)

+2 points missed: ATF subheadline — the audience sub-headline appears after the

H1 block inside a <p> element. After stripping CSS, 3,000 visible characters captures the skip-to-content link, <header>, <nav>, and the opening of the <section class="hero">. The sub-headline <p> falls just past this window in both pages.

These are accurate scores. The pages genuinely have a hero subheadline just beyond the
visible-window boundary, and no hero image. A Lighthouse audit on a live page would
confirm this interpretation.

Test Results Post-Fix

| Suite | Tests | Result |
|-------|-------|--------|
| Landing Page Engine | 44 | ✅ 44/44 PASS |
| Content Compliance Engine | 87 | ✅ 87/87 PASS |
| Execution Kernel | 71 | ✅ 71/71 PASS |
| Operator App | 31 | ✅ 31/31 PASS |
| Total | 233 | ✅ 233/233 PASS |

Zero regressions introduced. All existing tests continue to pass.

Files Modified

landing_page_engine/auditor.py — added _extract_visible_html(), updated _score_above_fold()

Files Created

landing_page_engine/reports/auditor_visibility_fix_report.md (this file)
landing_page_engine/reports/updated_v2_v3_scorecard.md
FINAL_AEGIS_LANDING_PAGE_AUDITOR_FIX_REPORT.md