Every statistic in Lighthouse passes through a 6-stage deterministic verification pipeline before publication. No editorial shortcuts. No LLM-assigned grades.
Continuous monitoring of 23,000+ live intelligence feeds — government statistical APIs, academic RSS, industry report publishers, regulatory filings, and verified social signals. New data is ingested automatically across 13 languages and 27 government statistical APIs.
23K+ sources monitoredNamed entity recognition and statistical pattern detection identify numeric claims, their context, units, and methodology signals. Each claim is tagged with an extraction confidence score before entering the grading queue.
Sub-100ms per statCross-source deduplication at scale eliminates redundancy before grading begins. Content-hashing identifies identical claims across multiple sources. Matching claims accumulate corroboration weight rather than creating duplicate entries.
Content-hash dedupA 6-dimension deterministic scoring algorithm evaluates: source credibility, methodology rigor, sample adequacy, data recency, corroboration breadth, and independence scoring. Output: an A–D evidence grade plus a full confidence score breakdown. Zero editorial opinion involved.
6-dimension formulaA 10-step verification pipeline checks: value precision (0.1% tolerance), source liveness (HTTP validation), staleness (90/180-day thresholds), cross-platform consistency, and contradiction detection. Every step is formula-based and fully reproducible.
2+ sources requiredOnly statistics passing all quality gates are published. Every stat carries a full 6-layer provenance chain and a citation safety classification: Safe to Cite, Use with Context, Use with Hedging, or Do Not Cite. Citation-ready within hours of source publication.
Citation-ready hoursEvery published statistic is assigned one of four evidence grades based on a deterministic rubric — never editorial opinion. Grades reflect the quality, verifiability, and corroboration of the underlying research.
Large-sample primary research, peer-reviewed methodology, 3+ independent sources, published within 18 months. Safe to cite without qualification in any context.
Industry study with documented methodology, 2+ independent sources, adequate sample size, published within 24 months. Reliable for most citation contexts.
Single credible source or cross-source consensus without a primary study. Auto-generated hedging language provided. Suitable for directional use with appropriate caveats.
Unverified origin, insufficient corroboration. Flagged as Do Not Cite. Included in the database for completeness and research context only.
Every grade is computed across exactly 6 weighted dimensions. The weights are defined by formula and applied consistently across all statistics. LLMs are never the source of a numeric score.
Bar widths are illustrative of relative weighting. Exact dimension weights are published in the technical appendix.
Every published statistic receives a citation safety classification that tells you exactly how and when it is safe to reference the data — with no ambiguity.
Cite with full confidence. Complete provenance chain is available. Suitable for published reports, enterprise presentations, and journalism.
Reliable for directional use. Note the source limitation when citing. Recommended for internal research and supporting evidence.
Auto-generated disclaimer language is provided with every statistic. Use as supporting context, not as a primary claim.
Insufficient provenance to support citation. Available for internal research purposes only. Not suitable for any public-facing use.
Every published statistic carries a 6-layer immutable audit trail. Each layer is append-only — no record can be modified or deleted, only superseded by a new version.
The original document — content-hashed and stored permanently. URL, retrieval timestamp, full-text snapshot, and Terms of Service compliance status are recorded at ingestion.
Which extraction model was used, which prompt version, the raw model response, and the structured output. Every extraction is traceable to a specific model version and run configuration.
Every correction creates an immutable version record with a full diff. No data is overwritten. Previous versions remain accessible with an explanation of what changed and why.
The full pipeline trace — pass/fail status for each of the 10 verification steps, the specific values that triggered any failure, and the precise reason the statistic was accepted or rejected.
Upstream sources this statistic corroborates, and downstream derived statistics that depend on it. Any change to a root stat automatically propagates confidence recalculations across its dependency tree.
A 0–100 completeness score reflecting the depth and integrity of the provenance chain. Higher lineage scores indicate more complete audit trails and are used to prioritise quality assurance review.
Confidence scores are not static. Lighthouse applies a time-aware staleness model that automatically downgrades confidence as data ages — with topic-specific thresholds for breaking news versus evergreen benchmarks.
Statistics that have not been refreshed by a new primary source within 90 days receive an automatic confidence penalty. The grade may drop by one level if no corroborating update is found. A staleness flag is added to the citation display.
Statistics older than 180 days without a refreshed source are blocked from new citations and marked as potentially stale. They remain in the database with full provenance history but are excluded from top-of-page results.
Structural benchmarks — such as average human reading speed or long-run global literacy rates — are classified as evergreen and exempt from standard staleness thresholds. Evergreen classification is assigned by formula, not editorial discretion.
Browse 4.6 million+ evidence-graded statistics — each one with a full provenance chain, citation safety classification, and confidence breakdown available on demand.