Fourth Down Data

How Fourth Down Data computes what it computes.

This page documents our methodology — how we calculate EPA, rank teams, define situations, and decide what we’re confident enough to say. We’re more transparent than is typical in sports analytics because we want our work scrutinized, and because the few cases where we’ve corrected our own methodology in public have been the most important things we’ve shipped.

When we measure something one way and not another, we say which way and why. When we test something and find it doesn’t work, we publish the null finding. When we change how a metric is computed, we document the change. The goal isn’t to convince you our numbers are right — it’s to show our work clearly enough that you can evaluate it.

Expected Points Added

EPA is the analytical unit underneath most of FDD. Every offensive play has an expected point value before it happens (based on down, distance, field position, and other game-state factors) and a new expected point value after it happens. The difference — points gained or lost relative to expectation — is the play’s EPA.

Our EPA model is computed independently from play-by-play data on every FBS scrimmage play from 2017–2025 — approximately 1.06 million plays. We train on historical play outcomes and validate against held-out seasons to avoid the lookahead biases that can inflate apparent predictive power.

A few methodological decisions worth being explicit about:

Success rate uses the proper down-based definition (the play gained enough yards relative to down and distance to keep the offense on track), not the cruder “EPA > 0” proxy.
Defensive EPA is a directional convention worth flagging: across different views in our database, “defensive EPA” can mean either defensive performance (where higher is better) or yards/points allowed (where lower is better). Each view documents its convention. When you see EPA on FDD, it’s the performance convention by default — higher is better for both offense and defense.
Scrimmage plays only. Our “all plays” category includes runs and passes only — not punts or field goals. Including special teams in average EPA dilutes the signal of offensive and defensive efficiency.
Garbage time is excluded from our no-garbage variant: plays in the fourth quarter with a 21+ point lead, or the third quarter with a 28+ point lead, are filtered out. This matches widely-used analytical conventions and keeps efficiency metrics from being skewed by run-out-the-clock or score-padding play.

Team Strength & Rankings

Our team strength rating is computed from per-drive efficiency adjusted for opponent quality. Each FBS team gets an offensive rating, a defensive rating, and an overall rating per season — all derived through ridge regression on drive-level outcomes against the league’s actual schedule.

The intuition: a team that scores at 3.5 points per drive against a tough schedule is stronger than a team scoring at the same rate against weak opponents. The ridge regression adjusts for that schedule context across the full FBS field, surfacing teams that are genuinely strong from teams that have padded stats against weak opponents.

A few methodological notes:

Opponent adjustment is applied at the drive level, not the play level. This is methodologically cleaner — drives are the natural unit of offensive vs defensive performance, and adjusting at this grain avoids over-weighting situations like long fields after turnovers or short fields after special-teams setups.
Strength of schedule is implicit in the adjusted ratings, not a separate input. We don’t compute SoS as its own metric to feed in — it’s baked into the regression.
Ratings exist for every team-season from 2017–2025. This means historical comparisons are real — a 2019 LSU offense and a 2024 Indiana offense can be compared on the same scale.
Our Ratings, Checked — how they line up with SP+ and the AP poll →

Situational Analysis

Situational EPA captures how teams perform in specific game contexts — not just overall, but in red zone, on third down, when behind schedule, and so on. Ten situations are tracked across 2017–2025, on both offense and defense.

The situations and how they’re defined:

Red zone — plays where the offense is at the opponent’s 20-yard line or closer.
Third down — plays on third down, all distances.
Passing downs — second-and-7-or-more, third-and-5-or-more, fourth-and-5-or-more. This matches Football Outsiders’ published threshold.
Standard downs — all scrimmage plays that aren’t passing downs.
Early downs — first down, or second-and-6-or-less.
Goal-to-go — plays inside the 20 where distance equals yards to goal.
First half / Second half — split by halves.
No-garbage — excludes garbage time (4th quarter with 21+ point lead, 3rd quarter with 28+ point lead).
All scrimmage plays — runs and passes only, no special teams.

Each situation is measured by EPA per play, success rate, and total play volume.

A real limitation to call out: raw situational EPA is not opponent-adjusted. A team’s red zone defense looking good in raw EPA could mean genuinely elite defense, or it could mean facing weak red zone offenses. Schedule-strength context lives in the separate opponent-adjusted ratings (above). When you read raw situational EPA, that’s what happened in those situations — not a quality rating that controls for who was on the other side.

Methodology Principles

Some principles that govern how we present analytical findings:

We’re descriptive, not predictive. When we surface analytical findings about a team or game, we’re telling you what’s true in the data — not predicting what will happen Saturday. Sports analytics is full of confident predictions; we’re explicitly not adding to that pile.
When something doesn’t work, we publish that. Reality Check, our tool for surfacing teams where record and analytical model diverge, exists in part because we tested whether that divergence predicts future game outcomes — and it doesn’t, at least not robustly enough to support betting decisions. We published the empirical null finding rather than burying it. That kind of honesty is uncommon and we treat it as a feature.
When we change our methodology, we say so. We’ve corrected predicate bugs in our own analytical layer mid-development. We’ve deprecated a 46-feature spread prediction model because the empirical evidence didn’t support its claims. These changes are documented in our commits and our published methodology updates here.
We don’t claim parity with proprietary models we can’t see. When our analytical rankings disagree with AP polls or Vegas spreads, we don’t claim we’re right and they’re wrong. The Reality Check tool surfaces divergence as analytical context — descriptive, not prescriptive.
Our data is commercially licensed. Play-by-play data underlying our EPA computations is obtained through a commercial licensing agreement. Our analytical layer — EPA model, team strength ratings, similarity engine, situational metrics — is computed independently from that play-by-play source.
Browse the glossary →Explore the tools →