Introduction
Is it possible to quantify the dynamics of pass plays independent of the true pass thrown? How can we evaluate defensive coverage on all the valuable areas of the field? We tackle these questions by building the Passing Value in Expectation (PaVE) metric using the tracking data provided.
Any casual football viewer knows that an incomplete pass doesn’t necessarily reflect immaculate defensive coverage, and passes can be completed despite multiple defenders draped over the receiver. Rather, coaches aim to design defensive game-plans that position defenders to minimize the value of potential passes a QB can throw based on the game situation and offensive positioning.
Key highlights of our approach:
- Intuitive, Modular Modeling: All core aspects of the model represent how football is naturally played, with each decomposed part easily modified or replaced without affecting the overall pipeline.
- Trajectory-Based Analysis: Defense is often played beyond just the arrival point of the ball. We analyze an average of 1.3 million potential passes per second of game time using their trajectory provided by NFL3D (by Dutta et al.).
- Predictivity: The metric correlates with offensive passing output and is predictive of future offensive passing output.
Much of the high-level conceptual framework behind PaVE comes from off-ball scoring opportunity in soccer (Spearman and Spearman et al.).
Video Demo
Methodology
To evaluate offensive value over all potential passes over the field, we break up the pass into 3 components:
where PaVEF is per frame, and PaVE is evaluated on a play with N frames from snap to throw. We evaluate defenses as minimizing PaVE and offenses as maximizing it.
Influence
The basic building block for pass influence is evaluating the probability that a player j can arrive at a location \(\ell\) by the time the ball arrives there, T. We define ToA(j, \(\ell\)) to be the time of arrival of player j at \(\ell\). We assume players have a reaction time treact to the pass and project their velocity at that time onto the vector from their location to \(\ell\).
We integrate across the pass trajectory, modeling potential player influence PPI(p) as a sequence of Bernoulli random variables corresponding to every player-time combination along the trajectory.
Selection
Pass selection consists of both QB ability and QB decision-making. We define a distribution over passes based on the historical distribution of times of flight based on distance traveled.
where \(\alpha\) is a tuned hyperparameter and S(p) is normalized to be a valid probability distribution.
Value
We model expected YAC (xYAC) and EPA to get expected EPA (xEPA).
xYAC Model
Used XGBoost trained on 21 features from tracking data. The 3 most important features were closest defender speed, ball y coordinate, and closest defender y coordinate. Achieved validation accuracy of 81% with an average of 2.4 yards from truth.
xEPA Model
Similar to the nflfastR EP model by Baldwin et al. Trained on the entire publicly available play-by-play dataset. Found 0.8 correlation with nflfastR EPA.
Validation
Football is a dynamic sport with many interacting factors and high variance outcomes. We find that PaVE is more predictive of passing yards gained than either EPA or passing yards itself.
| bin \ bin+1 | PaVE | EPA | Yards |
|---|---|---|---|
| PaVE | 0.19 | 0.30 | 0.44 |
| EPA | ~ | 0.36 | 0.33 |
| Yards | ~ | ~ | 0.43 |
Hyperparameter Tuning
Four model parameters tuned: max acceleration (amax = 7.67 yd/s²), max velocity (vmax = 9.42 yd/s), influence parameter (σ = 0.31s), selection parameter (α = 1.2). Additional: treact = 0.2s, zmin = 1yd, zmax = 3yd.
Applications
Conclusion
PaVE represents a novel attempt at comprehensively modeling the extremely variant action that comprises an NFL passing play. This first attempt is not only a predictive measure that passes the eye test, but also a foundation for countless further applications. For example, a more mature rendition of PaVE can be used by NFL teams weekly by swapping out or honing on specific parts — shifting selection probabilities to reflect a specific QB’s tendencies while simulating defensive schemes to minimize a receiver’s value based on route tendencies. We truly hope that this metric PaVEs the way to new findings for passing in the NFL.