Introduction

Is it possible to quantify the dynamics of pass plays independent of the true pass thrown? How can we evaluate defensive coverage on all the valuable areas of the field? We tackle these questions by building the Passing Value in Expectation (PaVE) metric using the tracking data provided.

Any casual football viewer knows that an incomplete pass doesn’t necessarily reflect immaculate defensive coverage, and passes can be completed despite multiple defenders draped over the receiver. Rather, coaches aim to design defensive game-plans that position defenders to minimize the value of potential passes a QB can throw based on the game situation and offensive positioning.

Key highlights of our approach:

  • Intuitive, Modular Modeling: All core aspects of the model represent how football is naturally played, with each decomposed part easily modified or replaced without affecting the overall pipeline.
  • Trajectory-Based Analysis: Defense is often played beyond just the arrival point of the ball. We analyze an average of 1.3 million potential passes per second of game time using their trajectory provided by NFL3D (by Dutta et al.).
  • Predictivity: The metric correlates with offensive passing output and is predictive of future offensive passing output.

Much of the high-level conceptual framework behind PaVE comes from off-ball scoring opportunity in soccer (Spearman and Spearman et al.).

Video Demo

Methodology

To evaluate offensive value over all potential passes over the field, we break up the pass into 3 components:

Component 1
Selection — S(p)
The probability that a given pass is selected to be thrown by the quarterback.
Component 2
Influence — I(p)
Aggregate probability that a pass will be ultimately influenced by a member of the offense given that pass is thrown.
Component 3
Value — VC(p), VI(p)
Given that a certain pass is thrown, value gained for a completion or incompletion.
$$PaVE = \sum_{p \in \mathcal{F}} \underbrace{(V_C(p) \cdot I(p) + V_I(p) \cdot (1 - I(p)))}_{\text{expected pass value}} \cdot S(p)$$

where PaVEF is per frame, and PaVE is evaluated on a play with N frames from snap to throw. We evaluate defenses as minimizing PaVE and offenses as maximizing it.

Influence

The basic building block for pass influence is evaluating the probability that a player j can arrive at a location \(\ell\) by the time the ball arrives there, T. We define ToA(j, \(\ell\)) to be the time of arrival of player j at \(\ell\). We assume players have a reaction time treact to the pass and project their velocity at that time onto the vector from their location to \(\ell\).

$$P_{inf}(\ell, T, j) = \left(1 + \exp\left(\frac{-\pi(T - ToA(j,\ell))}{\sqrt{3}\sigma}\right)\right)^{-1} \cdot \mathbf{1}(z \in [z_{min}, z_{max}])$$

We integrate across the pass trajectory, modeling potential player influence PPI(p) as a sequence of Bernoulli random variables corresponding to every player-time combination along the trajectory.

Selection

Pass selection consists of both QB ability and QB decision-making. We define a distribution over passes based on the historical distribution of times of flight based on distance traveled.

$$S(p) = H(p) \cdot (PPI_{off}(p))^{\alpha}$$

where \(\alpha\) is a tuned hyperparameter and S(p) is normalized to be a valid probability distribution.

Value

We model expected YAC (xYAC) and EPA to get expected EPA (xEPA).

xYAC Model

Used XGBoost trained on 21 features from tracking data. The 3 most important features were closest defender speed, ball y coordinate, and closest defender y coordinate. Achieved validation accuracy of 81% with an average of 2.4 yards from truth.

xEPA Model

Similar to the nflfastR EP model by Baldwin et al. Trained on the entire publicly available play-by-play dataset. Found 0.8 correlation with nflfastR EPA.

Validation

Football is a dynamic sport with many interacting factors and high variance outcomes. We find that PaVE is more predictive of passing yards gained than either EPA or passing yards itself.

bin \ bin+1 PaVE EPA Yards
PaVE 0.19 0.30 0.44
EPA ~ 0.36 0.33
Yards ~ ~ 0.43
Pearson correlation coefficient between successive 4-game bins within the 2018 NFL season.

Hyperparameter Tuning

Four model parameters tuned: max acceleration (amax = 7.67 yd/s²), max velocity (vmax = 9.42 yd/s), influence parameter (σ = 0.31s), selection parameter (α = 1.2). Additional: treact = 0.2s, zmin = 1yd, zmax = 3yd.

Applications

Defensive Positional Breakdown
PaVE broken down by position reveals that cornerbacks are by far the most valuable position with respect to pass defense, confirming an expected trend wherein players closer to receivers and in more valuable areas of the field have better defensive PaVE.
Coverage Comparison on Route Combinations
We identified GO-OUT-CROSS-IN plays comparing good coverage vs poor coverage as measured by PaVE. This application can be used to identify the most successful coverages against various route combinations.
Defensive Player Optimization
We investigate PaVE as an objective function for joint optimization of defensive player trajectories. This can serve as a guide to uncover new coverage strategies against various route combinations.

Conclusion

PaVE represents a novel attempt at comprehensively modeling the extremely variant action that comprises an NFL passing play. This first attempt is not only a predictive measure that passes the eye test, but also a foundation for countless further applications. For example, a more mature rendition of PaVE can be used by NFL teams weekly by swapping out or honing on specific parts — shifting selection probabilities to reflect a specific QB’s tendencies while simulating defensive schemes to minimize a receiver’s value based on route tendencies. We truly hope that this metric PaVEs the way to new findings for passing in the NFL.

Citation

Ranasaria, U., Dutta, R., Sanjeev, S., & Murali, A. (2021). PaVE the Way for NFL Passing Analytics: Passing Value in Expectation. Kaggle. NFL Big Data Bowl 2021.

Kaggle Notebook