Predicting NFL Prop Bets with Bootstrap Sampling

Published on January 18, 2024

In this blog post, I explore how bootstrap sampling can be used to predict NFL player prop bets and identify value betting opportunities. By generating numerous simulated outcomes based on historical player performance data, we can create probability distributions for various statistical categories and compare them against sportsbook lines.

Understanding Prop Bets and Bootstrap Sampling

Prop bets (or proposition bets) are wagers on individual player performances rather than game outcomes. Examples include:

  • Will Patrick Mahomes throw over/under 275.5 passing yards?
  • Will Derrick Henry rush for over/under 95.5 yards?
  • Will Davante Adams have over/under 6.5 receptions?

Bootstrap sampling is a statistical technique that involves random sampling with replacement from a dataset to create multiple simulated datasets. For our purposes, it allows us to generate thousands of possible game outcomes for a player based on their historical performance.

Data Collection and Preprocessing

The first step in our process is gathering historical NFL player data. For this analysis, I used:

  • Play-by-play data from the 2022 and 2023 NFL seasons
  • Player game logs for rushing, passing, and receiving statistics
  • Current Las Vegas sportsbook lines for player props

After collecting the data, I performed several preprocessing steps:


# Load and prepare the data
import pandas as pd
import numpy as np
from nfl_data_py import import_pbp_data, import_weekly_data

# Get play-by-play data
seasons = [2022, 2023]
pbp_data = import_pbp_data(seasons)

# Get weekly player stats
weekly_data = import_weekly_data(seasons)

# Filter for specific player (example with Josh Allen)
josh_allen_data = weekly_data[weekly_data['player_name'] == 'Josh Allen']
                

The Bootstrap Sampling Methodology

For each player and prop bet, I implemented the following bootstrap sampling approach:

  1. Extract the player's historical performance data for the relevant statistic (e.g., passing yards, rushing attempts)
  2. Apply contextual filters based on:
    • Home/away status
    • Opponent defensive ranking
    • Recent performance (last 4-6 games)
    • Game total and spread (to capture game script)
  3. Generate 10,000 bootstrap samples from the filtered data
  4. Calculate the percentage of samples that exceed the sportsbook line
  5. Identify value opportunities where our probability differs significantly from the implied probability of the betting line

Here's the core implementation of the bootstrap sampling function:


def bootstrap_prop_prediction(player_data, stat_column, line, n_samples=10000):
    """
    Perform bootstrap sampling to predict probability of exceeding a prop line
    
    Args:
        player_data (DataFrame): Historical performance data for the player
        stat_column (str): The statistical category to analyze
        line (float): The sportsbook's over/under line
        n_samples (int): Number of bootstrap samples to generate
    
    Returns:
        float: Probability of exceeding the line
    """
    # Extract the relevant statistic
    stat_values = player_data[stat_column].dropna().values
    
    if len(stat_values) < 5:
        return None  # Not enough data
    
    # Generate bootstrap samples
    bootstrap_samples = np.random.choice(stat_values, size=(n_samples,), replace=True)
    
    # Calculate probability of exceeding the line
    prob_over = np.mean(bootstrap_samples > line)
    
    return prob_over
                

Case Study: Josh Allen Passing Yards

Let's examine a specific example using Josh Allen's passing yards prop for an upcoming game:

  • Sportsbook line: 275.5 passing yards
  • Historical data: Last 16 games of Josh Allen's passing performance
  • Contextual filter: Similar defensive opponents (ranked 10-20 in passing yards allowed)

After running our bootstrap sampling process:

Josh Allen Passing Yards Distribution

Results:

  • Bootstrap mean: 264.8 yards
  • Probability of exceeding 275.5 yards: 42.3%
  • Sportsbook implied probability: 50% (standard -110 odds)
  • Edge identified: 7.7% in favor of betting the under

Evaluating Multiple Props

Applying this methodology across multiple player props for a given NFL week, we can identify the most promising betting opportunities:

Value Props for Week 7

In this sample from Week 7 of the 2023 NFL season, our model identified several props with significant value edges, including:

  • Lamar Jackson over 62.5 rushing yards (63.1% bootstrap probability vs. 52.4% implied)
  • Ja'Marr Chase under 85.5 receiving yards (68.2% bootstrap probability vs. 52.4% implied)
  • Travis Kelce over 6.5 receptions (59.3% bootstrap probability vs. 47.6% implied)

Model Performance and Backtesting

To validate our approach, I backtested the model on a sample of 50 player props from Weeks 5-8 of the 2023 NFL season. Here are the results:

  • Overall accuracy: 56.0% (28/50)
  • Props with > 5% edge: 60.7% accuracy (17/28)
  • Props with > 10% edge: 66.7% accuracy (8/12)

The increasing accuracy with larger predicted edges suggests the model is successfully identifying value opportunities.

Limitations and Improvements

While the bootstrap sampling approach shows promise, it has several limitations:

  • Small sample sizes for some players, especially rookies or those with limited playing time
  • Inability to capture external factors like weather conditions, coaching changes, or injuries to teammates
  • Limited incorporation of defensive matchup quality beyond basic rankings

Future improvements could include:

  • Weighted bootstrap sampling to prioritize more recent performances
  • Integration of advanced metrics like EPA (Expected Points Added), DVOA, or NextGen Stats
  • Ensemble approaches combining bootstrap results with other predictive models
  • Bayesian methods to incorporate prior knowledge about player performance

Conclusion

Bootstrap sampling provides a powerful framework for evaluating NFL player prop bets. By generating probability distributions from historical data, we can identify market inefficiencies and make more informed betting decisions.

The approach excels at quantifying uncertainty and producing intuitive probability estimates that can be directly compared to sportsbook lines. While no model can perfectly predict individual player performance, this methodology offers a systematic, data-driven approach to finding value in the prop betting market.