← back

Prop Engine

MLB Player Props Prediction | 2023-2025

MLB player props prediction system with probability distributions. Monte Carlo backtesting on PrizePicks and similar platforms.

The Problem

MLB player props (hits, strikeouts, runs, etc.) are binary over/under markets. Traditional point estimation models output a single predicted value, but the actual betting decision depends on the full probability distribution. A predicted 7.5 strikeouts with 90% confidence is very different from a predicted 7.5 with 55% confidence.

Technical Approach

I built two custom models that output probability distributions:

  • XGDiscrete: XGBoost with softmax activation for discrete outcome probabilities
  • RidgeKNN: Ridge regression for feature weighting combined with KNN neighbor voting for distribution estimation

Feature engineering via dbt includes:

  • Rolling statistics (xwOBA, hard hit rate, K rate) with handedness and pitch type splits
  • Park factors scraped from Baseball Savant
  • Defensive metrics, handedness adjustments, wind, and temperature

The backtesting framework uses Monte Carlo simulation to evaluate parlay strategies across different payout structures (PrizePicks flex, Underdog power). Kelly criterion integration for bankroll sizing.

Interesting Challenges

Player props markets are efficient. The edge from statistical models is small, requiring disciplined bankroll management. The Monte Carlo simulation was essential for understanding variance in multi-leg parlays.

Player injuries and lineup changes introduce noise that rolling statistics can't fully capture.

What I'd Do Differently

The system treats each prop independently. A full game simulator would capture the correlation structure between props and players - when one batter gets on base, it affects the next batter's RBI opportunities. That correlation is where parlay edge lives.

Key Features

  • -Custom ML models with probability distributions
  • -35+ dbt SQL models for feature engineering
  • -Rolling statistics with configurable lookback windows
  • -Batter vs pitcher matchup modeling
  • -Monte Carlo simulation for parlay backtesting
  • -Streamlit dashboard for daily props

Tech

Pythonscikit-learnXGBoostPostgreSQLSQLAlchemyFastAPIStreamlitdbt