← back

PropEngine | MLB Player Props Prediction

2023-2025Baseball Modeling

MLB player props prediction system with probability distributions. Monte Carlo backtesting on PrizePicks and similar platforms.

The Problem

MLB player props (hits, strikeouts, runs, etc.) are binary over/under markets. Traditional point estimation models output a single predicted value, but the actual betting decision depends on the full probability distribution. A predicted 7.5 strikeouts with 90% confidence is very different from a predicted 7.5 with 55% confidence.

Technical Approach

I built two custom models that output probability distributions:

- XGDiscrete: XGBoost with softmax activation for discrete outcome probabilities

- RidgeKNN: Ridge regression for feature weighting combined with KNN neighbor voting for distribution estimation

Feature engineering via dbt includes: - Rolling statistics (xwOBA, hard hit rate, K rate) with configurable lookback windows - Batter vs pitcher matchup history with split statistics - Park factors scraped from Baseball Savant - Defensive metrics and handedness adjustments

The backtesting framework uses Monte Carlo simulation to evaluate parlay strategies across different payout structures (PrizePicks flex, Underdog power). Kelly criterion integration for bankroll sizing.

Interesting Challenges

Player props markets are efficient. The edge from statistical models is small, requiring disciplined bankroll management. The Monte Carlo simulation was essential for understanding variance in multi-leg parlays.

Player injuries and lineup changes introduce noise that rolling statistics can't fully capture. Weather and umpire effects also matter but were deprioritized.

What I'd Do Differently

The system treats each prop independently. A multi-output model capturing correlations between props (e.g., a pitcher's strikeouts and hits allowed) could find parlay edges that independent models miss.

Key Features

  • -Custom ML models with probability distributions
  • -35+ dbt SQL models for feature engineering
  • -Rolling statistics with configurable lookback windows
  • -Batter vs pitcher matchup modeling
  • -Monte Carlo simulation for parlay backtesting
  • -Streamlit dashboard for daily props

Tech Stack

Pythonscikit-learnXGBoostPostgreSQLSQLAlchemyFastAPIStreamlitdbt