PropEngine | MLB Player Props Prediction
MLB player props prediction system with probability distributions. Monte Carlo backtesting on PrizePicks and similar platforms.
The Problem
MLB player props (hits, strikeouts, runs, etc.) are binary over/under markets. Traditional point estimation models output a single predicted value, but the actual betting decision depends on the full probability distribution. A predicted 7.5 strikeouts with 90% confidence is very different from a predicted 7.5 with 55% confidence.
Technical Approach
I built two custom models that output probability distributions:
- XGDiscrete: XGBoost with softmax activation for discrete outcome probabilities
- RidgeKNN: Ridge regression for feature weighting combined with KNN neighbor voting for distribution estimation
Feature engineering via dbt includes: - Rolling statistics (xwOBA, hard hit rate, K rate) with configurable lookback windows - Batter vs pitcher matchup history with split statistics - Park factors scraped from Baseball Savant - Defensive metrics and handedness adjustments
The backtesting framework uses Monte Carlo simulation to evaluate parlay strategies across different payout structures (PrizePicks flex, Underdog power). Kelly criterion integration for bankroll sizing.
Interesting Challenges
Player props markets are efficient. The edge from statistical models is small, requiring disciplined bankroll management. The Monte Carlo simulation was essential for understanding variance in multi-leg parlays.
Player injuries and lineup changes introduce noise that rolling statistics can't fully capture. Weather and umpire effects also matter but were deprioritized.
What I'd Do Differently
The system treats each prop independently. A multi-output model capturing correlations between props (e.g., a pitcher's strikeouts and hits allowed) could find parlay edges that independent models miss.
Key Features
- -Custom ML models with probability distributions
- -35+ dbt SQL models for feature engineering
- -Rolling statistics with configurable lookback windows
- -Batter vs pitcher matchup modeling
- -Monte Carlo simulation for parlay backtesting
- -Streamlit dashboard for daily props