ML Platform · Python · Formula 1 · 2025
Stacked Ensemble · Causal ELO · Monte Carlo Simulation · Walk-Forward Backtest 2014–2026

F1
Predict.

A race-prediction platform built the way prediction should work: temporally honest. Five models blended into one ensemble, causal ELO ratings that never glimpse the future, and tens of thousands of Monte Carlo simulations a second — all walk-forward backtested across four eras of Formula 1 regulation.
75
Engineered features
5
Model ensemble
50k
Monte Carlo sims / sec
2014–26
Walk-forward backtest range

The
Signal.

Predicting a Formula 1 race looks like a parlour trick — pundits do it on television every weekend, with total confidence and zero accountability for being wrong. A real prediction has to survive contact with reality: weather that flips a session inside out, a grid-slot lottery that compounds across twenty cars, a DNF probability that rewrites the whole field, and regulation eras — 2014's turbo-hybrid reset, 2022's ground-effect reset, 2026's incoming reset — that make "last year's pattern" actively misleading.

Underneath the noise, there is a signal — driver form, constructor trajectory, track-specific history, qualifying pace relative to the field. The honest question isn't "can a model guess the podium." Anyone can guess. It's: can a model that is forbidden from seeing the future still beat broadcast consensus, scored the same way every week, across more than a decade of regulation change?

"A prediction that can't be checked against the next ten years of races isn't a prediction. It's a guess wearing a chart."

That single constraint — temporal safety — turned out to be the entire project. Every feature, every rating, every simulated race in F1Predict has to be computable using only information that existed before the race it predicts. It sounds obvious. It is the easiest rule in machine learning to break by accident, and the most common reason a backtest looks brilliant while the live model looks foolish.

Chapter Two

Built to
Actually Work.

Three engineering principles, defended at every layer of the stack — because a prediction platform that leaks the future, fakes its ratings, or runs too slowly to explore alternatives isn't a platform. It's a demo.

Layer 01
Temporal Safety
Zero leakage, by construction
Every one of the ~75 engineered features is computed only from information that existed strictly before the race being predicted — grid history, prior-race form, constructor trajectory, weather as it would have read at the time. Walk-forward validation re-trains on the past and tests on the future, in chronological order, every single time. No shuffled splits, ever.
Layer 02
Causal ELO Ratings
Updated race by race, never in hindsight
Standard ELO updates after the fact — fine for ranking history, dishonest for prediction. F1Predict's ELO engine updates causally: a driver's rating going into Sunday reflects only the races that happened before Sunday. The rating used to predict a result is provably the same rating that existed the moment before that result was known.
Layer 03
Stacked Ensemble + Monte Carlo
Five models, one meta-learner, fifty thousand futures
RandomForest, ExtraTrees, LightGBM, HistGradientBoosting, and GradientBoosting each read the same race differently; a Ridge meta-learner blends their disagreement into one calibrated probability. That probability seeds a vectorised Monte Carlo engine — 10,000 to 50,000 simulated races per second, each rolling its own grid variance, DNF chance, and safety-car timing.
Data Layer — FastF1 · jolpica-f1 · Open-Meteo01
~75 engineered features, spanning 2014–2026
Causal ELO Engine02
Temporal-safe driver and constructor ratings
Stacked Ensemble — five models into a Ridge meta-learner03
RandomForest · ExtraTrees · LightGBM · HistGB · GradientBoosting
Vectorised Monte Carlo — 10k–50k sims / sec04
DNF, safety-car, and era-specific modelling, every run
Output — P(win) · P(podium) · P(points) · what-if05
Championship-swing analysis from any starting condition
Language
Python
Python, end to end — the analysis core, the Rich CLI, and the Streamlit dashboard all share one engine. No duplicated logic between the two interfaces, no drift between what the terminal says and what the dashboard shows.
Modelling
scikit-learn · LightGBM
scikit-learn for the ensemble base learners and the Ridge meta-learner; LightGBM and HistGradientBoosting for the gradient-boosted members. Every model trains on the exact same temporally-safe feature set.
Data Sources
FastF1 · jolpica-f1
FastF1 for timing and telemetry, jolpica-f1 (the Ergast successor) for historical results, Open-Meteo for weather — all free, all queryable back to the 2014 season.
CLI
Typer · Rich
Typer + Rich — a terminal interface with real tables, progress bars, and colour, built so the tool is genuinely pleasant to run a hundred times a day during development.
Dashboard
Streamlit
Streamlit — the same analysis core, surfaced as an interactive web dashboard for exploring a race or a season without touching the terminal.
Validation
Walk-Forward CV
Walk-forward cross-validation across four F1 eras (2014–2026) — train on the past, test on the future, slide the window forward, repeat. The only honest way to validate a time-series model.
Chapter Three

What the
Numbers Say.

Real output, from a real sample report — the Bahrain Grand Prix 2024 grid, run through the full pipeline. These are the model's actual probabilities, not illustrative placeholders.

The Podium Board · Bahrain GP 2024
P(Win)·P(Podium)·xPts
Verstappen
39.9% · 73.0% · 16.0
Leclerc
15.5% · 46.9% · 10.5
Pérez
14.6% · 44.5% · 10.1
Sainz
9.0% · 35.0% · 8.5
Russell
6.0% · 26.0% · 6.5
Hamilton
5.0% · 22.0% · 6.0
Alonso
4.0% · 18.0% · 5.0
Norris
3.5% · 16.0% · 4.5
ELO Trajectories · Illustrative, 2021–2024
Verstappen
Leclerc
Pérez
Relative form arcs drawn to match each driver's real trajectory across the period — shown for shape, not claimed as the model's logged output.
< 1s
Monte Carlo runtime · 50,000 simulated races
5
Models in the stacked ensemble, blended via Ridge
2014–2026
Walk-forward backtest range · four regulation eras
~250–300MB
Peak training memory · runs on a laptop
Source code · Full pipeline · Sample reports
View on GitHub
Chapter Four

What Building
Real ML Taught Me.

The model isn't the lesson. What building each layer of it actually taught me is.

Finding 01
Temporal safety isn't optional — it's the whole experiment
My first walk-forward run scored suspiciously well. Hunting for why, I found one feature — a season-end constructor ranking — that had quietly looked into the future. That single leaked column had inflated every number on the page. I rebuilt the entire feature pipeline around one rule: if it couldn't have been computed on the morning of the race, it doesn't exist yet. The honest score was lower. It was also the first one I could trust.
Finding 02
Complexity has to be earned, one model at a time
F1Predict didn't start as a five-model stack — it started as a single RandomForest. Each later addition (ExtraTrees, then the gradient-boosted members, then the Ridge meta-learner) had to prove a measurable lift on the same walk-forward split before it was allowed to stay. Two candidates I tried never cleared that bar, and they aren't in the ensemble. Complexity that doesn't pay rent gets cut.
Finding 03
A distribution answers a better question than a guess does
A single predicted finishing order is a confident-sounding number with nowhere to put its uncertainty. Fifty thousand simulated races are the opposite — they show you that the front row is close to a lock, the midfield is close to a coin flip, and "second place" is really a cloud of plausible outcomes with a shape worth understanding. Monte Carlo didn't make the model more accurate. It made the model honest about what it doesn't know.
Interactive · Live in Your Browser

The
What-If Lab.

F1Predict's CLI has a command for exactly this — whatif --driver … --grid … --weather … --recent-form-boost …. This is that command, made visual: move the sliders and watch the starting grid recompute its odds live, seeded with the model's real Bahrain GP 2024 baseline probabilities.

Grid Position P1
Weather Dry
Recent Form Boost 1.00×
P(Win)
P(Podium)
Expected Points
Real Bahrain GP 2024 baseline probabilities, scaled live by a grid-position decay curve, a per-driver weather table, and your form multiplier — then renormalised across the six selectable drivers. A browser-scale approximation of the real pipeline, not the Python model itself.

Also Explore

Project · Statistics & Simulation
ScoutSelect

Alliance selection platform built for real FTC competition — Monte Carlo, OPR, Bayesian shrinkage.

Explore →
Research · Independent Study
JouleRoute-LM

Independent research on real BERT-base weights — every number measured on hardware, never simulated.

Explore →
Smaller Builds · incl. GHCountdown
Builds

Eight self-taught projects shipped from real friction — including GHCountdown, now living here.

Explore →
Ready to go back?
Return to Portfolio