F1Predict — George Hu

Chapter One

The
Signal.

Predicting a Formula 1 race looks like a parlour trick — pundits do it on television every weekend, with total confidence and zero accountability for being wrong. A real prediction has to survive contact with reality: weather that flips a session inside out, a grid-slot lottery that compounds across twenty cars, a DNF probability that rewrites the whole field, and regulation eras — 2014's turbo-hybrid reset, 2022's ground-effect reset, 2026's incoming reset — that make "last year's pattern" actively misleading.

Underneath the noise, there is a signal — driver form, constructor trajectory, track-specific history, qualifying pace relative to the field. The honest question isn't "can a model guess the podium." Anyone can guess. It's: can a model that is forbidden from seeing the future still beat broadcast consensus, scored the same way every week, across more than a decade of regulation change?

"A prediction that can't be checked against the next ten years of races isn't a prediction. It's a guess wearing a chart."

That single constraint — temporal safety — turned out to be the entire project. Every feature, every rating, every simulated race in F1Predict has to be computable using only information that existed before the race it predicts. It sounds obvious. It is the easiest rule in machine learning to break by accident, and the most common reason a backtest looks brilliant while the live model looks foolish.

Chapter Two

Built to
Actually Work.

Three engineering principles, defended at every layer of the stack — because a prediction platform that leaks the future, fakes its ratings, or runs too slowly to explore alternatives isn't a platform. It's a demo.

Layer 01

Temporal Safety

Zero leakage, by construction

Every one of the ~75 engineered features is computed only from information that existed strictly before the race being predicted — grid history, prior-race form, constructor trajectory, weather as it would have read at the time. Walk-forward validation re-trains on the past and tests on the future, in chronological order, every single time. No shuffled splits, ever.

Layer 02

Causal ELO Ratings

Updated race by race, never in hindsight

Standard ELO updates after the fact — fine for ranking history, dishonest for prediction. F1Predict's ELO engine updates causally: a driver's rating going into Sunday reflects only the races that happened before Sunday. The rating used to predict a result is provably the same rating that existed the moment before that result was known.

Layer 03
Stacked Ensemble + Monte Carlo
Five models, one meta-learner, fifty thousand futures
RandomForest, ExtraTrees, LightGBM, HistGradientBoosting, and GradientBoosting each read the same race differently; a Ridge meta-learner blends their disagreement into one calibrated probability. That probability seeds a vectorised Monte Carlo engine — 10,000 to 50,000 simulated races per second, each rolling its own grid variance, DNF chance, and safety-car timing.

The Pipeline

Data Layer — FastF1 · jolpica-f1 · Open-Meteo01

~75 engineered features, spanning 2014–2026

Causal ELO Engine02

Temporal-safe driver and constructor ratings

Stacked Ensemble — five models into a Ridge meta-learner03

RandomForest · ExtraTrees · LightGBM · HistGB · GradientBoosting

Vectorised Monte Carlo — 10k–50k sims / sec04

DNF, safety-car, and era-specific modelling, every run

Output — P(win) · P(podium) · P(points) · what-if05

Championship-swing analysis from any starting condition

Language

Python

Python, end to end — the analysis core, the Rich CLI, and the Streamlit dashboard all share one engine. No duplicated logic between the two interfaces, no drift between what the terminal says and what the dashboard shows.

Modelling

scikit-learn · LightGBM

scikit-learn for the ensemble base learners and the Ridge meta-learner; LightGBM and HistGradientBoosting for the gradient-boosted members. Every model trains on the exact same temporally-safe feature set.

Data Sources

FastF1 · jolpica-f1

FastF1 for timing and telemetry, jolpica-f1 (the Ergast successor) for historical results, Open-Meteo for weather — all free, all queryable back to the 2014 season.

CLI

Typer · Rich

Typer + Rich — a terminal interface with real tables, progress bars, and colour, built so the tool is genuinely pleasant to run a hundred times a day during development.

Dashboard

Streamlit

Streamlit — the same analysis core, surfaced as an interactive web dashboard for exploring a race or a season without touching the terminal.

Validation

Walk-Forward CV

Walk-forward cross-validation across four F1 eras (2014–2026) — train on the past, test on the future, slide the window forward, repeat. The only honest way to validate a time-series model.

Chapter Three

What the
Numbers Say.

Real output, from a real sample report — the Bahrain Grand Prix 2024 grid, run through the full pipeline. These are the model's actual probabilities, not illustrative placeholders.

The Podium Board · Bahrain GP 2024

P(Win)·P(Podium)·xPts

Verstappen

39.9% · 73.0% · 16.0

Leclerc

15.5% · 46.9% · 10.5

Pérez

14.6% · 44.5% · 10.1

Sainz

9.0% · 35.0% · 8.5

Russell

6.0% · 26.0% · 6.5

Hamilton

5.0% · 22.0% · 6.0

Alonso

4.0% · 18.0% · 5.0

Norris

3.5% · 16.0% · 4.5

ELO Trajectories · Illustrative, 2021–2024

Verstappen

Leclerc

Pérez

Relative form arcs drawn to match each driver's real trajectory across the period — shown for shape, not claimed as the model's logged output.

Source code · Full pipeline · Sample reports

View on GitHub

Chapter Four

What Building
Real ML Taught Me.

The model isn't the lesson. What building each layer of it actually taught me is.

Finding 01

Temporal safety isn't optional — it's the whole experiment

My first walk-forward run scored suspiciously well. Hunting for why, I found one feature — a season-end constructor ranking — that had quietly looked into the future. That single leaked column had inflated every number on the page. I rebuilt the entire feature pipeline around one rule: if it couldn't have been computed on the morning of the race, it doesn't exist yet. The honest score was lower. It was also the first one I could trust.

Finding 02

Complexity has to be earned, one model at a time

F1Predict didn't start as a five-model stack — it started as a single RandomForest. Each later addition (ExtraTrees, then the gradient-boosted members, then the Ridge meta-learner) had to prove a measurable lift on the same walk-forward split before it was allowed to stay. Two candidates I tried never cleared that bar, and they aren't in the ensemble. Complexity that doesn't pay rent gets cut.

Finding 03

A distribution answers a better question than a guess does

A single predicted finishing order is a confident-sounding number with nowhere to put its uncertainty. Fifty thousand simulated races are the opposite — they show you that the front row is close to a lock, the midfield is close to a coin flip, and "second place" is really a cloud of plausible outcomes with a shape worth understanding. Monte Carlo didn't make the model more accurate. It made the model honest about what it doesn't know.

Interactive · Live in Your Browser

The
What-If Lab.

F1Predict's CLI has a command for exactly this — whatif --driver … --grid … --weather … --recent-form-boost …. This is that command, made visual: move the sliders and watch the starting grid recompute its odds live, seeded with the model's real Bahrain GP 2024 baseline probabilities.

Grid Position P1

Weather Dry

Recent Form Boost 1.00×

—

P(Win)

—

P(Podium)

—

Expected Points

Real Bahrain GP 2024 baseline probabilities, scaled live by a grid-position decay curve, a per-driver weather table, and your form multiplier — then renormalised across the six selectable drivers. A browser-scale approximation of the real pipeline, not the Python model itself.