Case Study — Personal Project
KWB — Kalshi Weather Trading Bot
An automated, paper-first trading bot that prices a single Kalshi prediction market — what will NYC's high temperature be today? — by blending four weather models into a probability distribution, and bets only when it thinks the market is mispriced.
4models
Independent weather sources blended into one distribution
~570tests
Unit tests, plus mypy --strict and automated lint
6
Bot instances running in parallel — 1 live, 5 paper experiments
~9/day
Trading cycles, fully serverless on AWS Lambda
What it is
A single-purpose, fully serverless trading bot for one Kalshi market series: the NYC daily-high-temperature buckets, settled off the Central Park station. It blends multiple weather models into a probability distribution, prices every temperature bucket the market offers, and bets only where it sees a real edge. The decision logic is entirely deterministic — there is no LLM in the trading loop. AI was an engineering collaborator for code review and analysis, never a live trader.
How it works
Pull hourly forecasts from four independent weather sources — NWS, ECMWF, GFS, and NOAA HRRR.
Blend them into a single probability distribution over the day's high temperature, widening the uncertainty where the models disagree.
Convert that distribution into a probability for each temperature bucket the market offers.
Compare those probabilities to live market prices and place small, fractional-Kelly-sized bets only where the modeled edge clears a threshold.
Hold to settlement and book realized P&L when the official climate report finalizes the day's high.
What makes it interesting
The model knows which way the market is wrong
Backtesting against three years of settlements surfaced the edge the strategy runs on: the blended model reliably identifies which direction the market has mispriced a bucket. That directional signal — not raw point-forecast accuracy — is what the bot trades, and sharpening how confidently it acts on that signal is the lead experiment.
Calibrated to a single thermometer
The market settles off one sensor — the Central Park station. An early build had been calibrated against a gridded weather reanalysis instead of that exact sensor, and the two can differ by several degrees seasonally. Aligning the model to the true settlement source was one of the highest-leverage improvements in the project.
Correctness caught before it cost anything
Disciplined review is why the bot runs paper-first. It caught subtle bugs early — including one where the bot was pricing the next day's market against the current day's forecast — long before real capital was ever at stake.
Experiments, not hunches
Every strategy change ships as a pre-committed A/B: paper stacks against a control, with success criteria locked before results are read. A recurring discipline is refusing to overreact to small samples — more than once, a scary two-day signal turned out to be weather noise that three years of data refuted.
Built like production infrastructure
- Treated as production trading infrastructure, not a prototype: ~570 unit tests, mypy --strict, and automated linting on every change.
- Fully serverless on AWS — Lambda (Python 3.12), DynamoDB, CloudWatch, and SES — deployed via SAM / CloudFormation on fixed cron cycles.
- Deliberately dependency-light: httpx, pydantic, and numpy, with the needed statistics implemented directly to avoid pulling in scipy and keep the Lambda artifact small.
- Safety designed in from day one: paper-trading by default, post-only limit orders that never cross the spread, and six independent kill switches — drawdown, daily-loss, anomaly, API-failure, manual, and inception-drawdown.
- Six instances run in parallel — one live, five paper experiments — so every new idea is proven on paper before it touches real capital.