Skip to content
Mitul Shah

Case Study — Personal Project

KWB — Kalshi Weather Trading Bot

An automated, paper-first trading bot that prices a single Kalshi prediction market — what will NYC's high temperature be today? — by blending four weather models into a probability distribution, and bets only when it thinks the market is mispriced.

  • 4models

    Independent weather sources blended into one distribution

  • ~570tests

    Unit tests, plus mypy --strict and automated lint

  • 6

    Bot instances running in parallel — 1 live, 5 paper experiments

  • ~9/day

    Trading cycles, fully serverless on AWS Lambda

What it is

A single-purpose, fully serverless trading bot for one Kalshi market series: the NYC daily-high-temperature buckets, settled off the Central Park station. It blends multiple weather models into a probability distribution, prices every temperature bucket the market offers, and bets only where it sees a real edge. The decision logic is entirely deterministic — there is no LLM in the trading loop. AI was an engineering collaborator for code review and analysis, never a live trader.

How it works

  1. Pull hourly forecasts from four independent weather sources — NWS, ECMWF, GFS, and NOAA HRRR.

  2. Blend them into a single probability distribution over the day's high temperature, widening the uncertainty where the models disagree.

  3. Convert that distribution into a probability for each temperature bucket the market offers.

  4. Compare those probabilities to live market prices and place small, fractional-Kelly-sized bets only where the modeled edge clears a threshold.

  5. Hold to settlement and book realized P&L when the official climate report finalizes the day's high.

What makes it interesting

  • The model knows which way the market is wrong

    Backtesting against three years of settlements surfaced the edge the strategy runs on: the blended model reliably identifies which direction the market has mispriced a bucket. That directional signal — not raw point-forecast accuracy — is what the bot trades, and sharpening how confidently it acts on that signal is the lead experiment.

  • Calibrated to a single thermometer

    The market settles off one sensor — the Central Park station. An early build had been calibrated against a gridded weather reanalysis instead of that exact sensor, and the two can differ by several degrees seasonally. Aligning the model to the true settlement source was one of the highest-leverage improvements in the project.

  • Correctness caught before it cost anything

    Disciplined review is why the bot runs paper-first. It caught subtle bugs early — including one where the bot was pricing the next day's market against the current day's forecast — long before real capital was ever at stake.

  • Experiments, not hunches

    Every strategy change ships as a pre-committed A/B: paper stacks against a control, with success criteria locked before results are read. A recurring discipline is refusing to overreact to small samples — more than once, a scary two-day signal turned out to be weather noise that three years of data refuted.

Built like production infrastructure

  • Treated as production trading infrastructure, not a prototype: ~570 unit tests, mypy --strict, and automated linting on every change.
  • Fully serverless on AWS — Lambda (Python 3.12), DynamoDB, CloudWatch, and SES — deployed via SAM / CloudFormation on fixed cron cycles.
  • Deliberately dependency-light: httpx, pydantic, and numpy, with the needed statistics implemented directly to avoid pulling in scipy and keep the Lambda artifact small.
  • Safety designed in from day one: paper-trading by default, post-only limit orders that never cross the spread, and six independent kill switches — drawdown, daily-loss, anomaly, API-failure, manual, and inception-drawdown.
  • Six instances run in parallel — one live, five paper experiments — so every new idea is proven on paper before it touches real capital.

KWB is live today, trading a deliberately capped stake and modestly in the green — but the real return on the project is the architecture, the bias for action, and a learning loop that keeps compounding.