Prediction Market Trading Bot: From Side Project to Automated System

There's a particular kind of itch that developers get when they look at a market and think: I could write something that does this better than I can manually. That's roughly where this project started — as a curiosity about prediction markets, and a question about whether a well-structured automated system could participate in them more consistently than a human refreshing a browser tab.

What Are Prediction Markets, and Why Automate Them?

Prediction markets are platforms where people trade on the outcomes of real-world events. Instead of buying stocks, you're buying contracts that resolve to $1 if something happens (say, a specific price level being hit by a certain date) and $0 if it doesn't. The market price reflects the crowd's collective probability estimate.

What makes them interesting from a systems perspective is that they behave a lot like financial markets — they have order books, bid/ask spreads, liquidity, and price dynamics — but with a defined resolution date. That structure makes them well-suited for algorithmic participation.

The two main venues this bot supports are Polymarket (a blockchain-based prediction market) and Kalshi (a US-regulated exchange). Each has its own API, its own quirks, and its own flavor of market.

The Tech Stack

The bot is built primarily in Python, which turned out to be the right choice for this kind of project. Python's ecosystem for data work, API integrations, and async programming is hard to beat, and the language's readability makes it easier to reason about what the bot is actually doing at any given moment.

FastAPI powers the dashboard server — a lightweight framework that handles REST endpoints and WebSocket connections for real-time updates. The dashboard itself is intentionally plain: HTML, CSS, and vanilla JavaScript. No build pipeline, no dependency sprawl. The goal was something that loads fast and is easy to debug at 2am when something unexpected happens.

Configuration lives in YAML files — one per venue — with a clear structure that separates concerns like market filtering, risk limits, and strategy parameters. Secrets stay in a `.env` file and never touch the YAML. Deployment is a single shell script that rsyncs the working directory to a VPS and restarts the relevant service. No Kubernetes, no CI/CD pipeline. For a project at this scale, simplicity wins.

The bot runs in two modes: paper (real market data, simulated fills, no money at risk) and live (real orders, real money, manual start required). Paper mode uses actual live prices, not historical data, so what you see there is genuinely predictive of live behavior.

How the Bot Is Structured

The core loop is straightforward: scan available markets, evaluate them against the active strategy, pass approved signals through a risk filter, then execute — or simulate — the resulting orders. This loop runs in a background thread per venue, while FastAPI serves the dashboard on top.

Every decision gets written to an append-only trade journal — a JSONL file that records each scan, each trade, each rejection, and each exit. This turns out to be the most valuable part of the whole system. When something behaves unexpectedly, the journal is where you go first.

An AI advisor rounds out the base feature set: it reads the trade journal and analytics, calls the Anthropic Claude API, and returns structured configuration suggestions. Think of it as a second opinion, not an autonomous decision-maker — at least to start.

The Eval Engine: Closing the Feedback Loop

The most significant recent addition is what we call the eval engine — a scheduled system that automates the tedious parts of daily performance review while keeping a human in the loop for every actual change.

Before the eval engine, improving the bot meant manually reading the trade journal, forming a hypothesis about what was going wrong, asking the AI advisor for suggestions, approving a change, applying it, and then watching for 30–60 minutes to see if things improved. This worked, but it was slow and required operator attention every day.

The eval engine changes that rhythm. Every night at 00:30 UTC, it runs a two-phase process automatically.

In the first phase, a set of pure Python leak detection rules scans the last 24 hours of trade data. Each rule tests a specific hypothesis — things like "are cooldowns rejecting entries far more often than entries are actually firing?" or "is a particular market family consistently losing more than it wins?" The rules are deterministic, fast, and each produces a structured finding with supporting evidence and an estimate of how much PnL the issue is likely costing per day.

In the second phase, any detected leaks are passed to the AI advisor, along with the current values of the relevant configuration knobs and a slice of recent trade data. The advisor proposes specific changes — not vague suggestions, but exact YAML path changes with before/after values and a plain-language rationale.

These proposals land in a recommendation queue and surface in a new "Eval" tab in the dashboard. The operator sees a clear diff: what the bot currently does, what the advisor wants to change, and why. One click to approve, one click to reject. No copy-pasting config values, no manual restarts.

Critically, the system enforces a strict allowlist of what the AI is even allowed to suggest. Hard risk limits — daily loss caps, maximum position sizes, paper/live mode — are completely off-limits. The advisor can only propose adjustments to strategy thresholds, cooldowns, and similar soft tuning parameters, and even those are capped at small increments per recommendation. The AI cannot propose something unsafe, and even if it tried, a second validation layer would discard it before it ever reached the queue.

Once a change is applied, a rollback monitor watches what happens. At five minutes, fifteen minutes, thirty minutes, and sixty minutes after the change, it checks whether key metrics — PnL trajectory, trade count, error rate — have moved in the wrong direction. If something looks like a regression, it automatically reverts the change and records the outcome so the same proposal won't be repeated for a week.

The result is a system that does the daily data-gathering work automatically, proposes well-reasoned changes with evidence, and protects against bad outcomes — while keeping a human in control of every decision that actually affects behavior.

Where It's Strong and Where It Falls Short

The bot's biggest strengths are consistency and transparency. It never gets tired, never skips a scan because something else came up, and it writes down everything it does. The eval engine extends that to the tuning layer — the feedback loop between "what's the bot doing?" and "what should we change?" is now tight and systematic rather than dependent on how much time the operator has that day.

Where it falls short is in the areas that still genuinely require human judgment. The eval engine will detect that a market family is losing money, but it won't tell you whether that's a tuning problem or a structural one. Adding an entirely new strategy, deciding when to go live with real money, or diagnosing something the 10 leak detection rules weren't designed to catch — those still require an operator who understands the system and is paying attention.

It's also a single-server deployment. There's no built-in redundancy or failover. For a personal or small-team project this is entirely fine, but it's worth knowing.

Automated systems are relentlessly honest. Every assumption you made about how markets work eventually surfaces as an edge case in the journal. The faster you can see that feedback — and the easier it is to act on it — the faster the system improves. That's really what the eval engine is: a way to make the feedback loop faster without making it less careful.

The other thing it taught us is that keeping humans in the loop isn't just a safety feature — it's also a learning mechanism. Every approval or rejection the operator makes is a data point about what good tuning looks like. The system gets smarter the more it's used, in part because the human using it does too.

This post describes the architecture and development experience of the market-bot project. It is not financial advice, and no specific trading strategies, performance figures, or financial details are discussed here.

What Are Prediction Markets, and Why Automate Them?

The Tech Stack

How the Bot Is Structured

The Eval Engine: Closing the Feedback Loop

Where It's Strong and Where It Falls Short

About the Author