Backtesting

Run a StrategySpec against real broker tick data with walk-forward validation, spread and slippage modelling, and an illustrative-only equity curve.

Last updated · 2026-05-13

The backtester is the same engine that runs the live droplet, fed with historical broker tick data instead of a live feed. The point is to stress your spec before money is at risk, and to surface the analytical mistakes most strategies fall into.

Data

Backtests use real broker tick data, not synthetic candles. Ticks are aggregated at runtime into the timeframes referenced by the spec, so a 5-minute strategy sees exactly the bars the live engine sees. Coverage typically goes back two to five years per symbol; older history is available on request.

Spread and slippage

Spread is replayed from broker-reported quotes on each tick. We do not assume a fixed spread; widening around the open, the close, and news events is preserved.
Slippage is modelled as a function of order size relative to top-of-book depth. The default is conservative; you can tighten it for high-liquidity symbols.
Commissions are applied per side using the schedule of the broker connection you point the backtest at.

Walk-forward validation

A single fit-and-test split is rarely enough to argue a strategy will hold up. The backtester runs a walk-forward analysis by default:

The window is split into rolling segments — typically six in-sample months followed by two out-of-sample months.
For each segment, the engine evaluates the spec exactly as written on the in-sample portion and reports the metrics, then does the same on the out-of-sample portion using the same spec.
The aggregate report shows in-sample versus out-of-sample side by side. A strategy that wins in-sample and breaks out-of-sample is curve-fit, not predictive.

In-sample vs out-of-sample

Treat in-sample as the rehearsal and out-of-sample as the dress rehearsal. If the out-of-sample equity curve is qualitatively different — say, a steady up-slope flattening into noise — the spec has learned the past more than the structure. Re-author rather than re-tune.

Reading the equity curve

The chart on the report page shows account equity over time, including unrealised PnL. Two annotations to look for:

Drawdown shading. Every peak-to-trough drawdown is shaded; hover for depth and duration. A strategy with twenty short drawdowns is more livable than one with two long ones.
Out-of-sample boundaries. Vertical lines mark each walk-forward boundary. Watch how the curve behaves immediately after a boundary — that is the honest performance.

Reading the trades list

Every individual trade appears with entry time, exit time, direction, size, MAE (maximum adverse excursion), MFE (maximum favourable excursion), and the realised PnL. Sortable by any column. Common uses:

Sort by MAE to find trades where the stop saved you, or where you were close to a stop-out.
Sort by holding time to detect a strategy that is two strategies in one — a fast scalp and a slow drift — which usually means the exit logic is doing too much.
Filter by session or weekday to find pockets of consistent loss that a session filter could remove.

Common backtest mistakes

Curve fitting

Tweaking a threshold until the equity curve looks perfect on history is the most common cause of live failure. If a one-unit change to any single parameter flips the verdict, the strategy is over-fit.

Look-ahead bias

A spec that quietly references future data — for example, exiting on the close of the current bar after using its high as a signal — will look brilliant in backtest and lose immediately live. The engine raises a hard error if it detects this, but custom indicators added in the spec free-text fields are not automatically checked.

Ignoring spread

Strategies that look profitable in mid-price terms can be unprofitable after spread, especially scalpers. The default spread model is realistic; do not turn it off unless you are explicitly stress-testing without microstructure.

Survivorship in symbol choice

Picking the symbol after seeing which ones backtest well is a survivorship trap. Decide on the symbol from a thesis, then test.

Comparing runs

Every backtest is saved with the exact spec version it ran against. Use the Compare view to diff two runs side by side. The diff shows both the spec change and the metric change, which is the only way to defensibly link a tweak to a result.

Illustrative only

Backtests are illustrative and clearly badged as such across the product. Real markets include events, liquidity regimes, and counterparties you cannot model. Past performance, simulated or otherwise, does not predict future results. Use backtesting to disqualify weak ideas, not to underwrite live risk.

When you are ready to go live, move on to Deployment.