Backtesting honestly
Most retail backtests are lies. Curve-fitting, lookahead bias, ignored spreads, fake fills. Here is how TICPOZ approaches honest validation.
I will say something uncomfortable up front. Most published retail backtests are lies. Not because the people running them are dishonest, but because the tools they use make honesty optional and the audience they post to rewards results that look good. The compounding incentive is to produce equity curves that go up and to the right with smooth, low drawdown, because those are the ones that get shared. The methodology that produces such curves is almost always wrong.
This post is about the specific ways backtests deceive their authors, and what we do inside TICPOZ to make our backtests less prone to those failure modes. We are not claiming we have eliminated bias. We are claiming we have eliminated the easy biases and we surface the hard ones.
The six common lies
1. Curve-fitting
Curve-fitting is the act of tuning a strategy's parameters until they fit the historical data perfectly, then publishing the fit as if it were a prediction. A strategy with five parameters and a one-year history has enormous degrees of freedom. You can almost always find a parameter combination that turns historical noise into a beautiful equity curve. That curve has zero predictive power.
2. Lookahead bias
Lookahead bias is using information at time T that you could not have known at time T. The classic case is calculating an indicator using the day's high or low and then "trading" off it during that same day. Less obviously, it shows up in libraries that return rolling values with the wrong alignment, or in code that uses bar-close prices for signals but bar-open prices for fills without checking whether the signal was even knowable at the open.
3. Survivorship
Survivorship bias is testing on the universe of instruments that still exist today, which excludes everything that went to zero or got delisted along the way. A "stock-picking strategy" backtested on the current S&P 500 constituents is testing on a universe whose membership was selected, in part, on the basis of having survived.
4. Ignored spread
Spread is the gap between bid and ask. Every trade you enter at market pays the spread. Strategies that hold positions for minutes and trade frequently are dominated by spread cost. A retail backtest that uses mid-price for both entry and exit overstates returns by an amount that scales with trade frequency. For high-frequency strategies the bias is catastrophic; the strategy is unprofitable in live trading and the backtest never warned.
5. Fake fills
Backtests assume your order fills at the price you wanted. In real markets, your fill depends on what is sitting in the order book at that instant, what the broker's execution venue routes the order to, and how much liquidity exists. At news events, in thin sessions, or on small-cap instruments, the fill can be several pips away from the price your signal saw. Strategies that look profitable in backtest because they catch the first tick of a move often look unprofitable in live trading because they catch the first tick at three pips of slippage.
6. Ignored slippage on stops
Related to fake fills but deserves its own item. A stop-loss in a backtest exits at the stop price. A stop-loss in real markets becomes a market order when the price prints, and that market order fills at whatever the current bid is, which during a gap can be twenty or thirty pips worse. Strategies that rely on tight stops are particularly vulnerable. The backtest shows a 10-pip loss; the real world delivers a 30-pip loss; the difference is the whole edge.
What we do inside TICPOZ
Our backtest engine is not magic, but it refuses several of the common shortcuts.
Real broker tick data
We pull ticks from the venue your account is connected to, at the resolution the broker records them. This is not aggregated minute-bar data with synthesized intra-bar prices. It is the actual bid-ask quotes the broker timestamped during the historical window. If the broker had a gap, our data has a gap. If the broker had unusual spreads during rollover, our data has those spreads. The backtest cannot pretend the market was nicer than it actually was.
Broker-actual spread
Every order in the backtest pays the spread that was quoted at that exact tick. If you try to enter long at 09:31:00.124, you pay the ask at 09:31:00.124, not the mid. Exit prices follow the same rule. The cost shows up in the equity curve immediately, which kills high-frequency strategies before you spend three months running them live.
Broker-actual fill behaviour
Our fill model is calibrated against observed fills on the same broker. For market orders in normal conditions, fills happen at the quoted ask or bid. For market orders during news or thin sessions, we widen the fill by a calibrated slippage distribution. For stop orders that trigger during a gap, the fill walks to the next quoted price after the gap, which is often worse than the stop level. None of this is "punishment for being aggressive in backtest"; it is what would have happened.
Walk-forward validation by default
Every backtest is split into in-sample and out-of-sample sections. Parameters are tuned only on the in-sample portion. Results are reported on the out-of-sample portion. You cannot override this without explicitly clicking through a "use full history for reporting" toggle, and when you do, the resulting metrics get a different colour and a prominent warning. The default is the honest path; cheating requires a deliberate action.
Mandatory out-of-sample portion
We refuse to publish a backtest result to other users — for example, in our strategy marketplace — unless at least thirty percent of the data was held out as out-of-sample. Strategies that fit beautifully in-sample but fall apart out-of-sample do not get listed. This is unpopular with strategy sellers. We are okay with that.
The Illustrative badge
Every metric we display, including the metrics from in-house strategies, carries an ILLUSTRATIVE or HYPOTHETICAL label. This is not a legal fig-leaf. It is a real claim about the limits of what a backtest can prove. A backtest is evidence that a strategy would have made money in the past under specific assumptions. It is not a prediction that the strategy will make money in the future.
What we have not solved
Three honest admissions.
First, we cannot eliminate curve-fitting. We can flag suspicious-looking parameter sweeps, refuse to publish strategies whose in-sample-vs-out-of-sample gap is too large, and pre-fill our parameter dialogs with sensible defaults that discourage tweaking. But a determined user can still over-fit. The only real defence is statistical discipline by the strategy author, and we cannot enforce discipline.
Second, our tick data goes back as far as the broker keeps it, which is typically two to four years. Strategies that depend on regimes longer than that — for example, the behaviour of FX during the 2008 crisis — cannot be fully validated. We disclose the available range for every instrument; we do not extrapolate.
Third, our slippage model is calibrated against observed historical fills. Live markets can produce slippage outside any historical distribution, particularly during black-swan events. A backtest that survives our slippage model is not guaranteed to survive a flash crash. Nothing is.
The closing thought
A backtest is a hypothesis, not a result. The hypothesis is: under these specific historical conditions, with these specific data assumptions, a strategy of this shape would have produced this curve. The test of the hypothesis is the live track. There is no substitute.
Be suspicious of platforms that show you backtests without showing you live performance. Be suspicious of strategies whose live performance is invisible because they only just launched. Be suspicious of your own backtests, especially the ones that look good. The ones that look mediocre are usually closer to the truth.