Does backtesting work? / Why doesn't backtesting work? - Genius Mathematics Consultants (2024)

If a trading strategy seems to backtest successfully, why doesn’t it always work in live trading?

It’s widely acknowledged that a strategy that worked in the past may not work in the future. Market conditions change, other participants change their algorithms, adapt to your attempts to pick them off, and so on. This means you need to continually monitor and adjust even profitable strategies.

But there’s something even more problematic about backtesting strategies, which fewer people understand clearly. This is that a profitable backtest does not prove that a strategy “worked”, even in the past. This is because most backtests do not achieve any kind of “statistical significance”.

As everyone knows, it’s trivial to tailor a strategy that works beautifully on any given piece of historical data. It’s easy to contrive a strategy that fits the idiosyncratic features of a particular historical dataset, and then show that it is profitable when backtested. But when no mechanism actually exists relating the signal to future movement, the strategy will fail in live testing.

So how does one tell the difference? How can one show that a backtest is not only profitable, but statistically significant?

Statistical hypothesis testing in trading

If you’ve studied some basic statistics, you’ve probably heard of hypothesis testing.

In hypothesis testing, it’s not enough for a model to fit the data. It’s got to fit the data in a way that is “statistically significant”. This means that it’s unlikely that the model would fit the data to the extent that it does, by chance or for any other reason than that the model really is valid. The only way for the model not to be valid is to invoke an “unlikely coincidence”.

One proposes some hypothesis about the data, and then considers the probability (called the p-value) that the apparent agreement between the data and the hypothesis occurred by chance. By convention, if the p-value is less than 5%, the hypothesis is considered statistically significant.

It’s worthwhile to place backtesting within this framework of hypothesis testing to help understand what, if anything, we can really conclude from a given backtest.

Coin toss trading

Let’s keep it simple to start with. Let’s suppose we have an algorithm which predicts, at time steps \(t_1,…,t_n\), whether the asset will subsequently move up (change \(\geq 0\)) or down (change \(<0\)) over some time interval \(\Delta T\). We then run a backtest and find that our algorithm was right some fraction \(0 \leq x \leq 1\) of the time.

If our algorithm was right more than half of the time during the backtest, what’s the probability that our algorithm was right only by chance? This is calculated using the binomial distribution. To see some numbers, let’s suppose our algorithm makes 20 predictions (\(n=20\)) and is right for 12 of them. The probability of this happening entirely by chance is about 25%. If it’s right for 14 of them, the probability of this happening by chance is about 5.8%. This is approaching statistical significance according to convention. The idea is that it’s “unlikely” that our strategy is right by chance, therefore the mechanism proposed by the strategy is likely correct. So if our algorithm got 15 or more correct during the backtest, we’re in the money, right? Not so fast.

To take an extreme example, let’s suppose that our piece of historical data was a spectacular bitcoin bull run that went up 20 times in a row. And let’s suppose that our strategy is “Bitcoin only goes up!” Then our calculation above would prove that the strategy works with a statistical significance of 0.0001%! What’s gone wrong here?

When calculating the p-value for a linear regression, standard statistics usually assumes that the “noise” in the data is random and normally distributed. One mistake we have made in the above analysis is assuming that the actual price trajectory is like a coin toss – equally likely to go up or down. But market movements are not random. They can, for example, be highly autocorrelated. And they can go up in a highly non-random way for quite some time, before turning around and going down.

Secondly, we presumably looked at the data before deciding on the strategy. If you’re allowed to look at the data first, it’s easy to contrive a strategy that exactly matches what the data happened to do. In this case, it’s not “unlikely” that our strategy is profitable by mere coincidence, because we simply chose the strategy that we could see matched the data.

Another thing that can destroy statistical validity is testing multiple models. Suppose a given model has a p value of 0.05, that is, it has only a 5% chance of appearing correct by chance. But now suppose you test 20 different models. Suddenly it’s quite likely that one of them will backtest successfully by chance alone. This sort of scenario can easily arise when one tests their strategy for many different choices of parameter, and chooses the one that works. This is why strategy “optimization” needs to be done carefully.

So how do you backtest successfully?

In practice, we wouldn’t be checking whether the asset goes up or down. Instead, we’d likely check, across all pairs of buy and sell decisions, whether the sellprice minus the buyprice amounted to a profit greater than buy and hold. We would then ask, what is the probability that this apparent fit occurred by chance, and the strategy doesn’t really work? If it seems unlikely that the observed fit could be a coincidence, we may be onto a winner.

On the other hand, a trader may have some external or pre-existing reason for believing that a strategy could work. In this case, he/she may not require the same degree of statistical significant. This is analogous to Bayesian statistics where one incorporates a prior belief into their statistical analysis.

Now, HFT (high frequency trading) backtests can often achieve statistical significant much more easily because of the large amount of data and the large number of buy/sell decisions in a short space of time. More pedestrian strategies will have a harder time.

So does machine learning work for trading?

People often ask whether machine learning techniques are effective for developing trading strategies. The answer is: it depends on how they’re applied. When machine learning models are fit to data, they produce certain “p-value” statistics which are vulnerable to all the issues we’ve discussed. Therefore, some care is needed to ensure the models are in fact statistically significant.

Does backtesting work? / Why doesn't backtesting work? - Genius Mathematics Consultants (2024)

FAQs

Why does backtesting not work? ›

This is that a profitable backtest does not prove that a strategy “worked”, even in the past. This is because most backtests do not achieve any kind of “statistical significance”. As everyone knows, it's trivial to tailor a strategy that works beautifully on any given piece of historical data.

Is backtesting reliable? ›

However, backtesting is not a guarantee of future success, and it can be prone to errors and biases if not done properly. In this article, you will learn how to conduct reliable backtests using technical analysis tools and techniques.

What is the best backtesting software without coding? ›

Tradewell is a backtesting and analytics web app designed to help traders succeed. “A fantastic tool for investors looking for a no-code backtesting tool.

Learn More ›

What is the best free site for backtesting trading strategies? ›

AlgoTest - Free Backtesting Options Trading Strategies in India.

View Details ›

What are the risks of backtesting? ›

Dangers of backtesting trading strategies

This can lead to inaccurate results and unreliable projections about future performance. Another common pitfall is overfitting, which occurs when traders focus too much on optimizing their strategy and neglect broader market factors.

View Details ›

How much backtesting is enough? ›

When you are backtesting a strategy on a higher timeframe, you will have to go back 6 to 12 months. Ideally, you want to end up with 30 to 50 trades in your backtest to get a meaningful sample size. Anything below 30 trades does not have enough explanatory power.

Discover More Details ›

What is the best platform to backtest trading? ›

Top best backtesting software for stocks 2024

Amibroker. Amibroker is a comprehensive and highly customizable backtesting platform that allows traders to develop, test, and optimize their trading strategies. ...
TradeStation. ...
MetaTrader 4/5. ...
NinjaTrader. ...
Backtrader. ...
Quant Rocket. ...
Trade Ideas. ...
MultiCharts.

More items...

Apr 24, 2024

Discover More Details ›

Which is the fastest backtesting framework? ›

Backtesting.py is a small and lightweight, blazing fast backtesting framework that uses state-of-the-art Python structures and procedures (Python 3.6+, Pandas, NumPy, Bokeh). It has a very small and simple API that is easy to remember and quickly shape towards meaningful results.

View Details ›

Is TradingView good for backtesting? ›

TradingView has become one of the most popular platforms for backtesting strategies, with its easy-to-use interface and variety of built-in tools. In this comprehensive guide, we'll cover everything you need to know to effectively backtest on TradingView.

Read On ›

How do you backtest a strategy without coding? ›

How To Backtest With No-Code. Capitalise. ai's backtesting feature simplifies the process by providing an intuitive, code-free environment. Users can set up their trading rules and parameters through an easy-to-use interface, enabling them to analyze the performance of their strategies over historical market data.

Is Python good for backtesting? ›

Python is an excellent tool for backtesting, and here are some reasons why: Tools like Jupyter Notebook and iPython make it easy to write and test code. NumPy and Pandas simplify processing large numerical datasets. There are many backtesting libraries available in Python.

Learn More Now ›

Is backtesting free on TradingView? ›

you can do charting create alerts create strategies and of course, you can do backtesting. Now there are a couple of reasons why we are using the trading view. Number one is that it's free.

Get More Info Here ›

Is NinjaTrader backtesting free? ›

You're able to create custom online futures charts through NinjaTrader, which unlocks a world of possibilities for your trades. Not only that, but NinjaTrader is completely FREE to use for charting, backtesting, trade simulation, and technical analysis.

Tell Me More ›

How do you manually backtest a trading strategy? ›

To manually backtest a trading strategy, you need access to historical data for the market you intend to trade. Traders are advised to use several weeks of historical data for short-term trading strategies and up to several years of data for long-term strategies.

Get More Info ›

Why is my trading strategy not working? ›

Too many variables make your trading strategies stop working

The more you put into your strategy, the more likely you are to curve-fit your strategy. The simpler you make it, the better. A system might be so complex that it has no predictive value. A slight market change might turn the strategy into a loser.

View Details ›

How do you backtest accurately? ›

Let us now see the general steps to backtest below.

Step 1: Define the trading strategy. ...
Step 2: Obtain historical data. ...
Step 3: Execute the strategy. ...
Step 4: Track and record results. ...
Step 5: Analyse the results. ...
Step 6: Refine and optimise the strategy. ...
Step 7: Validate the strategy.

Aug 14, 2023

Find Out More ›

How do you backtest efficiently? ›

Here are some tips to ensure effective backtesting:

Consider different market scenarios. ...
Aim to keep volatility as low as possible. ...
Backtest using a relevant set of data. ...
Customise backtesting parameters to meet your specific needs to get accurate results. ...
Be careful about over-optimisation.

View Details ›

Is EA backtesting accurate? ›

Data quality and availability: Backtesting typically relies on historical price and volume data. However, this data may not accurately reflect real-time market conditions. In real-time trading, market liquidity, bid-ask spreads, and slippage can significantly impact execution and profitability.

Find Out More ›

Does backtesting work? / Why doesn't backtesting work? - Genius Mathematics Consultants (2024)

Statistical hypothesis testing in trading

Coin toss trading

So how do you backtest successfully?

So does machine learning work for trading?

FAQs

Why does backtesting not work? ›

Is Python good for backtesting? ›

References