Timeless Markets.Org
Educational only — not financial advice. Backtest results are not future results; a playbook is a process to study and test for yourself.
Build Your Playbook · Lesson 4

Testing for a real edge

The stats that separate a genuine edge from a good-looking chart.

Uses expected value, R-multiples (Van Tharp) · ← Back to course

1 The standard of proof

A play is a claim — make it pay its way

Component 9 of the template is the honest one: does this play actually make money over a meaningful sample? A handful of remembered winners isn't evidence — memory keeps the wins and quietly deletes the losses. To trust a play, you measure it.

The good news: you only need a few numbers, and they're the same ones Van Tharp built his work around — losses and wins counted in R, the risk you took on each trade.

2 Think in R

Every result in units of risk

Record each trade's result as a multiple of its 1R risk, not in dollars. Risked $200 and made $600? That's +3R. Stopped out for the planned $200? −1R. Working in R makes trades comparable across account sizes and position sizes, and it's what lets you compute expectancy cleanly.

Expectancy (R per trade) = (win rate × average win in R) − (loss rate × average loss in R) Example: 40% win rate, average win +2.5R, average loss −1R = (0.40 × 2.5) − (0.60 × 1.0) = 1.00 − 0.60 = +0.40R per trade

A positive expectancy means the play makes money per trade on average, over a large enough sample. Notice the example wins less than half the time and is still strongly profitable — because the wins are bigger than the losses. Win rate alone tells you almost nothing; expectancy is the number that matters. (Full walk-through on expected value.)

3 Sample size & honesty

Enough trades, fairly counted

4 Filter by conditions

Don't average two different games together

This is the most common reason a real edge hides in the data: you pool instances that happen under different conditions. The clearest case is news. As the trade-the-news lesson shows, the same breakout has very different odds with a fresh catalyst versus without one. Pool them and the average looks mediocre; split them and one bucket may have a strong edge while the other has none.

So tag each instance by its conditions — news vs. no-news, trend vs. range, time of day — and check the stats within each bucket. Often the real discovery isn't "this play works" but "this play works under these specific conditions," which becomes component 3 of your template.

5 The curve-fitting trap

Beware the perfect backtest

If you keep adding rules until the historical results look flawless, you haven't found an edge — you've memorized the past. A play with ten finely-tuned conditions that worked beautifully on last year's data usually falls apart on next year's. Favour simple, robust plays with few parameters and a thesis that makes sense, and always reserve some data (or forward time) the rules never saw. If the edge only appears after heavy optimization, distrust it.

Your task

Pick your play's trigger, find ~20–30 historical instances, and record each as an R-multiple, tagged by conditions. Compute the win rate, average win/loss, and expectancy — overall and within the most important condition bucket. Fill in component 9. If expectancy isn't positive, the next lesson's review loop is how you fix or retire it.