Testing for a Real Edge | Build Your Playbook

1 The standard of proof

A play is a claim — make it pay its way

Component 9 of the template is the honest one: does this play actually make money over a meaningful sample? A handful of remembered winners isn't evidence — memory keeps the wins and quietly deletes the losses. To trust a play, you measure it.

The good news: you only need a few numbers, and they're the same ones Van Tharp built his work around — losses and wins counted in R, the risk you took on each trade.

2 Think in R

Every result in units of risk

Record each trade's result as a multiple of its 1R risk, not in dollars. Risked $200 and made $600? That's +3R. Stopped out for the planned $200? −1R. Working in R makes trades comparable across account sizes and position sizes, and it's what lets you compute expectancy cleanly.

Expectancy (R per trade) = (win rate × average win in R) − (loss rate × average loss in R) Example: 40% win rate, average win +2.5R, average loss −1R = (0.40 × 2.5) − (0.60 × 1.0) = 1.00 − 0.60 = +0.40R per trade

A positive expectancy means the play makes money per trade on average, over a large enough sample. Notice the example wins less than half the time and is still strongly profitable — because the wins are bigger than the losses. Win rate alone tells you almost nothing; expectancy is the number that matters. (Full walk-through on expected value.)

3 Sample size & honesty

Enough trades, fairly counted

·Gather a real sample. A few trades prove nothing — variance dominates. Aim for dozens of instances (backtested, paper, or small live) before you trust the stats. The rarer the play, the longer this takes.
·Count every instance, including the ones you'd skip. If you only log the trades you remember taking, you've already biased the result. Define the play's trigger and count all times it occurred in your test window.
·Separate backtest from forward test. A backtest checks the idea on history; a forward (paper or small-size) test checks that you can actually execute it in real time. Both matter — many plays look great in hindsight and fall apart live.

4 Filter by conditions

Don't average two different games together

This is the most common reason a real edge hides in the data: you pool instances that happen under different conditions. The clearest case is news. As the trade-the-news lesson shows, the same breakout has very different odds with a fresh catalyst versus without one. Pool them and the average looks mediocre; split them and one bucket may have a strong edge while the other has none.

So tag each instance by its conditions — news vs. no-news, trend vs. range, time of day — and check the stats within each bucket. Often the real discovery isn't "this play works" but "this play works under these specific conditions," which becomes component 3 of your template.

5 The curve-fitting trap

Beware the perfect backtest

If you keep adding rules until the historical results look flawless, you haven't found an edge — you've memorized the past. A play with ten finely-tuned conditions that worked beautifully on last year's data usually falls apart on next year's. Favour simple, robust plays with few parameters and a thesis that makes sense, and always reserve some data (or forward time) the rules never saw. If the edge only appears after heavy optimization, distrust it.

Your task

Pick your play's trigger, find ~20–30 historical instances, and record each as an R-multiple, tagged by conditions. Compute the win rate, average win/loss, and expectancy — overall and within the most important condition bucket. Fill in component 9. If expectancy isn't positive, the next lesson's review loop is how you fix or retire it.