The Batch Backtest Workflow: From 500 Strategies to a Go/No-Go Decision

Running individual backtests one at a time does not scale. Here is the workflow I built to run 500+ strategy/symbol combinations in parallel, filter for quality, and produce a ranked decision list — all from a single shell command.

MF
Martin Fournier
· June 14, 2026 · 4 MIN READ
Illustration for: The Batch Backtest Workflow: From 500 Strategies to a Go/No-Go Decision

When you have 20 strategies and 10 currency pairs, the naive approach is 200 manual backtest runs. Each takes 30 seconds to set up, a minute to run, and another minute to log results. That is 10 hours of mechanical work with zero insight gained until the very end.

That does not scale. What scales is a batch workflow that compiles, runs, filters, and ranks in one shot.

The Workflow, End to End

The pipeline lives in two components. A shell script orchestrates the lifecycle. A Java runner does the actual work. Together they form a single command:

./scripts/batch-gen.sh --count 500 --types all

Here is what that command actually does.

Phase 1: Compile

The script runs mvn compile -q on the monorepo. Quiet mode suppresses the noise. If compilation fails, the pipeline stops immediately. No point debugging a runtime error that is really a compile error.

Phase 2: Run in Parallel

The BatchStrategyRunner class takes a set of parameters: count, strategy types, bar count, initial capital, thread count. It generates random strategy parameters within the specified type constraints (trend, mean reversion, breakout, momentum) and backtests each one against historical data.

Key detail: the runner uses Java parallel streams. Thread count defaults to available processors. On a 16-core machine, 16 strategies run simultaneously. A batch of 500 strategies completes in roughly the same wall-clock time as 30 sequential runs.

Phase 3: Filter

After all runs complete, the runner applies selection criteria. In standard mode it emits every result. In selection mode (--min-sharpe 1.5 --min-pf 2.0 --max-dd 20 --target 10) it keeps generating until enough strategies pass all gates, then stops.

This is the difference between "here is a dump of everything" and "here are the strategies worth your time."

Phase 4: Export

Three files land in batch-results/:

  • ranking.html — interactive dashboard with Chart.js. Sortable table of every strategy with Sharpe, profit factor, max drawdown, win rate. Click a row to see equity curve.
  • ranking.json — raw data for programmatic consumption. Used by downstream tools and the Laravel dashboard.
  • summary.txt — terminal-friendly text report for quick review.

A strategies/ directory contains the top N candidates as compilable Java source files, ready to drop into the strategy catalog.

Phase 5: Review and Decide

The ranked list becomes the starting point for a Go/No-Go review. Not every high-Sharpe strategy gets deployed. The ranking surfaces candidates; domain knowledge filters them:

  • Strategies with 3 trades over 10 years get rejected regardless of Sharpe.
  • Strategies that profit only from one specific year get flagged as overfit.
  • Strategies with identical entry logic on different pairs get consolidated.

The All-Combinations Variant

The batch-gen script generates random strategies. A separate tool, RunAllBatchBacktests, takes the opposite approach: it iterates over every existing strategy in StrategyCatalog x every available symbol and runs each combination.

This is useful for portfolio construction. You want to know which strategies work on which pairs, and more importantly, which strategy-pair combinations are correlated. Running them all at once produces a correlation matrix that no manual process can match.

Why This Workflow Matters

The operational cost of running one backtest is close to zero. The operational cost of running 500 backtests is also close to zero, just parallelized. The real bottleneck is not compute. It is the decision loop: looking at results, forming a judgment, and deciding what to do next.

A batch workflow does not automate judgment. It compresses the time between asking a question and seeing the answer. Instead of "run test, look at output, run next test, look at output," it becomes "run batch, review ranked list, make decisions."

The difference is the difference between browsing files one at a time and searching with grep. Same data, different throughput.

Practical Lessons

Start with the baseline. Before adding selection criteria, run a baseline batch with no filters. You need to know what "random" looks like to recognize what "good" looks like.

Always sample the equity curve. A summary metric like Sharpe hides drawdown timing. Two strategies with identical Sharpe can have radically different equity curves. Sample the curve at 500 points and include it in the export.

Reject strategies that trade too little. A strategy that opens 2 positions in 10 years is not a strategy. Set a minimum trade count filter at the same level as your Sharpe threshold.

Write the ranked list to a file, not just stdout. You will want to compare batches over time. A file is searchable, diffable, and ingestible by other tools. Screen output is ephemeral.

The One-Liner

The entire pipeline, from source code to ranked decision list, runs as:

./batch-gen.sh --target 10 --min-sharpe 1.5 --min-pf 2.0 --max-dd 20

That command compiles 11 Maven modules, generates 500 strategies, runs them in parallel across all available cores, filters for Sharpe >= 1.5, profit factor >= 2.0, and max drawdown <= 20%, stops when 10 candidates are found, exports an interactive ranking dashboard, and writes the top strategies as deployable Java files.

No manual step survives in that chain. Every hour saved on mechanical backtesting is an hour spent on the only part that matters: deciding which strategies to trade.