Binary Bars and Byte Orders: Building a High-Performance Backtest Data Pipeline
How I built a custom binary bar format with MappedByteBuffer, chased an endianness bug across Python and Java, and organized 11 Maven modules around zero circular dependencies.
The Problem with CSVs
When you backtest 15 forex strategies across 9 pairs over 20 years of H1 data, CSV files become a bottleneck. A single year of H1 data for one pair is a few hundred kilobytes. Twenty years times 9 pairs quickly becomes gigabytes of text files. Parsing CSVs repeatedly is slow, and the data layout wastes space.
The fix: a custom binary format that loads 1 million bars in under 50 milliseconds with zero heap allocation.
The Binary Bar Format
Each bar is exactly 44 bytes, stored in a memory-mapped file:
┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐
│ timestamp│ open │ high │ low │ close │ volume │
│ (long) │ (double) │ (double) │ (double) │ (double) │ (int) │
│ 8B │ 8B │ 8B │ 8B │ 8B │ 4B │
└──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘
44 bytes per bar, fixed-width. No variable-length fields, no parsing overhead. The file is just a flat array of records that you index into directly.
I chose MappedByteBuffer from java.nio for the implementation. The key insight is that MappedByteBuffer lets the OS kernel handle paging -- bars are never fully loaded into heap memory. A 20-year H1 file with approximately 693,000 bars takes about 30 MB on disk and effectively zero Java heap.
public class BarStore {
public static final int BAR_SIZE = 44;
private MappedByteBuffer buffer;
public void open() throws IOException {
try (var channel = (FileChannel) Files.newByteChannel(filePath)) {
long size = channel.size();
this.barCount = (int) (size / BAR_SIZE);
this.buffer = channel.map(MapMode.READ_ONLY, 0, size);
this.buffer.order(DATA_BYTE_ORDER);
}
}
public Bar get(int i) {
int pos = i * BAR_SIZE;
var ts = readTimestamp(buffer.getLong(pos));
return new Bar(symbol, ts,
buffer.getDouble(pos + 8), // open
buffer.getDouble(pos + 16), // high
buffer.getDouble(pos + 24), // low
buffer.getDouble(pos + 32), // close
buffer.getInt(pos + 40)); // volume
}
}
Access is O(1) by index and completes in under 1 microsecond (code Javadoc estimate). Binary search over 693,000 bars for date range queries completes in about 20 iterations.
The Conversion Pipeline
Historical data comes from Dukascopy, downloaded via dukascopy-node. A Python script converts the CSVs to the binary format using struct.pack:
# '<qddddi' = little-endian, long, double x4, int
f.write(struct.pack('<qddddi', ts, o, h, l, c, v))
This lives in scripts/download-data.sh which handles the full pipeline:
- Download CSV from Dukascopy (rate-limited, one pair at a time)
- Convert to
.barsbinary format - Store in
data/historical/bars/alongside metadata
The BarStore has a main() entry point too, so you can batch-convert existing CSVs from the Java side.
The Endianness Bug
For months, this worked perfectly on my x86-64 machine. Here is why it worked: x86 uses little-endian byte order. The JVM on x86 defaults to little-endian. Python's struct.pack('<qddddi') writes little-endian. Everything was accidentally aligned.
The problem would surface on any big-endian platform -- a SPARC server, a network processor, or even a different JVM configuration. The BarStore wrote files using the JVM default byte order, and the Python script used an explicit little-endian directive. The contract was implicit and fragile.
The fix was a single line declaring the byte order explicitly on the buffer:
private static final ByteOrder DATA_BYTE_ORDER = ByteOrder.LITTLE_ENDIAN;
Applied in both write() and open() paths. Now the contract with the Python pipeline is self-documenting and portable. If someone runs this on a big-endian system, the byte order is asserted explicitly rather than relying on platform coincidence.
The commit message (5711fe6, June 2 2026) tells the full story: "On x86-64 Linux the default happens to be LE, so existing data was correct by accident, but the mismatch would break on any big-endian platform."
Backward Compatibility
The BarStore handles legacy files that used epoch seconds instead of milliseconds. The readTimestamp method checks a heuristic:
private static Instant readTimestamp(long raw) {
return raw > 1_000_000_000_000L
? Instant.ofEpochMilli(raw)
: Instant.ofEpochSecond(raw);
}
Values over 1 trillion are clearly millis (September 2001 onward); anything below is seconds. This lets the system read old files without a re-conversion step. A unit test (read_supportsLegacyEpochSeconds) validates this with a manually constructed binary buffer.
The Module Architecture
The full Trading Bridge project is an 11-module Maven monorepo with a strict zero-circular-dependency rule. Here is the dependency graph:
trading-core Domain models, Strategy interface, Indicators
trading-backtest BacktestEngine, RunContext, reports
trading-data BarStore, HistoricalDataLoader, OANDA client
trading-broker Broker connectors (OANDA REST, IBKR TWS)
trading-strategies 45+ creative and imported strategies
trading-runtime ControlPlane HTTP+WS, EventStore, promote gates
trading-examples RunBacktest CLI, golden tests
trading-parser StrategyQuant XML to Java conversion
trading-genetics Genetic strategy search (offline)
trading-tui JLine3 terminal client
dashboard/ Laravel control room (outside Maven reactor)
The dependency graph is acyclic by design. trading-core has zero internal trading dependencies -- it defines the domain models (Bar, Order, Strategy, Position) and every other module depends on it. This makes each module independently testable.
The Data Flow
Here is how the binary bars feed into the backtest engine:
- Download:
dukascopy-nodefetches historical H1 data as CSV - Convert: Python
struct.pack('<qddddi')writes binary.barsfiles - Load:
HistoricalDataLoaderopens BarStore files via memory-mapped I/O - Feed:
BacktestEngineiterates bars, calling each strategy'sonBar() - Record: Results go to
RunEventJSONL and an SQLite event store
The Golden Baseline
Every backtest run is validated against a golden baseline. The canonical numbers for LondonOpenRangeBreakout on EUR/USD H1 2012 are:
- 8760 bars, 61 trades
- Total return: 0.1397% ($139.67 on $100k capital)
- Max drawdown: 0.048%
- Tolerance: +/-1% on return, +/-0.01 pp on drawdown
A smaller CI subset (744 bars, 3 trades) runs on every push so no local historical data is needed for the basic smoke test.
The Endianness Lesson
This bug was never visible in production. It was a latent time bomb that only code review caught. The lesson is simple: whenever two systems agree on a binary format, state the byte order explicitly in both. Do not rely on platform defaults, even if both systems currently run on x86.
A lot of software engineering is like this -- fixing things that are technically correct but accidentally so. The endianness fix did not change any behavior on my machine. It changed the contract from implicit to explicit, which matters when the system grows beyond one developer on one architecture.
What I Would Do Differently
I would have written the BarStore's byte order assertion on day one, before the first conversion script ran. The ByteOrder.LITTLE_ENDIAN constant is three words that would have saved a commit and a documentation note. But the debugging process -- comparing Python output bytes to Java expectations -- was itself educational and led to a robust test for legacy file support.
This is one module in a larger trading infrastructure. The backtest engine, strategy promotion pipeline with qualification gates, and broker reconciliation system each have their own stories.