Public docs

Guide: Backtests Run Detail and Compare

Inspect the verdict-first run-detail, explainability, realism, and compare surfaces behind the signed-in backtests workflow.

Deep surface guide

What this guide covers

Use this guide when a backtest has already been launched and you need to inspect the run as a governed evidence package with a clear next action, not just a return number.

Run detail is where manifest, critical state, explainability, memo drafting, realism, and execution context come together.

Compare is the ranking and decision layer. Use it when multiple runs are competing for promotion or experiment selection.

Treat memo, explainability, and workflow actions as part of the validation process, not as after-the-fact reporting.

Routes and surfaces

These are the deeper validation surfaces behind the top-level backtests workflow.

/backtests/[runId]

Run detail page

Use the main run route for verdict, manifest, context, telemetry, and workflow actions around one specific backtest.

/backtests/[runId] · overview, analytics, execution, realism, portfolio, repro, trades

Summary tabs

Move through overview, analytics, execution, realism, portfolio, repro, and trades without losing the same run context.

/backtests/[runId] · memo, explainability, workflow actions

Memo, explainability, and workflow actions

Prepare drafts, inspect explainability, attach experiments, and submit promotion packets from the same run surface.

/backtests/[runId] · critical state, manifest, telemetry

Critical state and telemetry

Use critical-state rails, manifest summaries, and telemetry points to understand whether the run is operationally trustworthy.

/backtests/compare

Compare

Rank runs, inspect explainability deltas, and record the winning rationale before promotion.

Recommended validation loop

Use this sequence to keep run validation explicit before anything is promoted.

Inspect the run state first

Start with the verdict, core metrics, and carried context before you focus on any single deep-dive panel.

Use explainability and memo surfaces

Review explainability, prepare the memo, and capture the logic chain while the run is still under inspection.

Validate realism and reproducibility

Use realism, execution, repro, and telemetry detail so the run is trustworthy beyond the headline outcome.

Decide through compare

Use compare and the downstream carry-forward actions to choose among candidates instead of promoting a single run in isolation.

Run-detail operating controls

The run-detail shell is not just a dashboard. It is the place where review, handoff, and promotion posture stay attached to one run.

run header

Header summary cards and tabs

Start with the verdict hero, core metrics, progress, and tab selection before drilling into any single panel.

operator actions

Header actions and promotion triggers

Clone, cancel, report, repro verification, and promotion-gate actions live in the header. Treat them as governed operator controls, not convenience buttons.

workflow handoff

Compare, experiment, and packet actions

The workflow-actions panel is where a run gets compared, rerun, attached to an experiment, or submitted as a promotion packet candidate.

narrative evidence

Memo and explainability handoffs

Use the memo panel, trust panel, and artifact conversion controls to turn run review into durable report or strategy-draft artifacts while context is still fresh.

Realism, replay, and reproducibility controls

The execution, realism, repro, and trade tabs are where a promising run either holds up or breaks apart.

realism setup

Profile selection and shadow launch

Select the realism profile that matches the intended execution posture, then launch a shadow run from the same run detail page instead of context-switching.

shadow comparisons

Shadow run ledger and diff metrics

Review shadow run status, median absolute delta, p95 delta, and point counts before trusting that replay matches the original execution story.

execution replay

Signals, order events, fills, and replay grids

The execution and trades tabs expose signal explanations, order events, fill quality, participation, and slippage rows that explain how the run actually behaved.

repro chain

Manifest, inputs, and reproducibility artifacts

Use manifest rows, inputs, reproduce scripts, repro checks, and exported artifacts together so the run can be recreated and challenged later.

Compare decision controls

The compare page is a ranking tool, but it is also where experiment selection becomes traceable.

run filtering

Run list filters and selection limits

Use status filter, search, density mode, and the two-to-five run selection window to build a comparison set that is small enough to reason about.

metric ranking

Top-5 quick select and comparison metric

Switch the comparison metric between sharpe, return, and drawdown before using quick-select so the selected cohort reflects the decision you actually want to make.

trust panel

Compare explainability and artifact handoff

The compare explainability panel summarizes the current winner and lets you save report or strategy-draft artifacts from the comparison context.

experiment decision

Winner selection and rationale capture

Choose the experiment, winner run, and written rationale in the decision panel so the final selection is preserved with its context instead of living only in chat or memory.

Example validation scenarios

Use these playbooks when a promising run needs to survive scrutiny before it earns promotion.

trust test

Challenge a great-looking run with realism evidence

Open the run detail, inspect critical state and summary tabs, then move into realism and execution replay so a strong return is tested against fill quality, replay drift, and operational posture.

Expected outcome

You find out whether the run remains promotable once execution realism is included, not just whether the performance chart looks attractive.

selection process

Choose among several plausible winners

Use compare filters, metric switching, and explainability summaries to build a tight cohort, then record the winner in the decision panel with explicit rationale.

Expected outcome

The chosen run is preserved as a documented winner against real alternatives instead of becoming an informal favorite.

repro chain

Audit reproducibility before sharing or promotion

Inspect manifest rows, repro artifacts, and exported evidence from the run-detail surface before you send the run into experiments, packets, or reports.

Expected outcome

Later reviewers can recreate and challenge the run without depending on hidden local context or fragile memory.

Review and guardrails

Do not let a strong return hide weak manifest, critical-state, or realism signals.

Treat memo and explainability drafting as part of the evidence package, not optional reporting polish.

Use compare when more than one plausible run exists so the winner is documented against real alternatives.

Promotion packet actions should come after run detail, realism, and compare evidence are all coherent.

Next steps

Open Backtests

Jump into the signed-in backtests surface, open the latest verdict, and continue into compare or downstream review with the same evidence attached.

Open Compare

Use the compare surface to rank runs, inspect deltas, and record the winning rationale.

Strategy sweeps and optimizer guide

Return upstream when the run needs a broader sweep or allocation-design workflow.

Last updated

Mar 24, 2026

Feedback

Report unclear guidance, stale contracts, missing coverage, or broken docs UI on this page.

Open feedback issue

Backtests Run Lifecycle

Strategy Sweeps and Optimizer

On this page

Jump to the section you need without losing your place.

Last updated

Mar 24, 2026

Feedback

Report unclear guidance, stale contracts, missing coverage, or broken docs UI on this page.

Open feedback issue