Cross-Scenario Evaluation#
Evaluates parameter configurations across multiple scenarios simultaneously and ranks using cross-scenario criteria.
Three ranking strategies are available:
stability-first: Sort by minimum pass rate across scenarios, then total fails, then minimum combined score. Best when you need all scenarios to pass.
score-first: Sort by minimum combined score across scenarios. Best when you want the highest floor on quality.
balanced: Sort by geometric mean of combined scores. Best when you want a balanced tradeoff across scenarios.
This implements Lesson L4 (cross-scenario needs different ranking).
CLI Usage#
# Evaluate top configs across baseline and growth_plus
python -m calibration --phase cross-eval \
--scenarios baseline,growth_plus \
--configs output/baseline_stability.json \
--seeds 100 --rank-by stability-first
Required flags:
--scenarios: Comma-separated list of scenario names--configs: Path to stability/screening result JSON
Python API#
from calibration.cross_eval import evaluate_cross_scenario, rank_cross_scenario
results = evaluate_cross_scenario(
configs=[{"beta": 5.0, "max_M": 4}],
scenarios=["baseline", "growth_plus"],
n_seeds=100,
)
ranked = rank_cross_scenario(results, strategy="stability-first")
Scenario Tension#
Use compute_scenario_tension to identify parameters where different
scenarios prefer different values:
from calibration.cross_eval import compute_scenario_tension
tension = compute_scenario_tension(results, ["baseline", "growth_plus"])
for param, details in tension.items():
print(f"{param}: scenarios disagree on best value")
API Reference#
Cross-scenario evaluation – run configs across multiple scenarios.
Evaluates parameter configurations on all specified scenarios simultaneously and ranks using cross-scenario criteria.
- calibration.cross_eval.rank_cross_scenario(results, strategy='stability-first')[source]
Rank configs using cross-scenario criteria.
- Parameters:
results (
list[CalibrationResult]) – Results withscenario_resultspopulated.strategy (
str) – Ranking strategy: - “stability-first”: min(pass_rates) -> total fails -> min(combined) - “score-first”: min(combined) -> total fails - “balanced”: geometric mean of combined scores
- Returns:
Sorted results (best first).
- Return type:
list[CalibrationResult]- Raises:
ValueError – If strategy is not recognized.
- calibration.cross_eval.evaluate_cross_scenario(configs, scenarios, n_seeds=100, n_periods=1000, n_workers=10)[source]
Evaluate configs across multiple scenarios.
- Parameters:
- Returns:
Results with scenario_results populated.
- Return type:
list[CalibrationResult]
- calibration.cross_eval.compute_scenario_tension(results, scenarios)[source]
Analyze parameter tensions between scenarios.
Identifies params where the optimal value differs between scenarios, indicating a fundamental trade-off.
- Parameters:
results (
list[CalibrationResult]) – Results withscenario_resultspopulated.scenarios (
list[str]) – Scenario names to compare.
- Returns:
Per-parameter tension info: which value each scenario prefers, and the score gap.
- Return type:
dict[str,dict]
- calibration.cross_eval.run_cross_eval_phase(args, run_dir=None)[source]
CLI entry point for cross-eval phase.