API Reference#

Full autodoc reference for all calibration modules.

Analysis#

Types, patterns, export, and comparison.

Result types, parameter pattern analysis, config export, and comparison.

This module provides: - Core result types (CalibrationResult, ComparisonResult) - Progress formatting helpers - Parameter pattern analysis for identifying best values - Config export (YAML) and before/after comparison

class calibration.analysis.ScenarioResult(mean_score, std_score, combined_score, pass_rate, n_fail, seed_scores)[source]#

Per-scenario results for cross-scenario evaluation.

Variables:
  • mean_score (float) – Mean score across seeds.

  • std_score (float) – Standard deviation of scores across seeds.

  • combined_score (float) – Combined score: mean * (1 - std).

  • pass_rate (float) – Fraction of seeds with zero FAIL metrics.

  • n_fail (int) – Total number of seed-level failures.

  • seed_scores (list[float]) – Individual seed scores.

mean_score#
std_score#
combined_score#
pass_rate#
n_fail#
seed_scores#
class calibration.analysis.CalibrationResult(params, single_score, n_pass, n_warn, n_fail, mean_score=None, std_score=None, pass_rate=None, combined_score=None, stability_result=None, seed_scores=None, seed_fails=None, scenario_results=None)[source]#

Result from calibration optimization.

Variables:
  • params (dict) – Parameter configuration.

  • single_score (float) – Validation score from single-seed run.

  • n_pass (int) – Number of metrics that passed.

  • n_warn (int) – Number of metrics with warnings.

  • n_fail (int) – Number of metrics that failed.

  • mean_score (float, optional) – Mean score across stability seeds.

  • std_score (float, optional) – Standard deviation of scores across seeds.

  • pass_rate (float, optional) – Fraction of seeds that passed (no FAIL metrics).

  • combined_score (float, optional) – Combined score balancing accuracy and stability.

  • stability_result (StabilityResult, optional) – Full stability test result.

  • seed_scores (list[float], optional) – Individual seed scores (for incremental stability).

  • seed_fails (list[int], optional) – Per-seed fail counts (for incremental stability).

  • scenario_results (dict[str, ScenarioResult], optional) – Per-scenario results for cross-scenario evaluation.

params#
single_score#
n_pass#
n_warn#
n_fail#
mean_score = None#
std_score = None#
pass_rate = None#
combined_score = None#
stability_result = None#
seed_scores = None#
seed_fails = None#
scenario_results = None#
classmethod from_cross_eval(params, scenario_results)[source]#

Create a CalibrationResult from cross-scenario evaluation data.

Computes aggregate fields from per-scenario results.

Parameters:
  • params (dict) – Parameter configuration.

  • scenario_results (dict[str, ScenarioResult]) – Per-scenario evaluation results.

Return type:

CalibrationResult

class calibration.analysis.ComparisonResult(scenario, default_metrics, calibrated_metrics, default_score, calibrated_score, improvements)[source]#

Result from before/after config comparison.

scenario#
default_metrics#
calibrated_metrics#
default_score#
calibrated_score#
improvements#
calibration.analysis.format_eta(remaining, avg_time, n_workers)[source]#

Format an ETA string from remaining items and average time.

Parameters:
  • remaining (int) – Number of remaining items.

  • avg_time (float) – Average seconds per item.

  • n_workers (int) – Number of parallel workers.

Returns:

Formatted ETA string (e.g., “5m 30s”).

Return type:

str

calibration.analysis.format_progress(completed, total, remaining, eta)[source]#

Format a progress line.

Parameters:
  • completed (int) – Number of completed items.

  • total (int) – Total number of items.

  • remaining (int) – Number of remaining items.

  • eta (str) – ETA string.

Returns:

Formatted progress line.

Return type:

str

calibration.analysis.analyze_parameter_patterns(results, top_n=50)[source]#

Analyze which parameter values consistently appear in top configs.

Parameters:
  • results (list[CalibrationResult]) – Screening results sorted by score (best first).

  • top_n (int) – Number of top configs to analyze.

Returns:

For each parameter, a dict mapping value -> count in top configs.

Return type:

dict[str, dict[Any, int]]

calibration.analysis.print_parameter_patterns(patterns, top_n=50)[source]#

Print parameter pattern analysis.

Parameters:
  • patterns (dict) – Output from analyze_parameter_patterns().

  • top_n (int) – Number of top configs used for display.

calibration.analysis.export_best_config(result, scenario, path=None)[source]#

Export best calibration result as a ready-to-use YAML config.

Parameters:
  • result (CalibrationResult) – Best calibration result.

  • scenario (str) – Scenario name.

  • path (Path, optional) – Output path. Defaults to output/{scenario}_best_config.yml.

Returns:

Path to exported config file.

Return type:

Path

calibration.analysis.compare_configs(default, calibrated, scenario, seed=0, n_periods=1000)[source]#

Run default and calibrated configs side-by-side and compare.

Parameters:
  • default (dict) – Default config overrides (can be empty for engine defaults).

  • calibrated (dict) – Calibrated config params.

  • scenario (str) – Scenario name.

  • seed (int) – Random seed.

  • n_periods (int) – Simulation periods.

Returns:

Side-by-side comparison of metrics.

Return type:

ComparisonResult

calibration.analysis.print_comparison(result)[source]#

Print before/after comparison table.

Parameters:

result (ComparisonResult) – Output from compare_configs().

Morris Method#

Morris Method screening (elementary effects).

Morris Method (Elementary Effects) screening for global sensitivity analysis.

This module implements the Morris Method (Morris 1991), which runs multiple One-at-a-Time (OAT) trajectories from random starting points across the parameter space. Unlike standard OAT (which depends on a single baseline), Morris provides two measures per parameter:

  • mu* (mu_star): Mean absolute elementary effect – average importance

  • sigma: Std of elementary effects – interaction/nonlinearity indicator

Classification uses dual thresholds:

INCLUDE: mu* > threshold OR sigma > threshold
FIX:     mu* <= threshold AND sigma <= threshold

This catches interaction-prone parameters that OAT would miss: a parameter with low mu* but high sigma means its effect varies wildly depending on other parameters’ values.

Supports multiple scenarios:
  • baseline: Standard BAM model (Section 3.9.1)

  • growth_plus: Endogenous productivity growth via R&D (Section 3.9.2)

  • buffer_stock: Buffer-stock consumption with R&D (Section 3.9.4)

class calibration.morris.MorrisParameterEffect(name, mu, mu_star, sigma, elementary_effects, value_scores=<factory>)[source]#

Morris method results for a single parameter.

Variables:
  • name (str) – Parameter name.

  • mu (float) – Signed mean elementary effect (can cancel out).

  • mu_star (float) – Mean absolute elementary effect (primary importance measure).

  • sigma (float) – Standard deviation of elementary effects (interaction indicator).

  • elementary_effects (list[float]) – Raw elementary effects from each trajectory.

  • value_scores (dict[Any, list[float]]) – Observed scores for each grid value across trajectories. Used for best_value estimation and grid pruning.

name#
mu#
mu_star#
sigma#
elementary_effects#
value_scores#
class calibration.morris.MorrisResult(effects, n_trajectories, n_evaluations, scenario='baseline', avg_time_per_run=0.0, n_seeds=1)[source]#

Full Morris method screening result.

Variables:
  • effects (list[MorrisParameterEffect]) – Per-parameter results.

  • n_trajectories (int) – Number of Morris trajectories used.

  • n_evaluations (int) – Number of unique configs evaluated.

  • scenario (str) – The scenario that was analyzed.

  • avg_time_per_run (float) – Average wall-clock time per simulation run (seconds).

  • n_seeds (int) – Number of seeds used per evaluation.

effects#
n_trajectories#
n_evaluations#
scenario = 'baseline'#
avg_time_per_run = 0.0#
n_seeds = 1#
property ranked#

Effects ranked by mu_star (highest first).

get_important(mu_star_threshold=0.02, sigma_threshold=0.02)[source]#

Categorize parameters using dual threshold.

A parameter is INCLUDEd if it is either important (high mu*) OR interaction-prone (high sigma). It is FIXed only if both are low.

Parameters:
  • mu_star_threshold (float) – Minimum mu* for inclusion.

  • sigma_threshold (float) – Minimum sigma for inclusion (catches interaction-prone params).

Returns:

(included, fixed) parameter name lists.

Return type:

tuple[list[str], list[str]]

to_sensitivity_result()[source]#

Convert to SensitivityResult for downstream compatibility.

Maps mu* to sensitivity, reconstructs per-value scores from trajectory observations, enabling zero changes to build_focused_grid and all downstream calibration code.

Returns:

Compatible result that can be passed to build_focused_grid().

Return type:

SensitivityResult

calibration.morris.run_morris_screening(scenario='baseline', grid=None, n_trajectories=10, seed=0, n_seeds=1, n_periods=1000, n_workers=10, fixed_params=None)[source]#

Run Morris Method screening analysis.

Generates multiple OAT trajectories from random starting points, evaluates all unique configs in parallel, then computes per-parameter elementary effects (mu*, sigma) for importance and interaction classification.

Parameters:
  • scenario (str) – Scenario to calibrate.

  • grid (dict, optional) – Parameter grid. Defaults to scenario-specific grid.

  • n_trajectories (int) – Number of Morris trajectories (more = more reliable estimates).

  • seed (int) – Base random seed for trajectory generation and evaluation.

  • n_seeds (int) – Number of seeds per config evaluation.

  • n_periods (int) – Number of simulation periods.

  • n_workers (int) – Number of parallel workers.

  • fixed_params (dict, optional) – Parameters to lock at specific values. These params will be included in configs but not perturbed. Use for second-pass Morris screening after locking optimized params from a previous calibration.

Returns:

Morris screening result with per-parameter mu*, sigma, and value scores.

Return type:

MorrisResult

calibration.morris.print_morris_report(result, mu_star_threshold=0.02, sigma_threshold=0.02)[source]#

Print formatted Morris method screening report.

Parameters:
  • result (MorrisResult) – Result from run_morris_screening().

  • mu_star_threshold (float) – Threshold for mu* classification.

  • sigma_threshold (float) – Threshold for sigma classification.

OAT Sensitivity#

One-at-a-time sensitivity analysis and pairwise interaction testing.

One-At-a-Time (OAT) sensitivity analysis with pairwise interaction scanning.

This module provides sensitivity analysis functionality to identify which parameters have the most impact on validation scores.

Supports multiple scenarios:
  • baseline: Standard BAM model (Section 3.9.1)

  • growth_plus: Endogenous productivity growth via R&D (Section 3.9.2)

  • buffer_stock: Buffer-stock consumption with R&D (Section 3.9.4)

class calibration.sensitivity.ParameterSensitivity(name, values, scores, best_value, best_score, sensitivity, group_scores=<factory>)[source]#

Sensitivity result for a single parameter.

Variables:
  • name (str) – Parameter name.

  • values (list) – All values tested for this parameter.

  • scores (list[float]) – Validation scores for each value (averaged across seeds).

  • best_value (Any) – Value that produced the highest score.

  • best_score (float) – Highest score achieved.

  • sensitivity (float) – Score range (max - min), indicating parameter importance.

  • group_scores (dict[str, list[float]]) – Per-metric-group scores for each value. Keys are MetricGroup names (e.g., “TIME_SERIES”, “CURVES”), values are lists parallel to scores.

name#
values#
scores#
best_value#
best_score#
sensitivity#
group_scores#
class calibration.sensitivity.SensitivityResult(parameters, baseline_score, scenario='baseline', avg_time_per_run=0.0, n_seeds=1)[source]#

Full sensitivity analysis result.

Variables:
  • parameters (list[ParameterSensitivity]) – Sensitivity results for all parameters.

  • baseline_score (float) – Score with all default values.

  • scenario (str) – The scenario that was analyzed.

  • avg_time_per_run (float) – Average wall-clock time per simulation run (seconds).

  • n_seeds (int) – Number of seeds used per evaluation.

parameters#
baseline_score#
scenario = 'baseline'#
avg_time_per_run = 0.0#
n_seeds = 1#
property ranked#

Parameters ranked by sensitivity (highest first).

get_important(sensitivity_threshold=0.02)[source]#

Categorize parameters by sensitivity.

Parameters:

sensitivity_threshold (float) – Minimum sensitivity (Δ) for inclusion in grid search.

Returns:

(included, fixed) parameter name lists.

Return type:

tuple[list[str], list[str]]

prune_grid(grid, pruning_threshold)[source]#

Remove poorly-scoring values from grid based on OAT results.

Parameters:
  • grid (dict) – Parameter grid to prune (values per parameter).

  • pruning_threshold (float or None) – Maximum score gap from best value. Values with (best_score - score) > pruning_threshold are dropped. None disables pruning (returns grid unchanged).

Returns:

Pruned grid. Always keeps at least the best value per parameter. Unknown parameters or values are kept (conservative).

Return type:

dict[str, list[Any]]

calibration.sensitivity.run_sensitivity_analysis(scenario='baseline', grid=None, baseline=None, seed=0, n_seeds=1, n_periods=1000, n_workers=10)[source]#

Run OAT sensitivity analysis.

Tests each parameter independently while holding others at baseline values. Supports multi-seed evaluation for more robust sensitivity measurement.

Parameters:
  • scenario (str) – Scenario to calibrate (“baseline”, “growth_plus”, or “buffer_stock”).

  • grid (dict, optional) – Parameter grid. Defaults to scenario-specific grid.

  • baseline (dict, optional) – Baseline parameter values. Defaults to scenario-specific defaults.

  • seed (int) – Base random seed (used as first seed).

  • n_seeds (int) – Number of seeds per evaluation. Seeds are [seed, seed+1, …, seed+n_seeds-1].

  • n_periods (int) – Number of simulation periods.

  • n_workers (int) – Number of parallel workers.

Returns:

Sensitivity ranking of all parameters.

Return type:

SensitivityResult

calibration.sensitivity.print_sensitivity_report(result, sensitivity_threshold=0.02)[source]#

Print formatted sensitivity analysis report with score decomposition.

Parameters:
  • result (SensitivityResult) – Result from run_sensitivity_analysis().

  • sensitivity_threshold (float) – Threshold for INCLUDE/FIX classification (informational preview).

class calibration.sensitivity.PairInteraction(param_a, param_b, value_a, value_b, individual_a_score, individual_b_score, combined_score, baseline_score, interaction_strength)[source]#

Interaction result for a pair of parameters.

param_a#
param_b#
value_a#
value_b#
individual_a_score#
individual_b_score#
combined_score#
baseline_score#
interaction_strength#
class calibration.sensitivity.PairwiseResult(interactions, scenario, baseline_score)[source]#

Full pairwise interaction analysis result.

interactions#
scenario#
baseline_score#
property ranked#

Interactions ranked by strength (highest first).

property synergies#

Positive interactions (combined > expected).

property conflicts#

Negative interactions (combined < expected).

calibration.sensitivity.run_pairwise_analysis(params, grid, best_values, scenario='baseline', seed=0, n_seeds=3, n_periods=1000, n_workers=10)[source]#

Run pairwise interaction analysis on included parameters.

For each pair of included params, tests all value combinations while fixing others at best values. Measures interaction strength.

Parameters:
  • params (list[str]) – List of included parameter names.

  • grid (dict) – Full parameter grid.

  • best_values (dict) – Best value for each parameter (from sensitivity analysis).

  • scenario (str) – Scenario name.

  • seed (int) – Base random seed.

  • n_seeds (int) – Seeds per evaluation.

  • n_periods (int) – Simulation periods.

  • n_workers (int) – Parallel workers.

Returns:

Pairwise interaction results.

Return type:

PairwiseResult

calibration.sensitivity.print_pairwise_report(result, top_n=20)[source]#

Print formatted pairwise interaction report.

Grid Building#

Grid construction, YAML loading, validation, and combination generation.

Grid building, loading, validation, and combination generation.

This module handles parameter grid operations: - Building focused grids from sensitivity analysis results - Loading grids from YAML/JSON files - Validating grid structure - Generating and counting parameter combinations

calibration.grid.build_focused_grid(sensitivity, full_grid=None, scenario='baseline', sensitivity_threshold=0.02, pruning_threshold=0.04)[source]#

Build focused grid from sensitivity analysis.

Parameters:
  • sensitivity (SensitivityResult) – Result from run_sensitivity_analysis().

  • full_grid (dict, optional) – Full parameter grid. Defaults to scenario-specific grid.

  • scenario (str) – Scenario name.

  • sensitivity_threshold (float) – Minimum sensitivity (delta) for inclusion in grid search.

  • pruning_threshold (float or None) – Maximum score gap from best value for keeping a grid value. None disables pruning.

Returns:

(grid_to_search, fixed_params) - INCLUDE params (delta > threshold): all grid values (pruned if enabled) - FIX params (delta <= threshold): fix at best value

Return type:

tuple[dict, dict]

calibration.grid.load_grid(path)[source]#

Load parameter grid from YAML/JSON file.

Light validation: check dict-of-lists structure, warn about empty values. Supports both .yaml/.yml and .json extensions.

Parameters:

path (Path) – Path to grid file.

Returns:

Parameter grid (param_name -> list of values).

Return type:

dict[str, list[Any]]

Raises:
calibration.grid.validate_grid(grid)[source]#

Light validation of grid structure.

Parameters:

grid (dict[str, list[Any]]) – Parameter grid to validate.

Returns:

List of warnings (empty = OK).

Return type:

list[str]

calibration.grid.count_combinations(grid)[source]#

Count total combinations in grid.

Parameters:

grid (dict[str, list[Any]]) – Parameter grid.

Returns:

Number of combinations in the grid.

Return type:

int

calibration.grid.generate_combinations(grid, fixed=None, constraints=None)[source]#

Generate all parameter combinations, merged with fixed params.

Parameters:
  • grid (dict[str, list[Any]]) – Parameter grid to generate combinations from.

  • fixed (dict, optional) – Fixed parameter values to merge into each combination.

  • constraints (list[callable], optional) – List of callables that take a combo dict and return bool. A combination is yielded only if ALL constraints return True. Useful for coupled params (e.g., lambda c: c['nfpf'] >= c['nfsf']).

Yields:

dict[str, Any] – Dictionary mapping parameter names to values.

Screening#

Single-seed grid screening with checkpointing.

Single-seed grid screening with progress tracking and checkpointing.

This module handles the grid screening phase of calibration: testing many parameter combinations quickly using a single seed, with progress reporting and checkpoint-based resumption.

calibration.screening.screen_single_seed(params, scenario, seed, n_periods)[source]#

Run single-seed validation for quick screening.

Parameters:
  • params (dict) – Parameter configuration.

  • scenario (str) – Scenario name.

  • seed (int) – Random seed.

  • n_periods (int) – Number of simulation periods.

Returns:

Result with single-seed metrics and elapsed wall-clock seconds.

Return type:

tuple[CalibrationResult, float]

calibration.screening.save_checkpoint(results, scenario, phase='screening')[source]#

Save intermediate results to a checkpoint file.

Parameters:
  • results (list[CalibrationResult]) – Results to checkpoint.

  • scenario (str) – Scenario name.

  • phase (str) – Phase name for filename.

Returns:

Path to checkpoint file.

Return type:

Path

calibration.screening.load_checkpoint(scenario, phase='screening')[source]#

Load checkpoint if it exists.

Parameters:
  • scenario (str) – Scenario name.

  • phase (str) – Phase name.

Returns:

Previously checkpointed results, or None if no checkpoint.

Return type:

list[CalibrationResult] or None

calibration.screening.delete_checkpoint(scenario, phase='screening')[source]#

Delete checkpoint file if it exists.

calibration.screening.run_screening(combinations, scenario, n_workers=10, n_periods=1000, avg_time_per_run=0.0, checkpoint_every=50, resume=False)[source]#

Screen parameter combinations with progress tracking and checkpointing.

Parameters:
  • combinations (list[dict]) – Parameter combinations to test.

  • scenario (str) – Scenario name.

  • n_workers (int) – Parallel workers.

  • n_periods (int) – Simulation periods.

  • avg_time_per_run (float) – Estimated time per run (from sensitivity). 0 = measure during warmup.

  • checkpoint_every (int) – Save checkpoint every N completions.

  • resume (bool) – If True, load checkpoint and skip already-evaluated configs.

Returns:

Results sorted by single_score (best first).

Return type:

list[CalibrationResult]

Stability Testing#

Tiered stability testing with configurable ranking.

Multi-seed stability testing with tiered evaluation and ranking strategies.

This module handles the stability testing phase of calibration: evaluating top candidates from screening across multiple seeds with configurable ranking strategies and tiered pruning.

calibration.stability.evaluate_stability(params, scenario, seeds, n_periods)[source]#

Run multi-seed stability test for full evaluation.

Parameters:
  • params (dict) – Parameter configuration.

  • scenario (str) – Scenario name.

  • seeds (list[int]) – List of random seeds to test.

  • n_periods (int) – Number of simulation periods.

Returns:

Result with stability metrics and combined score.

Return type:

CalibrationResult

calibration.stability.parse_stability_tiers(tiers_str)[source]#

Parse stability tiers from CLI string.

Parameters:

tiers_str (str) – Format: “100:10,50:20,10:100” meaning (top 100 x 10 seeds, top 50 x 20 seeds, top 10 x 100 seeds)

Returns:

List of (n_configs, total_seeds) tuples.

Return type:

list[tuple[int, int]]

calibration.stability.run_tiered_stability(candidates, scenario, tiers, n_workers=10, n_periods=1000, avg_time_per_run=0.0, rank_by='combined', k_factor=1.0)[source]#

Run incremental tiered stability testing.

Each tier runs only NEW seeds (not previously tested ones) and accumulates all seed scores for ranking.

Parameters:
  • candidates (list[CalibrationResult]) – Screening results to stability-test.

  • scenario (str) – Scenario name.

  • tiers (list[tuple[int, int]]) – List of (n_configs, total_seeds) – each tier tests the top n_configs using enough new seeds to reach total_seeds cumulative.

  • n_workers (int) – Parallel workers.

  • n_periods (int) – Simulation periods.

  • avg_time_per_run (float) – Estimated time per run for ETA.

  • rank_by (str) – Ranking strategy: “combined” (mean*(1-k*std)), “stability” (pass_rate/n_fail priority), or “mean” (mean_score only).

  • k_factor (float) – Configurable k in mean - k*std formula (for “combined” ranking).

Returns:

Final results sorted by ranking strategy (best first).

Return type:

list[CalibrationResult]

calibration.stability.run_focused_calibration(grid, fixed_params, scenario='baseline', n_workers=10, n_periods=1000, stability_tiers=None, avg_time_per_run=0.0, resume=False, rank_by='combined', k_factor=1.0)[source]#

Run calibration on focused grid with fixed params.

Parameters:
  • grid (dict) – Parameter grid to search (from build_focused_grid).

  • fixed_params (dict) – Fixed parameter values (from build_focused_grid).

  • scenario (str) – Scenario name.

  • n_workers (int) – Number of parallel workers.

  • n_periods (int) – Number of simulation periods.

  • stability_tiers (list[tuple[int, int]], optional) – Tiered stability config. Defaults to [(100, 10), (50, 20), (10, 100)].

  • avg_time_per_run (float) – Average time per simulation run (from sensitivity).

  • resume (bool) – If True, resume from checkpoint.

  • rank_by (str) – Ranking strategy for stability testing.

  • k_factor (float) – k in mean - k*std formula.

Returns:

Results sorted by ranking strategy (best first).

Return type:

list[CalibrationResult]

Serialization#

Save/load for all result types, timestamped output directories.

Central serialization for calibration results.

All save/load operations use a consistent JSON schema with version tracking. Timestamped output directories keep results organized across runs.

calibration.io.create_run_dir(scenario, output_dir=None)[source]#

Create timestamped output directory.

Parameters:
  • scenario (str) – Scenario name (included in directory name).

  • output_dir (Path, optional) – Parent directory. Defaults to calibration/output/.

Returns:

Path to the created directory.

Return type:

Path

calibration.io.save_sensitivity(result, path)[source]#

Save sensitivity result to JSON.

calibration.io.load_sensitivity(path)[source]#

Load sensitivity result from JSON.

calibration.io.save_morris(result, path)[source]#

Save Morris result to JSON.

calibration.io.load_morris(path)[source]#

Load Morris result from JSON.

calibration.io.save_screening(results, sensitivity, grid, fixed, patterns, scenario, path)[source]#

Save screening results to JSON.

calibration.io.load_screening(path)[source]#

Load screening results from JSON. Returns (results, avg_time_per_run).

calibration.io.save_stability(results, scenario, path)[source]#

Save stability testing results to JSON.

calibration.io.load_stability(path)[source]#

Load stability results from JSON.

calibration.io.save_pairwise(result, scenario, path)[source]#

Save pairwise interaction results to JSON.

calibration.io.load_pairwise(path)[source]#

Load pairwise results from JSON.

Reporting#

Auto-generated markdown reports.

Auto-generated markdown reports for calibration results.

Each phase of the calibration pipeline generates a markdown report alongside its JSON results in the timestamped output directory.

calibration.reporting.generate_sensitivity_report(result, method, path)[source]#

Generate markdown report for sensitivity phase.

Parameters:
  • result (SensitivityResult) – Sensitivity analysis result.

  • method (str) – Sensitivity method used (“morris” or “oat”).

  • path (Path) – Output path for the markdown report.

calibration.reporting.generate_screening_report(results, grid, fixed, patterns, sensitivity, scenario, path, top_n=50)[source]#

Generate markdown report for grid screening phase.

Parameters:
  • results (list[CalibrationResult]) – Screening results (sorted by score).

  • grid (dict) – Grid parameters searched.

  • fixed (dict) – Fixed parameter values.

  • patterns (dict) – Parameter patterns from top configs.

  • sensitivity (SensitivityResult) – Sensitivity result used for grid building.

  • scenario (str) – Scenario name.

  • path (Path) – Output path for the markdown report.

  • top_n (int) – Number of top configs used for pattern analysis.

calibration.reporting.generate_stability_report(results, scenario, tiers, comparison, path)[source]#

Generate markdown report for stability phase.

Parameters:
  • results (list[CalibrationResult]) – Stability test results (sorted by ranking).

  • scenario (str) – Scenario name.

  • tiers (list[tuple[int, int]]) – Stability tiers used.

  • comparison (ComparisonResult or None) – Before/after comparison (if available).

  • path (Path) – Output path for the markdown report.

calibration.reporting.generate_full_report(sensitivity, screening_results, stability_results, comparison, scenario, tiers, path)[source]#

Generate comprehensive calibration report combining all phases.

Parameters:
  • sensitivity (SensitivityResult) – Sensitivity analysis result.

  • screening_results (list[CalibrationResult]) – Grid screening results.

  • stability_results (list[CalibrationResult]) – Stability testing results.

  • comparison (ComparisonResult or None) – Before/after comparison.

  • scenario (str) – Scenario name.

  • tiers (list[tuple[int, int]]) – Stability tiers used.

  • path (Path) – Output path for the markdown report.

Rescreen#

Second-pass Morris screening after locking optimized params.

Second-pass Morris screening after locking optimized params.

Delegates to run_morris_screening(fixed_params=...) and computes the sensitivity collapse between Phase 1 and Phase 2 Morris results.

calibration.rescreen.resolve_params(params_str)[source]#

Resolve a param group name or comma-separated param names.

Parameters:

params_str (str) – Either a PARAM_GROUPS key (e.g., “entry”, “behavioral”) or comma-separated full parameter names (e.g., “beta,max_M”).

Returns:

List of parameter names.

Return type:

list[str]

Raises:

ValueError – If the string is not a known group and doesn’t look like param names.

calibration.rescreen.load_fixed_from_result(path)[source]#

Load the #1-ranked result’s params from a stability result file.

Parameters:

path (Path) – Path to stability result JSON.

Returns:

Parameter dict from the top-ranked result.

Return type:

dict[str, Any]

calibration.rescreen.compute_sensitivity_collapse(phase1, phase2)[source]#

Compute sensitivity collapse between two Morris screenings.

Parameters:
  • phase1 (MorrisResult) – First-pass Morris result (before locking params).

  • phase2 (MorrisResult) – Second-pass Morris result (after locking params).

Returns:

Per-parameter dict with phase1_mu_star, phase2_mu_star, collapse_pct.

Return type:

dict[str, dict]

calibration.rescreen.run_rescreen(scenario, fix_from, params, n_trajectories=20, n_seeds=5, n_periods=1000, n_workers=10, phase1_morris=None)[source]#

Run second-pass Morris screening on a subset of params.

Parameters:
  • scenario (str) – Scenario name.

  • fix_from (Path) – Path to stability result JSON to load fixed params from.

  • params (list[str]) – Parameter names to screen (the rest are fixed).

  • n_trajectories (int) – Number of Morris trajectories.

  • n_seeds (int) – Seeds per evaluation.

  • n_periods (int) – Simulation periods.

  • n_workers (int) – Parallel workers.

  • phase1_morris (MorrisResult, optional) – Phase 1 Morris result for collapse comparison.

Returns:

(phase2_result, collapse_table)

Return type:

tuple[MorrisResult, dict]

calibration.rescreen.run_rescreen_phase(args, run_dir=None)[source]#

CLI entry point for rescreen phase.

Cost Analysis#

Targeted cost analysis for parameter value substitutions.

Targeted cost analysis – measure the cost of swapping values into a base config.

Evaluates the impact of substituting preferred parameter values into an optimized base configuration. Classifies each swap by cost: FREE (<0.002), CHEAP (<0.005), MODERATE (<0.010), EXPENSIVE (>=0.010).

class calibration.cost.SwapResult(param, value, base_combined, swap_combined, delta, classification, pass_rate)[source]#

Result of swapping a single parameter value into the base config.

Variables:
  • param (str) – Parameter name.

  • value (Any) – Swapped value.

  • base_combined (float) – Base config’s combined score.

  • swap_combined (float) – Combined score with this value swapped in.

  • delta (float) – Score change (swap - base). Negative = worse.

  • classification (str) – Cost classification: FREE, CHEAP, MODERATE, or EXPENSIVE.

  • pass_rate (float) – Pass rate with swapped value.

param#
value#
base_combined#
swap_combined#
delta#
classification#
pass_rate#
calibration.cost.classify_cost(delta_abs)[source]#

Classify the absolute cost of a swap.

Parameters:

delta_abs (float) – Absolute combined score difference.

Returns:

“FREE”, “CHEAP”, “MODERATE”, or “EXPENSIVE”.

Return type:

str

calibration.cost.parse_swaps(swap_args)[source]#

Parse swap arguments from CLI.

Parameters:

swap_args (list[str]) – List of “param=v1,v2,v3” strings.

Returns:

Parameter -> list of values to try.

Return type:

dict[str, list]

calibration.cost.run_cost_analysis(base_params, swaps, scenario, n_seeds=20, n_periods=1000, n_workers=10, base_combined=None)[source]#

Run targeted cost analysis for parameter swaps.

Parameters:
  • base_params (dict) – Base configuration (the stability winner).

  • swaps (dict[str, list]) – Parameters to swap and their candidate values.

  • scenario (str) – Scenario name.

  • n_seeds (int) – Seeds per evaluation.

  • n_periods (int) – Simulation periods.

  • n_workers (int) – Parallel workers.

  • base_combined (float, optional) – Pre-computed base combined score. If None, evaluates the base.

Returns:

Results for each swap, sorted by absolute delta.

Return type:

list[SwapResult]

calibration.cost.save_cost_results(results, scenario, path)[source]#

Save cost analysis results to JSON.

calibration.cost.run_cost_phase(args, run_dir=None)[source]#

CLI entry point for cost phase.

Cross-Scenario Evaluation#

Cross-scenario evaluation with multiple ranking strategies.

Cross-scenario evaluation – run configs across multiple scenarios.

Evaluates parameter configurations on all specified scenarios simultaneously and ranks using cross-scenario criteria.

calibration.cross_eval.rank_cross_scenario(results, strategy='stability-first')[source]#

Rank configs using cross-scenario criteria.

Parameters:
  • results (list[CalibrationResult]) – Results with scenario_results populated.

  • strategy (str) – Ranking strategy: - “stability-first”: min(pass_rates) -> total fails -> min(combined) - “score-first”: min(combined) -> total fails - “balanced”: geometric mean of combined scores

Returns:

Sorted results (best first).

Return type:

list[CalibrationResult]

Raises:

ValueError – If strategy is not recognized.

calibration.cross_eval.evaluate_cross_scenario(configs, scenarios, n_seeds=100, n_periods=1000, n_workers=10)[source]#

Evaluate configs across multiple scenarios.

Parameters:
  • configs (list[dict]) – Parameter configurations to evaluate.

  • scenarios (list[str]) – Scenario names to evaluate on.

  • n_seeds (int) – Seeds per scenario per config.

  • n_periods (int) – Simulation periods.

  • n_workers (int) – Parallel workers.

Returns:

Results with scenario_results populated.

Return type:

list[CalibrationResult]

calibration.cross_eval.compute_scenario_tension(results, scenarios)[source]#

Analyze parameter tensions between scenarios.

Identifies params where the optimal value differs between scenarios, indicating a fundamental trade-off.

Parameters:
  • results (list[CalibrationResult]) – Results with scenario_results populated.

  • scenarios (list[str]) – Scenario names to compare.

Returns:

Per-parameter tension info: which value each scenario prefers, and the score gap.

Return type:

dict[str, dict]

calibration.cross_eval.run_cross_eval_phase(args, run_dir=None)[source]#

CLI entry point for cross-eval phase.

Structured Sweep#

Multi-stage parameter sweep with carry-forward winners.

Structured parameter sweep by category, carrying forward winners.

Each stage runs a grid of its parameters while holding everything else fixed from the base config (plus winners from prior stages). Optionally cross-evaluates against other scenarios at each stage.

calibration.sweep.parse_stage(stage_str)[source]#

Parse a single stage definition.

Format: “LABEL:param1=v1,v2,v3 param2=v4,v5”

Parameters:

stage_str (str) – Stage definition string.

Returns:

(label, param_grid)

Return type:

tuple[str, dict]

calibration.sweep.parse_stages(stage_args)[source]#

Parse multiple stage definitions.

Parameters:

stage_args (list[str]) – List of stage definition strings.

Returns:

List of (label, param_grid) tuples.

Return type:

list[tuple[str, dict]]

class calibration.sweep.StageResult(label, winner_params, combined_score, mean_score, pass_rate, n_candidates)[source]#

Result of a single sweep stage.

Variables:
  • label (str) – Stage label.

  • winner_params (dict[str, Any]) – Winner’s parameters.

  • combined_score (float) – Winner’s combined score.

  • mean_score (float) – Winner’s mean score.

  • pass_rate (float) – Winner’s pass rate.

  • n_candidates (int) – Number of grid combinations tested.

label#
winner_params#
combined_score#
mean_score#
pass_rate#
n_candidates#
calibration.sweep.run_sweep(base_params, stages, scenario, n_workers=10, n_periods=1000, stability_tiers=None, rank_by='combined', k_factor=1.0, cross_scenario=None)[source]#

Run structured multi-stage parameter sweep.

Parameters:
  • base_params (dict) – Starting configuration.

  • stages (list[tuple[str, dict]]) – List of (label, param_grid) stages to run in order.

  • scenario (str) – Scenario name.

  • n_workers (int) – Parallel workers.

  • n_periods (int) – Simulation periods.

  • stability_tiers (list[tuple[int, int]], optional) – Tiers for stability testing. Defaults to [(100, 10), (50, 20), (10, 100)].

  • rank_by (str) – Ranking strategy for stability.

  • k_factor (float) – k in combined score formula.

  • cross_scenario (str, optional) – If set, cross-evaluate the stage winner against this scenario.

Returns:

Per-stage results with winner params and scores.

Return type:

list[StageResult]

calibration.sweep.run_sweep_phase(args, run_dir=None)[source]#

CLI entry point for sweep phase.

Parameter Space#

Parameter grids for all three scenarios.

Parameter space definition for calibration.

This module defines the parameter grids and scenario-specific defaults used as baselines for sensitivity analysis.

Supports multiple scenarios:
  • baseline: Standard BAM model (Section 3.9.1)

  • growth_plus: Endogenous productivity growth via R&D (Section 3.9.2)

  • buffer_stock: Buffer-stock consumption with R&D (Section 3.9.4)

For grid combination generation, see calibration.grid.

calibration.parameter_space.get_parameter_grid(scenario='baseline')[source]#

Get the parameter grid for a scenario.

Parameters:

scenario (str) – Scenario name (“baseline”, “growth_plus”, or “buffer_stock”).

Returns:

Parameter grid for the scenario.

Return type:

dict

Raises:

ValueError – If scenario is not recognized.

calibration.parameter_space.get_default_values(scenario='baseline')[source]#

Get the scenario-specific parameter overrides.

Parameters:

scenario (str) – Scenario name (“baseline”, “growth_plus”, or “buffer_stock”).

Returns:

Scenario overrides. For baseline, returns empty dict (engine defaults). For extensions, returns extension-specific parameter defaults.

Return type:

dict

Raises:

ValueError – If scenario is not recognized.