API Reference#

Full autodoc reference for all calibration modules.

Analysis#

Types, patterns, export, and comparison.

Result types, parameter pattern analysis, config export, and comparison.

This module provides: - Core result types (CalibrationResult, ComparisonResult) - Progress formatting helpers - Parameter pattern analysis for identifying best values - Config export (YAML) and before/after comparison

class calibration.analysis.ScenarioResult(mean_score, std_score, combined_score, pass_rate, n_fail, seed_scores)[source]#

Per-scenario results for cross-scenario evaluation.

Variables:

mean_score (float) – Mean score across seeds.
std_score (float) – Standard deviation of scores across seeds.
combined_score (float) – Combined score: mean * (1 - std).
pass_rate (float) – Fraction of seeds with zero FAIL metrics.
n_fail (int) – Total number of seed-level failures.
seed_scores (list[float]) – Individual seed scores.

mean_score#

std_score#

combined_score#

pass_rate#

n_fail#

seed_scores#

class calibration.analysis.CalibrationResult(params, single_score, n_pass, n_warn, n_fail, mean_score=None, std_score=None, pass_rate=None, combined_score=None, stability_result=None, seed_scores=None, seed_fails=None, scenario_results=None)[source]#

Result from calibration optimization.

Variables:

params (dict) – Parameter configuration.
single_score (float) – Validation score from single-seed run.
n_pass (int) – Number of metrics that passed.
n_warn (int) – Number of metrics with warnings.
n_fail (int) – Number of metrics that failed.
mean_score (float, optional) – Mean score across stability seeds.
std_score (float, optional) – Standard deviation of scores across seeds.
pass_rate (float, optional) – Fraction of seeds that passed (no FAIL metrics).
combined_score (float, optional) – Combined score balancing accuracy and stability.
stability_result (StabilityResult, optional) – Full stability test result.
seed_scores (list[float], optional) – Individual seed scores (for incremental stability).
seed_fails (list[int], optional) – Per-seed fail counts (for incremental stability).
scenario_results (dict[str, ScenarioResult], optional) – Per-scenario results for cross-scenario evaluation.

params#

single_score#

n_pass#

n_warn#

n_fail#

mean_score = None#

std_score = None#

pass_rate = None#

combined_score = None#

stability_result = None#

seed_scores = None#

seed_fails = None#

scenario_results = None#

classmethod from_cross_eval(params, scenario_results)[source]#

Create a CalibrationResult from cross-scenario evaluation data.

Computes aggregate fields from per-scenario results.

Parameters:

params (dict) – Parameter configuration.
scenario_results (dict[str, ScenarioResult]) – Per-scenario evaluation results.

Return type:

CalibrationResult

class calibration.analysis.ComparisonResult(scenario, default_metrics, calibrated_metrics, default_score, calibrated_score, improvements)[source]#

Result from before/after config comparison.

scenario#

default_metrics#

calibrated_metrics#

default_score#

calibrated_score#

improvements#

calibration.analysis.format_eta(remaining, avg_time, n_workers)[source]#

Format an ETA string from remaining items and average time.

Parameters:

remaining (int) – Number of remaining items.
avg_time (float) – Average seconds per item.
n_workers (int) – Number of parallel workers.

Returns:

Formatted ETA string (e.g., “5m 30s”).

Return type:

str

calibration.analysis.format_progress(completed, total, remaining, eta)[source]#

Format a progress line.

Parameters:

completed (int) – Number of completed items.
total (int) – Total number of items.
remaining (int) – Number of remaining items.
eta (str) – ETA string.

Returns:

Formatted progress line.

Return type:

str

calibration.analysis.analyze_parameter_patterns(results, top_n=50)[source]#

Analyze which parameter values consistently appear in top configs.

Parameters:

results (list[CalibrationResult]) – Screening results sorted by score (best first).
top_n (int) – Number of top configs to analyze.

Returns:

For each parameter, a dict mapping value -> count in top configs.

Return type:

dict[str, dict[Any, int]]

calibration.analysis.print_parameter_patterns(patterns, top_n=50)[source]#

Print parameter pattern analysis.

Parameters:

patterns (dict) – Output from analyze_parameter_patterns().
top_n (int) – Number of top configs used for display.

calibration.analysis.export_best_config(result, scenario, path=None)[source]#

Export best calibration result as a ready-to-use YAML config.

Parameters:

result (CalibrationResult) – Best calibration result.
scenario (str) – Scenario name.
path (Path, optional) – Output path. Defaults to output/{scenario}_best_config.yml.

Returns:

Path to exported config file.

Return type:

Path

calibration.analysis.compare_configs(default, calibrated, scenario, seed=0, n_periods=1000)[source]#

Run default and calibrated configs side-by-side and compare.

Parameters:

default (dict) – Default config overrides (can be empty for engine defaults).
calibrated (dict) – Calibrated config params.
scenario (str) – Scenario name.
seed (int) – Random seed.
n_periods (int) – Simulation periods.

Returns:

Side-by-side comparison of metrics.

Return type:

ComparisonResult

calibration.analysis.print_comparison(result)[source]#

Print before/after comparison table.

Parameters:: result (ComparisonResult) – Output from compare_configs().

Morris Method#

Morris Method screening (elementary effects).

Morris Method (Elementary Effects) screening for global sensitivity analysis.

This module implements the Morris Method (Morris 1991), which runs multiple One-at-a-Time (OAT) trajectories from random starting points across the parameter space. Unlike standard OAT (which depends on a single baseline), Morris provides two measures per parameter:

mu* (mu_star): Mean absolute elementary effect – average importance
sigma: Std of elementary effects – interaction/nonlinearity indicator

Classification uses dual thresholds:

INCLUDE: mu* > threshold OR sigma > threshold
FIX:     mu* <= threshold AND sigma <= threshold

This catches interaction-prone parameters that OAT would miss: a parameter with low mu* but high sigma means its effect varies wildly depending on other parameters’ values.

Supports multiple scenarios:

baseline: Standard BAM model (Section 3.9.1)
growth_plus: Endogenous productivity growth via R&D (Section 3.9.2)
buffer_stock: Buffer-stock consumption with R&D (Section 3.9.4)

class calibration.morris.MorrisParameterEffect(name, mu, mu_star, sigma, elementary_effects, value_scores=<factory>)[source]#

Morris method results for a single parameter.

Variables:

name (str) – Parameter name.
mu (float) – Signed mean elementary effect (can cancel out).
mu_star (float) – Mean absolute elementary effect (primary importance measure).
sigma (float) – Standard deviation of elementary effects (interaction indicator).
elementary_effects (list[float]) – Raw elementary effects from each trajectory.
value_scores (dict[Any, list[float]]) – Observed scores for each grid value across trajectories. Used for best_value estimation and grid pruning.

name#

mu#

mu_star#

sigma#

elementary_effects#

value_scores#

class calibration.morris.MorrisResult(effects, n_trajectories, n_evaluations, scenario='baseline', avg_time_per_run=0.0, n_seeds=1)[source]#

Full Morris method screening result.

Variables:

effects (list[MorrisParameterEffect]) – Per-parameter results.
n_trajectories (int) – Number of Morris trajectories used.
n_evaluations (int) – Number of unique configs evaluated.
scenario (str) – The scenario that was analyzed.
avg_time_per_run (float) – Average wall-clock time per simulation run (seconds).
n_seeds (int) – Number of seeds used per evaluation.

effects#

n_trajectories#

n_evaluations#

scenario = 'baseline'#

avg_time_per_run = 0.0#

n_seeds = 1#

property ranked#: Effects ranked by mu_star (highest first).

get_important(mu_star_threshold=0.02, sigma_threshold=0.02)[source]#

Categorize parameters using dual threshold.

A parameter is INCLUDEd if it is either important (high mu*) OR interaction-prone (high sigma). It is FIXed only if both are low.

Parameters:

mu_star_threshold (float) – Minimum mu* for inclusion.
sigma_threshold (float) – Minimum sigma for inclusion (catches interaction-prone params).

Returns:

(included, fixed) parameter name lists.

Return type:

tuple[list[str], list[str]]

to_sensitivity_result()[source]#

Convert to SensitivityResult for downstream compatibility.

Maps mu* to sensitivity, reconstructs per-value scores from trajectory observations, enabling zero changes to build_focused_grid and all downstream calibration code.

Returns:: Compatible result that can be passed to build_focused_grid().
Return type:: SensitivityResult

calibration.morris.run_morris_screening(scenario='baseline', grid=None, n_trajectories=10, seed=0, n_seeds=1, n_periods=1000, n_workers=10, fixed_params=None)[source]#

Run Morris Method screening analysis.

Generates multiple OAT trajectories from random starting points, evaluates all unique configs in parallel, then computes per-parameter elementary effects (mu*, sigma) for importance and interaction classification.

Parameters:

scenario (str) – Scenario to calibrate.
grid (dict, optional) – Parameter grid. Defaults to scenario-specific grid.
n_trajectories (int) – Number of Morris trajectories (more = more reliable estimates).
seed (int) – Base random seed for trajectory generation and evaluation.
n_seeds (int) – Number of seeds per config evaluation.
n_periods (int) – Number of simulation periods.
n_workers (int) – Number of parallel workers.
fixed_params (dict, optional) – Parameters to lock at specific values. These params will be included in configs but not perturbed. Use for second-pass Morris screening after locking optimized params from a previous calibration.

Returns:

Morris screening result with per-parameter mu*, sigma, and value scores.

Return type:

MorrisResult

calibration.morris.print_morris_report(result, mu_star_threshold=0.02, sigma_threshold=0.02)[source]#

Print formatted Morris method screening report.

Parameters:

result (MorrisResult) – Result from run_morris_screening().
mu_star_threshold (float) – Threshold for mu* classification.
sigma_threshold (float) – Threshold for sigma classification.

OAT Sensitivity#

One-at-a-time sensitivity analysis and pairwise interaction testing.

One-At-a-Time (OAT) sensitivity analysis with pairwise interaction scanning.

This module provides sensitivity analysis functionality to identify which parameters have the most impact on validation scores.

Supports multiple scenarios:

baseline: Standard BAM model (Section 3.9.1)
growth_plus: Endogenous productivity growth via R&D (Section 3.9.2)
buffer_stock: Buffer-stock consumption with R&D (Section 3.9.4)

class calibration.sensitivity.ParameterSensitivity(name, values, scores, best_value, best_score, sensitivity, group_scores=<factory>)[source]#

Sensitivity result for a single parameter.

Variables:

name (str) – Parameter name.
values (list) – All values tested for this parameter.
scores (list[float]) – Validation scores for each value (averaged across seeds).
best_value (Any) – Value that produced the highest score.
best_score (float) – Highest score achieved.
sensitivity (float) – Score range (max - min), indicating parameter importance.
group_scores (dict[str, list[float]]) – Per-metric-group scores for each value. Keys are MetricGroup names (e.g., “TIME_SERIES”, “CURVES”), values are lists parallel to scores.

name#

values#

scores#

best_value#

best_score#

sensitivity#

group_scores#

class calibration.sensitivity.SensitivityResult(parameters, baseline_score, scenario='baseline', avg_time_per_run=0.0, n_seeds=1)[source]#

Full sensitivity analysis result.

Variables:

parameters (list[ParameterSensitivity]) – Sensitivity results for all parameters.
baseline_score (float) – Score with all default values.
scenario (str) – The scenario that was analyzed.
avg_time_per_run (float) – Average wall-clock time per simulation run (seconds).
n_seeds (int) – Number of seeds used per evaluation.

parameters#

baseline_score#

scenario = 'baseline'#

avg_time_per_run = 0.0#

n_seeds = 1#

property ranked#: Parameters ranked by sensitivity (highest first).

get_important(sensitivity_threshold=0.02)[source]#

Categorize parameters by sensitivity.

Parameters:: sensitivity_threshold (float) – Minimum sensitivity (Δ) for inclusion in grid search.
Returns:: (included, fixed) parameter name lists.
Return type:: tuple[list[str], list[str]]

prune_grid(grid, pruning_threshold)[source]#

Remove poorly-scoring values from grid based on OAT results.

Parameters:

grid (dict) – Parameter grid to prune (values per parameter).
pruning_threshold (float or None) – Maximum score gap from best value. Values with (best_score - score) > pruning_threshold are dropped. None disables pruning (returns grid unchanged).

Returns:

Pruned grid. Always keeps at least the best value per parameter. Unknown parameters or values are kept (conservative).

Return type:

dict[str, list[Any]]

calibration.sensitivity.run_sensitivity_analysis(scenario='baseline', grid=None, baseline=None, seed=0, n_seeds=1, n_periods=1000, n_workers=10)[source]#

Run OAT sensitivity analysis.

Tests each parameter independently while holding others at baseline values. Supports multi-seed evaluation for more robust sensitivity measurement.

Parameters:

scenario (str) – Scenario to calibrate (“baseline”, “growth_plus”, or “buffer_stock”).
grid (dict, optional) – Parameter grid. Defaults to scenario-specific grid.
baseline (dict, optional) – Baseline parameter values. Defaults to scenario-specific defaults.
seed (int) – Base random seed (used as first seed).
n_seeds (int) – Number of seeds per evaluation. Seeds are [seed, seed+1, …, seed+n_seeds-1].
n_periods (int) – Number of simulation periods.
n_workers (int) – Number of parallel workers.

Returns:

Sensitivity ranking of all parameters.

Return type:

SensitivityResult

calibration.sensitivity.print_sensitivity_report(result, sensitivity_threshold=0.02)[source]#

Print formatted sensitivity analysis report with score decomposition.

Parameters:

result (SensitivityResult) – Result from run_sensitivity_analysis().
sensitivity_threshold (float) – Threshold for INCLUDE/FIX classification (informational preview).

class calibration.sensitivity.PairInteraction(param_a, param_b, value_a, value_b, individual_a_score, individual_b_score, combined_score, baseline_score, interaction_strength)[source]#

Interaction result for a pair of parameters.

param_a#

param_b#

value_a#

value_b#

individual_a_score#

individual_b_score#

combined_score#

baseline_score#

interaction_strength#

class calibration.sensitivity.PairwiseResult(interactions, scenario, baseline_score)[source]#

Full pairwise interaction analysis result.

interactions#

scenario#

baseline_score#

property ranked#: Interactions ranked by strength (highest first).

property synergies#: Positive interactions (combined > expected).

property conflicts#: Negative interactions (combined < expected).

calibration.sensitivity.run_pairwise_analysis(params, grid, best_values, scenario='baseline', seed=0, n_seeds=3, n_periods=1000, n_workers=10)[source]#

Run pairwise interaction analysis on included parameters.

For each pair of included params, tests all value combinations while fixing others at best values. Measures interaction strength.

Parameters:

params (list[str]) – List of included parameter names.
grid (dict) – Full parameter grid.
best_values (dict) – Best value for each parameter (from sensitivity analysis).
scenario (str) – Scenario name.
seed (int) – Base random seed.
n_seeds (int) – Seeds per evaluation.
n_periods (int) – Simulation periods.
n_workers (int) – Parallel workers.

Returns:

Pairwise interaction results.

Return type:

PairwiseResult

calibration.sensitivity.print_pairwise_report(result, top_n=20)[source]#: Print formatted pairwise interaction report.

Grid Building#

Grid construction, YAML loading, validation, and combination generation.

Grid building, loading, validation, and combination generation.

This module handles parameter grid operations: - Building focused grids from sensitivity analysis results - Loading grids from YAML/JSON files - Validating grid structure - Generating and counting parameter combinations

calibration.grid.build_focused_grid(sensitivity, full_grid=None, scenario='baseline', sensitivity_threshold=0.02, pruning_threshold=0.04)[source]#

Build focused grid from sensitivity analysis.

Parameters:

sensitivity (SensitivityResult) – Result from run_sensitivity_analysis().
full_grid (dict, optional) – Full parameter grid. Defaults to scenario-specific grid.
scenario (str) – Scenario name.
sensitivity_threshold (float) – Minimum sensitivity (delta) for inclusion in grid search.
pruning_threshold (float or None) – Maximum score gap from best value for keeping a grid value. None disables pruning.

Returns:

(grid_to_search, fixed_params) - INCLUDE params (delta > threshold): all grid values (pruned if enabled) - FIX params (delta <= threshold): fix at best value

Return type:

tuple[dict, dict]

calibration.grid.load_grid(path)[source]#

Load parameter grid from YAML/JSON file.

Light validation: check dict-of-lists structure, warn about empty values. Supports both .yaml/.yml and .json extensions.

Parameters:

path (Path) – Path to grid file.

Returns:

Parameter grid (param_name -> list of values).

Return type:

dict[str, list[Any]]

Raises:

ValueError – If the file contents are not a dict-of-lists structure.
FileNotFoundError – If the file does not exist.

calibration.grid.validate_grid(grid)[source]#

Light validation of grid structure.

Parameters:: grid (dict[str, list[Any]]) – Parameter grid to validate.
Returns:: List of warnings (empty = OK).
Return type:: list[str]

calibration.grid.count_combinations(grid)[source]#

Count total combinations in grid.

Parameters:: grid (dict[str, list[Any]]) – Parameter grid.
Returns:: Number of combinations in the grid.
Return type:: int

calibration.grid.generate_combinations(grid, fixed=None, constraints=None)[source]#

Generate all parameter combinations, merged with fixed params.

Parameters:

grid (dict[str, list[Any]]) – Parameter grid to generate combinations from.
fixed (dict, optional) – Fixed parameter values to merge into each combination.
constraints (list[callable], optional) – List of callables that take a combo dict and return bool. A combination is yielded only if ALL constraints return True. Useful for coupled params (e.g., lambda c: c['nfpf'] >= c['nfsf']).

Yields:

dict[str, Any] – Dictionary mapping parameter names to values.

Screening#

Single-seed grid screening with checkpointing.

Single-seed grid screening with progress tracking and checkpointing.

This module handles the grid screening phase of calibration: testing many parameter combinations quickly using a single seed, with progress reporting and checkpoint-based resumption.

calibration.screening.screen_single_seed(params, scenario, seed, n_periods)[source]#

Run single-seed validation for quick screening.

Parameters:

params (dict) – Parameter configuration.
scenario (str) – Scenario name.
seed (int) – Random seed.
n_periods (int) – Number of simulation periods.

Returns:

Result with single-seed metrics and elapsed wall-clock seconds.

Return type:

tuple[CalibrationResult, float]

calibration.screening.save_checkpoint(results, scenario, phase='screening')[source]#

Save intermediate results to a checkpoint file.

Parameters:

results (list[CalibrationResult]) – Results to checkpoint.
scenario (str) – Scenario name.
phase (str) – Phase name for filename.

Returns:

Path to checkpoint file.

Return type:

Path

calibration.screening.load_checkpoint(scenario, phase='screening')[source]#

Load checkpoint if it exists.

Parameters:

scenario (str) – Scenario name.
phase (str) – Phase name.

Returns:

Previously checkpointed results, or None if no checkpoint.

Return type:

list[CalibrationResult] or None

calibration.screening.delete_checkpoint(scenario, phase='screening')[source]#: Delete checkpoint file if it exists.

calibration.screening.run_screening(combinations, scenario, n_workers=10, n_periods=1000, avg_time_per_run=0.0, checkpoint_every=50, resume=False)[source]#

Screen parameter combinations with progress tracking and checkpointing.

Parameters:

combinations (list[dict]) – Parameter combinations to test.
scenario (str) – Scenario name.
n_workers (int) – Parallel workers.
n_periods (int) – Simulation periods.
avg_time_per_run (float) – Estimated time per run (from sensitivity). 0 = measure during warmup.
checkpoint_every (int) – Save checkpoint every N completions.
resume (bool) – If True, load checkpoint and skip already-evaluated configs.

Returns:

Results sorted by single_score (best first).

Return type:

list[CalibrationResult]

Stability Testing#

Tiered stability testing with configurable ranking.

Multi-seed stability testing with tiered evaluation and ranking strategies.

This module handles the stability testing phase of calibration: evaluating top candidates from screening across multiple seeds with configurable ranking strategies and tiered pruning.

calibration.stability.evaluate_stability(params, scenario, seeds, n_periods)[source]#

Run multi-seed stability test for full evaluation.

Parameters:

params (dict) – Parameter configuration.
scenario (str) – Scenario name.
seeds (list[int]) – List of random seeds to test.
n_periods (int) – Number of simulation periods.

Returns:

Result with stability metrics and combined score.

Return type:

CalibrationResult

calibration.stability.parse_stability_tiers(tiers_str)[source]#

Parse stability tiers from CLI string.

Parameters:: tiers_str (str) – Format: “100:10,50:20,10:100” meaning (top 100 x 10 seeds, top 50 x 20 seeds, top 10 x 100 seeds)
Returns:: List of (n_configs, total_seeds) tuples.
Return type:: list[tuple[int, int]]

calibration.stability.run_tiered_stability(candidates, scenario, tiers, n_workers=10, n_periods=1000, avg_time_per_run=0.0, rank_by='combined', k_factor=1.0)[source]#

Run incremental tiered stability testing.

Each tier runs only NEW seeds (not previously tested ones) and accumulates all seed scores for ranking.

Parameters:

candidates (list[CalibrationResult]) – Screening results to stability-test.
scenario (str) – Scenario name.
tiers (list[tuple[int, int]]) – List of (n_configs, total_seeds) – each tier tests the top n_configs using enough new seeds to reach total_seeds cumulative.
n_workers (int) – Parallel workers.
n_periods (int) – Simulation periods.
avg_time_per_run (float) – Estimated time per run for ETA.
rank_by (str) – Ranking strategy: “combined” (mean*(1-k*std)), “stability” (pass_rate/n_fail priority), or “mean” (mean_score only).
k_factor (float) – Configurable k in mean - k*std formula (for “combined” ranking).

Returns:

Final results sorted by ranking strategy (best first).

Return type:

list[CalibrationResult]

calibration.stability.run_focused_calibration(grid, fixed_params, scenario='baseline', n_workers=10, n_periods=1000, stability_tiers=None, avg_time_per_run=0.0, resume=False, rank_by='combined', k_factor=1.0)[source]#

Run calibration on focused grid with fixed params.

Parameters:

grid (dict) – Parameter grid to search (from build_focused_grid).
fixed_params (dict) – Fixed parameter values (from build_focused_grid).
scenario (str) – Scenario name.
n_workers (int) – Number of parallel workers.
n_periods (int) – Number of simulation periods.
stability_tiers (list[tuple[int, int]], optional) – Tiered stability config. Defaults to [(100, 10), (50, 20), (10, 100)].
avg_time_per_run (float) – Average time per simulation run (from sensitivity).
resume (bool) – If True, resume from checkpoint.
rank_by (str) – Ranking strategy for stability testing.
k_factor (float) – k in mean - k*std formula.

Returns:

Results sorted by ranking strategy (best first).

Return type:

list[CalibrationResult]

Serialization#

Save/load for all result types, timestamped output directories.

Central serialization for calibration results.

All save/load operations use a consistent JSON schema with version tracking. Timestamped output directories keep results organized across runs.

calibration.io.create_run_dir(scenario, output_dir=None)[source]#

Create timestamped output directory.

Parameters:

scenario (str) – Scenario name (included in directory name).
output_dir (Path, optional) – Parent directory. Defaults to calibration/output/.

Returns:

Path to the created directory.

Return type:

Path

calibration.io.save_sensitivity(result, path)[source]#: Save sensitivity result to JSON.

calibration.io.load_sensitivity(path)[source]#: Load sensitivity result from JSON.

calibration.io.save_morris(result, path)[source]#: Save Morris result to JSON.

calibration.io.load_morris(path)[source]#: Load Morris result from JSON.

calibration.io.save_screening(results, sensitivity, grid, fixed, patterns, scenario, path)[source]#: Save screening results to JSON.

calibration.io.load_screening(path)[source]#: Load screening results from JSON. Returns (results, avg_time_per_run).

calibration.io.save_stability(results, scenario, path)[source]#: Save stability testing results to JSON.

calibration.io.load_stability(path)[source]#: Load stability results from JSON.

calibration.io.save_pairwise(result, scenario, path)[source]#: Save pairwise interaction results to JSON.

calibration.io.load_pairwise(path)[source]#: Load pairwise results from JSON.

Reporting#

Auto-generated markdown reports.

Auto-generated markdown reports for calibration results.

Each phase of the calibration pipeline generates a markdown report alongside its JSON results in the timestamped output directory.

calibration.reporting.generate_sensitivity_report(result, method, path)[source]#

Generate markdown report for sensitivity phase.

Parameters:

result (SensitivityResult) – Sensitivity analysis result.
method (str) – Sensitivity method used (“morris” or “oat”).
path (Path) – Output path for the markdown report.

calibration.reporting.generate_screening_report(results, grid, fixed, patterns, sensitivity, scenario, path, top_n=50)[source]#

Generate markdown report for grid screening phase.

Parameters:

results (list[CalibrationResult]) – Screening results (sorted by score).
grid (dict) – Grid parameters searched.
fixed (dict) – Fixed parameter values.
patterns (dict) – Parameter patterns from top configs.
sensitivity (SensitivityResult) – Sensitivity result used for grid building.
scenario (str) – Scenario name.
path (Path) – Output path for the markdown report.
top_n (int) – Number of top configs used for pattern analysis.

calibration.reporting.generate_stability_report(results, scenario, tiers, comparison, path)[source]#

Generate markdown report for stability phase.

Parameters:

results (list[CalibrationResult]) – Stability test results (sorted by ranking).
scenario (str) – Scenario name.
tiers (list[tuple[int, int]]) – Stability tiers used.
comparison (ComparisonResult or None) – Before/after comparison (if available).
path (Path) – Output path for the markdown report.

calibration.reporting.generate_full_report(sensitivity, screening_results, stability_results, comparison, scenario, tiers, path)[source]#

Generate comprehensive calibration report combining all phases.

Parameters:

sensitivity (SensitivityResult) – Sensitivity analysis result.
screening_results (list[CalibrationResult]) – Grid screening results.
stability_results (list[CalibrationResult]) – Stability testing results.
comparison (ComparisonResult or None) – Before/after comparison.
scenario (str) – Scenario name.
tiers (list[tuple[int, int]]) – Stability tiers used.
path (Path) – Output path for the markdown report.

Rescreen#

Second-pass Morris screening after locking optimized params.

Delegates to run_morris_screening(fixed_params=...) and computes the sensitivity collapse between Phase 1 and Phase 2 Morris results.

calibration.rescreen.resolve_params(params_str)[source]#

Resolve a param group name or comma-separated param names.

Parameters:: params_str (str) – Either a PARAM_GROUPS key (e.g., “entry”, “behavioral”) or comma-separated full parameter names (e.g., “beta,max_M”).
Returns:: List of parameter names.
Return type:: list[str]
Raises:: ValueError – If the string is not a known group and doesn’t look like param names.

calibration.rescreen.load_fixed_from_result(path)[source]#

Load the #1-ranked result’s params from a stability result file.

Parameters:: path (Path) – Path to stability result JSON.
Returns:: Parameter dict from the top-ranked result.
Return type:: dict[str, Any]

calibration.rescreen.compute_sensitivity_collapse(phase1, phase2)[source]#

Compute sensitivity collapse between two Morris screenings.

Parameters:

phase1 (MorrisResult) – First-pass Morris result (before locking params).
phase2 (MorrisResult) – Second-pass Morris result (after locking params).

Returns:

Per-parameter dict with phase1_mu_star, phase2_mu_star, collapse_pct.

Return type:

dict[str, dict]

calibration.rescreen.run_rescreen(scenario, fix_from, params, n_trajectories=20, n_seeds=5, n_periods=1000, n_workers=10, phase1_morris=None)[source]#

Run second-pass Morris screening on a subset of params.

Parameters:

scenario (str) – Scenario name.
fix_from (Path) – Path to stability result JSON to load fixed params from.
params (list[str]) – Parameter names to screen (the rest are fixed).
n_trajectories (int) – Number of Morris trajectories.
n_seeds (int) – Seeds per evaluation.
n_periods (int) – Simulation periods.
n_workers (int) – Parallel workers.
phase1_morris (MorrisResult, optional) – Phase 1 Morris result for collapse comparison.

Returns:

(phase2_result, collapse_table)

Return type:

tuple[MorrisResult, dict]

calibration.rescreen.run_rescreen_phase(args, run_dir=None)[source]#: CLI entry point for rescreen phase.

Cost Analysis#

Targeted cost analysis for parameter value substitutions.

Targeted cost analysis – measure the cost of swapping values into a base config.

Evaluates the impact of substituting preferred parameter values into an optimized base configuration. Classifies each swap by cost: FREE (<0.002), CHEAP (<0.005), MODERATE (<0.010), EXPENSIVE (>=0.010).

class calibration.cost.SwapResult(param, value, base_combined, swap_combined, delta, classification, pass_rate)[source]#

Result of swapping a single parameter value into the base config.

Variables:

param (str) – Parameter name.
value (Any) – Swapped value.
base_combined (float) – Base config’s combined score.
swap_combined (float) – Combined score with this value swapped in.
delta (float) – Score change (swap - base). Negative = worse.
classification (str) – Cost classification: FREE, CHEAP, MODERATE, or EXPENSIVE.
pass_rate (float) – Pass rate with swapped value.

param#

value#

base_combined#

swap_combined#

delta#

classification#

pass_rate#

calibration.cost.classify_cost(delta_abs)[source]#

Classify the absolute cost of a swap.

Parameters:: delta_abs (float) – Absolute combined score difference.
Returns:: “FREE”, “CHEAP”, “MODERATE”, or “EXPENSIVE”.
Return type:: str

calibration.cost.parse_swaps(swap_args)[source]#

Parse swap arguments from CLI.

Parameters:: swap_args (list[str]) – List of “param=v1,v2,v3” strings.
Returns:: Parameter -> list of values to try.
Return type:: dict[str, list]

calibration.cost.run_cost_analysis(base_params, swaps, scenario, n_seeds=20, n_periods=1000, n_workers=10, base_combined=None)[source]#

Run targeted cost analysis for parameter swaps.

Parameters:

base_params (dict) – Base configuration (the stability winner).
swaps (dict[str, list]) – Parameters to swap and their candidate values.
scenario (str) – Scenario name.
n_seeds (int) – Seeds per evaluation.
n_periods (int) – Simulation periods.
n_workers (int) – Parallel workers.
base_combined (float, optional) – Pre-computed base combined score. If None, evaluates the base.

Returns:

Results for each swap, sorted by absolute delta.

Return type:

list[SwapResult]

calibration.cost.save_cost_results(results, scenario, path)[source]#: Save cost analysis results to JSON.

calibration.cost.run_cost_phase(args, run_dir=None)[source]#: CLI entry point for cost phase.

Cross-Scenario Evaluation#

Cross-scenario evaluation with multiple ranking strategies.

Cross-scenario evaluation – run configs across multiple scenarios.

Evaluates parameter configurations on all specified scenarios simultaneously and ranks using cross-scenario criteria.

calibration.cross_eval.rank_cross_scenario(results, strategy='stability-first')[source]#

Rank configs using cross-scenario criteria.

Parameters:

results (list[CalibrationResult]) – Results with scenario_results populated.
strategy (str) – Ranking strategy: - “stability-first”: min(pass_rates) -> total fails -> min(combined) - “score-first”: min(combined) -> total fails - “balanced”: geometric mean of combined scores

Returns:

Sorted results (best first).

Return type:

list[CalibrationResult]

Raises:

ValueError – If strategy is not recognized.

calibration.cross_eval.evaluate_cross_scenario(configs, scenarios, n_seeds=100, n_periods=1000, n_workers=10)[source]#

Evaluate configs across multiple scenarios.

Parameters:

configs (list[dict]) – Parameter configurations to evaluate.
scenarios (list[str]) – Scenario names to evaluate on.
n_seeds (int) – Seeds per scenario per config.
n_periods (int) – Simulation periods.
n_workers (int) – Parallel workers.

Returns:

Results with scenario_results populated.

Return type:

list[CalibrationResult]

calibration.cross_eval.compute_scenario_tension(results, scenarios)[source]#

Analyze parameter tensions between scenarios.

Identifies params where the optimal value differs between scenarios, indicating a fundamental trade-off.

Parameters:

results (list[CalibrationResult]) – Results with scenario_results populated.
scenarios (list[str]) – Scenario names to compare.

Returns:

Per-parameter tension info: which value each scenario prefers, and the score gap.

Return type:

dict[str, dict]

calibration.cross_eval.run_cross_eval_phase(args, run_dir=None)[source]#: CLI entry point for cross-eval phase.

Structured Sweep#

Multi-stage parameter sweep with carry-forward winners.

Structured parameter sweep by category, carrying forward winners.

Each stage runs a grid of its parameters while holding everything else fixed from the base config (plus winners from prior stages). Optionally cross-evaluates against other scenarios at each stage.

calibration.sweep.parse_stage(stage_str)[source]#

Parse a single stage definition.

Format: “LABEL:param1=v1,v2,v3 param2=v4,v5”

Parameters:: stage_str (str) – Stage definition string.
Returns:: (label, param_grid)
Return type:: tuple[str, dict]

calibration.sweep.parse_stages(stage_args)[source]#

Parse multiple stage definitions.

Parameters:: stage_args (list[str]) – List of stage definition strings.
Returns:: List of (label, param_grid) tuples.
Return type:: list[tuple[str, dict]]

class calibration.sweep.StageResult(label, winner_params, combined_score, mean_score, pass_rate, n_candidates)[source]#

Result of a single sweep stage.

Variables:

label (str) – Stage label.
winner_params (dict[str, Any]) – Winner’s parameters.
combined_score (float) – Winner’s combined score.
mean_score (float) – Winner’s mean score.
pass_rate (float) – Winner’s pass rate.
n_candidates (int) – Number of grid combinations tested.

label#

winner_params#

combined_score#

mean_score#

pass_rate#

n_candidates#

calibration.sweep.run_sweep(base_params, stages, scenario, n_workers=10, n_periods=1000, stability_tiers=None, rank_by='combined', k_factor=1.0, cross_scenario=None)[source]#

Run structured multi-stage parameter sweep.

Parameters:

base_params (dict) – Starting configuration.
stages (list[tuple[str, dict]]) – List of (label, param_grid) stages to run in order.
scenario (str) – Scenario name.
n_workers (int) – Parallel workers.
n_periods (int) – Simulation periods.
stability_tiers (list[tuple[int, int]], optional) – Tiers for stability testing. Defaults to [(100, 10), (50, 20), (10, 100)].
rank_by (str) – Ranking strategy for stability.
k_factor (float) – k in combined score formula.
cross_scenario (str, optional) – If set, cross-evaluate the stage winner against this scenario.

Returns:

Per-stage results with winner params and scores.

Return type:

list[StageResult]

calibration.sweep.run_sweep_phase(args, run_dir=None)[source]#: CLI entry point for sweep phase.

Parameter Space#

Parameter grids for all three scenarios.

Parameter space definition for calibration.

This module defines the parameter grids and scenario-specific defaults used as baselines for sensitivity analysis.

Supports multiple scenarios:

baseline: Standard BAM model (Section 3.9.1)
growth_plus: Endogenous productivity growth via R&D (Section 3.9.2)
buffer_stock: Buffer-stock consumption with R&D (Section 3.9.4)

For grid combination generation, see calibration.grid.

calibration.parameter_space.get_parameter_grid(scenario='baseline')[source]#

Get the parameter grid for a scenario.

Parameters:: scenario (str) – Scenario name (“baseline”, “growth_plus”, or “buffer_stock”).
Returns:: Parameter grid for the scenario.
Return type:: dict
Raises:: ValueError – If scenario is not recognized.

calibration.parameter_space.get_default_values(scenario='baseline')[source]#

Get the scenario-specific parameter overrides.

Parameters:: scenario (str) – Scenario name (“baseline”, “growth_plus”, or “buffer_stock”).
Returns:: Scenario overrides. For baseline, returns empty dict (engine defaults). For extensions, returns extension-specific parameter defaults.
Return type:: dict
Raises:: ValueError – If scenario is not recognized.