Validation & Analysis#
BAM Engine includes a validation framework for comparing simulation output against target values from Delli Gatti et al. (2011). This ensures the model reproduces the reference results and helps detect parameter configurations that deviate from expected behavior.
Running Validation#
The simplest way to validate is with run_validation():
from validation import run_validation
result = run_validation(seed=42, n_periods=1000)
print(f"Score: {result.total_score:.3f}")
print(f"Passed: {result.passed}")
print(f"Failures: {result.n_fail}")
The result object contains:
total_score: Weighted score from 0.0 (worst) to 1.0 (perfect)passed:Trueif zero FAIL-status metricsn_pass,n_warn,n_fail: Count of metrics by statusmetric_results: Detailed per-metric breakdown
Validation Scenarios#
Three built-in scenarios correspond to sections of the reference book:
Scenario |
Book Section |
What It Validates |
|---|---|---|
|
Section 3.9.1 |
Core model: unemployment, inflation, firm dynamics, business cycles |
|
Section 3.9.2 |
R&D extension: productivity growth, firm size distribution |
|
Section 3.9.4 |
Buffer-stock extension: savings behavior, wealth distribution |
Run a specific scenario:
from validation import run_validation, run_growth_plus_validation
# Baseline (default)
baseline_result = run_validation(seed=42, n_periods=1000)
# Growth+ scenario
growth_result = run_growth_plus_validation(seed=42, n_periods=1000)
Each scenario has its own targets (defined in targets.yaml files) and
metric weights tuned to the phenomena that matter most for that model variant.
Understanding Scores#
Validation uses a two-layer system:
Status checks (categorical):
PASS: Metric is within acceptable range
WARN: Metric is borderline (outside target but within tolerance)
FAIL: Metric significantly deviates from target
Scores (continuous, 0 to 1):
Each metric produces a score between 0.0 and 1.0. The total_score is a
weighted average across all metrics. Metric weights range from 0.5 (low
importance) to 5.0 (critical).
Weight-based fail escalation: High-weight metrics have stricter WARN/FAIL thresholds. The escalation formula (\(\text{clamp}(5 - 2w, 0.5, 5.0)\)) means a weight-3.0 metric fails at deviations that would only warn for a weight-0.5 metric.
Metric types:
Type |
How It Works |
|---|---|
|
Value must fall within [min, max] range |
|
Value must be within percentage of target |
|
Percentage of time series within a band |
|
Penalizes extreme values in distribution |
|
Binary pass/fail check (e.g., “economy did not collapse”) |
Robustness Analysis#
The robustness package tests whether results hold across multiple random seeds, parameter variations, and structural mechanism changes (Section 3.10):
# Full robustness analysis
python -m validation.robustness
# Individual parts
python -m validation.robustness --internal-only
python -m validation.robustness --sensitivity-only
python -m validation.robustness --structural-only
See also
Robustness Analysis for complete robustness analysis documentation including internal validity, sensitivity analysis, and structural experiments.
Parameter Calibration#
The calibration package finds parameters that maximize validation scores through a multi-phase pipeline: Morris screening → grid search → stability testing.
python -m calibration --scenario baseline --workers 10
See also
Parameter Calibration for the user guide calibration tutorial
Calibration for the full calibration reference
Visualization#
Scenario plots. Run a scenario with visualization:
from validation.scenarios.baseline import run_scenario
run_scenario(seed=0, show_plot=True)
Diagnostic dashboards. Comprehensive multi-figure analysis:
python diagnostics/baseline_diagnostics.py
python diagnostics/growth_plus_diagnostics.py
See also
Validation for the full validation reference
Scoring System for the scoring system details
Parameter Calibration for the calibration tutorial
Model Extensions for setting up model extensions before validation
Configuration for parameter definitions