Data Collection#

sim.run() returns a SimulationResults object by default, containing time series data collected during the simulation. The collect parameter controls what data is captured.

Quick Example#

import bamengine as bam

sim = bam.Simulation.init(seed=42)
results = sim.run(n_periods=100)

# Access data via bracket syntax
unemployment = results["Economy.unemployment_rate"]
inflation = results["Economy.inflation"]

# Or via attribute-style access
prices = results.Producer.price  # shape: (n_periods, n_firms)

# Export to pandas DataFrame (requires pandas)
df = results.to_dataframe()

Collection Options#

The collect parameter accepts three forms:

Boolean (simplest):

# Collect all roles unaggregated + economy metrics (the default)
results = sim.run(n_periods=100)

# Skip collection for benchmarks or when only final state is needed
sim.run(n_periods=100, collect=False)

List (select roles):

# Collect specific roles with all their variables
# Economy metrics are always included automatically
results = sim.run(
    n_periods=100,
    collect=["Producer", "Worker"],
)

Dict (full control):

# Specify exactly what to collect (full per-agent data by default)
# Economy metrics are always included automatically
results = sim.run(
    n_periods=100,
    collect={
        "Producer": ["price", "inventory"],  # Specific variables
        "Worker": True,  # All Worker variables
        "aggregate": "mean",  # Explicit aggregation (default: None)
    },
)

Collection Settings#

In dict form, the following keys are recognized:

  • Role names (e.g., “Producer”, “Worker”): Values are either True (all variables) or a list of variable names.

  • “aggregate”: How to aggregate across agents. Options: None (default, full per-agent data), "mean", "median", "sum", or "std".

Economy metrics (avg_price, unemployment_rate, inflation) are always collected regardless of the collect form used.

Discoverability#

Use sim.collectables() before running to see all available variables, and results.available() after running to see what was collected:

sim = bam.Simulation.init(seed=42)

# Before running: see what can be collected
sim.collectables()
# ['Consumer.income', 'Economy.avg_price', 'Economy.inflation',
#  'Economy.unemployment_rate', 'Producer.price', 'Producer.production', ...]

results = sim.run(n_periods=100)

# After running: see what was collected
results.available()
# ['Consumer.income', 'Economy.avg_price', 'Economy.inflation', ...]

Economy Metrics#

Economy metrics are 1D arrays (one value per period) and are always collected:

Metric

Description

avg_price

Average market price across firms (production-weighted)

unemployment_rate

Fraction of households without an employer

inflation

Year-over-year change in average market price

These are also available directly on the economy object during simulation:

sim.ec.avg_mkt_price  # Current average price (scalar)
sim.ec.avg_mkt_price_history  # Full time series (array)
sim.ec.inflation_history  # Full inflation time series
np.mean(~sim.wrk.employed)  # Current unemployment rate

Full Per-Agent Data#

By default (collect=True), role data is unaggregated: each variable is a 2D array of shape (n_periods, n_agents).

results = sim.run(n_periods=100)

# Shape: (n_periods, n_firms)
prices = results["Producer.price"]
prices = results.Producer.price  # equivalent

# Aggregate on access if needed
avg_prices = results.get("Producer", "price", aggregate="mean")

Relationship Data Collection#

Relationships (like LoanBook) can also be collected. Unlike roles, relationships are opt-in only: they are NOT included when using collect=True.

# Collect LoanBook data along with role data
results = sim.run(
    n_periods=100,
    collect={
        "Producer": ["price"],
        "LoanBook": ["principal", "rate"],  # Relationship fields
        "aggregate": "sum",  # Sum across all active loans
    },
)

# Access relationship data
total_principal = results["LoanBook.principal"]
avg_rate = results.get("LoanBook", "rate")

Available aggregations for relationships:

  • "sum": Total across all edges (e.g., total outstanding principal)

  • "mean": Average value across all edges (e.g., average interest rate)

  • "std": Standard deviation across edges

  • None: Full edge data (list of variable-length arrays per period)

Non-aggregated relationship data:

When aggregate=None, relationship data cannot be stacked into 2D arrays because edge counts vary per period. Instead, data is stored as a list of arrays:

results = sim.run(
    n_periods=50,
    collect={
        "LoanBook": ["principal"],
    },
)

# List of variable-length arrays (one per period)
principal_per_period = results.relationship_data["LoanBook"]["principal"]
# principal_per_period[0] might have 5 loans, period 10 might have 12

Warning

Non-aggregated relationship data cannot be included in DataFrame exports due to variable lengths. Use results.relationship_data directly or use aggregation during collection.

Accessing Results#

SimulationResults provides several ways to access data:

# Bracket syntax (flat "Name.variable" key)
results["Producer.price"]
results["Economy.unemployment_rate"]
results["LoanBook.principal"]  # if collected

# Attribute-style access
results.Producer.price
results.Economy.unemployment_rate

# get() method (supports on-the-fly aggregation)
results.get("Producer", "price")
results.get("Producer", "price", aggregate="mean")
results.get("Economy", "unemployment_rate")
results.get("LoanBook", "principal")  # if collected

# Direct access to nested dicts
results.role_data["Producer"]["price"]
results.economy_data["unemployment_rate"]
results.relationship_data["LoanBook"]["principal"]  # if collected

# Get role/relationship as DataFrame
prod_df = results.get_role_data("Producer")
loans_df = results.get_relationship_data("LoanBook")  # if collected

Exporting Data#

Export collected data to pandas DataFrames or files for external analysis:

# Export all collected data to a single DataFrame
df = results.to_dataframe()

# Save to various formats (requires pandas)
df.to_csv("results.csv")
df.to_parquet("results.parquet")

# Export individual roles
prod_df = results.get_role_data("Producer")
prod_df.to_csv("producer_data.csv")

Tip

For long simulations or parameter sweeps, saving to Parquet format is recommended: it is compressed, fast to read, and preserves column types.

See also

See the examples for more data collection patterns.