Data Collection =============== ``sim.run()`` returns a :class:`~bamengine.SimulationResults` object by default, containing time series data collected during the simulation. The ``collect`` parameter controls what data is captured. Quick Example ------------- .. code-block:: python import bamengine as bam sim = bam.Simulation.init(seed=42) results = sim.run(n_periods=100) # Access data via bracket syntax unemployment = results["Economy.unemployment_rate"] inflation = results["Economy.inflation"] # Or via attribute-style access prices = results.Producer.price # shape: (n_periods, n_firms) # Export to pandas DataFrame (requires pandas) df = results.to_dataframe() Collection Options ------------------ The ``collect`` parameter accepts three forms: **Boolean** (simplest): .. code-block:: python # Collect all roles unaggregated + economy metrics (the default) results = sim.run(n_periods=100) # Skip collection for benchmarks or when only final state is needed sim.run(n_periods=100, collect=False) **List** (select roles): .. code-block:: python # Collect specific roles with all their variables # Economy metrics are always included automatically results = sim.run( n_periods=100, collect=["Producer", "Worker"], ) **Dict** (full control): .. code-block:: python # Specify exactly what to collect (full per-agent data by default) # Economy metrics are always included automatically results = sim.run( n_periods=100, collect={ "Producer": ["price", "inventory"], # Specific variables "Worker": True, # All Worker variables "aggregate": "mean", # Explicit aggregation (default: None) }, ) Collection Settings ------------------- In dict form, the following keys are recognized: * **Role names** (e.g., "Producer", "Worker"): Values are either ``True`` (all variables) or a list of variable names. * **"aggregate"**: How to aggregate across agents. Options: ``None`` (default, full per-agent data), ``"mean"``, ``"median"``, ``"sum"``, or ``"std"``. Economy metrics (``avg_price``, ``unemployment_rate``, ``inflation``) are always collected regardless of the ``collect`` form used. Discoverability --------------- Use ``sim.collectables()`` before running to see all available variables, and ``results.available()`` after running to see what was collected: .. code-block:: python sim = bam.Simulation.init(seed=42) # Before running: see what can be collected sim.collectables() # ['Consumer.income', 'Economy.avg_price', 'Economy.inflation', # 'Economy.unemployment_rate', 'Producer.price', 'Producer.production', ...] results = sim.run(n_periods=100) # After running: see what was collected results.available() # ['Consumer.income', 'Economy.avg_price', 'Economy.inflation', ...] Economy Metrics --------------- Economy metrics are 1D arrays (one value per period) and are always collected: .. list-table:: :header-rows: 1 :widths: 30 70 * - Metric - Description * - ``avg_price`` - Average market price across firms (production-weighted) * - ``unemployment_rate`` - Fraction of households without an employer * - ``inflation`` - Year-over-year change in average market price These are also available directly on the economy object during simulation: .. code-block:: python sim.ec.avg_mkt_price # Current average price (scalar) sim.ec.avg_mkt_price_history # Full time series (array) sim.ec.inflation_history # Full inflation time series np.mean(~sim.wrk.employed) # Current unemployment rate Full Per-Agent Data ------------------- By default (``collect=True``), role data is unaggregated: each variable is a 2D array of shape ``(n_periods, n_agents)``. .. code-block:: python results = sim.run(n_periods=100) # Shape: (n_periods, n_firms) prices = results["Producer.price"] prices = results.Producer.price # equivalent # Aggregate on access if needed avg_prices = results.get("Producer", "price", aggregate="mean") Relationship Data Collection ---------------------------- Relationships (like ``LoanBook``) can also be collected. Unlike roles, relationships are **opt-in only**: they are NOT included when using ``collect=True``. .. code-block:: python # Collect LoanBook data along with role data results = sim.run( n_periods=100, collect={ "Producer": ["price"], "LoanBook": ["principal", "rate"], # Relationship fields "aggregate": "sum", # Sum across all active loans }, ) # Access relationship data total_principal = results["LoanBook.principal"] avg_rate = results.get("LoanBook", "rate") **Available aggregations for relationships:** * ``"sum"``: Total across all edges (e.g., total outstanding principal) * ``"mean"``: Average value across all edges (e.g., average interest rate) * ``"std"``: Standard deviation across edges * ``None``: Full edge data (list of variable-length arrays per period) **Non-aggregated relationship data:** When ``aggregate=None``, relationship data cannot be stacked into 2D arrays because edge counts vary per period. Instead, data is stored as a list of arrays: .. code-block:: python results = sim.run( n_periods=50, collect={ "LoanBook": ["principal"], }, ) # List of variable-length arrays (one per period) principal_per_period = results.relationship_data["LoanBook"]["principal"] # principal_per_period[0] might have 5 loans, period 10 might have 12 .. warning:: Non-aggregated relationship data cannot be included in DataFrame exports due to variable lengths. Use ``results.relationship_data`` directly or use aggregation during collection. Accessing Results ----------------- ``SimulationResults`` provides several ways to access data: .. code-block:: python # Bracket syntax (flat "Name.variable" key) results["Producer.price"] results["Economy.unemployment_rate"] results["LoanBook.principal"] # if collected # Attribute-style access results.Producer.price results.Economy.unemployment_rate # get() method (supports on-the-fly aggregation) results.get("Producer", "price") results.get("Producer", "price", aggregate="mean") results.get("Economy", "unemployment_rate") results.get("LoanBook", "principal") # if collected # Direct access to nested dicts results.role_data["Producer"]["price"] results.economy_data["unemployment_rate"] results.relationship_data["LoanBook"]["principal"] # if collected # Get role/relationship as DataFrame prod_df = results.get_role_data("Producer") loans_df = results.get_relationship_data("LoanBook") # if collected Exporting Data -------------- Export collected data to pandas DataFrames or files for external analysis: .. code-block:: python # Export all collected data to a single DataFrame df = results.to_dataframe() # Save to various formats (requires pandas) df.to_csv("results.csv") df.to_parquet("results.parquet") # Export individual roles prod_df = results.get_role_data("Producer") prod_df.to_csv("producer_data.csv") .. tip:: For long simulations or parameter sweeps, saving to Parquet format is recommended: it is compressed, fast to read, and preserves column types. .. seealso:: See the :doc:`examples ` for more data collection patterns.