Data Collection
===============

``sim.run()`` returns a :class:`~bamengine.SimulationResults` object by default,
containing time series data collected during the simulation. The ``collect``
parameter controls what data is captured.

Quick Example
-------------

.. code-block:: python

   import bamengine as bam

   sim = bam.Simulation.init(seed=42)
   results = sim.run(n_periods=100)

   # Access data via bracket syntax
   unemployment = results["Economy.unemployment_rate"]
   inflation = results["Economy.inflation"]

   # Or via attribute-style access
   prices = results.Producer.price  # shape: (n_periods, n_firms)

   # Export to pandas DataFrame (requires pandas)
   df = results.to_dataframe()

Collection Options
------------------

The ``collect`` parameter accepts three forms:

**Boolean** (simplest):

.. code-block:: python

   # Collect all roles unaggregated + economy metrics (the default)
   results = sim.run(n_periods=100)

   # Skip collection for benchmarks or when only final state is needed
   sim.run(n_periods=100, collect=False)

**List** (select roles):

.. code-block:: python

   # Collect specific roles with all their variables
   # Economy metrics are always included automatically
   results = sim.run(
       n_periods=100,
       collect=["Producer", "Worker"],
   )

**Dict** (full control):

.. code-block:: python

   # Specify exactly what to collect (full per-agent data by default)
   # Economy metrics are always included automatically
   results = sim.run(
       n_periods=100,
       collect={
           "Producer": ["price", "inventory"],  # Specific variables
           "Worker": True,  # All Worker variables
           "aggregate": "mean",  # Explicit aggregation (default: None)
       },
   )

Collection Settings
-------------------

In dict form, the following keys are recognized:

* **Role names** (e.g., "Producer", "Worker"): Values are either ``True``
  (all variables) or a list of variable names.
* **"aggregate"**: How to aggregate across agents. Options:
  ``None`` (default, full per-agent data), ``"mean"``, ``"median"``,
  ``"sum"``, or ``"std"``.

Economy metrics (``avg_price``, ``unemployment_rate``, ``inflation``) are
always collected regardless of the ``collect`` form used.

Discoverability
---------------

Use ``sim.collectables()`` before running to see all available variables,
and ``results.available()`` after running to see what was collected:

.. code-block:: python

   sim = bam.Simulation.init(seed=42)

   # Before running: see what can be collected
   sim.collectables()
   # ['Consumer.income', 'Economy.avg_price', 'Economy.inflation',
   #  'Economy.unemployment_rate', 'Producer.price', 'Producer.production', ...]

   results = sim.run(n_periods=100)

   # After running: see what was collected
   results.available()
   # ['Consumer.income', 'Economy.avg_price', 'Economy.inflation', ...]

Economy Metrics
---------------

Economy metrics are 1D arrays (one value per period) and are always collected:

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Metric
     - Description
   * - ``avg_price``
     - Average market price across firms (production-weighted)
   * - ``unemployment_rate``
     - Fraction of households without an employer
   * - ``inflation``
     - Year-over-year change in average market price

These are also available directly on the economy object during simulation:

.. code-block:: python

   sim.ec.avg_mkt_price  # Current average price (scalar)
   sim.ec.avg_mkt_price_history  # Full time series (array)
   sim.ec.inflation_history  # Full inflation time series
   np.mean(~sim.wrk.employed)  # Current unemployment rate

Full Per-Agent Data
-------------------

By default (``collect=True``), role data is unaggregated: each variable is a
2D array of shape ``(n_periods, n_agents)``.

.. code-block:: python

   results = sim.run(n_periods=100)

   # Shape: (n_periods, n_firms)
   prices = results["Producer.price"]
   prices = results.Producer.price  # equivalent

   # Aggregate on access if needed
   avg_prices = results.get("Producer", "price", aggregate="mean")

Relationship Data Collection
----------------------------

Relationships (like ``LoanBook``) can also be collected. Unlike roles,
relationships are **opt-in only**: they are NOT included when using
``collect=True``.

.. code-block:: python

   # Collect LoanBook data along with role data
   results = sim.run(
       n_periods=100,
       collect={
           "Producer": ["price"],
           "LoanBook": ["principal", "rate"],  # Relationship fields
           "aggregate": "sum",  # Sum across all active loans
       },
   )

   # Access relationship data
   total_principal = results["LoanBook.principal"]
   avg_rate = results.get("LoanBook", "rate")

**Available aggregations for relationships:**

* ``"sum"``: Total across all edges (e.g., total outstanding principal)
* ``"mean"``: Average value across all edges (e.g., average interest rate)
* ``"std"``: Standard deviation across edges
* ``None``: Full edge data (list of variable-length arrays per period)

**Non-aggregated relationship data:**

When ``aggregate=None``, relationship data cannot be stacked into 2D arrays
because edge counts vary per period. Instead, data is stored as a list of
arrays:

.. code-block:: python

   results = sim.run(
       n_periods=50,
       collect={
           "LoanBook": ["principal"],
       },
   )

   # List of variable-length arrays (one per period)
   principal_per_period = results.relationship_data["LoanBook"]["principal"]
   # principal_per_period[0] might have 5 loans, period 10 might have 12

.. warning::

   Non-aggregated relationship data cannot be included in DataFrame exports
   due to variable lengths. Use ``results.relationship_data`` directly or
   use aggregation during collection.

Accessing Results
-----------------

``SimulationResults`` provides several ways to access data:

.. code-block:: python

   # Bracket syntax (flat "Name.variable" key)
   results["Producer.price"]
   results["Economy.unemployment_rate"]
   results["LoanBook.principal"]  # if collected

   # Attribute-style access
   results.Producer.price
   results.Economy.unemployment_rate

   # get() method (supports on-the-fly aggregation)
   results.get("Producer", "price")
   results.get("Producer", "price", aggregate="mean")
   results.get("Economy", "unemployment_rate")
   results.get("LoanBook", "principal")  # if collected

   # Direct access to nested dicts
   results.role_data["Producer"]["price"]
   results.economy_data["unemployment_rate"]
   results.relationship_data["LoanBook"]["principal"]  # if collected

   # Get role/relationship as DataFrame
   prod_df = results.get_role_data("Producer")
   loans_df = results.get_relationship_data("LoanBook")  # if collected

Exporting Data
--------------

Export collected data to pandas DataFrames or files for external analysis:

.. code-block:: python

   # Export all collected data to a single DataFrame
   df = results.to_dataframe()

   # Save to various formats (requires pandas)
   df.to_csv("results.csv")
   df.to_parquet("results.parquet")

   # Export individual roles
   prod_df = results.get_role_data("Producer")
   prod_df.to_csv("producer_data.csv")

.. tip::

   For long simulations or parameter sweeps, saving to Parquet format is
   recommended: it is compressed, fast to read, and preserves column types.

.. seealso::

   See the :doc:`examples </auto_examples/index>` for more data collection patterns.