Finance workflow example
========================

This example illustrates a practical workflow for return-based information
dynamics. In finance, ``xyz`` is often most useful when framed as either:

- a **hedge-fund research** toolkit for testing whether a signal contains
  incremental predictive information about future returns;
- a **market microstructure** toolkit for measuring directional information flow
  between order-book state variables, trading activity, and short-horizon price
  changes.

Problem setup
-------------

- ``Y``: target return (asset or portfolio).
- ``X``: candidate driver (factor, market index, signal).
- ``Z``: controls/confounders (other factors, volatility proxies, sector index).

Hedge-fund research framing
---------------------------

For discretionary or systematic research, the core objects often look like:

- ``Y``: next-period return for an asset, spread, or portfolio sleeve.
- ``X``: candidate alpha source such as carry, momentum, revisions, macro
  surprises, option-implied variables, or cross-asset moves.
- ``Z``: benchmark risk controls such as market beta, sector indices, rates,
  credit spreads, volatility proxies, or competing signals.

Typical questions include:

- Does a candidate signal still add information after controlling for known
  factors?
- Is an apparent lead-lag effect still present after conditioning on the
  target's own past?
- Are two features redundant, uniquely informative, or synergistic when
  predicting the same target?
- Does a relationship persist across rolling windows or disappear outside of a
  specific regime?

Suggested sequence
------------------

1. **Baseline linear inference**

   - Fit ``GaussianTransferEntropy`` and ``GaussianPartialTransferEntropy``.
   - Use ``p_value_`` for quick significance screening.

2. **Nonparametric confirmation**

   - Fit ``KSGTransferEntropy`` / ``KSGPartialTransferEntropy``.
   - Check robustness over ``k`` (for example 3, 5, 8).

3. **Storage diagnostics**

   - Fit ``KSGSelfEntropy`` (or Gaussian/Kernel variants) for persistence in ``Y``.

4. **Sensitivity analysis**

   - Vary lags/embedding size.
   - Compare metrics (Chebyshev vs Euclidean if needed).
   - Validate on rolling windows for nonstationarity.
   - Treat significance testing as part of the workflow, not an optional extra.

Useful interpretations
----------------------

- ``GaussianTransferEntropy`` / ``KSGTransferEntropy``:
  "Does signal ``X`` improve prediction of future ``Y`` beyond ``Y``'s own
  history?"
- ``GaussianPartialTransferEntropy`` / ``KSGPartialTransferEntropy``:
  "Does that directed relationship survive after controlling for known risk
  drivers ``Z``?"
- ``KSGSelfEntropy``:
  "How much of the target is explained by its own lagged state?"
- ``MVKSGCondMutualInformation``:
  "Is this feature still informative after controlling for existing factors?"
- ``MVKSGPartialInformationDecomposition``:
  "Are two signals redundant, unique, or synergistic?"

Code sketch: hedge-fund research
--------------------------------

.. code-block:: python

   import numpy as np
   from xyz import (
       GaussianPartialTransferEntropy,
       KSGPartialTransferEntropy,
       KSGSelfEntropy,
   )

   # data columns: [target_return, candidate_signal, control_factor]
   data = np.random.randn(3000, 3)

   gpte = GaussianPartialTransferEntropy(
       driver_indices=[1],
       target_indices=[0],
       conditioning_indices=[2],
       lags=1,
   ).fit(data)

   kpte = KSGPartialTransferEntropy(
       driver_indices=[1],
       target_indices=[0],
       conditioning_indices=[2],
       lags=1,
       k=3,
       metric="chebyshev",
   ).fit(data)

   kse = KSGSelfEntropy(target_indices=[0], lags=1, k=3).fit(data[:, [0]])

   print("Gaussian PTE:", gpte.transfer_entropy_, "p=", gpte.p_value_)
   print("KSG PTE:", kpte.transfer_entropy_)
   print("KSG SE:", kse.self_entropy_)

Market microstructure framing
-----------------------------

At higher frequency, replace asset-level factors with order-book and trade state
variables:

- ``Y``: next mid-price move, short-horizon return, spread change, or volatility
  burst.
- ``X``: order-flow imbalance, trade sign, queue depletion, cancellation bursts,
  depth imbalance, or venue-specific activity.
- ``Z``: own-price history, spread regime, market state, auction flags, or
  broad benchmark features.

This framing is useful for questions such as:

- whether order-flow imbalance contains incremental predictive information about
  the next price move;
- whether one venue systematically leads another in a fragmented market;
- whether a richer book feature adds unique information beyond simple trade
  imbalance;
- whether naturally bucketed states are more transparent to study with discrete
  estimators before using continuous KSG variants.

Code sketch: discrete microstructure states
-------------------------------------------

.. code-block:: python

   import numpy as np
   from xyz import DiscretePartialTransferEntropy, DiscreteSelfEntropy

   # data columns: [midprice_state, imbalance_state, spread_state]
   # each column can already be integer-coded, e.g. 0=down, 1=flat, 2=up
   states = np.random.randint(0, 3, size=(5000, 3))

   dpte = DiscretePartialTransferEntropy(
       driver_indices=[1],
       target_indices=[0],
       conditioning_indices=[2],
       lags=1,
       tau=1,
       delay=1,
       quantize=False,
   ).fit(states)

   dse = DiscreteSelfEntropy(
       target_indices=[0],
       lags=1,
       tau=1,
       quantize=False,
   ).fit(states[:, [0]])

   print("Discrete PTE:", dpte.transfer_entropy_)
   print("Discrete SE:", dse.self_entropy_)

Practical notes
---------------

- Start simple: Gaussian estimators are often the best first screen when sample
  size is limited and the goal is ranking relationships quickly.
- Use KSG estimators when you suspect nonlinear, heavy-tailed, or regime-shaped
  dependencies.
- Use ``InteractionDelaySearchCV`` and ``RagwitzEmbeddingSearchCV`` when the
  timing of the interaction is part of the research question.
- Use ``SurrogatePermutationTest`` before trusting a weak TE edge in a noisy
  financial system.
- Prefer rolling or event-conditioned analyses over a single full-sample fit on
  nonstationary market data.