Finance workflow example#

This example illustrates a practical workflow for return-based information dynamics. In finance, xyz is often most useful when framed as either:

  • a hedge-fund research toolkit for testing whether a signal contains incremental predictive information about future returns;

  • a market microstructure toolkit for measuring directional information flow between order-book state variables, trading activity, and short-horizon price changes.

Problem setup#

  • Y: target return (asset or portfolio).

  • X: candidate driver (factor, market index, signal).

  • Z: controls/confounders (other factors, volatility proxies, sector index).

Hedge-fund research framing#

For discretionary or systematic research, the core objects often look like:

  • Y: next-period return for an asset, spread, or portfolio sleeve.

  • X: candidate alpha source such as carry, momentum, revisions, macro surprises, option-implied variables, or cross-asset moves.

  • Z: benchmark risk controls such as market beta, sector indices, rates, credit spreads, volatility proxies, or competing signals.

Typical questions include:

  • Does a candidate signal still add information after controlling for known factors?

  • Is an apparent lead-lag effect still present after conditioning on the target’s own past?

  • Are two features redundant, uniquely informative, or synergistic when predicting the same target?

  • Does a relationship persist across rolling windows or disappear outside of a specific regime?

Suggested sequence#

  1. Baseline linear inference

    • Fit GaussianTransferEntropy and GaussianPartialTransferEntropy.

    • Use p_value_ for quick significance screening.

  2. Nonparametric confirmation

    • Fit KSGTransferEntropy / KSGPartialTransferEntropy.

    • Check robustness over k (for example 3, 5, 8).

  3. Storage diagnostics

    • Fit KSGSelfEntropy (or Gaussian/Kernel variants) for persistence in Y.

  4. Sensitivity analysis

    • Vary lags/embedding size.

    • Compare metrics (Chebyshev vs Euclidean if needed).

    • Validate on rolling windows for nonstationarity.

    • Treat significance testing as part of the workflow, not an optional extra.

Useful interpretations#

  • GaussianTransferEntropy / KSGTransferEntropy: “Does signal X improve prediction of future Y beyond Y’s own history?”

  • GaussianPartialTransferEntropy / KSGPartialTransferEntropy: “Does that directed relationship survive after controlling for known risk drivers Z?”

  • KSGSelfEntropy: “How much of the target is explained by its own lagged state?”

  • MVKSGCondMutualInformation: “Is this feature still informative after controlling for existing factors?”

  • MVKSGPartialInformationDecomposition: “Are two signals redundant, unique, or synergistic?”

Code sketch: hedge-fund research#

import numpy as np
from xyz import (
    GaussianPartialTransferEntropy,
    KSGPartialTransferEntropy,
    KSGSelfEntropy,
)

# data columns: [target_return, candidate_signal, control_factor]
data = np.random.randn(3000, 3)

gpte = GaussianPartialTransferEntropy(
    driver_indices=[1],
    target_indices=[0],
    conditioning_indices=[2],
    lags=1,
).fit(data)

kpte = KSGPartialTransferEntropy(
    driver_indices=[1],
    target_indices=[0],
    conditioning_indices=[2],
    lags=1,
    k=3,
    metric="chebyshev",
).fit(data)

kse = KSGSelfEntropy(target_indices=[0], lags=1, k=3).fit(data[:, [0]])

print("Gaussian PTE:", gpte.transfer_entropy_, "p=", gpte.p_value_)
print("KSG PTE:", kpte.transfer_entropy_)
print("KSG SE:", kse.self_entropy_)

Market microstructure framing#

At higher frequency, replace asset-level factors with order-book and trade state variables:

  • Y: next mid-price move, short-horizon return, spread change, or volatility burst.

  • X: order-flow imbalance, trade sign, queue depletion, cancellation bursts, depth imbalance, or venue-specific activity.

  • Z: own-price history, spread regime, market state, auction flags, or broad benchmark features.

This framing is useful for questions such as:

  • whether order-flow imbalance contains incremental predictive information about the next price move;

  • whether one venue systematically leads another in a fragmented market;

  • whether a richer book feature adds unique information beyond simple trade imbalance;

  • whether naturally bucketed states are more transparent to study with discrete estimators before using continuous KSG variants.

Code sketch: discrete microstructure states#

import numpy as np
from xyz import DiscretePartialTransferEntropy, DiscreteSelfEntropy

# data columns: [midprice_state, imbalance_state, spread_state]
# each column can already be integer-coded, e.g. 0=down, 1=flat, 2=up
states = np.random.randint(0, 3, size=(5000, 3))

dpte = DiscretePartialTransferEntropy(
    driver_indices=[1],
    target_indices=[0],
    conditioning_indices=[2],
    lags=1,
    tau=1,
    delay=1,
    quantize=False,
).fit(states)

dse = DiscreteSelfEntropy(
    target_indices=[0],
    lags=1,
    tau=1,
    quantize=False,
).fit(states[:, [0]])

print("Discrete PTE:", dpte.transfer_entropy_)
print("Discrete SE:", dse.self_entropy_)

Practical notes#

  • Start simple: Gaussian estimators are often the best first screen when sample size is limited and the goal is ranking relationships quickly.

  • Use KSG estimators when you suspect nonlinear, heavy-tailed, or regime-shaped dependencies.

  • Use InteractionDelaySearchCV and RagwitzEmbeddingSearchCV when the timing of the interaction is part of the research question.

  • Use SurrogatePermutationTest before trusting a weak TE edge in a noisy financial system.

  • Prefer rolling or event-conditioned analyses over a single full-sample fit on nonstationary market data.