Theory and notation#

This page defines the notation used throughout xyz and summarizes the mathematical quantities estimated by the library.

Why this matters#

The same high-level quantity, such as transfer entropy, can be estimated under very different assumptions:

  • a linear-Gaussian model,

  • a nonparametric nearest-neighbor model,

  • a fixed-radius kernel approximation,

  • or a discrete/binned empirical distribution.

The estimators in xyz are therefore best understood as different numerical approximations to the same information-theoretic functionals.

Notation#

Let \(Y_t\) be the target process at time \(t\). Its embedded past is written as

\[Y_t^- = \left[Y_{t-\tau}, Y_{t-2\tau}, \ldots, Y_{t-d\tau}\right],\]

where \(d\) is the embedding dimension and \(\tau\) is the embedding spacing. Likewise:

  • \(X_t^-\) denotes the embedded past of a driver process,

  • \(Z_t^-\) denotes the embedded past of one or more conditioning processes,

  • \(u\) denotes an interaction delay between source and target when a delay-specific TE estimator is used.

In xyz, these state vectors are assembled by the delay-embedding helpers in xyz.preprocessing.

Units#

Unless otherwise stated, values are reported in nats:

\[1\ \text{nat} = \log_2(e)\ \text{bits} \approx 1.4427\ \text{bits}.\]

Core information quantities#

Entropy#

For a continuous random vector \(Y \in \mathbb{R}^d\),

\[H(Y) = - \int p(y)\log p(y)\,dy.\]

If \(Y\) is Gaussian with covariance matrix \(\Sigma\),

\[H(Y) = \frac{1}{2}\log\!\left((2\pi e)^d \det(\Sigma)\right).\]

This is the quantity estimated by xyz.MVNEntropy.

Conditional entropy#

For two random variables \(X\) and \(Y\),

\[H(Y \mid X) = H(X, Y) - H(X).\]

In a regression-based Gaussian setting, this can be expressed via the covariance of the residual process:

\[H(Y \mid X) = \frac{1}{2}\log\!\left((2\pi e)^d \det(\Sigma_{\varepsilon})\right),\]

where \(\varepsilon\) are the residuals from regressing \(Y\) on \(X\).

Mutual information#

Mutual information measures statistical dependence:

\[I(X;Y) = H(X) + H(Y) - H(X,Y) = H(Y) - H(Y \mid X).\]

It is symmetric in \(X\) and \(Y\) and nonnegative in the population. In finite samples, nonparametric estimators can produce small negative values because of estimation variance.

Conditional mutual information#

Conditional mutual information measures dependence that remains after adjusting for a third variable:

\[I(X;Y \mid Z) = H(X \mid Z) + H(Y \mid Z) - H(X,Y \mid Z).\]

This is the core building block of transfer entropy.

Transfer entropy#

Bivariate transfer entropy from \(X\) to \(Y\) quantifies predictive information flow from the past of \(X\) to the present of \(Y\) beyond the information already contained in the past of \(Y\):

\[TE_{X \to Y} = I(X_t^-; Y_t \mid Y_t^-) = H(Y_t \mid Y_t^-) - H(Y_t \mid Y_t^-, X_t^-).\]

If a separate interaction delay \(u\) is used, the source state can be written more explicitly as

\[X_{t,u}^- = \left[X_{t-u}, X_{t-u-\tau}, \ldots, X_{t-u-(d-1)\tau}\right].\]

This distinction is important in TRENTOOL-style delay reconstruction.

Partial transfer entropy#

Partial transfer entropy adjusts for additional confounding processes \(Z_t^-\):

\[PTE_{X \to Y \mid Z} = I(X_t^-; Y_t \mid Y_t^-, Z_t^-) = H(Y_t \mid Y_t^-, Z_t^-) - H(Y_t \mid Y_t^-, X_t^-, Z_t^-).\]

This is the natural choice when the apparent effect of \(X\) on \(Y\) could be mediated or confounded by known controls.

Self-entropy / information storage#

xyz uses the term self-entropy for information storage:

\[SE_Y = I(Y_t; Y_t^-) = H(Y_t) - H(Y_t \mid Y_t^-).\]

This quantifies how much of the present of a process is predictable from its own past.

Estimator families in xyz#

Gaussian / linear#

These estimators assume the relevant distributions are well approximated by linear regressions with Gaussian residuals. They are fast, interpretable, and provide analytical F-tests for TE, PTE, and self-entropy.

KSG / nearest-neighbor#

These estimators are nonparametric and approximate entropies from nearest-neighbor distances. They are more flexible than Gaussian estimators, especially for nonlinear dependence, but require more data and more careful parameter tuning.

Kernel / fixed-radius#

These estimators replace the fixed-\(k\) neighborhood of KSG with a fixed radius \(r\). They are intuitive and useful for sensitivity analysis, but their performance can change substantially with the chosen radius.

Discrete / binning#

These estimators quantize the data and estimate probabilities from empirical frequencies. They are especially appropriate for symbolic or truly discrete state spaces, but can become sparse in high-dimensional embeddings.

How to choose an estimator family#

Family

Best when

Main strengths

Main risks

Gaussian

Dynamics are approximately linear and homoscedastic

Fast, stable, interpretable, analytical significance

Misses nonlinear structure

KSG

Nonlinear dependence is plausible and sample size is adequate

Flexible, widely used, closest to TRENTOOL-style continuous TE

Higher variance, more tuning, more expensive

Kernel

A local geometric neighborhood view is desirable

Simple radius interpretation, useful for robustness sweeps

Highly sensitive to r

Discrete

Data are symbolic, categorical, or deliberately quantized

Conceptually simple, easy to interpret

Binning bias and state-space sparsity

ITS / TSTOOL / TRENTOOL alignment#

The continuous nearest-neighbor estimators in xyz follow the same broad strategy as ITS/TSTOOL/TRENTOOL:

  1. find a neighborhood in the highest-dimensional joint space,

  2. project that neighborhood into lower-dimensional marginal spaces,

  3. use projected counts to estimate entropy differences with reduced bias.

For the TE/PTE/SE parity tests, xyz excludes self-matches in the projected count stage, mirroring the ITS range_search(..., past=0) behavior.

The TRENTOOL workflow then layers additional methodology on top of those core estimators: ACT-aware trial selection, Ragwitz embedding search, interaction delay reconstruction, surrogate testing, and group-level harmonization. Those workflow components are the bridge between low-level estimator parity and a full causal-analysis pipeline.