Discrete (binning) estimators#

The discrete family is intended for symbolic processes or for continuous data that you deliberately quantize into a small number of states.

Implemented classes#

  • xyz.DiscreteTransferEntropy

  • xyz.DiscretePartialTransferEntropy

  • xyz.DiscreteSelfEntropy

Mathematics#

These estimators compute empirical probabilities from counts of repeated states in embedded observation matrices.

If \(Y_t\) is the current target state and \(Y_t^-\) is its embedded past, then:

\[TE_{X \to Y} = H(Y_t \mid Y_t^-) - H(Y_t \mid Y_t^-, X_t^-),\]
\[PTE_{X \to Y \mid Z} = H(Y_t \mid Y_t^-, Z_t^-) - H(Y_t \mid Y_t^-, X_t^-, Z_t^-),\]
\[SE_Y = H(Y_t) - H(Y_t \mid Y_t^-).\]

Each entropy term is evaluated from empirical frequencies:

\[\hat{H}(Y) = -\sum_y \hat{p}(y)\log \hat{p}(y).\]

Quantization#

When quantize=True, xyz applies MATLAB-compatible uniform quantization with c bins before counting discrete states. This is useful for ITS-style parity and for exploratory symbolic analysis, but it also introduces a modeling choice: the result now depends on the quantization scheme.

Why use discrete estimators#

  • They are natural for genuinely discrete state spaces.

  • They are easy to interpret because they reduce everything to frequency tables.

  • They are often a useful pedagogical baseline for understanding TE and PTE.

When to use them#

Use the discrete family when:

  • your data are already categorical or symbolic,

  • you want to compare multiple coarse quantizations of a continuous process,

  • or you want a transparent state-counting baseline before moving to KSG or Gaussian estimators.

Typical use cases#

  • Symbolic dynamics and regime switching.

  • Discretized market states, such as up/flat/down returns.

  • Binned neural or physiological activity states.

How to use them#

import numpy as np
from xyz import (
    DiscretePartialTransferEntropy,
    DiscreteSelfEntropy,
    DiscreteTransferEntropy,
)

data = np.random.randn(2000, 3)

te = DiscreteTransferEntropy(
    driver_indices=[0],
    target_indices=[1],
    lags=1,
    c=8,
    quantize=True,
).fit(data)

pte = DiscretePartialTransferEntropy(
    driver_indices=[0],
    target_indices=[1],
    conditioning_indices=[2],
    lags=1,
    c=8,
).fit(data)

se = DiscreteSelfEntropy(target_indices=[1], lags=2, c=8).fit(data)

print(te.transfer_entropy_)
print(pte.transfer_entropy_)
print(se.self_entropy_)

Practical advice#

  • c too small merges distinct states and may underfit.

  • c too large creates sparse tables and unstable estimates.

  • If estimates change dramatically with the number of bins, report that sensitivity rather than hiding it.

  • In higher-dimensional embeddings, the discrete state space grows quickly, so KSG or Gaussian estimators may become more reliable.

Interactive example#

The plot below shows discrete TE as a function of the number of quantization bins in a synthetic lagged system. This is a useful diagnostic because an estimate that is only present for one narrow bin count is often not robust.