kNN / KSG estimators#

The KSG family provides nonparametric estimators for entropy, mutual information, conditional mutual information, and time-series information flow. This is the estimator family in xyz that is conceptually closest to the continuous nearest-neighbor machinery used by ITS and TRENTOOL.

Implemented classes#

xyz.KSGMutualInformation
xyz.KSGEntropy
xyz.MVKSGCondEntropy
xyz.MVKSGCondMutualInformation
xyz.DirectKSGConditionalMutualInformation (direct k-NN CMI \(I(X;Y|Z)\))
xyz.MVKSGTransferEntropy
xyz.KSGTransferEntropy
xyz.KSGPartialTransferEntropy
xyz.KSGSelfEntropy
xyz.MVKSGPartialInformationDecomposition

Core mathematical idea#

KSG estimators build on the Kozachenko-Leonenko nearest-neighbor entropy estimator. For each sample:

find the distance \(\varepsilon_i\) to the \(k\)-th nearest neighbor in a joint space,
project the same radius into lower-dimensional marginal spaces,
count how many samples fall inside the projected neighborhoods,
combine those counts with digamma functions to estimate entropy differences.

For mutual information, the classic KSG form is

\[\hat{I}(X;Y) = \psi(k) + \psi(N) - \frac{1}{N}\sum_{i=1}^{N}\left[\psi(n_x(i)) + \psi(n_y(i))\right],\]

where \(n_x(i)\) and \(n_y(i)\) are projected neighbor counts.

For transfer entropy, the same principle is applied to embedded state spaces:

\[TE_{X \to Y} = H(Y_t \mid Y_t^-) - H(Y_t \mid Y_t^-, X_t^-).\]

In practice, xyz estimates this from neighborhood counts in

\((Y_t, Y_t^-)\),
\((Y_t, Y_t^-, X_t^-)\),
and the corresponding reduced conditioning spaces.

Why use KSG#

It is nonparametric and can capture nonlinear dependence missed by Gaussian estimators.
It preserves the information-theoretic interpretation directly in continuous spaces.
It is the most natural choice when you want scientific comparability with ITS/TRENTOOL-style continuous TE.

When to prefer KSG#

Use KSG when:

you suspect nonlinear coupling,
you have enough samples to support neighborhood estimation,
and you care more about flexible dependence detection than about speed.

Typical use cases#

Neuroscience: directed functional connectivity in nonlinear neural signals.
Finance: nonlinear information flow between assets, factors, or volatility states.
Dynamical systems: causal coupling between chaotic or weakly nonlinear processes.

How to use the KSG estimators#

import numpy as np
from xyz import (
    KSGEntropy,
    KSGMutualInformation,
    KSGPartialTransferEntropy,
    KSGTransferEntropy,
)

X = np.random.randn(1000, 1)
Y = X + 0.2 * np.random.randn(1000, 1)
mi = KSGMutualInformation(k=3).fit(X, Y).score()

data = np.random.randn(1500, 3)
te = KSGTransferEntropy(
    driver_indices=[0],
    target_indices=[1],
    lags=1,
    tau=1,
    delay=1,
    k=3,
    metric="chebyshev",
).fit(data)

pte = KSGPartialTransferEntropy(
    driver_indices=[0],
    target_indices=[1],
    conditioning_indices=[2],
    lags=1,
    k=3,
).fit(data)

print(mi)
print(te.transfer_entropy_, te.conditional_entropy_)
print(pte.transfer_entropy_)

Parameter guidance#

k: smaller values reduce smoothing bias but increase variance; typical values are 3 to 10.
metric: use "chebyshev" for ITS-style comparability.
lags and tau: larger embeddings can represent richer dynamics, but quickly increase the dimensionality burden.
delay: use this explicitly when scanning interaction delays.

Important caveats#

Finite-sample estimates can be slightly negative.
Performance deteriorates as the effective embedding dimension grows.
Results can be sensitive to repeated samples, ties, and insufficient neighborhood support.

ITS parity note#

For the TE/PTE/SE parity tests in this project, self-neighbors are excluded from projected counts, matching the ITS behavior of range_search(..., past=0).

Interactive example#

The plot below shows KSG transfer entropy as a function of the neighborhood size \(k\) for a nonlinear driver-target system. The point is not that a single \(k\) is universally best, but that useful estimates should remain qualitatively stable across a sensible range.