kNN / KSG estimators#
The KSG family provides nonparametric estimators for entropy, mutual
information, conditional mutual information, and time-series information flow.
This is the estimator family in xyz that is conceptually closest to the
continuous nearest-neighbor machinery used by ITS and TRENTOOL.
Implemented classes#
xyz.KSGMutualInformationxyz.KSGEntropyxyz.MVKSGCondEntropyxyz.MVKSGCondMutualInformationxyz.DirectKSGConditionalMutualInformation(direct k-NN CMI \(I(X;Y|Z)\))xyz.MVKSGTransferEntropyxyz.KSGTransferEntropyxyz.KSGPartialTransferEntropyxyz.KSGSelfEntropyxyz.MVKSGPartialInformationDecomposition
Core mathematical idea#
KSG estimators build on the Kozachenko-Leonenko nearest-neighbor entropy estimator. For each sample:
find the distance \(\varepsilon_i\) to the \(k\)-th nearest neighbor in a joint space,
project the same radius into lower-dimensional marginal spaces,
count how many samples fall inside the projected neighborhoods,
combine those counts with digamma functions to estimate entropy differences.
For mutual information, the classic KSG form is
where \(n_x(i)\) and \(n_y(i)\) are projected neighbor counts.
For transfer entropy, the same principle is applied to embedded state spaces:
In practice, xyz estimates this from neighborhood counts in
\((Y_t, Y_t^-)\),
\((Y_t, Y_t^-, X_t^-)\),
and the corresponding reduced conditioning spaces.
Why use KSG#
It is nonparametric and can capture nonlinear dependence missed by Gaussian estimators.
It preserves the information-theoretic interpretation directly in continuous spaces.
It is the most natural choice when you want scientific comparability with ITS/TRENTOOL-style continuous TE.
When to prefer KSG#
Use KSG when:
you suspect nonlinear coupling,
you have enough samples to support neighborhood estimation,
and you care more about flexible dependence detection than about speed.
Typical use cases#
Neuroscience: directed functional connectivity in nonlinear neural signals.
Finance: nonlinear information flow between assets, factors, or volatility states.
Dynamical systems: causal coupling between chaotic or weakly nonlinear processes.
How to use the KSG estimators#
import numpy as np
from xyz import (
KSGEntropy,
KSGMutualInformation,
KSGPartialTransferEntropy,
KSGTransferEntropy,
)
X = np.random.randn(1000, 1)
Y = X + 0.2 * np.random.randn(1000, 1)
mi = KSGMutualInformation(k=3).fit(X, Y).score()
data = np.random.randn(1500, 3)
te = KSGTransferEntropy(
driver_indices=[0],
target_indices=[1],
lags=1,
tau=1,
delay=1,
k=3,
metric="chebyshev",
).fit(data)
pte = KSGPartialTransferEntropy(
driver_indices=[0],
target_indices=[1],
conditioning_indices=[2],
lags=1,
k=3,
).fit(data)
print(mi)
print(te.transfer_entropy_, te.conditional_entropy_)
print(pte.transfer_entropy_)
Parameter guidance#
k: smaller values reduce smoothing bias but increase variance; typical values are 3 to 10.metric: use"chebyshev"for ITS-style comparability.lagsandtau: larger embeddings can represent richer dynamics, but quickly increase the dimensionality burden.delay: use this explicitly when scanning interaction delays.
Important caveats#
Finite-sample estimates can be slightly negative.
Performance deteriorates as the effective embedding dimension grows.
Results can be sensitive to repeated samples, ties, and insufficient neighborhood support.
ITS parity note#
For the TE/PTE/SE parity tests in this project, self-neighbors are excluded
from projected counts, matching the ITS behavior of
range_search(..., past=0).
Interactive example#
The plot below shows KSG transfer entropy as a function of the neighborhood size \(k\) for a nonlinear driver-target system. The point is not that a single \(k\) is universally best, but that useful estimates should remain qualitatively stable across a sensible range.