kNN / KSG estimators
====================

The KSG family provides nonparametric estimators for entropy, mutual
information, conditional mutual information, and time-series information flow.
This is the estimator family in ``xyz`` that is conceptually closest to the
continuous nearest-neighbor machinery used by ITS and TRENTOOL.

Implemented classes
-------------------

- ``xyz.KSGMutualInformation``
- ``xyz.KSGEntropy``
- ``xyz.MVKSGCondEntropy``
- ``xyz.MVKSGCondMutualInformation``
- ``xyz.DirectKSGConditionalMutualInformation`` (direct k-NN CMI :math:`I(X;Y|Z)`)
- ``xyz.MVKSGTransferEntropy``
- ``xyz.KSGTransferEntropy``
- ``xyz.KSGPartialTransferEntropy``
- ``xyz.KSGSelfEntropy``
- ``xyz.MVKSGPartialInformationDecomposition``

Core mathematical idea
----------------------

KSG estimators build on the Kozachenko-Leonenko nearest-neighbor entropy
estimator. For each sample:

1. find the distance :math:`\varepsilon_i` to the :math:`k`-th nearest
   neighbor in a joint space,
2. project the same radius into lower-dimensional marginal spaces,
3. count how many samples fall inside the projected neighborhoods,
4. combine those counts with digamma functions to estimate entropy differences.

For mutual information, the classic KSG form is

.. math::

   \hat{I}(X;Y)
   = \psi(k) + \psi(N)
     - \frac{1}{N}\sum_{i=1}^{N}\left[\psi(n_x(i)) + \psi(n_y(i))\right],

where :math:`n_x(i)` and :math:`n_y(i)` are projected neighbor counts.

For transfer entropy, the same principle is applied to embedded state spaces:

.. math::

   TE_{X \to Y}
   = H(Y_t \mid Y_t^-)
   - H(Y_t \mid Y_t^-, X_t^-).

In practice, ``xyz`` estimates this from neighborhood counts in

- :math:`(Y_t, Y_t^-)`,
- :math:`(Y_t, Y_t^-, X_t^-)`,
- and the corresponding reduced conditioning spaces.

Why use KSG
-----------

- It is nonparametric and can capture nonlinear dependence missed by Gaussian
  estimators.
- It preserves the information-theoretic interpretation directly in continuous
  spaces.
- It is the most natural choice when you want scientific comparability with
  ITS/TRENTOOL-style continuous TE.

When to prefer KSG
------------------

Use KSG when:

- you suspect nonlinear coupling,
- you have enough samples to support neighborhood estimation,
- and you care more about flexible dependence detection than about speed.

Typical use cases
-----------------

- Neuroscience: directed functional connectivity in nonlinear neural signals.
- Finance: nonlinear information flow between assets, factors, or volatility
  states.
- Dynamical systems: causal coupling between chaotic or weakly nonlinear
  processes.

How to use the KSG estimators
-----------------------------

.. code-block:: python

   import numpy as np
   from xyz import (
       KSGEntropy,
       KSGMutualInformation,
       KSGPartialTransferEntropy,
       KSGTransferEntropy,
   )

   X = np.random.randn(1000, 1)
   Y = X + 0.2 * np.random.randn(1000, 1)
   mi = KSGMutualInformation(k=3).fit(X, Y).score()

   data = np.random.randn(1500, 3)
   te = KSGTransferEntropy(
       driver_indices=[0],
       target_indices=[1],
       lags=1,
       tau=1,
       delay=1,
       k=3,
       metric="chebyshev",
   ).fit(data)

   pte = KSGPartialTransferEntropy(
       driver_indices=[0],
       target_indices=[1],
       conditioning_indices=[2],
       lags=1,
       k=3,
   ).fit(data)

   print(mi)
   print(te.transfer_entropy_, te.conditional_entropy_)
   print(pte.transfer_entropy_)

Parameter guidance
------------------

- ``k``:
  smaller values reduce smoothing bias but increase variance; typical values
  are 3 to 10.
- ``metric``:
  use ``"chebyshev"`` for ITS-style comparability.
- ``lags`` and ``tau``:
  larger embeddings can represent richer dynamics, but quickly increase the
  dimensionality burden.
- ``delay``:
  use this explicitly when scanning interaction delays.

Important caveats
-----------------

- Finite-sample estimates can be slightly negative.
- Performance deteriorates as the effective embedding dimension grows.
- Results can be sensitive to repeated samples, ties, and insufficient
  neighborhood support.

ITS parity note
---------------

For the TE/PTE/SE parity tests in this project, self-neighbors are excluded
from projected counts, matching the ITS behavior of
``range_search(..., past=0)``.

Interactive example
-------------------

The plot below shows KSG transfer entropy as a function of the neighborhood
size :math:`k` for a nonlinear driver-target system. The point is not that a
single :math:`k` is universally best, but that useful estimates should remain
qualitatively stable across a sensible range.

.. plotly-exec::

   import numpy as np
   import plotly.graph_objects as go
   from xyz import KSGTransferEntropy

   rng = np.random.default_rng(12)
   n = 700
   driver = rng.normal(size=n)
   target = np.zeros(n)
   for t in range(1, n):
       target[t] = 0.35 * target[t - 1] + 0.25 * np.tanh(1.5 * driver[t - 1]) + 0.10 * rng.normal()

   data = np.column_stack([target, driver])
   ks = [2, 3, 4, 5, 6, 8, 10]
   te_vals = []
   for k in ks:
       est = KSGTransferEntropy(
           driver_indices=[1],
           target_indices=[0],
           lags=1,
           k=k,
           metric="chebyshev",
       ).fit(data)
       te_vals.append(est.transfer_entropy_)

   fig = go.Figure()
   fig.add_trace(
       go.Scatter(
           x=ks,
           y=te_vals,
           mode="lines+markers",
           name="KSG TE",
           line=dict(width=3),
       )
   )
   fig.update_layout(
       title="KSG transfer entropy sensitivity to neighborhood size",
       xaxis_title="k nearest neighbors",
       yaxis_title="Transfer entropy (nats)",
       template="plotly_white",
       height=420,
       margin=dict(l=40, r=20, t=60, b=40),
   )