Modules#

xyz._continuous#

class xyz._continuous.DirectKSGConditionalMutualInformation(k: int = 3, metric: str = 'chebyshev')[source]#

Bases: MVKSGInfoTheoryEstimator

Direct kNN conditional mutual information estimator.

score_attr_: str | None = 'conditional_mutual_information_'#
fit(X, y, Z)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

set_fit_request(*, Z: bool | None | str = '$UNCHANGED$') DirectKSGConditionalMutualInformation#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

class xyz._continuous.GaussianCopulaConditionalMutualInformation[source]#

Bases: MVInfoTheoryEstimator

Conditional mutual information after a Gaussian-copula transform.

score_attr_: str | None = 'conditional_mutual_information_'#
fit(X, y, Z)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

set_fit_request(*, Z: bool | None | str = '$UNCHANGED$') GaussianCopulaConditionalMutualInformation#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

class xyz._continuous.GaussianCopulaMutualInformation[source]#

Bases: MVInfoTheoryEstimator

Mutual information after a Gaussian-copula marginal transform.

Ranks each variable to uniforms then applies inverse Gaussian CDF; computes MI on the transformed data (nonparametric in marginals, Gaussian in dependence).

Parameters:

None

mutual_information_#

Fitted MI in nats (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import GaussianCopulaMutualInformation
>>> rng = np.random.default_rng(404)
>>> x = rng.normal(size=(500, 1))
>>> y = 0.6 * x + 0.3 * rng.normal(size=(500, 1))
>>> est = GaussianCopulaMutualInformation().fit(x, y)
>>> est.mutual_information_ > 0
True
score_attr_: str | None = 'mutual_information_'#
fit(X, y)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.GaussianCopulaTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, extra_conditioning: str | None = None)[source]#

Bases: _GaussianTEBase

Transfer entropy after a Gaussian-copula marginal transform.

score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.GaussianPartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, extra_conditioning: str | None = None)[source]#

Bases: _GaussianTEBase

Linear-Gaussian partial transfer entropy estimator.

score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.GaussianSelfEntropy(target_indices, lags: int = 1, tau: int = 1)[source]#

Bases: _GaussianTEBase

Linear-Gaussian information storage estimator.

score_attr_: str | None = 'self_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.GaussianTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, extra_conditioning: str | None = None)[source]#

Bases: _GaussianTEBase

Linear-Gaussian bivariate transfer entropy estimator.

score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KSGEntropy(k: int = 3, metric: str = 'chebyshev')[source]#

Bases: InfoTheoryMixin, InfoTheoryEstimator

Kozachenko–Leonenko k-NN differential entropy estimator.

Estimates \(H(X)\) from the distance to the k-th nearest neighbor and the log-volume of the unit ball for the chosen metric.

Parameters:
  • k (int, optional) – Number of neighbors. Default is 3.

  • metric (str, optional) – Distance metric ("chebyshev" or "euclidean"). Default is "chebyshev".

entropy_#

Fitted differential entropy in nats (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import KSGEntropy
>>> rng = np.random.default_rng(42)
>>> X = rng.normal(0, 2, (10000, 1))
>>> est = KSGEntropy(k=3).fit(X)
>>> theoretical = 0.5 * np.log(2 * np.pi * np.e * 4)
>>> abs(est.entropy_ - theoretical) < 0.1
True
score_attr_: str | None = 'entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KSGMutualInformation(k: int = 3, algorithm: int = 1, metric: str = 'chebyshev')[source]#

Bases: InfoTheoryMixin, InfoTheoryEstimator

Kraskov–Stögbauer–Grassberger (KSG) k-NN mutual information estimator.

Estimates \(I(X;Y)\) from k-nearest neighbor distances in the joint and marginal spaces. algorithm=1 (default) uses the stricter radius; algorithm=2 uses the larger radius (often more stable).

Parameters:
  • k (int, optional) – Number of neighbors. Default is 3.

  • algorithm ({1, 2}, optional) – KSG variant. Default is 1.

  • metric (str, optional) – Distance metric ("chebyshev" or "euclidean"). Default is "chebyshev".

mutual_information_#

Fitted MI in nats (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import KSGMutualInformation
>>> rng = np.random.default_rng(42)
>>> X = rng.normal(size=(1000, 1))
>>> y = rng.normal(size=(1000, 1))
>>> mi_ind = KSGMutualInformation(k=3).fit(X, y).score()
>>> y_corr = 0.7 * X + 0.3 * rng.normal(size=(1000, 1))
>>> mi_corr = KSGMutualInformation(k=3).fit(X, y_corr).score()
>>> mi_corr > mi_ind
True
score_attr_: str | None = 'mutual_information_'#
fit(X, y)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KSGPartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, k: int = 3, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#

Bases: _KSGTEBase

KSG partial transfer entropy estimator.

score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KSGSelfEntropy(target_indices, lags: int = 1, tau: int = 1, k: int = 3, metric: str = 'chebyshev')[source]#

Bases: _KSGTEBase

KSG information storage / self-entropy estimator.

score_attr_: str | None = 'self_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KSGTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, k: int = 3, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#

Bases: _KSGTEBase

KSG (k-NN) bivariate transfer entropy estimator.

Estimates \(TE_{X \to Y}\) from time-series data using k-nearest neighbor conditional entropy. Expects array X with shape (n_trials, n_samples, n_features) or 2D equivalent; driver and target are column indices.

Parameters:
  • driver_indices (array-like) – Column index(es) of the driver variable(s).

  • target_indices (array-like) – Column index(es) of the target variable(s).

  • lags (int, optional) – Number of past lags for embedding. Default is 1.

  • tau (int, optional) – Lag step (samples). Default is 1.

  • delay (int, optional) – Delay from driver to target. Default is 1.

  • k (int, optional) – Number of neighbors. Default is 3.

  • metric (str, optional) – Distance metric. Default is "chebyshev".

  • extra_conditioning (str or None, optional) – Optional extra conditioning (e.g. "Faes_Method"). Default is None.

transfer_entropy_#

Fitted TE in nats (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import KSGTransferEntropy
>>> rng = np.random.default_rng(42)
>>> trials = []
>>> for _ in range(4):
...     driver = rng.normal(size=180)
...     target = np.zeros(180)
...     for t in range(1, 180):
...         target[t] = 0.4 * target[t-1] + 0.5 * driver[t-1] + 0.1 * rng.normal()
...     trials.append(np.column_stack([target, driver]))
>>> X = np.stack(trials)
>>> est = KSGTransferEntropy(driver_indices=[1], target_indices=[0], lags=1, k=3).fit(X)
>>> est.transfer_entropy_ > 0
True
score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KernelPartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, r: float = 0.5, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#

Bases: _KernelTEBase

Kernel partial transfer entropy estimator.

score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KernelSelfEntropy(target_indices, lags: int = 1, tau: int = 1, r: float = 0.5, metric: str = 'chebyshev')[source]#

Bases: _KernelTEBase

Kernel information storage / self-entropy estimator.

score_attr_: str | None = 'self_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.KernelTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, r: float = 0.5, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#

Bases: _KernelTEBase

Kernel bivariate transfer entropy estimator.

score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.MVCondEntropy[source]#

Bases: MVInfoTheoryEstimator

Conditional entropy \(H(Y|X)\) under a linear-Gaussian model.

Fits \(Y \approx X\beta\) and uses the residual covariance to compute \(H(Y|X)\) as the entropy of the residual.

Parameters:

None

conditional_entropy_#

Fitted \(H(Y|X)\) in nats (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import MVCondEntropy
>>> rng = np.random.default_rng(42)
>>> X = rng.normal(size=(500, 2))
>>> y = X[:, :1] + 0.3 * rng.normal(size=(500, 1))
>>> est = MVCondEntropy().fit(X, y)
>>> np.isfinite(est.conditional_entropy_)
True
score_attr_: str | None = 'conditional_entropy_'#
fit(X, y)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.MVExponentialEntropy[source]#

Bases: MVInfoTheoryEstimator

Placeholder for a future multivariate exponential entropy estimator.

fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

score(X=None, y=None)[source]#

Return the estimator’s primary fitted scalar quantity.

If additional positional/keyword arguments are passed, refits the estimator on that data and then returns the score.

class xyz._continuous.MVKSGCondEntropy(k: int = 3, metric: str = 'chebyshev')[source]#

Bases: MVKSGInfoTheoryEstimator

Multivariate conditional entropy \(H(Y|X)\) via KSG \(H(X,Y) - H(X)\).

score_attr_: str | None = 'conditional_entropy_'#
fit(X, y)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.MVKSGCondMutualInformation(k: int = 3, metric: str = 'chebyshev')[source]#

Bases: MVKSGInfoTheoryEstimator

Conditional mutual information I(X;Y|Z) via KSG identities.

score_attr_: str | None = 'conditional_mutual_information_'#
fit(X, y, Z)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

set_fit_request(*, Z: bool | None | str = '$UNCHANGED$') MVKSGCondMutualInformation#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

class xyz._continuous.MVKSGPartialInformationDecomposition(k: int = 3, metric: str = 'chebyshev')[source]#

Bases: MVKSGInfoTheoryEstimator

Partial information decomposition (PID) of two sources onto target via KSG MI.

Decomposes \(I(X_1,X_2; Y)\) into unique, redundant, and synergistic terms.

fit(X1, X2, y)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

score(X1=None, X2=None, y=None)[source]#

Return the estimator’s primary fitted scalar quantity.

If additional positional/keyword arguments are passed, refits the estimator on that data and then returns the score.

set_fit_request(*, X1: bool | None | str = '$UNCHANGED$', X2: bool | None | str = '$UNCHANGED$') MVKSGPartialInformationDecomposition#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X1 (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X1 parameter in fit.

  • X2 (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X2 parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, X1: bool | None | str = '$UNCHANGED$', X2: bool | None | str = '$UNCHANGED$') MVKSGPartialInformationDecomposition#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X1 (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X1 parameter in score.

  • X2 (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X2 parameter in score.

Returns:

self – The updated object.

Return type:

object

class xyz._continuous.MVKSGTransferEntropy(k: int = 3, metric: str = 'chebyshev', lag: int = 1)[source]#

Bases: MVKSGInfoTheoryEstimator

Multivariate transfer entropy as \(I(X_{t-\tau}; Y_t | Y_{t-\tau})\) (single lag).

score_attr_: str | None = 'transfer_entropy_'#
fit(X, y)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.MVLNEntropy[source]#

Bases: MVInfoTheoryEstimator

Differential entropy for log-normal observations.

Transforms \(X \mapsto \log X\) and uses Gaussian entropy plus \(\mathbb{E}[\log X]\) correction. Data must be strictly positive.

Parameters:

None

entropy_#

Fitted entropy in nats (after fit()).

Type:

float

Raises:

ValueError – If any observation is non-positive.

score_attr_: str | None = 'entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.MVNEntropy[source]#

Bases: MVInfoTheoryEstimator

Differential entropy under a multivariate Gaussian assumption.

For covariance \(C\), \(H = \frac{1}{2}\log\det(C) + \frac{M}{2}\log(2\pi e)\).

Parameters:

None

entropy_#

Fitted differential entropy in nats (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import MVNEntropy
>>> rng = np.random.default_rng(42)
>>> A = rng.normal(size=(500, 3))
>>> est = MVNEntropy().fit(A)
>>> np.isfinite(est.score())
True
score_attr_: str | None = 'entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.MVNMutualInformation[source]#

Bases: MVInfoTheoryEstimator

Mutual information under a multivariate Gaussian assumption.

\(I(X;Y) = H(Y) - H(Y|X)\) with Gaussian entropy and conditional entropy.

Parameters:

None

mutual_information_#

Fitted \(I(X;Y)\) in nats (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import MVNMutualInformation
>>> rng = np.random.default_rng(42)
>>> X = rng.normal(size=(500, 2))
>>> y = X[:, :1] + 0.3 * rng.normal(size=(500, 1))
>>> est = MVNMutualInformation().fit(X, y)
>>> np.isfinite(est.mutual_information_)
True
score_attr_: str | None = 'mutual_information_'#
fit(X, y)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._continuous.MVParetoEntropy[source]#

Bases: MVInfoTheoryEstimator

Placeholder for a future multivariate Pareto entropy estimator.

fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

score(X=None, y=None)[source]#

Return the estimator’s primary fitted scalar quantity.

If additional positional/keyword arguments are passed, refits the estimator on that data and then returns the score.

xyz._discrete#

class xyz._discrete.DiscreteInfoTheoryEstimator[source]#

Bases: InfoTheoryMixin, InfoTheoryEstimator, ABC

Base class for discrete (binned) information-theoretic estimators.

Subclasses operate on discretized time series and use histogram-based entropy and conditional entropy to compute transfer entropy, partial transfer entropy, or self-entropy.

class xyz._discrete.DiscretePartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, c: int = 8, quantize: bool = True, extra_conditioning: str | None = None)[source]#

Bases: DiscreteInfoTheoryEstimator

Discrete partial transfer entropy (conditioning on side variables).

Estimates PTE from driver to target conditioned on conditioning_indices using binned entropy, i.e. the information transfer excluding that explained by the conditioning variable(s).

Parameters:
  • driver_indices (array-like) – Column index(es) of the driver.

  • target_indices (array-like) – Column index(es) of the target.

  • conditioning_indices (array-like) – Column index(es) to condition on.

  • lags (int, optional) – Number of past lags. Default is 1.

  • tau (int, optional) – Lag step. Default is 1.

  • delay (int, optional) – Delay from driver to target. Default is 1.

  • c (int, optional) – Number of bins. Default is 8.

  • quantize (bool, optional) – If True, bin continuous data. Default is True.

  • extra_conditioning (str or None, optional) – Optional extra conditioning. Default is None.

transfer_entropy_#

Fitted partial transfer entropy (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import DiscretePartialTransferEntropy
>>> rng = np.random.default_rng(42)
>>> X = np.column_stack([rng.rand(200), rng.rand(200), rng.rand(200)])  # target, driver, conditioning
>>> est = DiscretePartialTransferEntropy(
...     driver_indices=[1], target_indices=[0], conditioning_indices=[2],
...     lags=1, c=6, quantize=True,
... )
>>> est.fit(X)
>>> np.isfinite(est.transfer_entropy_)
True
score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._discrete.DiscreteSelfEntropy(target_indices, lags: int = 1, tau: int = 1, c: int = 8, quantize: bool = True)[source]#

Bases: DiscreteInfoTheoryEstimator

Discrete information storage (self-entropy).

Estimates \(S_Y = H(Y_t) - H(Y_t | Y_{t-1:t-l})\), the information in the target’s past about its present, using binned entropy.

Parameters:
  • target_indices (array-like) – Column index(es) of the target variable.

  • lags (int, optional) – Number of past lags. Default is 1.

  • tau (int, optional) – Lag step. Default is 1.

  • c (int, optional) – Number of bins. Default is 8.

  • quantize (bool, optional) – If True, bin continuous data. Default is True.

self_entropy_#

Fitted self-entropy (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import DiscreteSelfEntropy
>>> rng = np.random.default_rng(42)
>>> X = rng.rand(200, 1)  # single series
>>> est = DiscreteSelfEntropy(target_indices=[0], lags=2, c=6, quantize=True)
>>> est.fit(X)
>>> np.isfinite(est.self_entropy_)
True
score_attr_: str | None = 'self_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

class xyz._discrete.DiscreteTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, c: int = 8, quantize: bool = True, extra_conditioning: str | None = None)[source]#

Bases: DiscreteInfoTheoryEstimator

Discrete (binned) bivariate transfer entropy.

Estimates \(TE_{X \to Y} = H(Y_t | Y_{t-1:t-l}) - H(Y_t | Y_{t-1:t-l}, X_{t-d:t-d-l})\) using histogram-based entropy after binning into c bins.

Parameters:
  • driver_indices (array-like) – Column index(es) of the driver variable(s).

  • target_indices (array-like) – Column index(es) of the target variable(s).

  • lags (int, optional) – Number of past lags. Default is 1.

  • tau (int, optional) – Lag step (samples). Default is 1.

  • delay (int, optional) – Delay from driver to target. Default is 1.

  • c (int, optional) – Number of bins for discretization. Default is 8.

  • quantize (bool, optional) – If True, bin continuous data; if False, assume input is already discrete. Default is True.

  • extra_conditioning (str or None, optional) – Optional extra conditioning (e.g. Faes method). Default is None.

transfer_entropy_#

Fitted transfer entropy estimate (after fit()).

Type:

float

Examples

>>> import numpy as np
>>> from xyz import DiscreteTransferEntropy
>>> rng = np.random.default_rng(42)
>>> X = np.column_stack([rng.rand(200), rng.rand(200)])  # (n_samples, 2): target, driver
>>> est = DiscreteTransferEntropy(driver_indices=[1], target_indices=[0], lags=1, c=6, quantize=True)
>>> est.fit(X)
>>> np.isfinite(est.transfer_entropy_)
True
score_attr_: str | None = 'transfer_entropy_'#
fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

xyz.univariate#

xyz.univariate.entropy_linear(A: ndarray) float[source]#

Linear Gaussian estimate of differential (Shannon) entropy.

Assumes the data are multivariate Gaussian. For covariance \(C\), the differential entropy in nats is:

\[H = \frac{1}{2} \log \det(C) + \frac{M}{2} \log(2\pi e)\]

where \(M\) is the number of variables.

Parameters:

A (np.ndarray) – Multivariate data, shape (n_samples, n_features) (N×M).

Returns:

Estimated differential entropy in nats.

Return type:

float

Examples

>>> import numpy as np
>>> from xyz.univariate import entropy_linear
>>> rng = np.random.default_rng(42)
>>> A = rng.normal(size=(500, 3))
>>> h = entropy_linear(A)
>>> np.isfinite(h)
True
xyz.univariate.entropy_kernel(Y: ndarray, r: float, metric: str = 'chebyshev') float[source]#

Kernel (step-kernel) estimate of differential entropy.

Uses the mean log-probability of pairs within radius \(r\) under the chosen distance. By default uses a step kernel with Chebyshev (max-norm) distance.

Parameters:
  • Y (np.ndarray) – Data, shape (n_samples, n_features).

  • r (float) – Radius for the step kernel.

  • metric (str, optional) – Distance metric for pairwise distances (e.g. "chebyshev" or "euclidean"). Default is "chebyshev".

Returns:

Estimated differential entropy in nats.

Return type:

float

Examples

>>> import numpy as np
>>> from xyz.univariate import entropy_kernel
>>> rng = np.random.default_rng(42)
>>> Y = rng.normal(size=(500, 2))
>>> h = entropy_kernel(Y, 0.1)
>>> np.isfinite(h)
True
xyz.univariate.entropy_binning(Y, c, quantize, log_base: str = 'nat')[source]#

Binning (histogram) estimate of Shannon entropy.

Discretizes each column into c bins and computes entropy from the empirical distribution. If quantize is False, data are binned by equal-width bins; otherwise Y is assumed already quantized.

Parameters:
  • Y (array-like) – Data matrix, shape (n_samples, n_features).

  • c (int) – Number of bins per dimension.

  • quantize (bool) – If True, treat Y as already quantized (values in 0..c-1). If False, bin continuous values with xyz.utils.quantize().

  • log_base (str, optional) – Logarithm base; currently only "nat" is supported.

Returns:

Estimated entropy (implementation may return from internal state).

Return type:

float

Examples

Used internally by discrete estimators; for continuous entropy prefer xyz.KSGEntropy or xyz.MVNEntropy.

xyz.utils#

xyz.utils.quantize(y, c)[source]#

Bin continuous values into c equal-width bins.

Uses the range of y to define bin edges; returns 1-based bin indices (as from numpy.digitize()).

Parameters:
  • y (array-like) – 1D array of continuous values.

  • c (int) – Number of bins.

Returns:

Integer bin index for each element (1 to c).

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> from xyz.utils import quantize
>>> y = np.array([0.1, 0.5, 1.0, 1.5, 2.0])
>>> quantize(y, 3)
array([1, 1, 2, 2, 3])
xyz.utils.cov(X)[source]#

Compute the sample covariance matrix, always as a 2D array.

Parameters:

X (array-like) – Data, shape (n_samples,) or (n_samples, n_features). For 1D input, treated as a single feature.

Returns:

Covariance matrix, shape (n_features, n_features) (at least 2D).

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> from xyz.utils import cov
>>> X = np.random.randn(100, 2)
>>> C = cov(X)
>>> C.shape
(2, 2)
xyz.utils.buildvectors(Y, j, V=None)[source]#

Build observation matrix for entropy computation from lagged variables.

First column is the current target \(Y_{\cdot,j}\); subsequent columns are lagged variables specified by V (variable index, lag).

Parameters:
  • Y (np.ndarray) – Multivariate time series, shape (n_samples, n_features) (N×M).

  • j (int) – Column index of the target variable (0-based).

  • V (np.ndarray or None, optional) – Candidate lags, shape (n_candidates, 2). Column 0: variable index (0-based); column 1: lag in samples. If None, returns only the target column.

Returns:

Matrix with current target as first column and lagged variables as subsequent columns. Rows start at index \(L_{max}\) so all lags are valid.

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> from xyz.utils import buildvectors
>>> Y = np.random.randn(100, 3)
>>> # Target column 1, with lags: (var 0, lag 1), (var 2, lag 2)
>>> B = buildvectors(Y, 1, np.array([[0, 1], [2, 2]]))
>>> B.shape[1]
3

xyz.base#

class xyz.base.InfoTheoryEstimator[source]#

Bases: BaseEstimator, ABC

Base class for all information-theoretic estimators in xyz.

Subclasses follow scikit-learn conventions: parameter-only __init__, fit(...) -> self, fitted attributes with a trailing underscore, and a stable score method that returns the primary scalar quantity (e.g. entropy, mutual information, transfer entropy).

score_attr_#

Name of the fitted attribute used as the default score.

Type:

str or None

Examples

Concrete estimators (e.g. xyz.KSGMutualInformation) are fitted on data and expose the estimate via score():

>>> import numpy as np
>>> from xyz import KSGMutualInformation
>>> rng = np.random.default_rng(42)
>>> X = rng.normal(size=(500, 1))
>>> y = 0.7 * X + 0.3 * rng.normal(size=(500, 1))
>>> est = KSGMutualInformation(k=3).fit(X, y)
>>> mi = est.score()
>>> np.isfinite(mi)
True
score_attr_: str | None = None#
class xyz.base.InfoTheoryMixin[source]#

Bases: ABC

Mixin defining the minimal estimator protocol used across the package.

Requires fit(); score() may optionally refit when given data.

abstractmethod fit(X, y=None)[source]#

Fit estimator-specific internal state from data.

Parameters:
  • X (array-like) – Input data (e.g. driver/target time series or observations).

  • y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).

Returns:

Fitted estimator.

Return type:

self

score(*args, **kwargs)[source]#

Return the estimator’s primary fitted scalar quantity.

If additional positional/keyword arguments are passed, refits the estimator on that data and then returns the score.

xyz.preprocessing#

xyz.preprocessing.as_2d_array(X) ndarray[source]#

Coerce input to a 2D array with shape (n_samples, n_features).

Parameters:

X (array-like) – 1D or 2D array.

Returns:

Shape (n_samples, n_features); 1D input becomes (n, 1).

Return type:

np.ndarray

Raises:

ValueError – If X.ndim is not 1 or 2.

Examples

>>> import numpy as np
>>> from xyz.preprocessing import as_2d_array
>>> as_2d_array(np.array([1, 2, 3])).shape
(3, 1)
>>> as_2d_array(np.random.randn(10, 2)).shape
(10, 2)
xyz.preprocessing.as_trial_array(X) ndarray[source]#

Coerce input to trial format (n_trials, n_samples, n_features).

Parameters:

X (array-like) – 1D, 2D, or 3D array. 1D → (1, n, 1); 2D → (1, n_samples, n_features).

Returns:

Shape (n_trials, n_samples, n_features).

Return type:

np.ndarray

Raises:

ValueError – If X.ndim is not 1, 2, or 3.

Examples

>>> import numpy as np
>>> from xyz.preprocessing import as_trial_array
>>> X = np.random.randn(50, 2)
>>> T = as_trial_array(X)
>>> T.shape
(1, 50, 2)
xyz.preprocessing.iter_trials(X)[source]#

Iterate over trials, yielding arrays of shape (n_samples, n_features).

Parameters:

X (array-like) – Data in any format accepted by as_trial_array().

Yields:

np.ndarray – One array per trial.

Examples

>>> import numpy as np
>>> from xyz.preprocessing import iter_trials
>>> X = np.random.randn(2, 30, 2)  # 2 trials
>>> for trial in iter_trials(X):
...     print(trial.shape)
(30, 2)
(30, 2)
xyz.preprocessing.estimate_autocorrelation_decay(x, max_lag: int = 1000, threshold: float | None = None) int[source]#

Estimate autocorrelation decay time (ACT) in samples.

Returns the first positive lag \(\tau\) at which the normalized autocorrelation \(r(\tau) \le\) threshold. Default threshold is \(e^{-1}\), a common proxy for the decay time.

Parameters:
  • x (array-like) – 1D time series.

  • max_lag (int, optional) – Maximum lag to consider. Default is 1000.

  • threshold (float or None, optional) – Stop when autocorrelation falls below this. Default is \(e^{-1}\).

Returns:

Estimated ACT (samples).

Return type:

int

Examples

>>> import numpy as np
>>> from xyz.preprocessing import estimate_autocorrelation_decay
>>> x = np.cumsum(np.random.randn(500))
>>> act = estimate_autocorrelation_decay(x, max_lag=100)
>>> 1 <= act <= 101
True
xyz.preprocessing.estimate_trial_acts(X, target_index: int, max_lag: int = 1000) ndarray[source]#

Estimate autocorrelation decay time per trial for one target column.

Parameters:
Returns:

One ACT value per trial, shape (n_trials,).

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> from xyz.preprocessing import as_trial_array, estimate_trial_acts
>>> X = np.random.randn(3, 200, 2)
>>> acts = estimate_trial_acts(X, target_index=0, max_lag=50)
>>> acts.shape
(3,)
xyz.preprocessing.select_trials_by_act(X, target_index: int, *, max_lag: int = 1000, act_threshold: int | None = None, min_trials: int = 1) tuple[ndarray, ndarray][source]#

Select trials whose autocorrelation decay time is below a threshold.

Parameters:
  • X (array-like) – Data in trial format.

  • target_index (int) – Column index of the target for ACT estimation.

  • max_lag (int, optional) – Max lag for ACT. Default is 1000.

  • act_threshold (int or None, optional) – Keep only trials with ACT <= this. If None, no filtering (all trials returned).

  • min_trials (int, optional) – Minimum number of trials that must remain after filtering. Default is 1.

Returns:

  • selected_trials (np.ndarray) – Subset of trials with ACT <= act_threshold (or all if act_threshold is None).

  • acts (np.ndarray) – ACT value for each original trial.

Raises:

ValueError – If filtering would leave fewer than min_trials.

Examples

>>> import numpy as np
>>> from xyz.preprocessing import as_trial_array, select_trials_by_act
>>> X = np.random.randn(4, 300, 2)
>>> trials, acts = select_trials_by_act(X, 0, act_threshold=50, min_trials=2)
>>> trials.shape[0] <= 4 and len(acts) == 4
True
xyz.preprocessing.build_te_observations(X, *, target_index: int, lags: int, tau: int = 1, delay: int = 1, driver_index: int | None = None, driver_indices: Iterable[int] | None = None, conditioning_indices: Iterable[int] | None = None, extra_conditioning: str | None = None) dict[str, ndarray][source]#

Build transfer-entropy state-space matrices from trial data.

Constructs present/past blocks for target, driver(s), and optional conditioning variables across trials. Used internally by TE estimators.

Parameters:
  • X (array-like) – Data in trial format (n_trials, n_samples, n_features).

  • target_index (int) – Column index of the target.

  • lags (int) – Number of past lags (embedding dimension).

  • tau (int, optional) – Lag step (samples). Default is 1.

  • delay (int, optional) – Delay from driver to target (samples). Default is 1.

  • driver_index (int or iterable of int, optional) – Driver column index(s).

  • driver_indices (int or iterable of int, optional) – Driver column index(s).

  • conditioning_indices (iterable of int or None, optional) – Column indices for conditioning (e.g. for PTE).

  • extra_conditioning (str or None, optional) – If "Faes_Method" or "faes", include current driver in conditioning.

Returns:

Keys: "y_present", "y_past", "x_past", "z_past", "faes_current", "trial_ids". Values are concatenated over trials.

Return type:

dict

Raises:

ValueError – If lags, tau, or delay < 1, or no valid samples remain.

Examples

>>> import numpy as np
>>> from xyz.preprocessing import build_te_observations, as_trial_array
>>> X = np.random.randn(1, 200, 2)
>>> out = build_te_observations(X, target_index=0, lags=2, driver_index=1)
>>> out["y_present"].shape[0] == out["x_past"].shape[0]
True
xyz.preprocessing.ragwitz_prediction_error(x, *, dim: int, tau: int, k_neighbors: int = 4, theiler_t: int = 0, prediction_horizon: int = 1, metric: str = 'chebyshev') float[source]#

Ragwitz criterion: local prediction error for embedding (dim, tau).

Embeds the 1D series with dimension dim and spacing tau, finds \(k\) nearest neighbors in embedding space, and returns the mean squared error of predicting the future value from neighbors (Theiler window and metric as specified).

Parameters:
  • x (array-like) – 1D time series.

  • dim (int) – Embedding dimension.

  • tau (int) – Embedding delay (samples).

  • k_neighbors (int, optional) – Number of neighbors for local prediction. Default is 4.

  • theiler_t (int, optional) – Theiler window (exclude neighbors within this time index). Default is 0.

  • prediction_horizon (int, optional) – Steps ahead to predict. Default is 1.

  • metric (str, optional) – Distance metric (e.g. "chebyshev", "euclidean"). Default is "chebyshev".

Returns:

Mean squared prediction error.

Return type:

float

Raises:

ValueError – If dim, tau, or prediction_horizon < 1, or series too short.

Examples

>>> import numpy as np
>>> from xyz.preprocessing import ragwitz_prediction_error
>>> rng = np.random.default_rng(7)
>>> x = np.cumsum(rng.normal(size=300))
>>> err = ragwitz_prediction_error(x, dim=2, tau=1, k_neighbors=4)
>>> err >= 0
True

xyz.model_selection#

class xyz.model_selection.RagwitzEmbeddingSearchCV(estimator, *, target_index: int, dimensions=(1, 2, 3), taus=(1, 2, 3), k_neighbors: int = 4, theiler_t: int = 0, prediction_horizon: int = 1, metric: str = 'chebyshev', act_threshold: int | None = None, max_act_lag: int = 1000, min_trials: int = 1, refit: bool = True, n_jobs: int | None = 1)[source]#

Bases: MetaEstimatorMixin, BaseEstimator

Search embedding (dimension, tau) using the Ragwitz prediction-error criterion.

Evaluates (dim, tau) candidates via ragwitz_prediction_error() and selects the pair that minimizes mean prediction error across trials. Optionally filters trials by autocorrelation decay time (ACT).

Parameters:
  • estimator (object) – TE estimator to tune (e.g. xyz.KSGTransferEntropy).

  • target_index (int) – Column index of the target variable.

  • dimensions (tuple of int, optional) – Embedding dimensions to try. Default is (1, 2, 3).

  • taus (tuple of int, optional) – Embedding delays (samples) to try. Default is (1, 2, 3).

  • k_neighbors (int, optional) – k for local prediction in Ragwitz criterion. Default is 4.

  • theiler_t (int, optional) – Theiler window. Default is 0.

  • prediction_horizon (int, optional) – Steps ahead for prediction. Default is 1.

  • metric (str, optional) – Distance metric. Default is "chebyshev".

  • act_threshold (int or None, optional) – If set, keep only trials with ACT <= this. Default is None.

  • max_act_lag (int, optional) – Max lag for ACT estimation. Default is 1000.

  • min_trials (int, optional) – Minimum trials after ACT filtering. Default is 1.

  • refit (bool, optional) – If True, fit best_estimator_ with best params. Default is True.

  • n_jobs (int or None, optional) – Parallel jobs. Default is 1.

best_params_#

Best lags and tau.

Type:

dict

best_score_#

Best criterion score (negative prediction error).

Type:

float

best_estimator_#

Fitted estimator with best params (if refit=True).

Type:

estimator

cv_results_#

params, mean_test_score, mean_prediction_error.

Type:

dict

Examples

>>> import numpy as np
>>> from xyz import KSGTransferEntropy, RagwitzEmbeddingSearchCV
>>> rng = np.random.default_rng(7)
>>> trials = []
>>> for _ in range(4):
...     driver = rng.normal(size=250)
...     target = np.zeros(250)
...     for t in range(3, 250):
...         target[t] = 0.55 * target[t-1] + 0.25 * target[t-3] + 0.2 * driver[t-1] + 0.1 * rng.normal()
...     trials.append(np.column_stack([target, driver]))
>>> X = np.stack(trials)
>>> search = RagwitzEmbeddingSearchCV(
...     KSGTransferEntropy(driver_indices=[1], target_indices=[0], k=3),
...     target_index=0, dimensions=(1, 2, 3), taus=(1, 2),
... ).fit(X)
>>> "lags" in search.best_params_ and "tau" in search.best_params_
True
fit(X, y=None)[source]#
score(X=None, y=None)[source]#
class xyz.model_selection.InteractionDelaySearchCV(estimator, *, delays, refit: bool = True, tie_break: str = 'smallest', n_jobs: int | None = 1)[source]#

Bases: MetaEstimatorMixin, BaseEstimator

Search interaction delay for a TE estimator over a set of candidate delays.

Fits the estimator for each delay and selects the delay that maximizes the TE score (or minimizes, depending on estimator). Optionally refits the best estimator.

Parameters:
  • estimator (object) – TE estimator with a delay parameter.

  • delays (array-like) – Candidate delay values (samples) to try.

  • refit (bool, optional) – If True, fit best_estimator_ with best delay. Default is True.

  • tie_break (str, optional) – "smallest" or "largest" when multiple delays tie. Default is "smallest".

  • n_jobs (int or None, optional) – Parallel jobs. Default is 1.

best_delay_#

Selected delay.

Type:

int

best_score_#

TE score at best delay.

Type:

float

best_estimator_#

Fitted estimator at best delay (if refit=True).

Type:

estimator

delay_curve_#

Mapping delay -> score.

Type:

dict

cv_results_#

params, mean_test_score.

Type:

dict

fit(X, y=None)[source]#
score(X=None, y=None)[source]#
class xyz.model_selection.EnsembleTransferEntropy(estimator)[source]#

Bases: MetaEstimatorMixin, BaseEstimator

Wrapper that fits a TE estimator on multi-trial data.

Passes trial-shaped data to the underlying estimator so it can respect trial boundaries (e.g. for KSG within-trial neighbor search).

Parameters:

estimator (object) – TE estimator with fit(X, y=None) and score().

estimator_#

Fitted clone of the wrapped estimator.

Type:

estimator

score_#

TE score from the fitted estimator.

Type:

float

Examples

>>> import numpy as np
>>> from xyz import EnsembleTransferEntropy, KSGTransferEntropy
>>> X = np.random.randn(3, 200, 2)  # 3 trials
>>> meta = EnsembleTransferEntropy(
...     KSGTransferEntropy(driver_indices=[1], target_indices=[0], lags=1),
... ).fit(X)
>>> np.isfinite(meta.score())
True
fit(X, y=None)[source]#
score(X=None, y=None)[source]#
class xyz.model_selection.GroupTEAnalysis(estimator, *, target_index: int, dimensions=(1, 2, 3), taus=(1, 2, 3), aggregation: str = 'mean')[source]#

Bases: MetaEstimatorMixin, BaseEstimator

Group-level TE: Ragwitz search per subject, then common embedding and aggregate.

For each dataset in datasets, runs RagwitzEmbeddingSearchCV to find best (lags, tau). Then takes the maximum dimension and tau across subjects, refits each subject with that common embedding, and aggregates scores (mean or median).

Parameters:
  • estimator (object) – TE estimator to use (e.g. xyz.KSGTransferEntropy).

  • target_index (int) – Column index of the target.

  • dimensions (tuple of int, optional) – Ragwitz dimension candidates. Default is (1, 2, 3).

  • taus (tuple of int, optional) – Ragwitz tau candidates. Default is (1, 2, 3).

  • aggregation (str, optional) – "mean" or "median" for group score. Default is "mean".

common_params_#

Common lags and tau used for all subjects.

Type:

dict

subject_scores_#

TE score per subject.

Type:

np.ndarray

group_score_#

Aggregated (mean or median) group score.

Type:

float

embedding_searches_#

RagwitzEmbeddingSearchCV result per subject.

Type:

list

fit(datasets, y=None)[source]#
score(X=None, y=None)[source]#
set_fit_request(*, datasets: bool | None | str = '$UNCHANGED$') GroupTEAnalysis#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

datasets (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for datasets parameter in fit.

Returns:

self – The updated object.

Return type:

object

class xyz.model_selection.GreedySourceSelectionTransferEntropy(estimator, *, candidate_sources, max_sources: int | None = None, min_improvement: float = 0.0, n_jobs: int | None = 1, refit: bool = True)[source]#

Bases: MetaEstimatorMixin, BaseEstimator

Greedy forward selection of driver sources for partial TE.

Starts from the estimator’s existing conditioning set and adds driver sources one at a time from candidate_sources, keeping the one that increases the TE score most. Stops when no improvement or max_sources reached.

Parameters:
  • estimator (object) – Partial TE estimator with driver_indices and conditioning_indices.

  • candidate_sources (array-like) – Column indices of candidate driver sources.

  • max_sources (int or None, optional) – Maximum number of sources to add. None = no limit. Default is None.

  • min_improvement (float, optional) – Stop if improvement is <= this. Default is 0.0.

  • n_jobs (int or None, optional) – Parallel jobs for evaluating candidate sets. Default is 1.

  • refit (bool, optional) – If True, best_estimator_ is fitted with selected sources. Default is True.

selected_sources_#

Indices of selected driver sources.

Type:

list of int

selection_history_#

History of (sources, score) along the greedy path.

Type:

list

best_estimator_#

Fitted estimator with selected sources (if refit=True).

Type:

estimator

fit(X, y=None)[source]#
score(X=None, y=None)[source]#

xyz.stats#

xyz.stats.fdr_bh(p_values, alpha: float = 0.05) ndarray[source]#

Benjamini–Hochberg false-discovery-rate correction.

Rejects hypotheses with p-value \(\le\) the adaptive threshold so that the expected FDR is controlled at level alpha.

Parameters:
  • p_values (array-like) – P-values (any shape).

  • alpha (float, optional) – Target FDR level. Default is 0.05.

Returns:

Boolean array of same shape as p_values: True where the null is rejected.

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> from xyz.stats import fdr_bh
>>> p = np.array([0.001, 0.02, 0.04, 0.15])
>>> fdr_bh(p, alpha=0.05)
array([ True,  True,  True, False])
xyz.stats.bonferroni(p_values, alpha: float = 0.05) ndarray[source]#

Bonferroni correction for multiple testing.

Rejects where \(p_i \le \alpha / m\) with \(m\) the number of tests. Controls family-wise error rate at level alpha.

Parameters:
  • p_values (array-like) – P-values (any shape).

  • alpha (float, optional) – Family-wise error rate. Default is 0.05.

Returns:

Boolean array: True where the null is rejected.

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> from xyz.stats import bonferroni
>>> p = np.array([0.001, 0.02, 0.04])
>>> bonferroni(p, alpha=0.05)
array([ True, False, False])
xyz.stats.generate_surrogates(X, *, method: str = 'trial_shuffle', n_surrogates: int = 100, block_length: int | None = None, random_state=None, driver_index: int | None = None) list[ndarray][source]#

Generate surrogate datasets for transfer-entropy null testing.

Surrogates break the driver–target relationship while preserving marginal structure. Used with SurrogatePermutationTest to assess significance.

Parameters:
  • X (array-like) – Data, shape (n_trials, n_samples, n_features) or equivalent (see xyz.preprocessing.as_trial_array()).

  • method (str, optional) – One of "trial_shuffle" (shuffle driver across trials), "block_resample", "block_reverse", "swap_neighbors", "time_shift". Default is "trial_shuffle".

  • n_surrogates (int, optional) – Number of surrogate datasets. Default is 100.

  • block_length (int or None, optional) – Used by block_* and time_shift methods. Default is None.

  • random_state (int, array-like or None, optional) – Random seed or generator.

  • driver_index (int or None, optional) – Column index of the driver variable (for methods that permute the driver). Default is 0 if None.

Returns:

List of surrogate arrays; each has the same shape as the trial representation of X.

Return type:

list of np.ndarray

Examples

>>> import numpy as np
>>> from xyz import generate_surrogates
>>> rng = np.random.default_rng(5)
>>> X = rng.normal(size=(3, 40, 2))
>>> surrogates = generate_surrogates(X, method="trial_shuffle", n_surrogates=5, random_state=0, driver_index=1)
>>> len(surrogates)
5
>>> surrogates[0].shape == X.shape
True
class xyz.stats.BootstrapEstimate(estimator, *, n_bootstrap: int = 100, method: str = 'iid', block_length: int | None = None, ci: float = 0.95, random_state=None, n_jobs: int | None = 1)[source]#

Bases: MetaEstimatorMixin, BaseEstimator

Bootstrap confidence intervals for information-theoretic estimators.

Fits the wrapped estimator on the original data and on bootstrap resamples to obtain a distribution of the score and a confidence interval.

Parameters:
  • estimator (object) – An xyz estimator with fit and score (e.g. transfer entropy, mutual information).

  • n_bootstrap (int, optional) – Number of bootstrap samples. Default is 100.

  • method (str, optional) – Resampling: "iid" (sample rows with replacement), "trial" (resample trials), "block" (block bootstrap within trials). Default is "iid".

  • block_length (int or None, optional) – Block length for method="block". Default is None (auto).

  • ci (float, optional) – Confidence level for the interval (e.g. 0.95). Default is 0.95.

  • random_state (int, array-like or None, optional) – Random seed or generator.

  • n_jobs (int or None, optional) – Number of parallel jobs. Default is 1.

estimate_#

Point estimate from the original data.

Type:

float

ci_low_, ci_high_

Lower and upper bounds of the confidence interval.

Type:

float

bootstrap_distribution_#

Bootstrap scores.

Type:

np.ndarray

standard_error_#

Standard error of the bootstrap distribution.

Type:

float

Examples

>>> import numpy as np
>>> from xyz import BootstrapEstimate, GaussianCopulaMutualInformation
>>> rng = np.random.default_rng(404)
>>> x = rng.normal(size=(500, 1))
>>> y = 0.6 * x + 0.3 * rng.normal(size=(500, 1))
>>> bootstrap = BootstrapEstimate(
...     GaussianCopulaMutualInformation(),
...     n_bootstrap=24, method="iid", random_state=0, n_jobs=2,
... ).fit(x, y)
>>> bootstrap.ci_low_ <= bootstrap.estimate_ <= bootstrap.ci_high_
True
fit(X, y=None)[source]#
score(X=None, y=None)[source]#
class xyz.stats.SurrogatePermutationTest(estimator, *, n_permutations: int = 100, surrogate_method: str = 'trial_shuffle', alpha: float = 0.05, correction: str = 'fdr_bh', shift_test: bool = False, shift_method: str = 'time_shift', random_state=None, n_jobs: int | None = 1)[source]#

Bases: MetaEstimatorMixin, BaseEstimator

Permutation-based significance testing for transfer-entropy estimators.

Fits the estimator on the observed data and on surrogate data (driver shuffled/perturbed) to build a null distribution and compute a p-value.

Parameters:
  • estimator (object) – TE estimator with fit and score (e.g. xyz.KSGTransferEntropy).

  • n_permutations (int, optional) – Number of surrogates. Default is 100.

  • surrogate_method (str, optional) – Method passed to generate_surrogates(). Default is "trial_shuffle".

  • alpha (float, optional) – Significance level. Default is 0.05.

  • correction (str or None, optional) – Multiple-test correction: "fdr_bh", "bonferroni", or "none". Default is "fdr_bh".

  • shift_test (bool, optional) – If True, also run a time-shift test. Default is False.

  • shift_method (str, optional) – Surrogate method for the shift test. Default is "time_shift".

  • random_state (int, array-like or None, optional) – Random seed or generator.

  • n_jobs (int or None, optional) – Number of parallel jobs. Default is 1.

observed_score_#

Score on the original data.

Type:

float

null_distribution_#

Scores on surrogates.

Type:

np.ndarray

p_values_#

P-value(s).

Type:

np.ndarray

significant_#

True where p_value <= alpha.

Type:

bool or np.ndarray

corrected_significant_#

After multiple-test correction.

Type:

bool or np.ndarray

Examples

>>> import numpy as np
>>> from xyz import SurrogatePermutationTest, KSGTransferEntropy
>>> rng = np.random.default_rng(42)
>>> trials = []
>>> for _ in range(4):
...     driver = rng.normal(size=120)
...     target = np.zeros(120)
...     for t in range(1, 120):
...         target[t] = 0.4 * target[t-1] + 0.5 * driver[t-1] + 0.1 * rng.normal()
...     trials.append(np.column_stack([target, driver]))
>>> X = np.stack(trials)
>>> test = SurrogatePermutationTest(
...     KSGTransferEntropy(driver_indices=[1], target_indices=[0], lags=1, k=3),
...     n_permutations=12, surrogate_method="trial_shuffle", alpha=0.1, random_state=0,
... ).fit(X)
>>> np.isfinite(test.observed_score_)
True
fit(X, y=None)[source]#
score(X=None, y=None)[source]#