Modules#
xyz._continuous#
- class xyz._continuous.DirectKSGConditionalMutualInformation(k: int = 3, metric: str = 'chebyshev')[source]#
Bases:
MVKSGInfoTheoryEstimatorDirect kNN conditional mutual information estimator.
- fit(X, y, Z)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- set_fit_request(*, Z: bool | None | str = '$UNCHANGED$') DirectKSGConditionalMutualInformation#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class xyz._continuous.GaussianCopulaConditionalMutualInformation[source]#
Bases:
MVInfoTheoryEstimatorConditional mutual information after a Gaussian-copula transform.
- fit(X, y, Z)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- set_fit_request(*, Z: bool | None | str = '$UNCHANGED$') GaussianCopulaConditionalMutualInformation#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class xyz._continuous.GaussianCopulaMutualInformation[source]#
Bases:
MVInfoTheoryEstimatorMutual information after a Gaussian-copula marginal transform.
Ranks each variable to uniforms then applies inverse Gaussian CDF; computes MI on the transformed data (nonparametric in marginals, Gaussian in dependence).
- Parameters:
None
Examples
>>> import numpy as np >>> from xyz import GaussianCopulaMutualInformation >>> rng = np.random.default_rng(404) >>> x = rng.normal(size=(500, 1)) >>> y = 0.6 * x + 0.3 * rng.normal(size=(500, 1)) >>> est = GaussianCopulaMutualInformation().fit(x, y) >>> est.mutual_information_ > 0 True
- fit(X, y)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.GaussianCopulaTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, extra_conditioning: str | None = None)[source]#
Bases:
_GaussianTEBaseTransfer entropy after a Gaussian-copula marginal transform.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.GaussianPartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, extra_conditioning: str | None = None)[source]#
Bases:
_GaussianTEBaseLinear-Gaussian partial transfer entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.GaussianSelfEntropy(target_indices, lags: int = 1, tau: int = 1)[source]#
Bases:
_GaussianTEBaseLinear-Gaussian information storage estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.GaussianTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, extra_conditioning: str | None = None)[source]#
Bases:
_GaussianTEBaseLinear-Gaussian bivariate transfer entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KSGEntropy(k: int = 3, metric: str = 'chebyshev')[source]#
Bases:
InfoTheoryMixin,InfoTheoryEstimatorKozachenko–Leonenko k-NN differential entropy estimator.
Estimates \(H(X)\) from the distance to the k-th nearest neighbor and the log-volume of the unit ball for the chosen metric.
- Parameters:
Examples
>>> import numpy as np >>> from xyz import KSGEntropy >>> rng = np.random.default_rng(42) >>> X = rng.normal(0, 2, (10000, 1)) >>> est = KSGEntropy(k=3).fit(X) >>> theoretical = 0.5 * np.log(2 * np.pi * np.e * 4) >>> abs(est.entropy_ - theoretical) < 0.1 True
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KSGMutualInformation(k: int = 3, algorithm: int = 1, metric: str = 'chebyshev')[source]#
Bases:
InfoTheoryMixin,InfoTheoryEstimatorKraskov–Stögbauer–Grassberger (KSG) k-NN mutual information estimator.
Estimates \(I(X;Y)\) from k-nearest neighbor distances in the joint and marginal spaces.
algorithm=1(default) uses the stricter radius;algorithm=2uses the larger radius (often more stable).- Parameters:
Examples
>>> import numpy as np >>> from xyz import KSGMutualInformation >>> rng = np.random.default_rng(42) >>> X = rng.normal(size=(1000, 1)) >>> y = rng.normal(size=(1000, 1)) >>> mi_ind = KSGMutualInformation(k=3).fit(X, y).score() >>> y_corr = 0.7 * X + 0.3 * rng.normal(size=(1000, 1)) >>> mi_corr = KSGMutualInformation(k=3).fit(X, y_corr).score() >>> mi_corr > mi_ind True
- fit(X, y)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KSGPartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, k: int = 3, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#
Bases:
_KSGTEBaseKSG partial transfer entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KSGSelfEntropy(target_indices, lags: int = 1, tau: int = 1, k: int = 3, metric: str = 'chebyshev')[source]#
Bases:
_KSGTEBaseKSG information storage / self-entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KSGTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, k: int = 3, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#
Bases:
_KSGTEBaseKSG (k-NN) bivariate transfer entropy estimator.
Estimates \(TE_{X \to Y}\) from time-series data using k-nearest neighbor conditional entropy. Expects array
Xwith shape(n_trials, n_samples, n_features)or 2D equivalent; driver and target are column indices.- Parameters:
driver_indices (array-like) – Column index(es) of the driver variable(s).
target_indices (array-like) – Column index(es) of the target variable(s).
lags (int, optional) – Number of past lags for embedding. Default is 1.
tau (int, optional) – Lag step (samples). Default is 1.
delay (int, optional) – Delay from driver to target. Default is 1.
k (int, optional) – Number of neighbors. Default is 3.
metric (str, optional) – Distance metric. Default is
"chebyshev".extra_conditioning (str or None, optional) – Optional extra conditioning (e.g.
"Faes_Method"). Default is None.
Examples
>>> import numpy as np >>> from xyz import KSGTransferEntropy >>> rng = np.random.default_rng(42) >>> trials = [] >>> for _ in range(4): ... driver = rng.normal(size=180) ... target = np.zeros(180) ... for t in range(1, 180): ... target[t] = 0.4 * target[t-1] + 0.5 * driver[t-1] + 0.1 * rng.normal() ... trials.append(np.column_stack([target, driver])) >>> X = np.stack(trials) >>> est = KSGTransferEntropy(driver_indices=[1], target_indices=[0], lags=1, k=3).fit(X) >>> est.transfer_entropy_ > 0 True
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KernelPartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, r: float = 0.5, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#
Bases:
_KernelTEBaseKernel partial transfer entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KernelSelfEntropy(target_indices, lags: int = 1, tau: int = 1, r: float = 0.5, metric: str = 'chebyshev')[source]#
Bases:
_KernelTEBaseKernel information storage / self-entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.KernelTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, r: float = 0.5, metric: str = 'chebyshev', extra_conditioning: str | None = None)[source]#
Bases:
_KernelTEBaseKernel bivariate transfer entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVCondEntropy[source]#
Bases:
MVInfoTheoryEstimatorConditional entropy \(H(Y|X)\) under a linear-Gaussian model.
Fits \(Y \approx X\beta\) and uses the residual covariance to compute \(H(Y|X)\) as the entropy of the residual.
- Parameters:
None
Examples
>>> import numpy as np >>> from xyz import MVCondEntropy >>> rng = np.random.default_rng(42) >>> X = rng.normal(size=(500, 2)) >>> y = X[:, :1] + 0.3 * rng.normal(size=(500, 1)) >>> est = MVCondEntropy().fit(X, y) >>> np.isfinite(est.conditional_entropy_) True
- fit(X, y)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVExponentialEntropy[source]#
Bases:
MVInfoTheoryEstimatorPlaceholder for a future multivariate exponential entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVKSGCondEntropy(k: int = 3, metric: str = 'chebyshev')[source]#
Bases:
MVKSGInfoTheoryEstimatorMultivariate conditional entropy \(H(Y|X)\) via KSG \(H(X,Y) - H(X)\).
- fit(X, y)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVKSGCondMutualInformation(k: int = 3, metric: str = 'chebyshev')[source]#
Bases:
MVKSGInfoTheoryEstimatorConditional mutual information
I(X;Y|Z)via KSG identities.- fit(X, y, Z)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- set_fit_request(*, Z: bool | None | str = '$UNCHANGED$') MVKSGCondMutualInformation#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class xyz._continuous.MVKSGPartialInformationDecomposition(k: int = 3, metric: str = 'chebyshev')[source]#
Bases:
MVKSGInfoTheoryEstimatorPartial information decomposition (PID) of two sources onto target via KSG MI.
Decomposes \(I(X_1,X_2; Y)\) into unique, redundant, and synergistic terms.
- fit(X1, X2, y)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- score(X1=None, X2=None, y=None)[source]#
Return the estimator’s primary fitted scalar quantity.
If additional positional/keyword arguments are passed, refits the estimator on that data and then returns the score.
- set_fit_request(*, X1: bool | None | str = '$UNCHANGED$', X2: bool | None | str = '$UNCHANGED$') MVKSGPartialInformationDecomposition#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- Returns:
self – The updated object.
- Return type:
- set_score_request(*, X1: bool | None | str = '$UNCHANGED$', X2: bool | None | str = '$UNCHANGED$') MVKSGPartialInformationDecomposition#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- Returns:
self – The updated object.
- Return type:
- class xyz._continuous.MVKSGTransferEntropy(k: int = 3, metric: str = 'chebyshev', lag: int = 1)[source]#
Bases:
MVKSGInfoTheoryEstimatorMultivariate transfer entropy as \(I(X_{t-\tau}; Y_t | Y_{t-\tau})\) (single lag).
- fit(X, y)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVLNEntropy[source]#
Bases:
MVInfoTheoryEstimatorDifferential entropy for log-normal observations.
Transforms \(X \mapsto \log X\) and uses Gaussian entropy plus \(\mathbb{E}[\log X]\) correction. Data must be strictly positive.
- Parameters:
None
- Raises:
ValueError – If any observation is non-positive.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVNEntropy[source]#
Bases:
MVInfoTheoryEstimatorDifferential entropy under a multivariate Gaussian assumption.
For covariance \(C\), \(H = \frac{1}{2}\log\det(C) + \frac{M}{2}\log(2\pi e)\).
- Parameters:
None
Examples
>>> import numpy as np >>> from xyz import MVNEntropy >>> rng = np.random.default_rng(42) >>> A = rng.normal(size=(500, 3)) >>> est = MVNEntropy().fit(A) >>> np.isfinite(est.score()) True
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVNMutualInformation[source]#
Bases:
MVInfoTheoryEstimatorMutual information under a multivariate Gaussian assumption.
\(I(X;Y) = H(Y) - H(Y|X)\) with Gaussian entropy and conditional entropy.
- Parameters:
None
Examples
>>> import numpy as np >>> from xyz import MVNMutualInformation >>> rng = np.random.default_rng(42) >>> X = rng.normal(size=(500, 2)) >>> y = X[:, :1] + 0.3 * rng.normal(size=(500, 1)) >>> est = MVNMutualInformation().fit(X, y) >>> np.isfinite(est.mutual_information_) True
- fit(X, y)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._continuous.MVParetoEntropy[source]#
Bases:
MVInfoTheoryEstimatorPlaceholder for a future multivariate Pareto entropy estimator.
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
xyz._discrete#
- class xyz._discrete.DiscreteInfoTheoryEstimator[source]#
Bases:
InfoTheoryMixin,InfoTheoryEstimator,ABCBase class for discrete (binned) information-theoretic estimators.
Subclasses operate on discretized time series and use histogram-based entropy and conditional entropy to compute transfer entropy, partial transfer entropy, or self-entropy.
- class xyz._discrete.DiscretePartialTransferEntropy(driver_indices, target_indices, conditioning_indices, lags: int = 1, tau: int = 1, delay: int = 1, c: int = 8, quantize: bool = True, extra_conditioning: str | None = None)[source]#
Bases:
DiscreteInfoTheoryEstimatorDiscrete partial transfer entropy (conditioning on side variables).
Estimates PTE from driver to target conditioned on
conditioning_indicesusing binned entropy, i.e. the information transfer excluding that explained by the conditioning variable(s).- Parameters:
driver_indices (array-like) – Column index(es) of the driver.
target_indices (array-like) – Column index(es) of the target.
conditioning_indices (array-like) – Column index(es) to condition on.
lags (int, optional) – Number of past lags. Default is 1.
tau (int, optional) – Lag step. Default is 1.
delay (int, optional) – Delay from driver to target. Default is 1.
c (int, optional) – Number of bins. Default is 8.
quantize (bool, optional) – If True, bin continuous data. Default is True.
extra_conditioning (str or None, optional) – Optional extra conditioning. Default is None.
Examples
>>> import numpy as np >>> from xyz import DiscretePartialTransferEntropy >>> rng = np.random.default_rng(42) >>> X = np.column_stack([rng.rand(200), rng.rand(200), rng.rand(200)]) # target, driver, conditioning >>> est = DiscretePartialTransferEntropy( ... driver_indices=[1], target_indices=[0], conditioning_indices=[2], ... lags=1, c=6, quantize=True, ... ) >>> est.fit(X) >>> np.isfinite(est.transfer_entropy_) True
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._discrete.DiscreteSelfEntropy(target_indices, lags: int = 1, tau: int = 1, c: int = 8, quantize: bool = True)[source]#
Bases:
DiscreteInfoTheoryEstimatorDiscrete information storage (self-entropy).
Estimates \(S_Y = H(Y_t) - H(Y_t | Y_{t-1:t-l})\), the information in the target’s past about its present, using binned entropy.
- Parameters:
Examples
>>> import numpy as np >>> from xyz import DiscreteSelfEntropy >>> rng = np.random.default_rng(42) >>> X = rng.rand(200, 1) # single series >>> est = DiscreteSelfEntropy(target_indices=[0], lags=2, c=6, quantize=True) >>> est.fit(X) >>> np.isfinite(est.self_entropy_) True
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
- class xyz._discrete.DiscreteTransferEntropy(driver_indices, target_indices, lags: int = 1, tau: int = 1, delay: int = 1, c: int = 8, quantize: bool = True, extra_conditioning: str | None = None)[source]#
Bases:
DiscreteInfoTheoryEstimatorDiscrete (binned) bivariate transfer entropy.
Estimates \(TE_{X \to Y} = H(Y_t | Y_{t-1:t-l}) - H(Y_t | Y_{t-1:t-l}, X_{t-d:t-d-l})\) using histogram-based entropy after binning into
cbins.- Parameters:
driver_indices (array-like) – Column index(es) of the driver variable(s).
target_indices (array-like) – Column index(es) of the target variable(s).
lags (int, optional) – Number of past lags. Default is 1.
tau (int, optional) – Lag step (samples). Default is 1.
delay (int, optional) – Delay from driver to target. Default is 1.
c (int, optional) – Number of bins for discretization. Default is 8.
quantize (bool, optional) – If True, bin continuous data; if False, assume input is already discrete. Default is True.
extra_conditioning (str or None, optional) – Optional extra conditioning (e.g. Faes method). Default is None.
Examples
>>> import numpy as np >>> from xyz import DiscreteTransferEntropy >>> rng = np.random.default_rng(42) >>> X = np.column_stack([rng.rand(200), rng.rand(200)]) # (n_samples, 2): target, driver >>> est = DiscreteTransferEntropy(driver_indices=[1], target_indices=[0], lags=1, c=6, quantize=True) >>> est.fit(X) >>> np.isfinite(est.transfer_entropy_) True
- fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
xyz.univariate#
- xyz.univariate.entropy_linear(A: ndarray) float[source]#
Linear Gaussian estimate of differential (Shannon) entropy.
Assumes the data are multivariate Gaussian. For covariance \(C\), the differential entropy in nats is:
\[H = \frac{1}{2} \log \det(C) + \frac{M}{2} \log(2\pi e)\]where \(M\) is the number of variables.
- Parameters:
A (np.ndarray) – Multivariate data, shape
(n_samples, n_features)(N×M).- Returns:
Estimated differential entropy in nats.
- Return type:
Examples
>>> import numpy as np >>> from xyz.univariate import entropy_linear >>> rng = np.random.default_rng(42) >>> A = rng.normal(size=(500, 3)) >>> h = entropy_linear(A) >>> np.isfinite(h) True
- xyz.univariate.entropy_kernel(Y: ndarray, r: float, metric: str = 'chebyshev') float[source]#
Kernel (step-kernel) estimate of differential entropy.
Uses the mean log-probability of pairs within radius \(r\) under the chosen distance. By default uses a step kernel with Chebyshev (max-norm) distance.
- Parameters:
- Returns:
Estimated differential entropy in nats.
- Return type:
Examples
>>> import numpy as np >>> from xyz.univariate import entropy_kernel >>> rng = np.random.default_rng(42) >>> Y = rng.normal(size=(500, 2)) >>> h = entropy_kernel(Y, 0.1) >>> np.isfinite(h) True
- xyz.univariate.entropy_binning(Y, c, quantize, log_base: str = 'nat')[source]#
Binning (histogram) estimate of Shannon entropy.
Discretizes each column into
cbins and computes entropy from the empirical distribution. Ifquantizeis False, data are binned by equal-width bins; otherwiseYis assumed already quantized.- Parameters:
Y (array-like) – Data matrix, shape
(n_samples, n_features).c (int) – Number of bins per dimension.
quantize (bool) – If True, treat
Yas already quantized (values in0..c-1). If False, bin continuous values withxyz.utils.quantize().log_base (str, optional) – Logarithm base; currently only
"nat"is supported.
- Returns:
Estimated entropy (implementation may return from internal state).
- Return type:
Examples
Used internally by discrete estimators; for continuous entropy prefer
xyz.KSGEntropyorxyz.MVNEntropy.
xyz.utils#
- xyz.utils.quantize(y, c)[source]#
Bin continuous values into
cequal-width bins.Uses the range of
yto define bin edges; returns 1-based bin indices (as fromnumpy.digitize()).- Parameters:
y (array-like) – 1D array of continuous values.
c (int) – Number of bins.
- Returns:
Integer bin index for each element (1 to c).
- Return type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz.utils import quantize >>> y = np.array([0.1, 0.5, 1.0, 1.5, 2.0]) >>> quantize(y, 3) array([1, 1, 2, 2, 3])
- xyz.utils.cov(X)[source]#
Compute the sample covariance matrix, always as a 2D array.
- Parameters:
X (array-like) – Data, shape
(n_samples,)or(n_samples, n_features). For 1D input, treated as a single feature.- Returns:
Covariance matrix, shape
(n_features, n_features)(at least 2D).- Return type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz.utils import cov >>> X = np.random.randn(100, 2) >>> C = cov(X) >>> C.shape (2, 2)
- xyz.utils.buildvectors(Y, j, V=None)[source]#
Build observation matrix for entropy computation from lagged variables.
First column is the current target \(Y_{\cdot,j}\); subsequent columns are lagged variables specified by
V(variable index, lag).- Parameters:
Y (np.ndarray) – Multivariate time series, shape
(n_samples, n_features)(N×M).j (int) – Column index of the target variable (0-based).
V (np.ndarray or None, optional) – Candidate lags, shape
(n_candidates, 2). Column 0: variable index (0-based); column 1: lag in samples. If None, returns only the target column.
- Returns:
Matrix with current target as first column and lagged variables as subsequent columns. Rows start at index \(L_{max}\) so all lags are valid.
- Return type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz.utils import buildvectors >>> Y = np.random.randn(100, 3) >>> # Target column 1, with lags: (var 0, lag 1), (var 2, lag 2) >>> B = buildvectors(Y, 1, np.array([[0, 1], [2, 2]])) >>> B.shape[1] 3
xyz.base#
- class xyz.base.InfoTheoryEstimator[source]#
Bases:
BaseEstimator,ABCBase class for all information-theoretic estimators in xyz.
Subclasses follow scikit-learn conventions: parameter-only
__init__,fit(...) -> self, fitted attributes with a trailing underscore, and a stablescoremethod that returns the primary scalar quantity (e.g. entropy, mutual information, transfer entropy).Examples
Concrete estimators (e.g.
xyz.KSGMutualInformation) are fitted on data and expose the estimate viascore():>>> import numpy as np >>> from xyz import KSGMutualInformation >>> rng = np.random.default_rng(42) >>> X = rng.normal(size=(500, 1)) >>> y = 0.7 * X + 0.3 * rng.normal(size=(500, 1)) >>> est = KSGMutualInformation(k=3).fit(X, y) >>> mi = est.score() >>> np.isfinite(mi) True
- class xyz.base.InfoTheoryMixin[source]#
Bases:
ABCMixin defining the minimal estimator protocol used across the package.
Requires
fit();score()may optionally refit when given data.- abstractmethod fit(X, y=None)[source]#
Fit estimator-specific internal state from data.
- Parameters:
X (array-like) – Input data (e.g. driver/target time series or observations).
y (array-like or None, optional) – Optional second argument (e.g. target for MI, or conditioning).
- Returns:
Fitted estimator.
- Return type:
self
xyz.preprocessing#
- xyz.preprocessing.as_2d_array(X) ndarray[source]#
Coerce input to a 2D array with shape (n_samples, n_features).
- Parameters:
X (array-like) – 1D or 2D array.
- Returns:
Shape
(n_samples, n_features); 1D input becomes(n, 1).- Return type:
np.ndarray
- Raises:
ValueError – If
X.ndimis not 1 or 2.
Examples
>>> import numpy as np >>> from xyz.preprocessing import as_2d_array >>> as_2d_array(np.array([1, 2, 3])).shape (3, 1) >>> as_2d_array(np.random.randn(10, 2)).shape (10, 2)
- xyz.preprocessing.as_trial_array(X) ndarray[source]#
Coerce input to trial format (n_trials, n_samples, n_features).
- Parameters:
X (array-like) – 1D, 2D, or 3D array. 1D → (1, n, 1); 2D → (1, n_samples, n_features).
- Returns:
Shape
(n_trials, n_samples, n_features).- Return type:
np.ndarray
- Raises:
ValueError – If
X.ndimis not 1, 2, or 3.
Examples
>>> import numpy as np >>> from xyz.preprocessing import as_trial_array >>> X = np.random.randn(50, 2) >>> T = as_trial_array(X) >>> T.shape (1, 50, 2)
- xyz.preprocessing.iter_trials(X)[source]#
Iterate over trials, yielding arrays of shape (n_samples, n_features).
- Parameters:
X (array-like) – Data in any format accepted by
as_trial_array().- Yields:
np.ndarray – One array per trial.
Examples
>>> import numpy as np >>> from xyz.preprocessing import iter_trials >>> X = np.random.randn(2, 30, 2) # 2 trials >>> for trial in iter_trials(X): ... print(trial.shape) (30, 2) (30, 2)
- xyz.preprocessing.estimate_autocorrelation_decay(x, max_lag: int = 1000, threshold: float | None = None) int[source]#
Estimate autocorrelation decay time (ACT) in samples.
Returns the first positive lag \(\tau\) at which the normalized autocorrelation \(r(\tau) \le\) threshold. Default threshold is \(e^{-1}\), a common proxy for the decay time.
- Parameters:
- Returns:
Estimated ACT (samples).
- Return type:
Examples
>>> import numpy as np >>> from xyz.preprocessing import estimate_autocorrelation_decay >>> x = np.cumsum(np.random.randn(500)) >>> act = estimate_autocorrelation_decay(x, max_lag=100) >>> 1 <= act <= 101 True
- xyz.preprocessing.estimate_trial_acts(X, target_index: int, max_lag: int = 1000) ndarray[source]#
Estimate autocorrelation decay time per trial for one target column.
- Parameters:
X (array-like) – Data in trial format (see
as_trial_array()).target_index (int) – Column index of the target variable.
max_lag (int, optional) – Maximum lag for
estimate_autocorrelation_decay(). Default is 1000.
- Returns:
One ACT value per trial, shape
(n_trials,).- Return type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz.preprocessing import as_trial_array, estimate_trial_acts >>> X = np.random.randn(3, 200, 2) >>> acts = estimate_trial_acts(X, target_index=0, max_lag=50) >>> acts.shape (3,)
- xyz.preprocessing.select_trials_by_act(X, target_index: int, *, max_lag: int = 1000, act_threshold: int | None = None, min_trials: int = 1) tuple[ndarray, ndarray][source]#
Select trials whose autocorrelation decay time is below a threshold.
- Parameters:
X (array-like) – Data in trial format.
target_index (int) – Column index of the target for ACT estimation.
max_lag (int, optional) – Max lag for ACT. Default is 1000.
act_threshold (int or None, optional) – Keep only trials with ACT <= this. If None, no filtering (all trials returned).
min_trials (int, optional) – Minimum number of trials that must remain after filtering. Default is 1.
- Returns:
selected_trials (np.ndarray) – Subset of trials with ACT <= act_threshold (or all if act_threshold is None).
acts (np.ndarray) – ACT value for each original trial.
- Raises:
ValueError – If filtering would leave fewer than min_trials.
Examples
>>> import numpy as np >>> from xyz.preprocessing import as_trial_array, select_trials_by_act >>> X = np.random.randn(4, 300, 2) >>> trials, acts = select_trials_by_act(X, 0, act_threshold=50, min_trials=2) >>> trials.shape[0] <= 4 and len(acts) == 4 True
- xyz.preprocessing.build_te_observations(X, *, target_index: int, lags: int, tau: int = 1, delay: int = 1, driver_index: int | None = None, driver_indices: Iterable[int] | None = None, conditioning_indices: Iterable[int] | None = None, extra_conditioning: str | None = None) dict[str, ndarray][source]#
Build transfer-entropy state-space matrices from trial data.
Constructs present/past blocks for target, driver(s), and optional conditioning variables across trials. Used internally by TE estimators.
- Parameters:
X (array-like) – Data in trial format (n_trials, n_samples, n_features).
target_index (int) – Column index of the target.
lags (int) – Number of past lags (embedding dimension).
tau (int, optional) – Lag step (samples). Default is 1.
delay (int, optional) – Delay from driver to target (samples). Default is 1.
driver_index (int or iterable of int, optional) – Driver column index(s).
driver_indices (int or iterable of int, optional) – Driver column index(s).
conditioning_indices (iterable of int or None, optional) – Column indices for conditioning (e.g. for PTE).
extra_conditioning (str or None, optional) – If
"Faes_Method"or"faes", include current driver in conditioning.
- Returns:
Keys:
"y_present","y_past","x_past","z_past","faes_current","trial_ids". Values are concatenated over trials.- Return type:
- Raises:
ValueError – If lags, tau, or delay < 1, or no valid samples remain.
Examples
>>> import numpy as np >>> from xyz.preprocessing import build_te_observations, as_trial_array >>> X = np.random.randn(1, 200, 2) >>> out = build_te_observations(X, target_index=0, lags=2, driver_index=1) >>> out["y_present"].shape[0] == out["x_past"].shape[0] True
- xyz.preprocessing.ragwitz_prediction_error(x, *, dim: int, tau: int, k_neighbors: int = 4, theiler_t: int = 0, prediction_horizon: int = 1, metric: str = 'chebyshev') float[source]#
Ragwitz criterion: local prediction error for embedding (dim, tau).
Embeds the 1D series with dimension
dimand spacingtau, finds \(k\) nearest neighbors in embedding space, and returns the mean squared error of predicting the future value from neighbors (Theiler window and metric as specified).- Parameters:
x (array-like) – 1D time series.
dim (int) – Embedding dimension.
tau (int) – Embedding delay (samples).
k_neighbors (int, optional) – Number of neighbors for local prediction. Default is 4.
theiler_t (int, optional) – Theiler window (exclude neighbors within this time index). Default is 0.
prediction_horizon (int, optional) – Steps ahead to predict. Default is 1.
metric (str, optional) – Distance metric (e.g.
"chebyshev","euclidean"). Default is"chebyshev".
- Returns:
Mean squared prediction error.
- Return type:
- Raises:
ValueError – If dim, tau, or prediction_horizon < 1, or series too short.
Examples
>>> import numpy as np >>> from xyz.preprocessing import ragwitz_prediction_error >>> rng = np.random.default_rng(7) >>> x = np.cumsum(rng.normal(size=300)) >>> err = ragwitz_prediction_error(x, dim=2, tau=1, k_neighbors=4) >>> err >= 0 True
xyz.model_selection#
- class xyz.model_selection.RagwitzEmbeddingSearchCV(estimator, *, target_index: int, dimensions=(1, 2, 3), taus=(1, 2, 3), k_neighbors: int = 4, theiler_t: int = 0, prediction_horizon: int = 1, metric: str = 'chebyshev', act_threshold: int | None = None, max_act_lag: int = 1000, min_trials: int = 1, refit: bool = True, n_jobs: int | None = 1)[source]#
Bases:
MetaEstimatorMixin,BaseEstimatorSearch embedding (dimension, tau) using the Ragwitz prediction-error criterion.
Evaluates (dim, tau) candidates via
ragwitz_prediction_error()and selects the pair that minimizes mean prediction error across trials. Optionally filters trials by autocorrelation decay time (ACT).- Parameters:
estimator (object) – TE estimator to tune (e.g.
xyz.KSGTransferEntropy).target_index (int) – Column index of the target variable.
dimensions (tuple of int, optional) – Embedding dimensions to try. Default is (1, 2, 3).
taus (tuple of int, optional) – Embedding delays (samples) to try. Default is (1, 2, 3).
k_neighbors (int, optional) – k for local prediction in Ragwitz criterion. Default is 4.
theiler_t (int, optional) – Theiler window. Default is 0.
prediction_horizon (int, optional) – Steps ahead for prediction. Default is 1.
metric (str, optional) – Distance metric. Default is
"chebyshev".act_threshold (int or None, optional) – If set, keep only trials with ACT <= this. Default is None.
max_act_lag (int, optional) – Max lag for ACT estimation. Default is 1000.
min_trials (int, optional) – Minimum trials after ACT filtering. Default is 1.
refit (bool, optional) – If True, fit best_estimator_ with best params. Default is True.
n_jobs (int or None, optional) – Parallel jobs. Default is 1.
- best_estimator_#
Fitted estimator with best params (if refit=True).
- Type:
estimator
Examples
>>> import numpy as np >>> from xyz import KSGTransferEntropy, RagwitzEmbeddingSearchCV >>> rng = np.random.default_rng(7) >>> trials = [] >>> for _ in range(4): ... driver = rng.normal(size=250) ... target = np.zeros(250) ... for t in range(3, 250): ... target[t] = 0.55 * target[t-1] + 0.25 * target[t-3] + 0.2 * driver[t-1] + 0.1 * rng.normal() ... trials.append(np.column_stack([target, driver])) >>> X = np.stack(trials) >>> search = RagwitzEmbeddingSearchCV( ... KSGTransferEntropy(driver_indices=[1], target_indices=[0], k=3), ... target_index=0, dimensions=(1, 2, 3), taus=(1, 2), ... ).fit(X) >>> "lags" in search.best_params_ and "tau" in search.best_params_ True
- class xyz.model_selection.InteractionDelaySearchCV(estimator, *, delays, refit: bool = True, tie_break: str = 'smallest', n_jobs: int | None = 1)[source]#
Bases:
MetaEstimatorMixin,BaseEstimatorSearch interaction delay for a TE estimator over a set of candidate delays.
Fits the estimator for each delay and selects the delay that maximizes the TE score (or minimizes, depending on estimator). Optionally refits the best estimator.
- Parameters:
estimator (object) – TE estimator with a
delayparameter.delays (array-like) – Candidate delay values (samples) to try.
refit (bool, optional) – If True, fit best_estimator_ with best delay. Default is True.
tie_break (str, optional) –
"smallest"or"largest"when multiple delays tie. Default is"smallest".n_jobs (int or None, optional) – Parallel jobs. Default is 1.
- best_estimator_#
Fitted estimator at best delay (if refit=True).
- Type:
estimator
- class xyz.model_selection.EnsembleTransferEntropy(estimator)[source]#
Bases:
MetaEstimatorMixin,BaseEstimatorWrapper that fits a TE estimator on multi-trial data.
Passes trial-shaped data to the underlying estimator so it can respect trial boundaries (e.g. for KSG within-trial neighbor search).
- Parameters:
estimator (object) – TE estimator with
fit(X, y=None)andscore().
- estimator_#
Fitted clone of the wrapped estimator.
- Type:
estimator
Examples
>>> import numpy as np >>> from xyz import EnsembleTransferEntropy, KSGTransferEntropy >>> X = np.random.randn(3, 200, 2) # 3 trials >>> meta = EnsembleTransferEntropy( ... KSGTransferEntropy(driver_indices=[1], target_indices=[0], lags=1), ... ).fit(X) >>> np.isfinite(meta.score()) True
- class xyz.model_selection.GroupTEAnalysis(estimator, *, target_index: int, dimensions=(1, 2, 3), taus=(1, 2, 3), aggregation: str = 'mean')[source]#
Bases:
MetaEstimatorMixin,BaseEstimatorGroup-level TE: Ragwitz search per subject, then common embedding and aggregate.
For each dataset in
datasets, runsRagwitzEmbeddingSearchCVto find best (lags, tau). Then takes the maximum dimension and tau across subjects, refits each subject with that common embedding, and aggregates scores (mean or median).- Parameters:
estimator (object) – TE estimator to use (e.g.
xyz.KSGTransferEntropy).target_index (int) – Column index of the target.
dimensions (tuple of int, optional) – Ragwitz dimension candidates. Default is (1, 2, 3).
taus (tuple of int, optional) – Ragwitz tau candidates. Default is (1, 2, 3).
aggregation (str, optional) –
"mean"or"median"for group score. Default is"mean".
- subject_scores_#
TE score per subject.
- Type:
np.ndarray
- set_fit_request(*, datasets: bool | None | str = '$UNCHANGED$') GroupTEAnalysis#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class xyz.model_selection.GreedySourceSelectionTransferEntropy(estimator, *, candidate_sources, max_sources: int | None = None, min_improvement: float = 0.0, n_jobs: int | None = 1, refit: bool = True)[source]#
Bases:
MetaEstimatorMixin,BaseEstimatorGreedy forward selection of driver sources for partial TE.
Starts from the estimator’s existing conditioning set and adds driver sources one at a time from
candidate_sources, keeping the one that increases the TE score most. Stops when no improvement or max_sources reached.- Parameters:
estimator (object) – Partial TE estimator with
driver_indicesandconditioning_indices.candidate_sources (array-like) – Column indices of candidate driver sources.
max_sources (int or None, optional) – Maximum number of sources to add. None = no limit. Default is None.
min_improvement (float, optional) – Stop if improvement is <= this. Default is 0.0.
n_jobs (int or None, optional) – Parallel jobs for evaluating candidate sets. Default is 1.
refit (bool, optional) – If True, best_estimator_ is fitted with selected sources. Default is True.
- best_estimator_#
Fitted estimator with selected sources (if refit=True).
- Type:
estimator
xyz.stats#
- xyz.stats.fdr_bh(p_values, alpha: float = 0.05) ndarray[source]#
Benjamini–Hochberg false-discovery-rate correction.
Rejects hypotheses with p-value \(\le\) the adaptive threshold so that the expected FDR is controlled at level
alpha.- Parameters:
p_values (array-like) – P-values (any shape).
alpha (float, optional) – Target FDR level. Default is 0.05.
- Returns:
Boolean array of same shape as
p_values: True where the null is rejected.- Return type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz.stats import fdr_bh >>> p = np.array([0.001, 0.02, 0.04, 0.15]) >>> fdr_bh(p, alpha=0.05) array([ True, True, True, False])
- xyz.stats.bonferroni(p_values, alpha: float = 0.05) ndarray[source]#
Bonferroni correction for multiple testing.
Rejects where \(p_i \le \alpha / m\) with \(m\) the number of tests. Controls family-wise error rate at level
alpha.- Parameters:
p_values (array-like) – P-values (any shape).
alpha (float, optional) – Family-wise error rate. Default is 0.05.
- Returns:
Boolean array: True where the null is rejected.
- Return type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz.stats import bonferroni >>> p = np.array([0.001, 0.02, 0.04]) >>> bonferroni(p, alpha=0.05) array([ True, False, False])
- xyz.stats.generate_surrogates(X, *, method: str = 'trial_shuffle', n_surrogates: int = 100, block_length: int | None = None, random_state=None, driver_index: int | None = None) list[ndarray][source]#
Generate surrogate datasets for transfer-entropy null testing.
Surrogates break the driver–target relationship while preserving marginal structure. Used with
SurrogatePermutationTestto assess significance.- Parameters:
X (array-like) – Data, shape
(n_trials, n_samples, n_features)or equivalent (seexyz.preprocessing.as_trial_array()).method (str, optional) – One of
"trial_shuffle"(shuffle driver across trials),"block_resample","block_reverse","swap_neighbors","time_shift". Default is"trial_shuffle".n_surrogates (int, optional) – Number of surrogate datasets. Default is 100.
block_length (int or None, optional) – Used by
block_*andtime_shiftmethods. Default is None.random_state (int, array-like or None, optional) – Random seed or generator.
driver_index (int or None, optional) – Column index of the driver variable (for methods that permute the driver). Default is 0 if None.
- Returns:
List of surrogate arrays; each has the same shape as the trial representation of
X.- Return type:
list of np.ndarray
Examples
>>> import numpy as np >>> from xyz import generate_surrogates >>> rng = np.random.default_rng(5) >>> X = rng.normal(size=(3, 40, 2)) >>> surrogates = generate_surrogates(X, method="trial_shuffle", n_surrogates=5, random_state=0, driver_index=1) >>> len(surrogates) 5 >>> surrogates[0].shape == X.shape True
- class xyz.stats.BootstrapEstimate(estimator, *, n_bootstrap: int = 100, method: str = 'iid', block_length: int | None = None, ci: float = 0.95, random_state=None, n_jobs: int | None = 1)[source]#
Bases:
MetaEstimatorMixin,BaseEstimatorBootstrap confidence intervals for information-theoretic estimators.
Fits the wrapped estimator on the original data and on bootstrap resamples to obtain a distribution of the score and a confidence interval.
- Parameters:
estimator (object) – An xyz estimator with
fitandscore(e.g. transfer entropy, mutual information).n_bootstrap (int, optional) – Number of bootstrap samples. Default is 100.
method (str, optional) – Resampling:
"iid"(sample rows with replacement),"trial"(resample trials),"block"(block bootstrap within trials). Default is"iid".block_length (int or None, optional) – Block length for
method="block". Default is None (auto).ci (float, optional) – Confidence level for the interval (e.g. 0.95). Default is 0.95.
random_state (int, array-like or None, optional) – Random seed or generator.
n_jobs (int or None, optional) – Number of parallel jobs. Default is 1.
- ci_low_, ci_high_
Lower and upper bounds of the confidence interval.
- Type:
- bootstrap_distribution_#
Bootstrap scores.
- Type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz import BootstrapEstimate, GaussianCopulaMutualInformation >>> rng = np.random.default_rng(404) >>> x = rng.normal(size=(500, 1)) >>> y = 0.6 * x + 0.3 * rng.normal(size=(500, 1)) >>> bootstrap = BootstrapEstimate( ... GaussianCopulaMutualInformation(), ... n_bootstrap=24, method="iid", random_state=0, n_jobs=2, ... ).fit(x, y) >>> bootstrap.ci_low_ <= bootstrap.estimate_ <= bootstrap.ci_high_ True
- class xyz.stats.SurrogatePermutationTest(estimator, *, n_permutations: int = 100, surrogate_method: str = 'trial_shuffle', alpha: float = 0.05, correction: str = 'fdr_bh', shift_test: bool = False, shift_method: str = 'time_shift', random_state=None, n_jobs: int | None = 1)[source]#
Bases:
MetaEstimatorMixin,BaseEstimatorPermutation-based significance testing for transfer-entropy estimators.
Fits the estimator on the observed data and on surrogate data (driver shuffled/perturbed) to build a null distribution and compute a p-value.
- Parameters:
estimator (object) – TE estimator with
fitandscore(e.g.xyz.KSGTransferEntropy).n_permutations (int, optional) – Number of surrogates. Default is 100.
surrogate_method (str, optional) – Method passed to
generate_surrogates(). Default is"trial_shuffle".alpha (float, optional) – Significance level. Default is 0.05.
correction (str or None, optional) – Multiple-test correction:
"fdr_bh","bonferroni", or"none". Default is"fdr_bh".shift_test (bool, optional) – If True, also run a time-shift test. Default is False.
shift_method (str, optional) – Surrogate method for the shift test. Default is
"time_shift".random_state (int, array-like or None, optional) – Random seed or generator.
n_jobs (int or None, optional) – Number of parallel jobs. Default is 1.
- null_distribution_#
Scores on surrogates.
- Type:
np.ndarray
- p_values_#
P-value(s).
- Type:
np.ndarray
Examples
>>> import numpy as np >>> from xyz import SurrogatePermutationTest, KSGTransferEntropy >>> rng = np.random.default_rng(42) >>> trials = [] >>> for _ in range(4): ... driver = rng.normal(size=120) ... target = np.zeros(120) ... for t in range(1, 120): ... target[t] = 0.4 * target[t-1] + 0.5 * driver[t-1] + 0.1 * rng.normal() ... trials.append(np.column_stack([target, driver])) >>> X = np.stack(trials) >>> test = SurrogatePermutationTest( ... KSGTransferEntropy(driver_indices=[1], target_indices=[0], lags=1, k=3), ... n_permutations=12, surrogate_method="trial_shuffle", alpha=0.1, random_state=0, ... ).fit(X) >>> np.isfinite(test.observed_score_) True