Workflows and meta-estimators#

Beyond single-call estimators, xyz provides meta-estimators and workflows for model selection, uncertainty quantification, and multivariate source selection.

Bootstrap confidence intervals#

BootstrapEstimate wraps any estimator and returns a point estimate plus a bootstrap distribution and confidence interval.

  • method "iid": resample rows with replacement (suitable for non–time-series or when trials are exchangeable).

  • method "trial": resample whole trials with replacement (multi-trial data); requires at least two trials.

  • method "block": block bootstrap along time within each trial; use block_length to control block size.

Fitted attributes: estimate_, bootstrap_distribution_, ci_low_, ci_high_, standard_error_. Use n_jobs to parallelize bootstrap replicates.

Example:

from xyz import BootstrapEstimate, GaussianTransferEntropy

bootstrap = BootstrapEstimate(
    GaussianTransferEntropy(driver_indices=[1], target_indices=[0], lags=1),
    n_bootstrap=200,
    method="trial",
    ci=0.95,
    n_jobs=2,
    random_state=0,
).fit(data)
print(bootstrap.estimate_, bootstrap.ci_low_, bootstrap.ci_high_)

Greedy source selection (multivariate TE)#

GreedySourceSelectionTransferEntropy performs forward selection of driver variables using a partial transfer entropy estimator. You provide a base estimator with driver_indices and conditioning_indices, a list of candidate_sources (column indices), and optional max_sources and min_improvement. The meta-estimator repeatedly adds the source that most improves the TE score, and stops when no candidate adds at least min_improvement.

Fitted attributes: selected_sources_, best_estimator_, best_score_, selection_history_. Supports n_jobs for evaluating candidate sets in parallel.

Example:

from xyz import GaussianPartialTransferEntropy, GreedySourceSelectionTransferEntropy

selector = GreedySourceSelectionTransferEntropy(
    GaussianPartialTransferEntropy(
        driver_indices=[1],
        target_indices=[0],
        conditioning_indices=[],
        lags=1,
    ),
    candidate_sources=[1, 2, 3],
    max_sources=3,
    min_improvement=0.01,
).fit(data)
# selector.selected_sources_ = [1, 2]  # chosen drivers
# selector.best_estimator_.driver_indices == [1, 2]

Multivariate drivers in TE estimators#

All time-series TE estimators accept driver_indices as a list of column indices. When multiple indices are given, the embedded driver past is the concatenation of the embedded pasts of each driver variable (same lags, tau, delay for all). This allows a single TE model to include several source variables without running greedy selection.

Example:

from xyz import GaussianTransferEntropy

# data columns: 0=target, 1=driver1, 2=driver2, 3=noise
te = GaussianTransferEntropy(
    driver_indices=[1, 2],
    target_indices=[0],
    lags=1,
).fit(data)

Parallelization (n_jobs)#

The following support an n_jobs parameter for parallel execution (default 1; use -1 for all cores with joblib):

  • RagwitzEmbeddingSearchCV: parallel over (dimension, tau) candidates.

  • InteractionDelaySearchCV: parallel over delay candidates.

  • SurrogatePermutationTest: parallel over surrogate fits.

  • BootstrapEstimate: parallel over bootstrap replicates.

  • GreedySourceSelectionTransferEntropy: parallel over candidate source sets.

Results are deterministic for a fixed random_state when applicable.