Model selection workflow ======================== This page demonstrates how the sklearn-style meta-estimators in ``xyz`` can be used to search embedding settings and interaction delays before running a final TE analysis. Why this workflow exists ------------------------ TRENTOOL-style TE analysis is not only about the low-level estimator. It also depends on: - choosing a sensible embedding dimension, - choosing an embedding spacing, - and choosing a plausible interaction delay. The ``xyz`` search classes make these choices explicit and reproducible in a Pythonic, scikit-learn-like form. Example: embedding and delay search ----------------------------------- .. code-block:: python import numpy as np from xyz import ( GaussianTransferEntropy, InteractionDelaySearchCV, RagwitzEmbeddingSearchCV, ) rng = np.random.default_rng(123) n = 700 driver = rng.normal(size=n) target = np.zeros(n) for t in range(2, n): target[t] = 0.45 * target[t - 1] + 0.20 * target[t - 2] + 0.35 * driver[t - 2] + 0.1 * rng.normal() data = np.column_stack([target, driver]) base = GaussianTransferEntropy(driver_indices=[1], target_indices=[0], lags=1) embedding = RagwitzEmbeddingSearchCV( base, target_index=0, dimensions=(1, 2, 3), taus=(1, 2, 3), ).fit(data) delay = InteractionDelaySearchCV( base.set_params(**embedding.best_params_), delays=(1, 2, 3, 4, 5), ).fit(data) print(embedding.best_params_, embedding.best_score_) print(delay.best_delay_, delay.best_score_) Interactive example ------------------- The two figures below show: 1. a heatmap of the Ragwitz-style embedding search surface, 2. a delay profile after fixing the best embedding. .. plotly-exec:: import numpy as np import plotly.graph_objects as go from plotly.subplots import make_subplots from xyz import GaussianTransferEntropy, InteractionDelaySearchCV, RagwitzEmbeddingSearchCV rng = np.random.default_rng(22) n = 800 true_delay = 3 driver = rng.normal(size=n) target = np.zeros(n) for t in range(true_delay, n): target[t] = ( 0.45 * target[t - 1] + 0.18 * target[t - 2] + 0.30 * driver[t - true_delay] + 0.10 * rng.normal() ) data = np.column_stack([target, driver]) base = GaussianTransferEntropy(driver_indices=[1], target_indices=[0], lags=1) dimensions = (1, 2, 3, 4) taus = (1, 2, 3) embedding = RagwitzEmbeddingSearchCV( base, target_index=0, dimensions=dimensions, taus=taus, ).fit(data) error_grid = np.asarray(embedding.cv_results_["mean_prediction_error"]).reshape(len(dimensions), len(taus)) delay = InteractionDelaySearchCV( base.set_params(**embedding.best_params_), delays=(1, 2, 3, 4, 5, 6), ).fit(data) delay_x = delay.te_by_delay_[:, 0] delay_y = delay.te_by_delay_[:, 1] fig = make_subplots( rows=1, cols=2, subplot_titles=("Embedding search surface", "Delay reconstruction"), ) fig.add_trace( go.Heatmap( x=list(taus), y=list(dimensions), z=error_grid, colorbar=dict(title="Prediction error"), hovertemplate="tau=%{x}
lags=%{y}
error=%{z:.4f}", ), row=1, col=1, ) fig.add_trace( go.Scatter( x=delay_x, y=delay_y, mode="lines+markers", name="TE by delay", ), row=1, col=2, ) fig.add_vline( x=true_delay, line_dash="dash", annotation_text="True delay", row=1, col=2, ) fig.update_xaxes(title_text="tau", row=1, col=1) fig.update_yaxes(title_text="lags", row=1, col=1) fig.update_xaxes(title_text="delay", row=1, col=2) fig.update_yaxes(title_text="transfer entropy (nats)", row=1, col=2) fig.update_layout( title="Model selection workflow for transfer entropy", template="plotly_white", height=460, margin=dict(l=40, r=20, t=70, b=40), ) Interpretation -------------- - A smooth embedding surface is usually easier to trust than a highly erratic one. - Delay reconstruction is most convincing when the TE profile has a clear and interpretable maximum. - In real data, do not rely on model selection alone; combine it with significance testing and domain knowledge.