Discrete (binning) estimators ============================= The discrete family is intended for symbolic processes or for continuous data that you deliberately quantize into a small number of states. Implemented classes ------------------- - ``xyz.DiscreteTransferEntropy`` - ``xyz.DiscretePartialTransferEntropy`` - ``xyz.DiscreteSelfEntropy`` Mathematics ----------- These estimators compute empirical probabilities from counts of repeated states in embedded observation matrices. If :math:`Y_t` is the current target state and :math:`Y_t^-` is its embedded past, then: .. math:: TE_{X \to Y} = H(Y_t \mid Y_t^-) - H(Y_t \mid Y_t^-, X_t^-), .. math:: PTE_{X \to Y \mid Z} = H(Y_t \mid Y_t^-, Z_t^-) - H(Y_t \mid Y_t^-, X_t^-, Z_t^-), .. math:: SE_Y = H(Y_t) - H(Y_t \mid Y_t^-). Each entropy term is evaluated from empirical frequencies: .. math:: \hat{H}(Y) = -\sum_y \hat{p}(y)\log \hat{p}(y). Quantization ------------ When ``quantize=True``, ``xyz`` applies MATLAB-compatible uniform quantization with ``c`` bins before counting discrete states. This is useful for ITS-style parity and for exploratory symbolic analysis, but it also introduces a modeling choice: the result now depends on the quantization scheme. Why use discrete estimators --------------------------- - They are natural for genuinely discrete state spaces. - They are easy to interpret because they reduce everything to frequency tables. - They are often a useful pedagogical baseline for understanding TE and PTE. When to use them ---------------- Use the discrete family when: - your data are already categorical or symbolic, - you want to compare multiple coarse quantizations of a continuous process, - or you want a transparent state-counting baseline before moving to KSG or Gaussian estimators. Typical use cases ----------------- - Symbolic dynamics and regime switching. - Discretized market states, such as up/flat/down returns. - Binned neural or physiological activity states. How to use them --------------- .. code-block:: python import numpy as np from xyz import ( DiscretePartialTransferEntropy, DiscreteSelfEntropy, DiscreteTransferEntropy, ) data = np.random.randn(2000, 3) te = DiscreteTransferEntropy( driver_indices=[0], target_indices=[1], lags=1, c=8, quantize=True, ).fit(data) pte = DiscretePartialTransferEntropy( driver_indices=[0], target_indices=[1], conditioning_indices=[2], lags=1, c=8, ).fit(data) se = DiscreteSelfEntropy(target_indices=[1], lags=2, c=8).fit(data) print(te.transfer_entropy_) print(pte.transfer_entropy_) print(se.self_entropy_) Practical advice ---------------- - ``c`` too small merges distinct states and may underfit. - ``c`` too large creates sparse tables and unstable estimates. - If estimates change dramatically with the number of bins, report that sensitivity rather than hiding it. - In higher-dimensional embeddings, the discrete state space grows quickly, so KSG or Gaussian estimators may become more reliable. Interactive example ------------------- The plot below shows discrete TE as a function of the number of quantization bins in a synthetic lagged system. This is a useful diagnostic because an estimate that is only present for one narrow bin count is often not robust. .. plotly-exec:: import numpy as np import plotly.graph_objects as go from xyz import DiscreteTransferEntropy rng = np.random.default_rng(16) n = 900 driver = rng.normal(size=n) target = np.zeros(n) for t in range(1, n): target[t] = 0.55 * target[t - 1] + 0.40 * driver[t - 1] + 0.10 * rng.normal() data = np.column_stack([target, driver]) bins = [3, 4, 5, 6, 8, 10, 12] te_vals = [] for c in bins: est = DiscreteTransferEntropy( driver_indices=[1], target_indices=[0], lags=1, c=c, quantize=True, ).fit(data) te_vals.append(est.transfer_entropy_) fig = go.Figure() fig.add_trace( go.Bar( x=bins, y=te_vals, name="Discrete TE", ) ) fig.update_layout( title="Discrete transfer entropy across quantization granularities", xaxis_title="Number of bins c", yaxis_title="Transfer entropy (nats)", template="plotly_white", height=420, margin=dict(l=40, r=20, t=60, b=40), )