ELICIT Validation Program

Methodology & Significance

How the operator is defined, how it was tested, and what the results mean

The Operator

The Cx operator measures cohesive self-referential integration of information in any multi-channel time-series system. It is computed as the product of two components:

$$\mathrm{Cx} = \Phi \times C^2$$

where $\Phi$ measures differentiation (how many independent dimensions of information a system maintains) and $C^2$ measures relational stability (how consistently those dimensions relate to each other across time).

$\Phi$ — Differentiation (Normalized Eigenvalue Entropy)

For a window $W$ with $n$ signal channels, compute the cross-correlation matrix $R$. Extract eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_n$ and normalize:

$$p_i = \frac{\lambda_i}{\sum_j \lambda_j}, \qquad H = -\sum_i p_i \log p_i, \qquad \Phi = \frac{H}{\log n} \in [0,1]$$

High $\Phi$ means eigenvalues are distributed uniformly — signal dimensions are equally informative, indicating rich differentiation. Low $\Phi$ means one eigenvalue dominates — one mode drives the system. Grounded in Cover & Thomas (1991, Ch. 2, 16): eigenvalue entropy as a measure of distributional uniformity in information theory.

$C^2$ — Relational Coherence (Inverse Frobenius Distance)

$$C^2(W_t) = \frac{1}{(1 + \|R_t - R_{t-1}\|_F)^2}$$

where $\|\cdot\|_F$ is the Frobenius norm. High $C^2$ means the relational structure between channels is stable across consecutive windows — the system is coherent. Low $C^2$ means relational structure is shifting — the system is losing coherence. The squared form derives from the NPR two-frame geometry requiring a squared coherence term analogous to $c^2$ in spacetime.

Signal Requirements

The operator requires a minimum of 3 channels (4–6 recommended) that are spatially or functionally distinct measurements of the same system. Single-source derived statistics (e.g., mean, std, RMSSD all computed from one RR-interval series) are invalid — they cannot generate the cross-channel structure the operator measures. This is why the MIT-BIH cardiac HRV analysis produced a null result: it violated the multi-channel independence requirement.

Windowing Protocol

Window size is domain-dependent, matched to the characteristic timescale of each system. Step size is typically 1/3 to 1/2 of window length. All windows require ≥70% valid (non-NaN) observations.

Domain	Substrate	Window	Channels	N
EEG Seizure	Electromagnetic	10 seconds	8 (10-20 scalp)	4,574
Ocean Currents	Liquid fluid	Variable	4 (depth layers)	7,103
Financial Markets	Information	20–60 trading days	7 (sector ETFs)	852
Jet Stream	Neutral gas	30 days	5 (latitude bands)	204
Seismic Network	Solid elastic	1–6 hours	6 (stations)	239
Solar Wind	Plasma	72 hours	4 (B-field)	65

What the p-values Mean

A p-value measures the probability of observing results at least as extreme as those measured, assuming the null hypothesis (no real effect) is true. Convention:

$p < .05$ — statistically significant
$p < .01$ — strong evidence
$p < .001$ — very strong evidence

These are the filed p-values, computed against a parametric null that assumes independent, white-noise channels — too generous for signals carrying red-noise structure. Re-tested against spectrum-matched (Ebisuzaki phase-randomized) surrogate nulls, the filed values recalibrate upward by roughly 7–8×, and no domain clears $p < .001$ after calibration. The jet stream remains the strongest — it survives at about the 1% level on both p-value and effect size. The solar wind survives only marginally. The financial-markets result does not survive a block-permutation null ($p = 0.19$) and has been retired from the confirmed set. We therefore make no combined-probability claim; we report each domain’s calibrated status and the honest nulls, and we lead with the recalibrated numbers.

What We Did Not Do

No parameters were adjusted between domains. The identical operator — same code, same $\Phi$ and $C^2$ definitions — was applied to each dataset independently. No domain-specific tuning, no post-hoc correction, no feature engineering. The operator receives a matrix of time-aligned numerical channels and returns a scalar. It does not know what it is measuring.

This rules out overfitting. An operator tuned to detect seizures in EEG data has no reason to also detect atmospheric blocking in jet stream data or pre-crash signatures in financial markets — unless the mathematical structure it measures is genuinely domain-general.

Information-Theoretic Foundations

The operator’s design is grounded in formal information theory (Cover & Thomas, 1991):

Data Processing Inequality

Derived Cx cannot artifactually contain more information about regime state than the raw data. Any structure the operator finds was present in the original signal.

Entropy Rate & Conditional Structure

$C^2$ measures the conditional entropy rate of the correlation structure — how much new relational information each window adds. Stable systems have low conditional entropy; transitioning systems have high conditional entropy.

Stein’s Lemma & Error Bounds

Type II error probability decays exponentially with exponent $D(p\|q)$ (KL divergence), governing how reliably the operator distinguishes regime states. Sanov’s theorem bounds the probability of empirical misclassification at approximately $2^{-n \cdot D(p\|q)}$.

Calibrated Confidence

What we claim and what we don’t

Statistical significance tells you the results are not random. It does not tell you the theory is correct. We maintain calibrated confidence levels throughout our work:

94% confidence in internal coherence — the framework is logically self-consistent.

83% confidence in physics compatibility — the framework does not contradict established physics.

~45% confidence in literal truth — the framework describes reality as it actually is.

These numbers are published in every paper. The gap between 94% internal coherence and 45% literal truth is where the real scientific work remains to be done.

Reproducibility

Every dataset is publicly available. The complete validation runs on a standard laptop in under two hours. Source datasets: NASA OMNI2 (CDAWeb), CHB-MIT Scalp EEG (PhysioNet), NOAA NCEP/NCAR Reanalysis (PSL), SCEDC CI Network, SPDR Sector ETFs (public market data), Argo Float Program (Argo GDAC). No proprietary data. No special hardware.

See the six confirmed domains → · The mathematics → · Research hub →