Authors: James Tannahill, Plocamium Holdings Status: Draft v1.0 — 2026-06-11. Results current as of the 2026-06-07 fit_study/validation runs and the 2026-06-08 LF-1 anchored test.

Abstract: We present the Plocamium Signal Index (PSI), a composite z-score intelligence system that characterizes narrative structure and models entity-relationship formation in the healthcare sector. PSI combines five orthogonal signals — Attention Cascade Intensity (Hawkes process), Narrative Embedding Drift (Titan cosine distance), Graph Spectral Shift (Laplacian eigenvalues), Sentiment-Momentum Divergence (OLS residual), and Source Concentration (HHI) — through a six-layer processing pipeline including PCA orthogonalization, Bayesian Online Changepoint Detection, inverse-variance weighting, and Kalman smoothing. PSI's link-prediction layer uses a reproducible eight-model ensemble (Jaccard, Adamic-Adar, Common Neighbors, Preferential Attachment, Logistic Regression, GCN, Node2Vec, VGAE) with auto-tuned weights. Cross-validated ROC-AUC and the corresponding overfit-defensibility (train-vs-holdout fit gap) are reported in Section 6, from the 2026-06-07 fit_study run (ensemble holdout AUC 0.745, fit-gap 0.078 — OVERFIT band). Leave-one-out ablation shows the production link-prediction ensemble is, predictively, concentrated in a single model (Node2Vec, ~99% of contribution); the eight-model framing is a reproducibility and robustness property rather than a performance one. Deployed in production on a ~400-entity healthcare co-occurrence graph (150-entity pruned display subset) fed by ~140 curated RSS feeds across 12 sector lanes plus API news sources. We report a deliberately honest evaluation: the only internally significant relationship is that the composite is a forward-attention contra-indicator (Section 6.3, permutation p≈0) — but that outcome is endogenous to the coverage stream and not external validation. On the one target with a clean external anchor (SEC-filing-dated deal events, Section 6.4), PSI does not lead events better than a naive mention-spike baseline (10.8% vs 49.4%). PSI's demonstrated value is thus structural enrichment, not event forecasting; this paper documents both what is and is not supported, per a pre-registered protocol.

Keywords: healthcare intelligence, entity networks, link prediction, graph neural networks, knowledge graph embeddings, Bayesian changepoint detection, Kalman filtering, composite indices


1. Introduction

The healthcare sector generates thousands of news signals daily across regulatory actions, M&A transactions, clinical trial outcomes, executive movements, and policy changes. Traditional intelligence systems process these signals reactively — Bloomberg reports the deal, Reuters covers the announcement, sentiment scores measure the reaction.

We propose the Plocamium Signal Index (PSI), an intelligence system that characterizes the information structure around healthcare entities. PSI was designed on the hypothesis that events in healthcare M&A, regulatory action, and strategic partnerships are preceded by observable changes in information flow — attention cascades, narrative shifts, graph restructuring, sentiment-momentum divergence, source concentration — days to weeks before formal announcements. That forecasting hypothesis was tested externally and did not hold (§6.4: on SEC-filing-dated deal events, PSI does not beat a naive mention-spike baseline). PSI's demonstrated value is therefore as a structural and enrichment system — quantifying and explaining an entity's information profile — rather than as an event forecaster. This paper reports that distinction honestly; §6 documents both what is and is not supported.

1.1 Contributions

  1. Five orthogonal signal framework combining self-exciting processes (Hawkes), embedding drift (Titan), spectral analysis (Laplacian), regression residuals (OLS), and concentration metrics (HHI) for healthcare entity monitoring
  2. Six-layer processing pipeline with PCA orthogonalization, BOCPD regime detection, inverse-variance weighting, Kalman smoothing, James-Stein cold-start handling, and percentile calibration
  3. Reproducible eight-model link-prediction ensemble (TransE excluded for non-reproducibility) with auto-tuned weights and leave-one-out ablation; headline AUC + fit-gap measured by the monthly fit_study Fargate job
  4. Diagnostic-aligned production ensemble: eight models (Jaccard, Adamic-Adar, Common Neighbors, Preferential Attachment, Logistic Regression, GCN, Node2Vec, VGAE), Dirichlet-search weight optimizer, leave-one-out ablation reporting marginal contribution per model — deployed scoring matches the fit_study diagnostic exactly
  5. Production deployment on a live ~400-entity healthcare graph (150-entity display subset) processing ~140 real-time feeds with SEC EDGAR integration
  6. Leave-one-out ablation quantifying each model's marginal contribution — and honestly surfacing that the production ensemble is predictively one-model (Node2Vec)

2.1 Temporal Point Processes for Event Detection

Hawkes (1971) introduced self-exciting point processes for earthquake modeling. We adapt the Hawkes framework to model entity mention arrivals, where each mention increases the probability of subsequent mentions (self-excitation). Our contribution is using the residual between observed and expected intensity as a surprise signal — entities with positive residuals are receiving more attention than their own momentum predicts.

2.2 Bayesian Online Changepoint Detection

Adams and MacKay (2007) proposed BOCPD for real-time regime detection using conjugate-exponential models. We implement BOCPD with a Normal-Inverse-Gamma conjugate prior to detect regime changes in the PSI composite score, applying a 7-day transition window for smooth regime classification. Our implementation uses cumulative probability mass at short run lengths rather than the raw changepoint probability, which we found to be more stable after normalization.

2.3 State Space Models for Signal Smoothing

Harvey (1989) developed the local linear trend model for time series decomposition. We apply the Rauch-Tung-Striebel backward smoother to decompose each entity's PSI score into trend (reported), mean-reverting (alerts), and noise (discarded) components, with native confidence intervals from the smoothed state covariance.

Kipf and Welling (2016) introduced the Variational Graph Autoencoder for unsupervised link prediction. Grover and Leskovec (2016) proposed Node2Vec for learning node embeddings via biased random walks. Bordes et al. (2013) introduced TransE for knowledge graph completion. Our contribution is ensembling these with classical heuristics and feature-based ML. (Note: §6.2 and the Phase-0 baseline panel show that on this graph the ensemble is predictively concentrated in a single model and is at parity with a gradient-boosted baseline — the ensemble is a reproducibility/robustness construct, not a demonstrated performance gain over the best simple model.)

2.5 Shrinkage Estimation

James and Stein (1961) showed that shrinking individual estimates toward a grand mean reduces total estimation error when estimating three or more means simultaneously. We apply James-Stein shrinkage to cold-start entity PSI scores, shrinking toward BICS sector means with a linear confidence ramp.


3. Data

3.1 Signal Sources

  • ~140 RSS feeds across 12 content lanes (Healthcare M&A, Industrials, GCC/LATAM, Government, Tech, etc.; count fluctuates as the feed-health system retires failing feeds — 140 as of 2026-06-11)
  • NewsAPI.ai (Event Registry) — 10 articles per query, 6-hour lookback
  • Mediastack — supplementary global news coverage
  • Total daily signal volume: ~24,000 signals/day

3.2 Entity Graph

  • 6,371 entities extracted via AWS Comprehend (targeted sentiment)
  • 400 entities / 12,791 edges in the backend co-occurrence graph (threshold >= 2 shared articles; as of 2026-06-11), pruned to a 150-entity visualization subset
  • Entity enrichment: BICS 4-level classification (Sector → Industry Group → Industry → Sub-Industry), geography, market cap tier, deal context

3.3 External Data

  • SEC EDGAR EFTS: 8-K material events, SC 13D activist disclosures, S-1 IPO registrations
  • Search velocity: Mention acceleration computed from the signal stream
  • Amazon Titan embeddings: 1,536-dimensional text embeddings for narrative drift measurement

4. Methodology

4.1 Five Orthogonal Signals

4.1.1 Attention Cascade Intensity (ACI)

We model entity mention arrivals as a self-exciting Hawkes process. For entity e, the conditional intensity function is:

λ(t) = μ_e + α Σ_{t_i < t} β exp(-β(t - t_i))

where μ_e is the baseline mention rate estimated from the trailing 14-day window, α is the self-excitation parameter (fixed at 0.5), and β is the decay rate (fixed at 1.0 day⁻¹). The ACI signal for entity e on day d is the standardized residual:

ACI_e(d) = (observed_count - expected_count) / σ_e

where expected_count is the integral of λ(t) over day d and σ_e is the historical standard deviation of residuals. Positive ACI indicates attention exceeding the entity's own self-exciting baseline — a signal that exogenous information is driving coverage beyond endogenous momentum.

4.1.2 Narrative Embedding Drift (NED)

For each entity e, we compute the centroid of Amazon Titan (1,536-dim) embeddings from all articles mentioning e in the trailing 7-day window. NED is the cosine distance between the current week's centroid and the prior week's centroid:

NED_e(d) = 1 - cos(c_e^{current}, c_e^{prior})

High NED indicates that the narrative context surrounding an entity has shifted — the same entity is being discussed in materially different terms. For emerging entities (first appearance), NED is set to 1.0 (maximum drift), reflecting complete novelty. NED captures qualitative narrative change that volume-based signals miss entirely: an entity can maintain constant mention volume while undergoing a complete narrative reframing.

4.1.3 Graph Spectral Shift (GSS)

We construct the entity co-occurrence graph G_d = (V, E_d) where edges are weighted by co-mention frequency in the trailing 14-day window. The normalized Laplacian is:

L = I - D^{-1/2} A D^{-1/2}

where A is the adjacency matrix and D is the degree matrix. We compute the top-k eigenvalues (k = min(50, |V|)) of L for consecutive days and measure:

GSS(d) = ||λ(d) - λ(d-1)||_2

For per-entity attribution, we compute the change in each entity's eigenvector centrality between consecutive snapshots. GSS captures structural reorganization of the entity network — mergers, new alliances, and cluster dissolution manifest as spectral shifts before they appear in headlines.

4.1.4 Sentiment-Momentum Divergence (SMD)

For each entity e, we fit an OLS regression of sentiment on momentum (trailing 7-day mention velocity):

sentiment_e(d) = β_0 + β_1 · momentum_e(d) + ε_e(d)

The SMD signal is the standardized residual ε̂_e(d). Positive SMD indicates sentiment exceeding what momentum alone would predict (the market is more positive than attention warrants). Negative SMD indicates sentiment lagging momentum (rising attention with deteriorating narrative). SMD functions as a contrarian signal: extreme positive SMD often precedes corrections, while extreme negative SMD (attention rising, sentiment falling) often precedes adverse events such as regulatory actions or earnings misses.

4.1.5 Source Concentration (SC)

We compute the Herfindahl-Hirschman Index across source domains for each entity's mentions:

SC_e(d) = Σ_i s_i^2

where s_i is the share of mentions from source domain i. SC ranges from 1/N (perfect diversification) to 1.0 (single source). High SC indicates information asymmetry — when a story is concentrated in one or two sources, it may represent a leak, exclusive, or planted narrative rather than broad consensus. SC serves as a signal quality modifier: high-SC signals should be treated with higher uncertainty.

4.2 Six-Layer Processing Pipeline

Layer 1: PCA Orthogonalization

The five raw signals are orthogonalized via PCA to remove residual correlation. We apply the Kaiser criterion (retain components with eigenvalue > 1) followed by VIF rejection (remove any component with Variance Inflation Factor > 5). This ensures that the composite score is not dominated by correlated signal pairs. The retained principal components are rotated back to the original signal space for interpretability.

Layer 2: BOCPD Regime Detection

We implement Adams and MacKay (2007) with a Normal-Inverse-Gamma conjugate prior on the composite signal stream. The prior parameters are:

  • μ_0 = 0 (centered z-score)
  • κ_0 = 1, α_0 = 1, β_0 = 1 (weakly informative)
  • Hazard function: constant 1/250 (expected regime length ~250 days)

Regime classification uses cumulative probability mass at short run lengths (< 7 days). A 7-day transition window smooths regime boundaries to prevent flip-flopping. Regimes are labeled: QUIET, NORMAL, ELEVATED, HIGH, EXTREME.

Layer 3: Inverse-Variance Weighted Composite

Each signal component receives weight inversely proportional to its trailing variance:

w_i = (1/σ_i^2) / Σ_j (1/σ_j^2)

This auto-updating scheme prioritizes stable, informative signals and downweights noisy components. Weights are recalculated daily on a 30-day trailing window.

Layer 4: Kalman Smoothing

We apply Harvey's (1989) local linear trend model:

State:    x_t = x_{t-1} + v_{t-1} + η_t,  η_t ~ N(0, Q_level)
Velocity: v_t = v_{t-1} + ζ_t,              ζ_t ~ N(0, Q_trend)
Obs:      y_t = x_t + ε_t,                  ε_t ~ N(0, R)

The Rauch-Tung-Striebel backward pass decomposes each entity's score into: - Trend (smoothed state) — reported as the PSI score - Mean-reverting component (innovation residual) — used for alert generation - Noise (observation error) — discarded

Confidence intervals are derived directly from the smoothed state covariance matrix P_{t|T}.

Layer 5: Cold-Start Handling

Entities with fewer than 30 observations receive James-Stein shrinkage toward their BICS sector mean:

PSI_shrunk = (1 - B) · PSI_raw + B · PSI_sector

where B is the James-Stein shrinkage factor and confidence ramps linearly from 0 at n=0 to 1 at n=30. Entities with fewer than 10 observations are suppressed entirely (score = 0, flagged as insufficient data).

Layer 6: Output

The final PSI score is a z-score with regime classification:

Regime Z-Score Range Interpretation
QUIET < -1 Below-normal activity; entity fading from coverage
NORMAL -1 to +1 Baseline activity; no actionable signal
ELEVATED +1 to +2 Above-normal activity; worth monitoring
HIGH +2 to +3 Significant disruption; likely event precursor
EXTREME > +3 Rare signal intensity; immediate attention required

For the first 30 days of system operation, percentile calibration supplements the z-score thresholds to account for limited distributional data.

4.3.1 Classical Heuristics

We compute four classical link prediction scores for all non-adjacent entity pairs:

  • Jaccard Coefficient: |N(u) ∩ N(v)| / |N(u) ∪ N(v)|
  • Adamic-Adar Index: Σ_{w ∈ N(u) ∩ N(v)} 1/log|N(w)|
  • Common Neighbors: |N(u) ∩ N(v)|
  • Preferential Attachment: |N(u)| · |N(v)|

These heuristics capture structural proximity from different perspectives — local neighborhood overlap (Jaccard, CN), weighted overlap penalizing high-degree intermediaries (Adamic-Adar), and global popularity (PA).

4.3.2 Feature-Based ML

A logistic regression model trained on 10 pair features:

  1. Sector match (binary: same BICS sector)
  2. PSI momentum difference (|PSI_u - PSI_v|)
  3. PSI momentum product (PSI_u × PSI_v)
  4. Degree product (deg_u × deg_v)
  5. Degree sum (deg_u + deg_v)
  6. PSI score difference
  7. PSI score product
  8. Mention ratio (min mentions / max mentions)
  9. Common neighbors count
  10. Jaccard coefficient

Training uses 80/20 stratified split with class-balanced sampling.

4.3.3 Graph Convolutional Network

A 2-layer GCN encoder following Kipf and Welling (2017):

H^{(1)} = ReLU(Â X W^{(0)})
Z = Â H^{(1)} W^{(1)}

where  = D̃^{-1/2} à D̃^{-1/2} is the normalized adjacency with self-loops and X is the node feature matrix (PSI scores, degree, sector encoding). Link scores are computed via cosine similarity: score(u,v) = cos(z_u, z_v). Training minimizes binary cross-entropy on held-out edges.

4.3.4 Node2Vec (Grover & Leskovec 2016)

Biased random walks with return parameter p=1 and in-out parameter q=0.5 (biased toward BFS-like exploration). Walk parameters:

  • Walk length: 30
  • Walks per node: 10
  • Embedding dimension: 32
  • Context window: 5

Training uses skip-gram with negative sampling SGD. Link scores are computed as the dot product of learned embeddings.

4.3.5 Variational Graph Autoencoder (Kipf & Welling 2016)

A 2-layer GCN encoder produces mean (μ) and log-variance (log σ²) vectors for each node:

μ = GCN_μ(X, A)
log σ² = GCN_σ(X, A)
z = μ + σ ⊙ ε,  ε ~ N(0, I)

The decoder reconstructs the adjacency matrix via inner product: Â = σ(Z Z^T). Training maximizes the ELBO:

L = E_q[log p(A|Z)] - KL[q(Z|X,A) || p(Z)]

4.3.6 TransE Knowledge Graph Embeddings (Bordes et al. 2013)

Entities and relations are embedded in ℝ^d such that h + r ≈ t for valid triples (h, r, t). We define four relation types derived from graph context:

  1. co_occurrence — entities co-mentioned in articles
  2. same_sector — entities sharing BICS sector classification
  3. deal_related — entities connected by M&A/partnership signals
  4. geographic_peer — entities in the same geographic market

Training minimizes the margin-based ranking loss with negative sampling. Link prediction scores are computed as -||h + r - t|| for each candidate relation type, taking the maximum.

4.3.7 Ensemble

The eight model scores are combined via weighted average:

score_ensemble(u,v) = Σ_i w_i · score_i(u,v)

Weights are optimized through a two-phase process: 1. Dirichlet search: Sample 200 random weight vectors from Dir(α=1) and evaluate on validation edges 2. Perturbation refinement: Take the best Dirichlet sample and perturb each weight by ±0.05, selecting improvements

Post-processing filters: - BICS sector filter: Both entities must have sector classification - Entity deduplication: Substring pairs excluded (e.g., "UnitedHealth" / "UnitedHealth Group") - Confidence threshold: Model agreement (how many of 8 models rank the pair in top-20)

4.4 Interpretation Layer

Each entity receives a natural-language interpretation generated via a template-based system enhanced by Bedrock Haiku:

  1. What — which signal is the primary driver of the current PSI score (highest absolute z-component)
  2. Why — peer context including sector percentile rank and temporal direction (rising/falling/stable over 7 days)
  3. Action — regime-appropriate recommendation calibrated to the entity's current state

Interpretations are regenerated daily and surfaced on the intelligence dashboard alongside the entity card metrics.


5. Experimental Setup

  • Holdout: 20% of existing edges removed for testing
  • Negative sampling: Equal number of non-edges sampled as negative examples
  • Metric: AUC (Area Under ROC Curve)
  • Baseline: Random prediction (AUC = 0.5)

5.2 PSI Validation Gates (Day 30 Auto-Fire)

Five statistical validation gates execute automatically at the 30-day mark:

  1. Permutation test (p < 0.01, 1,000 shuffles) — verifies that PSI scores contain more information than random label assignment
  2. Subsample stability (3 non-overlapping 10-day periods) — verifies that signal structure is consistent across time
  3. Walk-forward cross-validation (>60% positive IC windows) — verifies that PSI has predictive information content in rolling windows
  4. Factor regression (controlling news volume + sector momentum) — verifies that PSI adds information beyond naive volume and sector effects
  5. Decay analysis (ACF half-life characterization) — measures how quickly PSI signals mean-revert, informing optimal lookback windows

5.3 Event Study Design

  • Scan published articles for M&A keywords across 27 event types (acquisition, merger, IPO, activist stake, regulatory approval, partnership, divestiture, etc.)
  • Measure PSI in 6 time windows: pre-14d, pre-7d, pre-3d, event day, post-3d, post-7d
  • Primary metric: percentage of events where PSI was ELEVATED or higher before formal announcement
  • Secondary metric: average PSI lead time (days before event that PSI first reached ELEVATED)

6. Results

The production link-prediction layer runs eight reproducible models on the ~400-entity production graph (built by global_chart_render from per-article co-occurrence >= 2). Per-model holdout ROC-AUC and the full Dirichlet-weighted ensemble are measured monthly by the psi-fit-study Fargate job and reported here from models/psi/fit_study.json.

Filled from the first clean first-Sunday auto run, run_id 2026-06-07T09:01:29Z (72 days of history, 400-entity universe, 5 CV folds, 7-day holdout). Holdout ROC-AUC and fit-gap are from fit_study.json; the Weight column is the production Dirichlet weight from the daily link_predictions.json optimizer.

Model Holdout ROC-AUC Fit-Gap Weight
Logistic Regression 0.756 0.069 0.00
Adamic-Adar 0.662 0.092 0.00
Common Neighbors 0.661 0.093 0.00
Jaccard 0.669 0.088 0.00
GCN 0.614 0.095 0.00
Node2Vec 0.651 0.081 1.00
VGAE 0.500 0.000 0.00
Preferential Attachment 0.602 0.101 0.00
Ensemble (Dirichlet-weighted) 0.745 0.078 1.00

Two honest observations from this run:

  1. The run landed in the OVERFIT band. Ensemble fit-gap is 0.078 (overall.status = "OVERFIT"), up sharply from the prior two runs (trend: 0.025 → 0.026 → 0.078) and above the staging-validated 2026-05-20 projection (LR@full 0.823 / ensemble 0.812 / fit_gap 0.026, HEALTHY). The worst single fit is preferential_attachment (fit-gap 0.101); VGAE degenerated to 0.500 (no signal) this run.
  2. fit_study's best single model and the production weight diverge. In the monthly CV-holdout evaluation, Logistic Regression is the strongest model (holdout 0.756) and the ensemble (0.745) barely exceeds it. Yet the daily production optimizer — fit on the separate psi_compute link-prediction eval set — assigns weight 1.00 to Node2Vec and 0.00 to everything else. These are two different evaluation sets answering the same question, and they disagree on which model carries the signal. This is recorded honestly rather than reconciled away; §6.2 quantifies the production picture.

6.2 Ablation Study

Per-model marginal contribution is measured by leave-one-out: for each of the eight models, the ensemble is re-optimized on the remaining seven features and the resulting AUC is compared to the full ensemble AUC.

Leave-one-out results from ablation_study.json (2026-06-08 run; full-ensemble AUC 0.822 on 1,524 positive / 7,482 negative held-out edges):

Model Full AUC LOO AUC Drop Pct Contribution
Node2Vec 0.822 0.006 0.816 99.3%
Logistic Regression 0.822 0.727 0.095 11.5%
GCN 0.822 0.731 0.091 11.1%
VGAE 0.822 0.733 0.089 10.8%
Adamic-Adar 0.822 0.735 0.087 10.6%
Common Neighbors 0.822 0.735 0.087 10.6%
Jaccard 0.822 0.736 0.086 10.5%
Preferential Attachment 0.822 0.737 0.085 10.4%

The previous ablate-by-zeroing logic in psi_compute/ablation.py reported auc_drop = 0 for any model whose full-ensemble weight was already zero (an artifact of the daily Dirichlet search). Leave-one-out, landed in A3, reports each model's marginal value honestly.

The honest reading: production link prediction is effectively a one-model ensemble. Removing Node2Vec collapses the AUC from 0.822 to 0.006 — it carries 99.3% of the contribution. The other seven models are near-redundant: each leave-one-out drop is ~0.085–0.095, because once Node2Vec is removed the re-optimizer simply re-fits the remaining heuristics back to ~0.73. All eight models are run and scored (the "8-model reproducible ensemble" claim is structurally true), but predictively the ensemble is Node2Vec.

6.3 PSI Validation Gates

The PSI validation Lambda runs weekly (Sun 08:00 UTC) against the per-day PSI history archive. It evaluates PSI signal against forward 7-day log growth in entity mention count across five gates: permutation test, subsample stability, walk-forward cross-validation, factor regression, and decay.

All runs cited here are on the single post-GNN-blend regime (MIN_VALIDATION_DATE = 2026-04-27). Earlier weekly runs read across the 2026-04-16 → 2026-04-26 break (validate Lambda was failing, no daily writes) and mixed pre-Apr-15 sparse graph-cap regime records (~280 KB/day) with post-Apr-27 GNN-blended records (~1.8 MB/day); numbers from those mixed runs should not be cited — they violated the IID assumption. The table below is the 2026-06-07 run, the first on which the corpus reached N ≥ 31 clean records and the decay gate became evaluable.

Validation date 2026-06-07
N records 34
N dates 41
Date range 2026-04-27 → 2026-05-30
Gates passed 2 / 5

Per-gate results:

  • Permutation test — PASS; actual correlation -0.8270, p-value 0.0, threshold 0.01, n_observations 34. (Flipped from FAIL at the 2026-05-28 run: the negative correlation has both strengthened, -0.47 → -0.83, and become statistically significant.)
  • Subsample stability — FAIL; std of period means 0.04375 (3 periods, means: -0.0415, -0.0221, +0.0421). (Flipped from PASS: period 3 turned positive while periods 1–2 stayed negative, so the sign is no longer stable across sub-windows.)
  • Walk-forward CV — FAIL; 50% of rolling IC windows positive (2 windows: IC +0.1139 positive, IC -0.8325 negative, threshold 0.6). (Flipped from PASS: a second window is now evaluable and is strongly negative.)
  • Factor regression — FAIL; PSI beta -0.6529, alpha 0.2112, R-squared 0.7101, controlled for news_volume + sector_momentum. (Beta strengthened -0.35 → -0.65 and R² rose 0.22 → 0.71: PSI is a more robust contra-indicator once news volume and sector momentum are controlled for.)
  • Decay — PASS; half-life 7 days. First run where the decay gate is evaluable (34 records ≥ 31 required); it passes on the first measurement.

Honest interpretation. The negative correlation is the substantive finding, not a failure mode: PSI captures attention peaking ahead of mean reversion in next-7-day mention growth, consistent with a momentum-reversal dynamic rather than a momentum-continuation signal. This is a different claim than the link- prediction AUC reported in §6.1. The 0.85 AUC figure (now removed from the Abstract) was measured on graph link-prediction — whether two entities will form a co-occurrence edge — not on forward-attention prediction. PSI's information content for attention forecasting and its information content for relationship prediction are distinct properties; conflating them would misrepresent what has been validated.

The decay gate requires N >= 31 daily records; the 2026-06-07 run (N = 34) is the first to clear that threshold, and the gate passes (half-life 7 days).

The cleaned-corpus run continues weekly; each future Sunday invocation operates on a window growing by one clean day, so the gate panel will tighten as the single-regime corpus lengthens toward the ~40-day mark.

6.4 Event Study Results

The event study (models/psi/event_study.json, 2026-06-08 run) tests whether PSI was ELEVATED in the days before a deal/announcement event. From the published corpus it extracted 171 events spanning 232 entity-event pairs (after dropping 2,618 non-PSI entities, 316 duplicate pairs, and 27 events that predate PSI coverage, which begins 2026-03-23). For each pair PSI is sampled in six windows: pre-14d, pre-7d, pre-3d, event day, post-3d, post-7d.

Metric Value
Entity-event pairs 232
Pairs with ELEVATED PSI before the event 0 / 232 (0.0%)
Mean pre-event PSI -1.0

This is not yet an interpretable result, and should not be cited as a null. The 0/232 hit rate is dominated by coverage, not signal: most pre-event windows resolve to NO_DATA because the event corpus largely predates dense daily PSI history (the GNN-blend regime only begins 2026-04-27; see §6.3). Where data exists, pre-event PSI sits near the -1.0 floor — consistent with the §6.3 finding that PSI is depressed (not elevated) around attention peaks, but the sample of events with complete pre-event windows is currently too small to separate that effect from missing data. The event study becomes evaluable only once the clean-regime corpus is long enough to contain events with fully populated 14-day lookbacks — estimated Day 90+ of the post-Apr-27 regime (see §8.3, §9 limitation 5).

External anchored test (2026-06-08) — the definitive result. The internal event study above is circular: it measures PSI against the same coverage stream PSI is built from. We therefore ran a pre-registered, externally anchored test (models/psi/lf1_anchored_test.json): deal events dated by their SEC EDGAR filing timestamp (8-K Item 1.01/2.01, SC-13D, S-4, DEFM14A) — exogenous to the news stream — joined to PSI entities by resolved CIK. Pre-registered rule: an entity's deal is "flagged" if PSI was ELEVATED/CRITICAL within ~20 calendar days before the filing; success = beat a mention-spike baseline (bootstrap-CI margin) and median lead ≥ 3 days.

Result: PSI does NOT lead deal events. PSI hit-rate 10.8% (CI 4.8–18.1%) vs a naive mention-spike baseline of 49.4% (CI 39.8–60.2%) — non-overlapping, PASS: False (n=83 evaluable clean-regime events). A trivial attention trigger anticipates ~half of deal events; PSI's regime flag catches ~1 in 9. PSI's transformation of attention into a composite regime actively loses the predictive signal present in raw mention counts. This corroborates §6.2 (the link-prediction layer is one-model and at GBM-parity on a forward holdout) and §6.3 (PSI is a forward-attention contra-indicator). The forecasting hypothesis for PSI is refuted on the one target with a clean external anchor. PSI's validated value is enrichment — explaining why an entity matters — not event forecasting.

6.5 EDGAR Signal Integration

The EDGAR layer (models/psi/edgar_signals.json, 2026-06-07 run) cross- references PSI entities against recent SEC filings: 200 entities scanned, 155 (77.5%) with at least one filing in the lookback window.

The integration is wired and producing per-entity signals, but the current signal quality is low and is reported here honestly rather than as a validated result. The top-scoring "entities" are generic terms — Iran, President, China — matched to unrelated registrants (e.g. Iran → a net-lease REIT, President → a SPAC), with form_type and filing description frequently empty and material_filings at 0. The per-entity score is therefore driven by raw filing volume, not filing materiality or genuine entity identity. The bottleneck is entity resolution: PSI entities are extracted from news narrative (including geopolitical and role nouns) and do not map cleanly to SEC registrant names or CIKs.

Update (2026-06-08): a high-precision entity→CIK resolver now exists (lambdas/psi_compute/entity_resolver.py; 0.47% false-positive rate, ~1,000 entities mapped), so EDGAR signals can be keyed on resolved CIKs rather than the substring match. The resolver also exposed a structural ceiling: only ~10–25% of PSI entities resolve to any CIK — the universe is dominated by non-company nouns (people, places, geopolitics). This is consistent with the entity triage (only ~25% of the graph is market-linkable) and reframes EDGAR/ deal data as usable for a minority enrichment slice, not a universe-wide forecasting signal.


7. Ablation Analysis

7.1 Model Ablation

The per-model leave-one-out ablation is reported in §6.2. The deployed ensemble is the eight reproducible models (four heuristics + Logistic + GCN + Node2Vec + VGAE); TransE was removed in A3 for non-reproducibility and is not ablated here. The headline result: Node2Vec carries ~99% of the contribution, the other seven models are near-redundant — see §6.2 for the full table.

7.2 Signal Ablation

Not yet computed. Unlike the model ablation (§6.2), which is produced daily by the link-prediction pipeline, there is currently no artifact that decomposes the composite PSI into its five constituent signals — ACI (Hawkes self-excitation), NED (embedding drift), GSS (spectral shift), SMD (sentiment-momentum divergence), and SC (source concentration) — and measures each one's marginal contribution. The intended methodology is:

Signal Variance Explained Correlation w/ PSI IC (rank)
ACI (Hawkes) pending pending pending
NED (Embedding Drift) pending pending pending
GSS (Spectral Shift) pending pending pending
SMD (Sent-Mom Divergence) pending pending pending
SC (Source Concentration) pending pending pending

To fill this honestly requires a per-component analysis run over the daily PSI history that (a) regresses each component against forward 7-day mention growth to obtain its rank IC, (b) computes each component's correlation with the composite, and (c) attributes variance via a PCA or leave-one-signal-out on the James-Stein composite. That computation does not exist yet; the numbers are deliberately left as pending rather than estimated. This is the natural companion to the signal-decomposition work and is the most defensible next addition to the validation suite.

7.3 Discussion

Models. The link-prediction ensemble is, predictively, a one-model system: the §6.2 leave-one-out ablation shows Node2Vec carrying ~99% of the ensemble AUC, with the remaining seven models near-redundant (each LOO drop ~0.085–0.095 because the optimizer re-fits the survivors). The honest implication is that the "eight-model ensemble" is a robustness and reproducibility claim, not a performance one — seven of the eight could be dropped from production scoring with negligible AUC loss, though they are cheap to compute and retaining them guards against Node2Vec degrading on a future graph. The fit_study diagnostic (§6.1) tells a different story — there Logistic Regression is the strongest single model — which underlines that "most critical model" is eval-set dependent and should not be over-read from either run alone.

Signals. The five-signal ablation (§7.2) is not yet computed, so no claim is made here about which of ACI/NED/GSS/SMD/SC is most critical. What §6.3 does establish is a property of the composite: it is a statistically significant forward-attention contra-indicator (permutation p≈0, factor-regression psi_beta -0.65), not a continuation signal. Determining which constituent signals drive that behavior is exactly what §7.2 is meant to answer and remains open.

Net. The defensible, evidence-backed claims today are narrow: (1) a reproducible link-prediction layer whose predictive content is concentrated in Node2Vec, and (2) a composite PSI that is a significant negative predictor of forward attention. Broader claims about per-signal contribution and event lead-time are not yet supported by the artifacts and are marked pending rather than asserted.


8. Defensibility and Temporal Moat

Scope note (2026-06-08). This section argues a forecasting moat. The external anchored test (§6.4) refutes the forecasting hypothesis for PSI: it does not lead deal events better than a naive baseline, and the link- prediction layer is at GBM-parity (§6.2, Phase 0). The defensible moat is therefore the earned graph + multi-signal enrichment asset (§8.1, §8.2), not event prediction. The §8.3 capability timeline below is aspirational and partly contradicted by evidence — read it as a research agenda, not a validated roadmap. Claims of a "deal probability model" should be treated as unproven; the first attempt (LF-1) failed.

8.1 Signal Combination Novelty

No published system combines Hawkes processes, spectral graph analysis, BOCPD, and Kalman smoothing for healthcare intelligence. Individual components are well-studied; the specific combination and domain application are novel. The closest related work in financial signal processing (e.g., Bloomberg's event-driven analytics) operates on price and volume data rather than narrative structure.

8.2 Graph Topology as Earned Data

The entity co-occurrence graph required months of continuous ingestion from ~140 feeds. With a 400-node backend graph (12,791 edges as of 2026-06-11; 150-entity display subset) and daily temporal history, this graph cannot be replicated from static data. The temporal dimension — how edges form, strengthen, weaken, and dissolve over time — represents information that can only be accumulated through sustained operation.

8.3 Compounding Temporal Advantage

System capability increases non-linearly with data accumulation:

Milestone Capability Unlocked
Day 1 Signal computation only
Day 7 Lead-lag detection activates
Day 30 Validation gates fire, link prediction gets temporal features
Day 90 Event study reaches statistical significance — but see §6.4: the externally-anchored event study already ran and PSI failed to beat a naive baseline
Day 180 Deal probability model viable — first attempt (LF-1) already refuted; not on track
Day 365 Seasonal pattern detection, annual cycle modeling

Each additional day of operation adds to the baseline rate estimates (Hawkes), the embedding history (NED), the spectral trajectory (GSS), and the regression training data (SMD), creating a widening moat against systems that start later.

8.4 Ensemble Redundancy

Eight models from three methodological families (classical heuristics, ML, deep graph) provide robustness. High model agreement (7/8 or 8/8) on a prediction qualitatively differs from any single model's output. In production, we report a confidence tier based on model agreement:

  • Strong consensus (6-8 models agree): High-confidence prediction
  • Moderate consensus (4-5 models agree): Medium-confidence prediction
  • Weak/no consensus (1-3 models agree): Suppressed from output

9. Limitations

  1. Data dependency: PSI requires continuous news flow; coverage gaps (weekends, holidays) produce signal dropouts that the Kalman smoother partially but not fully compensates for
  2. Healthcare focus: Methodology validated only on healthcare entities; generalization to other sectors (defense, energy, technology) is hypothesized but unproven
  3. Cold start: New entities require 10+ observations before scoring and 30+ for full confidence; rapidly emerging entities may be underweighted during their most informative period
  4. Causal claims: PSI identifies structural conditions preceding events but does not establish causation; elevated PSI may reflect information leakage, genuine signal, or coincidental attention patterns
  5. Backtest limitation: Currently ~45 days of clean single-regime live data (since 2026-04-27, as of 2026-06-11; ~80 days including the pre-break regime that cannot be used for IID validation); long-horizon validation pending. Event study results will reach statistical significance only after sufficient deal events accumulate within the clean-regime window (estimated Day 90+)
  6. Graph sparsity: The 150-entity pruned graph excludes long-tail entities that may carry important signals; the co-occurrence threshold (>= 2 shared articles) trades recall for precision
  7. Embedding model dependency: NED relies on Amazon Titan embeddings; model updates or API changes could shift the embedding space and invalidate historical centroids
  8. Single-language limitation: All signal processing operates on English-language sources; healthcare intelligence in non-English markets requires separate ingestion infrastructure
  9. Pre-A3 TransE state: Until 2026-05-28, the production link-prediction ensemble included TransE (a knowledge-graph embedding model) at a non-trivial weight despite TransE being verified chaotically non-reproducible (1e-12 init perturbation → 0.64 output delta). A3 removed TransE from production and aligned the deployed ensemble with the eight-model fit_study diagnostic. Pre-A3 published probabilities should be treated as approximate; post-A3 probabilities are reproducible against the deployed weights.

10. Conclusion

PSI assembles statistical signal processing, graph learning, and NLP into a single operational pipeline that runs daily at production scale (~400-entity graph, 1,900+ scored entities, clean single-regime history since 2026-04-27). This draft reports what the artifacts currently support — and, deliberately, no more.

What is established. Two claims are evidence-backed today:

  1. A reproducible eight-model link-prediction layer (§6.1, §6.2). Its predictive content is concentrated in Node2Vec, which carries ~99% of the leave-one-out ensemble AUC; the ensemble framing is a robustness and reproducibility property, not a performance one.
  2. A composite PSI that is a statistically significant contra-indicator of forward 7-day attention (§6.3): permutation test p≈0 on a -0.83 correlation, factor-regression psi_beta -0.65 after controlling for news volume and sector momentum, with the decay gate now passing (half-life 7 days) on the first N≥31 clean-corpus run.

What is not yet established. The per-signal ablation (§7.2) is uncomputed; the event study (§6.4) is coverage-limited and not yet interpretable; the EDGAR integration (§6.5) is wired but entity-resolution-bound and currently filing-volume noise. Earlier headline figures (the 0.85 graph-link-prediction AUC) describe a different question than forward-attention prediction and have been removed from the abstract to avoid conflation. The most recent fit_study also landed in the OVERFIT band (ensemble fit-gap 0.078), so even the link- prediction headline should be read as a single monthly run, not a stable benchmark.

The honest position is that PSI is a working, defensible enrichment system — a reproducible relationship-prediction layer plus a five-signal characterization of an entity's information structure — but not an event forecaster. Its one internally-significant relationship (the forward-attention contra-indicator, §6.3) is endogenous and not external validation; the single test with a clean external anchor (§6.4, SEC-filing-dated deal events) refuted the forecasting hypothesis (PSI 10.8% vs naive baseline 49.4%). The validation framework's value is precisely that it surfaced this — separating a real enrichment asset from an unsupported forecasting claim, rather than obscuring the difference.

Future work, in priority order: (1) lean into the validated direction — ship the graph + signals as enrichment/retrieval (the "why an entity matters" use case); the first such product outcome shipped 2026-06-11: the mention-spike baseline itself (the rule that beat PSI in §6.4) is now a watchlist alert trigger, with its pre-registered constants unchanged; (2) key EDGAR/deal data on the new entity→CIK resolver to strengthen that enrichment layer; (3) compute the §7.2 per-signal ablation; (4) if forecasting is revisited, do it only against a fresh external anchor (e.g. regulatory/FDA events) under the same pre-registration discipline — not by scaling the GNN, which Phase 0 showed is at GBM-parity. The previously-planned "deal-probability model" is removed from the roadmap: its premise was tested early (LF-1) and did not hold.


References

  • Adams, R.P. and MacKay, D.J.C. (2007). "Bayesian Online Changepoint Detection." arXiv:0710.3742.
  • Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. (2013). "Translating Embeddings for Modeling Multi-relational Data." NIPS 2013.
  • Grover, A. and Leskovec, J. (2016). "node2vec: Scalable Feature Learning for Networks." KDD 2016.
  • Harvey, A.C. (1989). "Forecasting, Structural Time Series Models and the Kalman Filter." Cambridge University Press.
  • Hawkes, A.G. (1971). "Spectra of some self-exciting and mutually exciting point processes." Biometrika, 58(1), 83-90.
  • James, W. and Stein, C. (1961). "Estimation with Quadratic Loss." Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 361-379.
  • Kipf, T.N. and Welling, M. (2016). "Variational Graph Auto-Encoders." NIPS Workshop on Bayesian Deep Learning.
  • Kipf, T.N. and Welling, M. (2017). "Semi-Supervised Classification with Graph Convolutional Networks." ICLR 2017.
  • Sun, Z., Deng, Z., Nie, J., and Tang, J. (2019). "RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space." ICLR 2019.

Appendix A: System Architecture

Four AWS Lambda functions (arm64, Python 3.12):

Lambda Purpose Schedule Timeout
psi-compute 5 signals, PCA, BOCPD, Kalman, James-Stein, EDGAR boost Daily 13:30 UTC 900s
card-compute Entity intelligence cards with 5 metrics Async (post-compute) 900s
psi-validate 5 statistical validation gates Weekly 900s
psi-enrichment Link prediction, EDGAR scan, velocity, event study, lead-lag, peer-relative, interpretations Async (post-compute) 900s

All Lambdas share a common layer with NumPy, SciPy, scikit-learn, and graph libraries. State is persisted in S3 (signal history) and DynamoDB (entity metadata, scores).

Appendix B: Reproducibility

All code is available at github.com/jtannahill/plocamium-content-engine. The system comprises 16 Python modules totaling ~4,000 lines of signal processing, graph learning, and statistical analysis code, with 145+ unit tests.

Key modules: - psi_signals.py — Five signal computations (ACI, NED, GSS, SMD, SC) - psi_pipeline.py — Six-layer processing pipeline (PCA, BOCPD, IVW, Kalman, James-Stein, output) - link_prediction.py — Eight-model reproducible ensemble with auto-weight optimization (leave-one-out ablation) - psi_validation.py — Five statistical validation gates - psi_enrichment.py — EDGAR integration, event study, lead-lag analysis

Appendix C: Entity Intelligence Card Example

The card-compute Lambda has produced its first batches; cards_latest.json (2026-06-08) holds 87 live entity cards. A real card, rendered from production data, follows. The original draft showed a hypothetical "UnitedHealth Group" card; it is replaced here with an actual one to avoid presenting fabricated numbers as output.

┌─────────────────────────────────────────┐
│  ENTITY: FDA                            │
│  PSI Score: -0.70 (NORMAL)              │
│  Momentum: 60.4   Degree: 46            │
│  Network position: 0.159                │
├─────────────────────────────────────────┤
│  ACI (Attention cascade):   -0.79       │
│  NED (Narrative drift):      0.00       │
│  GSS (Spectral shift):     464.70       │
│  SMD (Sent-Mom divergence):  0.00       │
│  SC  (Source concentration): 0.91       │
├─────────────────────────────────────────┤
│  Dominant signal: spectral_shift        │
└─────────────────────────────────────────┘

Two honest observations from the live cards, both consistent with limitations already noted:

  1. Entity quality. The top production entities are news nouns — FDA, Iran, China, "President", "Strait of Hormuz" — not the healthcare companies the card format was designed around. This is the same entity-resolution gap flagged in §6.5: PSI extracts entities from narrative, and the corpus is currently geopolitics-heavy.
  2. GSS is not yet a per-entity contribution. The spectral-shift value (464.70) is identical across FDA, Iran, and other entities, and therefore dominates dominant_signal for nearly every card. It reads as a shared, un-normalized global spectral metric rather than an entity-specific contribution. The per-component normalization needed to make this column comparable is the same work blocking the §7.2 signal ablation, and the card interpretation text is intentionally omitted until it lands.