An honest evaluation of a composite intelligence system: what the artifacts support, and what they refute.
Authors: James Tannahill, Plocamium Holdings Status: Draft v1.0 — 2026-06-11. Results current as of the 2026-06-07 fit_study/validation runs and the 2026-06-08 LF-1 anchored test.
Abstract: We present the Plocamium Signal Index (PSI), a composite z-score intelligence system that characterizes narrative structure and models entity-relationship formation in the healthcare sector. PSI combines five orthogonal signals — Attention Cascade Intensity (Hawkes process), Narrative Embedding Drift (Titan cosine distance), Graph Spectral Shift (Laplacian eigenvalues), Sentiment-Momentum Divergence (OLS residual), and Source Concentration (HHI) — through a six-layer processing pipeline including PCA orthogonalization, Bayesian Online Changepoint Detection, inverse-variance weighting, and Kalman smoothing. PSI's link-prediction layer uses a reproducible eight-model ensemble (Jaccard, Adamic-Adar, Common Neighbors, Preferential Attachment, Logistic Regression, GCN, Node2Vec, VGAE) with auto-tuned weights. Cross-validated ROC-AUC and the corresponding overfit-defensibility (train-vs-holdout fit gap) are reported in Section 6, from the 2026-06-07 fit_study run (ensemble holdout AUC 0.745, fit-gap 0.078 — OVERFIT band). Leave-one-out ablation shows the production link-prediction ensemble is, predictively, concentrated in a single model (Node2Vec, ~99% of contribution); the eight-model framing is a reproducibility and robustness property rather than a performance one. Deployed in production on a ~400-entity healthcare co-occurrence graph (150-entity pruned display subset) fed by ~140 curated RSS feeds across 12 sector lanes plus API news sources. We report a deliberately honest evaluation: the only internally significant relationship is that the composite is a forward-attention contra-indicator (Section 6.3, permutation p≈0) — but that outcome is endogenous to the coverage stream and not external validation. On the one target with a clean external anchor (SEC-filing-dated deal events, Section 6.4), PSI does not lead events better than a naive mention-spike baseline (10.8% vs 49.4%). PSI's demonstrated value is thus structural enrichment, not event forecasting; this paper documents both what is and is not supported, per a pre-registered protocol.
Keywords: healthcare intelligence, entity networks, link prediction, graph neural networks, knowledge graph embeddings, Bayesian changepoint detection, Kalman filtering, composite indices
The healthcare sector generates thousands of news signals daily across regulatory actions, M&A transactions, clinical trial outcomes, executive movements, and policy changes. Traditional intelligence systems process these signals reactively — Bloomberg reports the deal, Reuters covers the announcement, sentiment scores measure the reaction.
We propose the Plocamium Signal Index (PSI), an intelligence system that characterizes the information structure around healthcare entities. PSI was designed on the hypothesis that events in healthcare M&A, regulatory action, and strategic partnerships are preceded by observable changes in information flow — attention cascades, narrative shifts, graph restructuring, sentiment-momentum divergence, source concentration — days to weeks before formal announcements. That forecasting hypothesis was tested externally and did not hold (§6.4: on SEC-filing-dated deal events, PSI does not beat a naive mention-spike baseline). PSI's demonstrated value is therefore as a structural and enrichment system — quantifying and explaining an entity's information profile — rather than as an event forecaster. This paper reports that distinction honestly; §6 documents both what is and is not supported.
Hawkes (1971) introduced self-exciting point processes for earthquake modeling. We adapt the Hawkes framework to model entity mention arrivals, where each mention increases the probability of subsequent mentions (self-excitation). Our contribution is using the residual between observed and expected intensity as a surprise signal — entities with positive residuals are receiving more attention than their own momentum predicts.
Adams and MacKay (2007) proposed BOCPD for real-time regime detection using conjugate-exponential models. We implement BOCPD with a Normal-Inverse-Gamma conjugate prior to detect regime changes in the PSI composite score, applying a 7-day transition window for smooth regime classification. Our implementation uses cumulative probability mass at short run lengths rather than the raw changepoint probability, which we found to be more stable after normalization.
Harvey (1989) developed the local linear trend model for time series decomposition. We apply the Rauch-Tung-Striebel backward smoother to decompose each entity's PSI score into trend (reported), mean-reverting (alerts), and noise (discarded) components, with native confidence intervals from the smoothed state covariance.
Kipf and Welling (2016) introduced the Variational Graph Autoencoder for unsupervised link prediction. Grover and Leskovec (2016) proposed Node2Vec for learning node embeddings via biased random walks. Bordes et al. (2013) introduced TransE for knowledge graph completion. Our contribution is ensembling these with classical heuristics and feature-based ML. (Note: §6.2 and the Phase-0 baseline panel show that on this graph the ensemble is predictively concentrated in a single model and is at parity with a gradient-boosted baseline — the ensemble is a reproducibility/robustness construct, not a demonstrated performance gain over the best simple model.)
James and Stein (1961) showed that shrinking individual estimates toward a grand mean reduces total estimation error when estimating three or more means simultaneously. We apply James-Stein shrinkage to cold-start entity PSI scores, shrinking toward BICS sector means with a linear confidence ramp.
We model entity mention arrivals as a self-exciting Hawkes process. For entity e, the conditional intensity function is:
λ(t) = μ_e + α Σ_{t_i < t} β exp(-β(t - t_i))
where μ_e is the baseline mention rate estimated from the trailing 14-day window, α is the self-excitation parameter (fixed at 0.5), and β is the decay rate (fixed at 1.0 day⁻¹). The ACI signal for entity e on day d is the standardized residual:
ACI_e(d) = (observed_count - expected_count) / σ_e
where expected_count is the integral of λ(t) over day d and σ_e is the historical standard deviation of residuals. Positive ACI indicates attention exceeding the entity's own self-exciting baseline — a signal that exogenous information is driving coverage beyond endogenous momentum.
For each entity e, we compute the centroid of Amazon Titan (1,536-dim) embeddings from all articles mentioning e in the trailing 7-day window. NED is the cosine distance between the current week's centroid and the prior week's centroid:
NED_e(d) = 1 - cos(c_e^{current}, c_e^{prior})
High NED indicates that the narrative context surrounding an entity has shifted — the same entity is being discussed in materially different terms. For emerging entities (first appearance), NED is set to 1.0 (maximum drift), reflecting complete novelty. NED captures qualitative narrative change that volume-based signals miss entirely: an entity can maintain constant mention volume while undergoing a complete narrative reframing.
We construct the entity co-occurrence graph G_d = (V, E_d) where edges are weighted by co-mention frequency in the trailing 14-day window. The normalized Laplacian is:
L = I - D^{-1/2} A D^{-1/2}
where A is the adjacency matrix and D is the degree matrix. We compute the top-k eigenvalues (k = min(50, |V|)) of L for consecutive days and measure:
GSS(d) = ||λ(d) - λ(d-1)||_2
For per-entity attribution, we compute the change in each entity's eigenvector centrality between consecutive snapshots. GSS captures structural reorganization of the entity network — mergers, new alliances, and cluster dissolution manifest as spectral shifts before they appear in headlines.
For each entity e, we fit an OLS regression of sentiment on momentum (trailing 7-day mention velocity):
sentiment_e(d) = β_0 + β_1 · momentum_e(d) + ε_e(d)
The SMD signal is the standardized residual ε̂_e(d). Positive SMD indicates sentiment exceeding what momentum alone would predict (the market is more positive than attention warrants). Negative SMD indicates sentiment lagging momentum (rising attention with deteriorating narrative). SMD functions as a contrarian signal: extreme positive SMD often precedes corrections, while extreme negative SMD (attention rising, sentiment falling) often precedes adverse events such as regulatory actions or earnings misses.
We compute the Herfindahl-Hirschman Index across source domains for each entity's mentions:
SC_e(d) = Σ_i s_i^2
where s_i is the share of mentions from source domain i. SC ranges from 1/N (perfect diversification) to 1.0 (single source). High SC indicates information asymmetry — when a story is concentrated in one or two sources, it may represent a leak, exclusive, or planted narrative rather than broad consensus. SC serves as a signal quality modifier: high-SC signals should be treated with higher uncertainty.
The five raw signals are orthogonalized via PCA to remove residual correlation. We apply the Kaiser criterion (retain components with eigenvalue > 1) followed by VIF rejection (remove any component with Variance Inflation Factor > 5). This ensures that the composite score is not dominated by correlated signal pairs. The retained principal components are rotated back to the original signal space for interpretability.
We implement Adams and MacKay (2007) with a Normal-Inverse-Gamma conjugate prior on the composite signal stream. The prior parameters are:
Regime classification uses cumulative probability mass at short run lengths (< 7 days). A 7-day transition window smooths regime boundaries to prevent flip-flopping. Regimes are labeled: QUIET, NORMAL, ELEVATED, HIGH, EXTREME.
Each signal component receives weight inversely proportional to its trailing variance:
w_i = (1/σ_i^2) / Σ_j (1/σ_j^2)
This auto-updating scheme prioritizes stable, informative signals and downweights noisy components. Weights are recalculated daily on a 30-day trailing window.
We apply Harvey's (1989) local linear trend model:
State: x_t = x_{t-1} + v_{t-1} + η_t, η_t ~ N(0, Q_level)
Velocity: v_t = v_{t-1} + ζ_t, ζ_t ~ N(0, Q_trend)
Obs: y_t = x_t + ε_t, ε_t ~ N(0, R)
The Rauch-Tung-Striebel backward pass decomposes each entity's score into: - Trend (smoothed state) — reported as the PSI score - Mean-reverting component (innovation residual) — used for alert generation - Noise (observation error) — discarded
Confidence intervals are derived directly from the smoothed state covariance matrix P_{t|T}.
Entities with fewer than 30 observations receive James-Stein shrinkage toward their BICS sector mean:
PSI_shrunk = (1 - B) · PSI_raw + B · PSI_sector
where B is the James-Stein shrinkage factor and confidence ramps linearly from 0 at n=0 to 1 at n=30. Entities with fewer than 10 observations are suppressed entirely (score = 0, flagged as insufficient data).
The final PSI score is a z-score with regime classification:
| Regime | Z-Score Range | Interpretation |
|---|---|---|
| QUIET | < -1 | Below-normal activity; entity fading from coverage |
| NORMAL | -1 to +1 | Baseline activity; no actionable signal |
| ELEVATED | +1 to +2 | Above-normal activity; worth monitoring |
| HIGH | +2 to +3 | Significant disruption; likely event precursor |
| EXTREME | > +3 | Rare signal intensity; immediate attention required |
For the first 30 days of system operation, percentile calibration supplements the z-score thresholds to account for limited distributional data.
We compute four classical link prediction scores for all non-adjacent entity pairs:
These heuristics capture structural proximity from different perspectives — local neighborhood overlap (Jaccard, CN), weighted overlap penalizing high-degree intermediaries (Adamic-Adar), and global popularity (PA).
A logistic regression model trained on 10 pair features:
Training uses 80/20 stratified split with class-balanced sampling.
A 2-layer GCN encoder following Kipf and Welling (2017):
H^{(1)} = ReLU(Â X W^{(0)})
Z = Â H^{(1)} W^{(1)}
where  = D̃^{-1/2} à D̃^{-1/2} is the normalized adjacency with self-loops and X is the node feature matrix (PSI scores, degree, sector encoding). Link scores are computed via cosine similarity: score(u,v) = cos(z_u, z_v). Training minimizes binary cross-entropy on held-out edges.
Biased random walks with return parameter p=1 and in-out parameter q=0.5 (biased toward BFS-like exploration). Walk parameters:
Training uses skip-gram with negative sampling SGD. Link scores are computed as the dot product of learned embeddings.
A 2-layer GCN encoder produces mean (μ) and log-variance (log σ²) vectors for each node:
μ = GCN_μ(X, A)
log σ² = GCN_σ(X, A)
z = μ + σ ⊙ ε, ε ~ N(0, I)
The decoder reconstructs the adjacency matrix via inner product: Â = σ(Z Z^T). Training maximizes the ELBO:
L = E_q[log p(A|Z)] - KL[q(Z|X,A) || p(Z)]
Entities and relations are embedded in ℝ^d such that h + r ≈ t for valid triples (h, r, t). We define four relation types derived from graph context:
Training minimizes the margin-based ranking loss with negative sampling. Link prediction scores are computed as -||h + r - t|| for each candidate relation type, taking the maximum.
The eight model scores are combined via weighted average:
score_ensemble(u,v) = Σ_i w_i · score_i(u,v)
Weights are optimized through a two-phase process: 1. Dirichlet search: Sample 200 random weight vectors from Dir(α=1) and evaluate on validation edges 2. Perturbation refinement: Take the best Dirichlet sample and perturb each weight by ±0.05, selecting improvements
Post-processing filters: - BICS sector filter: Both entities must have sector classification - Entity deduplication: Substring pairs excluded (e.g., "UnitedHealth" / "UnitedHealth Group") - Confidence threshold: Model agreement (how many of 8 models rank the pair in top-20)
Each entity receives a natural-language interpretation generated via a template-based system enhanced by Bedrock Haiku:
Interpretations are regenerated daily and surfaced on the intelligence dashboard alongside the entity card metrics.
Five statistical validation gates execute automatically at the 30-day mark:
The production link-prediction layer runs eight reproducible models on the
~400-entity production graph (built by global_chart_render from per-article
co-occurrence >= 2). Per-model holdout ROC-AUC and the full Dirichlet-weighted
ensemble are measured monthly by the psi-fit-study Fargate job and reported
here from models/psi/fit_study.json.
Filled from the first clean first-Sunday auto run, run_id
2026-06-07T09:01:29Z (72 days of history, 400-entity universe, 5 CV folds,
7-day holdout). Holdout ROC-AUC and fit-gap are from fit_study.json;
the Weight column is the production Dirichlet weight from the daily
link_predictions.json optimizer.
| Model | Holdout ROC-AUC | Fit-Gap | Weight |
|---|---|---|---|
| Logistic Regression | 0.756 | 0.069 | 0.00 |
| Adamic-Adar | 0.662 | 0.092 | 0.00 |
| Common Neighbors | 0.661 | 0.093 | 0.00 |
| Jaccard | 0.669 | 0.088 | 0.00 |
| GCN | 0.614 | 0.095 | 0.00 |
| Node2Vec | 0.651 | 0.081 | 1.00 |
| VGAE | 0.500 | 0.000 | 0.00 |
| Preferential Attachment | 0.602 | 0.101 | 0.00 |
| Ensemble (Dirichlet-weighted) | 0.745 | 0.078 | 1.00 |
Two honest observations from this run:
overall.status = "OVERFIT"), up sharply from the prior two runs
(trend: 0.025 → 0.026 → 0.078) and above the staging-validated 2026-05-20
projection (LR@full 0.823 / ensemble 0.812 / fit_gap 0.026, HEALTHY). The
worst single fit is preferential_attachment (fit-gap 0.101); VGAE
degenerated to 0.500 (no signal) this run.psi_compute link-prediction
eval set — assigns weight 1.00 to Node2Vec and 0.00 to everything else.
These are two different evaluation sets answering the same question, and
they disagree on which model carries the signal. This is recorded honestly
rather than reconciled away; §6.2 quantifies the production picture.Per-model marginal contribution is measured by leave-one-out: for each of the eight models, the ensemble is re-optimized on the remaining seven features and the resulting AUC is compared to the full ensemble AUC.
Leave-one-out results from ablation_study.json (2026-06-08 run; full-ensemble
AUC 0.822 on 1,524 positive / 7,482 negative held-out edges):
| Model | Full AUC | LOO AUC | Drop | Pct Contribution |
|---|---|---|---|---|
| Node2Vec | 0.822 | 0.006 | 0.816 | 99.3% |
| Logistic Regression | 0.822 | 0.727 | 0.095 | 11.5% |
| GCN | 0.822 | 0.731 | 0.091 | 11.1% |
| VGAE | 0.822 | 0.733 | 0.089 | 10.8% |
| Adamic-Adar | 0.822 | 0.735 | 0.087 | 10.6% |
| Common Neighbors | 0.822 | 0.735 | 0.087 | 10.6% |
| Jaccard | 0.822 | 0.736 | 0.086 | 10.5% |
| Preferential Attachment | 0.822 | 0.737 | 0.085 | 10.4% |
The previous ablate-by-zeroing logic in psi_compute/ablation.py reported
auc_drop = 0 for any model whose full-ensemble weight was already zero
(an artifact of the daily Dirichlet search). Leave-one-out, landed in A3,
reports each model's marginal value honestly.
The honest reading: production link prediction is effectively a one-model ensemble. Removing Node2Vec collapses the AUC from 0.822 to 0.006 — it carries 99.3% of the contribution. The other seven models are near-redundant: each leave-one-out drop is ~0.085–0.095, because once Node2Vec is removed the re-optimizer simply re-fits the remaining heuristics back to ~0.73. All eight models are run and scored (the "8-model reproducible ensemble" claim is structurally true), but predictively the ensemble is Node2Vec.
The PSI validation Lambda runs weekly (Sun 08:00 UTC) against the per-day PSI history archive. It evaluates PSI signal against forward 7-day log growth in entity mention count across five gates: permutation test, subsample stability, walk-forward cross-validation, factor regression, and decay.
All runs cited here are on the single post-GNN-blend regime
(MIN_VALIDATION_DATE = 2026-04-27). Earlier weekly runs read across the
2026-04-16 → 2026-04-26 break (validate Lambda was failing, no daily writes)
and mixed pre-Apr-15 sparse graph-cap regime records (~280 KB/day) with
post-Apr-27 GNN-blended records (~1.8 MB/day); numbers from those mixed runs
should not be cited — they violated the IID assumption. The table below is the
2026-06-07 run, the first on which the corpus reached N ≥ 31 clean records and
the decay gate became evaluable.
| Validation date | 2026-06-07 |
|---|---|
| N records | 34 |
| N dates | 41 |
| Date range | 2026-04-27 → 2026-05-30 |
| Gates passed | 2 / 5 |
Per-gate results:
Honest interpretation. The negative correlation is the substantive finding, not a failure mode: PSI captures attention peaking ahead of mean reversion in next-7-day mention growth, consistent with a momentum-reversal dynamic rather than a momentum-continuation signal. This is a different claim than the link- prediction AUC reported in §6.1. The 0.85 AUC figure (now removed from the Abstract) was measured on graph link-prediction — whether two entities will form a co-occurrence edge — not on forward-attention prediction. PSI's information content for attention forecasting and its information content for relationship prediction are distinct properties; conflating them would misrepresent what has been validated.
The decay gate requires N >= 31 daily records; the 2026-06-07 run (N = 34) is the first to clear that threshold, and the gate passes (half-life 7 days).
The cleaned-corpus run continues weekly; each future Sunday invocation operates on a window growing by one clean day, so the gate panel will tighten as the single-regime corpus lengthens toward the ~40-day mark.
The event study (models/psi/event_study.json, 2026-06-08 run) tests whether
PSI was ELEVATED in the days before a deal/announcement event. From the
published corpus it extracted 171 events spanning 232 entity-event pairs (after
dropping 2,618 non-PSI entities, 316 duplicate pairs, and 27 events that
predate PSI coverage, which begins 2026-03-23). For each pair PSI is sampled in
six windows: pre-14d, pre-7d, pre-3d, event day, post-3d, post-7d.
| Metric | Value |
|---|---|
| Entity-event pairs | 232 |
| Pairs with ELEVATED PSI before the event | 0 / 232 (0.0%) |
| Mean pre-event PSI | -1.0 |
This is not yet an interpretable result, and should not be cited as a null.
The 0/232 hit rate is dominated by coverage, not signal: most pre-event windows
resolve to NO_DATA because the event corpus largely predates dense daily PSI
history (the GNN-blend regime only begins 2026-04-27; see §6.3). Where data
exists, pre-event PSI sits near the -1.0 floor — consistent with the §6.3
finding that PSI is depressed (not elevated) around attention peaks, but the
sample of events with complete pre-event windows is currently too small to
separate that effect from missing data. The event study becomes evaluable only
once the clean-regime corpus is long enough to contain events with fully
populated 14-day lookbacks — estimated Day 90+ of the post-Apr-27 regime
(see §8.3, §9 limitation 5).
External anchored test (2026-06-08) — the definitive result. The internal
event study above is circular: it measures PSI against the same coverage stream
PSI is built from. We therefore ran a pre-registered, externally anchored test
(models/psi/lf1_anchored_test.json): deal events dated by their SEC EDGAR
filing timestamp (8-K Item 1.01/2.01, SC-13D, S-4, DEFM14A) — exogenous to the
news stream — joined to PSI entities by resolved CIK. Pre-registered rule: an
entity's deal is "flagged" if PSI was ELEVATED/CRITICAL within ~20 calendar days
before the filing; success = beat a mention-spike baseline (bootstrap-CI margin)
and median lead ≥ 3 days.
Result: PSI does NOT lead deal events. PSI hit-rate 10.8% (CI 4.8–18.1%) vs a naive mention-spike baseline of 49.4% (CI 39.8–60.2%) — non-overlapping, PASS: False (n=83 evaluable clean-regime events). A trivial attention trigger anticipates ~half of deal events; PSI's regime flag catches ~1 in 9. PSI's transformation of attention into a composite regime actively loses the predictive signal present in raw mention counts. This corroborates §6.2 (the link-prediction layer is one-model and at GBM-parity on a forward holdout) and §6.3 (PSI is a forward-attention contra-indicator). The forecasting hypothesis for PSI is refuted on the one target with a clean external anchor. PSI's validated value is enrichment — explaining why an entity matters — not event forecasting.
The EDGAR layer (models/psi/edgar_signals.json, 2026-06-07 run) cross-
references PSI entities against recent SEC filings: 200 entities scanned, 155
(77.5%) with at least one filing in the lookback window.
The integration is wired and producing per-entity signals, but the current
signal quality is low and is reported here honestly rather than as a validated
result. The top-scoring "entities" are generic terms — Iran, President, China —
matched to unrelated registrants (e.g. Iran → a net-lease REIT, President →
a SPAC), with form_type and filing description frequently empty and
material_filings at 0. The per-entity score is therefore driven by raw filing
volume, not filing materiality or genuine entity identity. The bottleneck
is entity resolution: PSI entities are extracted from news narrative (including
geopolitical and role nouns) and do not map cleanly to SEC registrant names or
CIKs.
Update (2026-06-08): a high-precision entity→CIK resolver now exists
(lambdas/psi_compute/entity_resolver.py; 0.47% false-positive rate, ~1,000
entities mapped), so EDGAR signals can be keyed on resolved CIKs rather than the
substring match. The resolver also exposed a structural ceiling: only ~10–25%
of PSI entities resolve to any CIK — the universe is dominated by
non-company nouns (people, places, geopolitics). This is consistent with the
entity triage (only ~25% of the graph is market-linkable) and reframes EDGAR/
deal data as usable for a minority enrichment slice, not a universe-wide
forecasting signal.
The per-model leave-one-out ablation is reported in §6.2. The deployed ensemble is the eight reproducible models (four heuristics + Logistic + GCN + Node2Vec + VGAE); TransE was removed in A3 for non-reproducibility and is not ablated here. The headline result: Node2Vec carries ~99% of the contribution, the other seven models are near-redundant — see §6.2 for the full table.
Not yet computed. Unlike the model ablation (§6.2), which is produced daily by the link-prediction pipeline, there is currently no artifact that decomposes the composite PSI into its five constituent signals — ACI (Hawkes self-excitation), NED (embedding drift), GSS (spectral shift), SMD (sentiment-momentum divergence), and SC (source concentration) — and measures each one's marginal contribution. The intended methodology is:
| Signal | Variance Explained | Correlation w/ PSI | IC (rank) |
|---|---|---|---|
| ACI (Hawkes) | pending | pending | pending |
| NED (Embedding Drift) | pending | pending | pending |
| GSS (Spectral Shift) | pending | pending | pending |
| SMD (Sent-Mom Divergence) | pending | pending | pending |
| SC (Source Concentration) | pending | pending | pending |
To fill this honestly requires a per-component analysis run over the daily PSI
history that (a) regresses each component against forward 7-day mention growth
to obtain its rank IC, (b) computes each component's correlation with the
composite, and (c) attributes variance via a PCA or leave-one-signal-out on the
James-Stein composite. That computation does not exist yet; the numbers are
deliberately left as pending rather than estimated. This is the natural
companion to the signal-decomposition work and is the most defensible next
addition to the validation suite.
Models. The link-prediction ensemble is, predictively, a one-model system: the §6.2 leave-one-out ablation shows Node2Vec carrying ~99% of the ensemble AUC, with the remaining seven models near-redundant (each LOO drop ~0.085–0.095 because the optimizer re-fits the survivors). The honest implication is that the "eight-model ensemble" is a robustness and reproducibility claim, not a performance one — seven of the eight could be dropped from production scoring with negligible AUC loss, though they are cheap to compute and retaining them guards against Node2Vec degrading on a future graph. The fit_study diagnostic (§6.1) tells a different story — there Logistic Regression is the strongest single model — which underlines that "most critical model" is eval-set dependent and should not be over-read from either run alone.
Signals. The five-signal ablation (§7.2) is not yet computed, so no claim is made here about which of ACI/NED/GSS/SMD/SC is most critical. What §6.3 does establish is a property of the composite: it is a statistically significant forward-attention contra-indicator (permutation p≈0, factor-regression psi_beta -0.65), not a continuation signal. Determining which constituent signals drive that behavior is exactly what §7.2 is meant to answer and remains open.
Net. The defensible, evidence-backed claims today are narrow: (1) a reproducible link-prediction layer whose predictive content is concentrated in Node2Vec, and (2) a composite PSI that is a significant negative predictor of forward attention. Broader claims about per-signal contribution and event lead-time are not yet supported by the artifacts and are marked pending rather than asserted.
Scope note (2026-06-08). This section argues a forecasting moat. The external anchored test (§6.4) refutes the forecasting hypothesis for PSI: it does not lead deal events better than a naive baseline, and the link- prediction layer is at GBM-parity (§6.2, Phase 0). The defensible moat is therefore the earned graph + multi-signal enrichment asset (§8.1, §8.2), not event prediction. The §8.3 capability timeline below is aspirational and partly contradicted by evidence — read it as a research agenda, not a validated roadmap. Claims of a "deal probability model" should be treated as unproven; the first attempt (LF-1) failed.
No published system combines Hawkes processes, spectral graph analysis, BOCPD, and Kalman smoothing for healthcare intelligence. Individual components are well-studied; the specific combination and domain application are novel. The closest related work in financial signal processing (e.g., Bloomberg's event-driven analytics) operates on price and volume data rather than narrative structure.
The entity co-occurrence graph required months of continuous ingestion from ~140 feeds. With a 400-node backend graph (12,791 edges as of 2026-06-11; 150-entity display subset) and daily temporal history, this graph cannot be replicated from static data. The temporal dimension — how edges form, strengthen, weaken, and dissolve over time — represents information that can only be accumulated through sustained operation.
System capability increases non-linearly with data accumulation:
| Milestone | Capability Unlocked |
|---|---|
| Day 1 | Signal computation only |
| Day 7 | Lead-lag detection activates |
| Day 30 | Validation gates fire, link prediction gets temporal features |
| Day 90 | Event study reaches statistical significance — but see §6.4: the externally-anchored event study already ran and PSI failed to beat a naive baseline |
| Day 180 | Deal probability model viable — first attempt (LF-1) already refuted; not on track |
| Day 365 | Seasonal pattern detection, annual cycle modeling |
Each additional day of operation adds to the baseline rate estimates (Hawkes), the embedding history (NED), the spectral trajectory (GSS), and the regression training data (SMD), creating a widening moat against systems that start later.
Eight models from three methodological families (classical heuristics, ML, deep graph) provide robustness. High model agreement (7/8 or 8/8) on a prediction qualitatively differs from any single model's output. In production, we report a confidence tier based on model agreement:
PSI assembles statistical signal processing, graph learning, and NLP into a single operational pipeline that runs daily at production scale (~400-entity graph, 1,900+ scored entities, clean single-regime history since 2026-04-27). This draft reports what the artifacts currently support — and, deliberately, no more.
What is established. Two claims are evidence-backed today:
What is not yet established. The per-signal ablation (§7.2) is uncomputed; the event study (§6.4) is coverage-limited and not yet interpretable; the EDGAR integration (§6.5) is wired but entity-resolution-bound and currently filing-volume noise. Earlier headline figures (the 0.85 graph-link-prediction AUC) describe a different question than forward-attention prediction and have been removed from the abstract to avoid conflation. The most recent fit_study also landed in the OVERFIT band (ensemble fit-gap 0.078), so even the link- prediction headline should be read as a single monthly run, not a stable benchmark.
The honest position is that PSI is a working, defensible enrichment system — a reproducible relationship-prediction layer plus a five-signal characterization of an entity's information structure — but not an event forecaster. Its one internally-significant relationship (the forward-attention contra-indicator, §6.3) is endogenous and not external validation; the single test with a clean external anchor (§6.4, SEC-filing-dated deal events) refuted the forecasting hypothesis (PSI 10.8% vs naive baseline 49.4%). The validation framework's value is precisely that it surfaced this — separating a real enrichment asset from an unsupported forecasting claim, rather than obscuring the difference.
Future work, in priority order: (1) lean into the validated direction — ship the graph + signals as enrichment/retrieval (the "why an entity matters" use case); the first such product outcome shipped 2026-06-11: the mention-spike baseline itself (the rule that beat PSI in §6.4) is now a watchlist alert trigger, with its pre-registered constants unchanged; (2) key EDGAR/deal data on the new entity→CIK resolver to strengthen that enrichment layer; (3) compute the §7.2 per-signal ablation; (4) if forecasting is revisited, do it only against a fresh external anchor (e.g. regulatory/FDA events) under the same pre-registration discipline — not by scaling the GNN, which Phase 0 showed is at GBM-parity. The previously-planned "deal-probability model" is removed from the roadmap: its premise was tested early (LF-1) and did not hold.
Four AWS Lambda functions (arm64, Python 3.12):
| Lambda | Purpose | Schedule | Timeout |
|---|---|---|---|
psi-compute |
5 signals, PCA, BOCPD, Kalman, James-Stein, EDGAR boost | Daily 13:30 UTC | 900s |
card-compute |
Entity intelligence cards with 5 metrics | Async (post-compute) | 900s |
psi-validate |
5 statistical validation gates | Weekly | 900s |
psi-enrichment |
Link prediction, EDGAR scan, velocity, event study, lead-lag, peer-relative, interpretations | Async (post-compute) | 900s |
All Lambdas share a common layer with NumPy, SciPy, scikit-learn, and graph libraries. State is persisted in S3 (signal history) and DynamoDB (entity metadata, scores).
All code is available at github.com/jtannahill/plocamium-content-engine. The system comprises 16 Python modules totaling ~4,000 lines of signal processing, graph learning, and statistical analysis code, with 145+ unit tests.
Key modules:
- psi_signals.py — Five signal computations (ACI, NED, GSS, SMD, SC)
- psi_pipeline.py — Six-layer processing pipeline (PCA, BOCPD, IVW, Kalman, James-Stein, output)
- link_prediction.py — Eight-model reproducible ensemble with auto-weight optimization (leave-one-out ablation)
- psi_validation.py — Five statistical validation gates
- psi_enrichment.py — EDGAR integration, event study, lead-lag analysis
The card-compute Lambda has produced its first batches; cards_latest.json
(2026-06-08) holds 87 live entity cards. A real card, rendered from production
data, follows. The original draft showed a hypothetical "UnitedHealth Group"
card; it is replaced here with an actual one to avoid presenting fabricated
numbers as output.
┌─────────────────────────────────────────┐
│ ENTITY: FDA │
│ PSI Score: -0.70 (NORMAL) │
│ Momentum: 60.4 Degree: 46 │
│ Network position: 0.159 │
├─────────────────────────────────────────┤
│ ACI (Attention cascade): -0.79 │
│ NED (Narrative drift): 0.00 │
│ GSS (Spectral shift): 464.70 │
│ SMD (Sent-Mom divergence): 0.00 │
│ SC (Source concentration): 0.91 │
├─────────────────────────────────────────┤
│ Dominant signal: spectral_shift │
└─────────────────────────────────────────┘
Two honest observations from the live cards, both consistent with limitations already noted:
dominant_signal for nearly every card. It reads as a shared,
un-normalized global spectral metric rather than an entity-specific
contribution. The per-component normalization needed to make this column
comparable is the same work blocking the §7.2 signal ablation, and the card
interpretation text is intentionally omitted until it lands.Notes, edits, or content requests — sent directly to James.