Machine Learning

Why SIMCA Isn't Enough: Building Real-Time AI for Tablet Compression

February 24, 2026 · 8 min read

Vilmer Frost — Founder, BatchCortex

SIMCA is the gold standard for multivariate batch analysis. Sartorius built an industry on it. Every QA director in solid oral dosage knows PCA, knows Hotelling's T², knows SPE residuals. If you run a tablet press and you're doing any kind of multivariate statistical process control, you're probably using SIMCA or something like it.

I'm not here to say SIMCA is bad. The math is sound. The regulatory acceptance is established. PCA-based MSPC has been saving batches for decades.

But there's a gap — and it's growing.

The Problem With Batch-Level Averages

Here's what SIMCA typically sees: batch-level summary statistics. The mean compression force across a run. The average tablet weight. The standard deviation of hardness over 60 minutes.

Here's what it doesn't see: Station 23 drifting 0.8σ/hour while the other 44 stations hold steady.

A 45-station rotary tablet press at 100,000 tablets per hour generates between 45,000 and 450,000 sensor readings per second at typical sampling rates of 1–10 kHz. When you average that down to batch-level statistics, you lose the per-station, per-second resolution that actually tells you what's going wrong.

A single station with a worn punch creates a signal that's invisible in the batch average. The cross-station mean barely moves. The batch-level PCA model stays green. But Station 23 is producing out-of-spec tablets — and your operators won't catch it until the next manual sample in 15–30 minutes.

By then, you've produced 250,000 tablets between checks. At a 0.02% sampling rate, the other 99.98% are unchecked.

What We Actually Need

The question isn't “is this batch going wrong?” — SIMCA answers that reasonably well, after the fact.

The question is: which station, which failure mode, and how long until spec breach?

That requires three things SIMCA doesn't provide out of the box:

Per-station, per-second monitoring — not batch averages
Temporal pattern recognition — catching drift patterns that unfold over minutes, not just point-in-time deviations
Cross-station correlation analysis — distinguishing “Station 23 worn punch” from “feeder blockage affecting all stations”

Our Detection Stack: Defense in Depth

At BatchCortex, we're building an 11-layer defense-in-depth detection stack for tablet compression monitoring, validated layer by layer. Each layer is designed to catch what the others miss.

Where this stands today: a deterministic safety floor (hard-limit, specification and slope checks) runs on every batch now. The ML layers below are being validated one at a time on real pharmaceutical data before any of them gates a GMP decision — only a validated layer is treated as a live control. See the public roadmap for each layer's current status.

Layer 1 — SPC: CUSUM and EWMA

Before any ML, we run classical Statistical Process Control — but with a twist.

Standard Shewhart charts (the ±3σ control limits most facilities use) need approximately 44 subgroups to detect a 1σ shift. CUSUM (Cumulative Sum) detects the same shift in about 10 subgroups. That's 18 minutes of early warning your operators don't have today.

We pair CUSUM (k=0.5, h=4.0) with EWMA (λ=0.2, L=3.0) for complementary detection. CUSUM catches sustained shifts. EWMA tracks gradual drift with exponentially weighted memory — 20% current value, 80% history — with time-varying control limits that tighten as the process stabilizes.

Both reset between batches to prevent cross-contamination of process state.

Layer 2 — PCA: T² and SPE

This is SIMCA's home territory — and we use it too. But with a critical difference.

Our PCA model operates on a 47-dimensional feature vector extracted per reading, not per batch. Those 47 features include:

25 station statistics (5 sensors × 5 statistics: mean, std, min, max, range across all 45 stations)
5 machine-level sensor values (turret speed, feeder speed, room temperature, humidity, dust extraction pressure)
Pre/main compression force ratio
Rolling statistics at multiple window sizes (50, 200, 500 readings) for trend detection
Cross-station correlation metrics (mean correlation, minimum correlation, correlation std, maximum deviation station ID)
SPC state variables (CUSUM accumulator, EWMA value)

We retain components capturing 95% of variance (typically 25–28 components out of 47), then monitor both T² (in-model deviations — known patterns going wrong) and SPE/Q-residuals (novel failures the model hasn't seen).

The contribution plot from PCA tells you exactly which features drove the alarm. This isn't just a score — it's the SHAP-equivalent explanation that EU GMP Annex 22 requires for model explainability. Every alert comes with a per-feature breakdown an inspector can read.

Layer 3 — Isolation Forest

PCA assumes the data is roughly Gaussian. Manufacturing data often isn't. Isolation Forest handles the multimodal, non-linear anomalies that PCA misses.

We use scikit-learn's implementation with 100 estimators and automatic contamination estimation. Inference time is under 5ms — fast enough for real-time monitoring without adding latency.

Layer 4 — TCN Autoencoder

Here's where temporal awareness enters. A Temporal Convolutional Network with dilated causal convolutions (dilation factors [1, 2, 4, 8, 16], kernel size 3) processes sequences of 50 readings.

The key innovation: causal convolutions mean the model only sees past data, never future data. This is critical for real-time deployment — you can't peek ahead in a live production environment.

The TCN autoencoder learns to reconstruct normal temporal patterns. When reconstruction error spikes, something novel is happening in the temporal structure — a drift pattern, an oscillation, a step change that the point-in-time models (PCA, IF) might not catch.

90,000 parameters. Under 10ms inference. Small enough to run on edge hardware.

Layer 5 — LSTM Autoencoder

The LSTM captures longer-range dependencies that the TCN's fixed receptive field might miss. A 2-layer encoder-decoder architecture (378,000 parameters) learns the normal temporal dynamics across the full feature space.

Where TCN excels at local patterns (the shape of a compression cycle), LSTM excels at process-level trends (gradual bearing wear that develops over 60+ minutes with growing oscillation at 0.3–0.5 Hz).

Layer 6 — Meta-Scorer

Once validated, the ensemble works not as a vote but as a weighted combination, with CUSUM-based change detection on the combined score.

Layer weights: SPC 0.25, PCA 0.25, IF 0.15, TCN 0.20, LSTM 0.15.

SPC and PCA carry the most weight because they're the most interpretable — and interpretability matters in a regulated environment. The deep learning layers add sensitivity to patterns the statistical methods miss, but they don't override the classical foundations.

The meta-scorer applies CUSUM to the combined score and classifies each reading as NORMAL, WARNING, or CRITICAL. A WARNING fires when the combined score crosses the first threshold. A CRITICAL fires when it crosses the second — and triggers the LLM deviation report generator.

What This Catches That SIMCA Doesn't

We simulate and detect six pharmaceutical failure modes, each with distinct signatures:

Failure Mode	What Happens	Detection Approach
Punch wear	Single station gradual drift over 20–40 min	Cross-station correlation drop + per-station trend
Feeder blockage	Exponential decay cascading to all stations over 15 min	Multi-station simultaneous deviation + fill depth drop
Granulation moisture	Variance increase + compression ratio shift over 45 min	SPC variance alarm + PCA loading shift
Turret bearing wear	Growing oscillation at 0.3–0.5 Hz over 60 min	TCN temporal pattern + LSTM frequency detection
Sudden equipment failure	Spike to 25+ kN in under 10 seconds	All layers fire simultaneously
Weight drift	Linear drift in tablet weight over 40 min	CUSUM trend detection + EWMA tracking

The critical distinction: cross-station correlation analysis. When Station 23's compression force diverges while other stations hold steady, the cross-station correlation drops from >0.90 to ~0.71. That specific pattern means “single-station mechanical issue” — almost certainly punch wear or tooling damage.

When all stations drop simultaneously, the correlation stays high but the means shift. That pattern means “upstream process issue” — feeder blockage, granulation moisture change, or environmental drift.

No batch-level PCA model makes this distinction. It requires per-station, per-second data.

The Regulatory Reality

We built this for EU GMP Annex 22 from day one. That draft (published July 2025) explicitly permits only static, deterministic AI/ML models for critical GMP applications. Dynamic models, probabilistic models, generative AI, and LLMs are excluded from direct process control.

Our architecture respects this completely:

All 11 ML layers are trained offline, frozen at deployment, exported to fixed model files (pkl, pt). No online learning. No model drift.
The LLM deviation report generator (Mistral AI, temperature 0.1) fires only on anomaly events to generate investigation documentation. It never touches process control.
Every AI recommendation requires human approval via electronic signature before any action is taken. Ghost Operator architecture: AI drafts, human decides.
Full audit trail with ALCOA+ data integrity — every reading, every model output, every operator action is hash-chained and immutable.

This isn't compliance added as an afterthought. The regulatory constraints shaped the architecture.

Same Math. Deeper Intelligence.

We use the same PCA, T², and SPE monitoring that SIMCA pioneered. We don't reject the statistical foundation — we build on it.

But a modern tablet press generates too much data, with too much per-station granularity, changing too fast, for batch-level multivariate analysis alone. The gap between what a Shewhart chart catches and what defense-in-depth detection on per-station data catches is up to 18 minutes of early warning.

In those 18 minutes, a press running at 100,000 tablets per hour produces 30,000 tablets. At typical batch values, that's the difference between a $14,000 deviation investigation and a $5 million batch failure.

SIMCA isn't wrong. It's incomplete.

BatchCortex is a GMP-compliant AI batch monitoring platform built for pharmaceutical tablet compression. We're accepting pilot partners at batchcortex.com.

Presenting at ISPE Europe Annual Conference, Copenhagen — April 20–22, 2026.

See BatchCortex in action

Join our pilot programme and catch failures 18 minutes earlier than today.

Apply for Pilot

← Back to blog