Fusion Engine
The Fusion Engine is the analytical core of Ruptura. It combines normalised telemetry from all three signal sources (metrics, logs, traces) into 10 composite KPI signals, runs the adaptive ensemble, and produces the Fused Rupture Index™ for every tracked workload.
Responsibilities
- Composite signal computation — recalculate all 10 signals on every new data point
- Adaptive per-workload baselines — after 24h of observation, thresholds become relative to each workload's Welford baseline
- Rupture detection — compute
metricR,logR,traceRand fuse intoFusedR - Topology-based contagion — build real service dependency graph from OTLP trace spans
- XAI trace — record which signals and pipelines contributed to each prediction
Signal computation order
Raw metrics → stress → fatigue → pressure → humidity → velocity
Raw errors → mood
Trace spans → contagion (topology-based; falls back to errors×cpu)
Log lines → sentiment → (informs contagion)
stress + variance → entropy
all signals → health_score (additive-penalty model)
Signals are computed in dependency order: stress must be ready before fatigue and pressure.
Three-source fusion
metricR = CA-ILR ratio (5-min burst / 60-min stable slope)
logR = burst_rate / log_baseline (fires when error/warn > 3σ)
traceR = span_error_rate × (p99_latency / latency_baseline)
FusedR = weighted combination of available signals
(each source contributes independently — FusedR requires ≥2 sources)
The Fused Rupture Index is more resistant to false positives than any single-source index. A metric spike alone won't push FusedR to critical if logs and traces are healthy.
Adaptive ensemble inside the fusion engine
type EnsembleState struct {
Weights map[string]float64 // ca_ilr, arima, holt_winters, mad, ewma
Errors map[string]float64 // sliding 1-hour MAE per model
Updated time.Time
}
Every 60 seconds, the fusion engine:
- Collects actual vs predicted values for each model over the past 1 hour
- Computes MAE per model:
MAE_i = mean(|actual − predicted_i|) - Normalises inverse MAE to produce new weights
- Stores updated weights in
EnsembleState
The final prediction passed to the rupture detector is the weighted sum:
prediction(t) = Σ_i weight_i(t) × prediction_i(t)
Rupture detector (CA-ILR)
type RuptureDetector struct {
stableILR *ILR // 60-min window — long-term baseline
burstILR *ILR // 5-min window — micro-acceleration
}
func (d *RuptureDetector) R() float64 {
denom := math.Abs(d.stableILR.Alpha)
if denom < 1e-6 { denom = 1e-6 }
return math.Abs(d.burstILR.Alpha) / denom
}
Each ILR runs Welford-style O(1) incremental updates — no history stored, constant memory per workload.
WorkloadRef — treatment unit
All signals are keyed by WorkloadRef (namespace/kind/name), not by host. When multiple pods from the same Deployment send metrics, they are merged using these aggregation rules:
| Signal | Aggregation | Rationale |
|---|---|---|
| Stress | max | worst pod defines workload stress |
| Fatigue | max | accumulated burden follows the most fatigued pod |
| Mood | min | workload mood is as low as the saddest pod |
| Pressure | max | highest pressure pod sets the alarm |
| Humidity | mean | spread errors across all pods |
| Contagion | max | if any pod is contagious, the workload is |
| Resilience | min | weakest pod limits overall resilience |
| Entropy | mean | disorder is a workload-wide property |
| Velocity | mean | aggregate rate of change |
| HealthScore | min | weakest pod governs workload health |
Storage
BadgerDB embedded — no external database:
| Granularity | Retention | Use |
|---|---|---|
| 15 s | 7 days | Raw metrics |
| 1 min | 30 days | Aggregated metrics + logs |
| 5 min | 1 year | KPI snapshot history |
| 1 h | 400 days | Long-term compliance |
FlushSnapshots() is called on SIGTERM — no data loss on graceful shutdown.
XAI — explainability trace
For every rupture, the fusion engine records a full explain trace:
{
"rupture_id": "r_abc123",
"workload": { "namespace": "default", "kind": "Deployment", "name": "payment-api" },
"fused_rupture_index": 4.2,
"model_contributions": {
"ca_ilr": { "weight": 0.35, "prediction": 4.4 },
"arima": { "weight": 0.22, "prediction": 4.0 },
"holt_winters": { "weight": 0.18, "prediction": 3.9 }
},
"top_factor": "fatigue",
"signal_values": {
"stress": 0.72, "fatigue": 0.81, "mood": 0.31,
"pressure": 0.65, "contagion": 0.58, "health_score": 43.0
},
"primary_pipeline": "metric"
}
Retrieve the human-readable narrative at GET /api/v2/explain/{rupture_id}/narrative.