Ruptura
The Predictive Action Layer for Cloud-Native Infrastructure.
Ruptura detects Kubernetes workload failures before they become outages — using the Fused Rupture Index™, 10 composite KPI signals with adaptive per-workload baselines, and an action engine that responds automatically with configurable safety gates.
→ Getting Started → · GitHub · CLI Reference →
Why Ruptura?
| Traditional Observability | Ruptura |
|---|---|
| Threshold alerts fire after the fact | Fused Rupture Index™ detects divergence hours early |
| Global thresholds — batch jobs always "stressed" | Adaptive per-workload baselines after ~45 min |
| "host-123 CPU 78%" — what does it mean? | "payment-api is exhausted — fatigue accumulation, cascade from db" |
| Manual incident response | Tier-1 actions (scale, restart, rollback) with safety gates |
| 5+ tools: Prom + Grafana + AM + Loki + PD | One helm install, two pods, no external database |
| Numbers, no reasoning | Narrative explain — structured causal chain |
v7 Architecture
┌────────────────────────────────────────────────────────────┐
│ ruptura-system │
│ │
│ ┌───────────────────────┐ ┌────────────────────────┐ │
│ │ ruptura-engine │ │ ruptura-ui │ │
│ │ (Go binary) │ │ (Svelte 4 + nginx) │ │
│ │ │ │ │ │
│ │ :8080 REST API │◄───│ nginx proxies /api/ │ │
│ │ :4317 OTLP ingest │ │ injects Bearer token │ │
│ │ │ │ :80 dashboard UI │ │
│ └───────────────────────┘ └────────────────────────┘ │
│ NodePort 31468 NodePort 31469 │
│ NodePort 31470 (OTLP) │
└────────────────────────────────────────────────────────────┘
| Port | Purpose |
|---|---|
| 31468 | Engine REST API (/api/v2/*) |
| 31469 | Svelte dashboard |
| 31470 | OTLP ingest + Prometheus remote-write |
Core Concepts
Fused Rupture Index™
FusedR = weighted_average(metricR, logR, traceR)
requires ≥ 2 sources — one noisy signal cannot trigger critical
| FusedR | State | Action |
|---|---|---|
| < 1.5 | Stable | None |
| 1.5 – 3.0 | Warning | Tier-3 alert |
| 3.0 – 5.0 | Critical | Tier-2 suggested |
| ≥ 5.0 | Emergency | Tier-1 automated |
10 Composite KPI Signals
| Signal | Display Name | Measures |
|---|---|---|
| stress | CPU Pressure | CPU + latency burst |
| fatigue | Memory Pressure | Cumulative baseline deviation |
| mood | Trend | Log error/warn sentiment |
| pressure | Load Index | Memory + disk saturation |
| humidity | Saturation | Forecast variance |
| contagion | Blast Radius | Error propagation from upstream |
| resilience | Resilience | Recovery speed after spikes |
| entropy | Entropy | Signal disorder |
| velocity | Velocity | Request rate acceleration |
| throughput | Throughput | Data volume per cycle |
Adaptive Ensemble — 5 models, no configuration
| Model | Strengths |
|---|---|
| CA-ILR (dual-scale) | O(1) update · detects acceleration |
| ARIMA | Stationary series with trends |
| Holt-Winters | Seasonal patterns |
| MAD | Robust to outliers |
| EWMA | Reacts quickly to recent shifts |
Weights recomputed every 60s from live prediction error — no tuning needed.
Action Engine
| Tier | Trigger | Mode |
|---|---|---|
| Tier-1 | FusedR ≥ 5.0 + confidence ≥ 0.85 | Auto (scale/restart/cordon) |
| Tier-2 | FusedR ≥ 3.0 + confidence ≥ 0.60 | Suggested — approve via API or CLI |
| Tier-3 | FusedR ≥ 1.5 | Alert only (Slack / PagerDuty / webhook) |
Safety gates: per-target rate limit (6/hour), 300s cooldown, namespace allowlist, emergency stop.
Current Release
v7.1.0 — Security hardening, atomic compaction, Fleet UX, ruptura-ctl v1.2.0.
| Change | Detail |
|---|---|
| Auth fail-closed | RUPTURA_API_KEY required — no silent open access |
| Atomic compaction | Storage rollups are crash-safe — no double-averaging on restart |
/api/v2/metrics public |
Prometheus scraping no longer requires an API key |
| Fleet UX | Signal mini-bars, null guards, 3-tab detail, empty states with calibration status |
| SRE-friendly labels | FusedRuptureIndex → Risk Score, fatigue → Memory Pressure, contagion → Blast Radius |
| ruptura-ctl v1.2.0 | Watch mode (-w 5), context subcommand, emergency-stop confirmation, server version check |
| Lab setup | Civo/k3s one-shot deploy — 6 synthetic test apps covering all failure modes |
| Tenant isolation | Autopilot: namespace filter applied to ALL GET endpoints (was fleet/actions only) |