Skip to content

Action Engine

The Ruptura Action Engine translates rupture detections into concrete remediation steps. It supports three execution tiers, four integration targets, and a suite of safety gates to prevent runaway automation.

Execution tiers

Tier Mode Trigger Who acts
Tier-1 Automatic FusedR ≥ 5.0 + confidence ≥ 0.85 Ruptura (no human needed)
Tier-2 Suggested FusedR ≥ 3.0 + confidence ≥ 0.60 Human approves via API
Tier-3 Alert only FusedR ≥ 1.5 Human decides

Configure the execution mode via RUPTURA_ACTIONS_EXECUTION_MODE or in ruptura.yaml:

actions:
  execution_mode: suggest   # shadow | suggest | auto

Recommended for first deployment: start with suggest to review Ruptura's recommendations for a week before enabling auto.

Integration targets

Kubernetes

# Available K8s actions
- scale:    increase replica count on target Deployment
- restart:  rolling restart of target Deployment
- cordon:   mark Node unschedulable
- drain:    evict all Pods from a Node
- isolate:  apply NetworkPolicy to block ingress/egress

The action engine uses the same service account as Ruptura itself (see RBAC in the Helm chart). The ClusterRole grants get/list/watch on Deployments, StatefulSets, Pods, and Nodes in all editions. When edition: autopilot is set, the Helm chart automatically adds patch/update verbs on Deployments, StatefulSets, DaemonSets, and Nodes — no manual ClusterRole editing required.

Webhook

Send an HTTP POST to any URL with the rupture payload. Useful for triggering CI/CD pipelines, Slack notifications, or custom scripts.

Alertmanager

Raise or resolve alerts in Prometheus Alertmanager. Ruptura generates compatible alert payloads with labels, annotations, and generatorURL.

PagerDuty

Create or update PagerDuty incidents with severity, rupture context, and link to the narrative explain.

Safety gates

Ruptura enforces multiple safety gates before executing any Tier-1 action:

Gate Default Description
Rate limit 6 / hour Max Tier-1 actions per target per hour
Cooldown 300 s Minimum gap between two actions on the same target
Namespace allowlist [] (all blocked) Only act on pods in listed namespaces
Confidence threshold 0.85 Ensemble confidence required for auto-execution
Emergency stop off POST /api/v2/actions/emergency-stop halts all Tier-1 globally

Configuration:

actions:
  execution_mode: auto
  safety:
    rate_limit_per_hour: 6
    cooldown_seconds: 300
    namespace_allowlist:
      - production
      - staging

Action lifecycle

FusedR ≥ threshold (workload enters Warning/Critical/Emergency)
        │
        ▼
Safety gates evaluated
        │
   ┌────┴────┐
  Pass      Fail → Log + skip
   │
   ▼
execution_mode?
   ├── shadow  → Log action, do nothing
   ├── suggest → Enqueue in /api/v2/actions (pending approval, max 256 entries)
   └── auto    → Execute immediately (Tier-1) or queue (Tier-2)
        │
        ▼
Emit metric: rpt_actions_total{type,tier,outcome}

Approving a suggested action

Actions are returned with snake_case fields and a computed state:

[
  {
    "id": "abc123-alert",
    "host": "production/Deployment/order-service",
    "action_type": "alert",
    "tier": 2,
    "confidence": 0.75,
    "r": 3.8,
    "approved": false,
    "executed": false,
    "state": "pending",
    "description": "alert on production/Deployment/order-service",
    "timestamp": "2026-05-27T10:00:00Z"
  }
]
# List pending actions
curl -H "Authorization: Bearer $API_KEY" \
  http://localhost:8080/api/v2/actions

# Approve
curl -X POST -H "Authorization: Bearer $API_KEY" \
  http://localhost:8080/api/v2/actions/act_abc/approve

# Reject
curl -X POST -H "Authorization: Bearer $API_KEY" \
  http://localhost:8080/api/v2/actions/act_abc/reject

# Emergency stop all Tier-1 auto-actions
curl -X POST -H "Authorization: Bearer $API_KEY" \
  http://localhost:8080/api/v2/actions/emergency-stop

Edition gate

Ruptura ships in two editions controlled by the RUPTURA_EDITION environment variable:

Edition Approve endpoint Tier-1 auto-execution
community (default) Returns 402 Payment Required Disabled
autopilot Full approval flow Enabled

In community mode, all action recommendations are visible via GET /api/v2/actions — you can read what Ruptura would do. Only the execution step is gated.

Set the edition in Helm:

# helm/values.yaml
edition: autopilot

Or at runtime:

RUPTURA_EDITION=autopilot ./ruptura

Attempting to approve in community mode returns:

{
  "error": "action execution requires the Autopilot edition",
  "upgrade": "set RUPTURA_EDITION=autopilot to enable automated and manual action approval"
}

Maintenance windows

To suppress action dispatch during planned deploys (preventing false alarms):

curl -X POST \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "workload": "default/Deployment/order-processor",
    "start": "2026-05-01T14:00:00Z",
    "end": "2026-05-01T14:30:00Z",
    "reason": "rolling deploy v2.4.1"
  }' \
  http://localhost:8080/api/v2/suppressions

During the window, ruptures are still recorded and the narrative explain is updated — only action dispatch is suppressed. After the window, Ruptura compares pre/post baselines and reports the health delta.