STPA or - How to Stop Agents From Yeeting Production
We can have our agents and eat them too?
In April 2026, a Cursor agent running Anthropic’s Claude Opus 4.6 deleted the production database for PocketOS—a B2B software platform handling reservations and payments for car rental businesses—in nine seconds. It took the backups with it, because Railway, their infrastructure provider, stores volume-level backups in the same volume as the data they’re protecting. The most recent recoverable backup was three months old.
You’ve probably read about this already if you’re in the chronically online software ethusiast club. While I find it interesting that Railway and Cursor (Perhaps the entire software industry at this point) doesn’t seem to be overly concerned with Agentic safety/access control (Simon Willison’s take on AI safety seems to be right to me here) - I don’t want to talk about that. I’m more interested in simple ways this could have been prevented in the design stage.
On the positive side, my solution is simple, replicable, and can be done with a pencil and paper in ten minutes before you have your agent write any code. The downside? We’d have to care about correctness and safety slightly more than we do now.
A Brief Primer on STPA
Systems-Theoretic Process Analysis—STPA is fairly simple and mechanical. The core premise is that accidents are control problems, not failure problems. You don’t need a component to break for a system to end up in a catastrophic state. Sometimes everything works exactly as designed, and that’s the disaster. (Note: If you want a non-blog level description of this, or expect some level of detail, MIT’s PSASS has resources including a free handbook on STPA.)
STPA models a system as a hierarchy of controllers sending control actions to controlled processes, with feedback flowing back up. A controller has a process model—its internal beliefs about system state—and a control algorithm that turns those beliefs into actions. Accidents often happen when the process model is wrong: the controller believes it’s in one state, acts accordingly, and is actually in another.
The main artifact of an STPA analysis is a table of Unsafe Control Actions (UCAs). For each control action a controller can issue, you ask whether there’s a context in which issuing it causes harm. The four types:
- Provided when it shouldn’t be — the action fires in a context where it causes harm
- Not provided when it should be — the action fails to fire when it was needed
- Wrong timing or order — too early, too late, out of sequence
- Wrong duration — applied for too long or cut short
The PocketOS incident is a Type 1 UCA.
Drawing the Control Structure
Before you analyze UCAs, you draw the control structure. Here’s the relevant slice of PocketOS’s architecture, approximately as it existed on April 24th:
┌─────────────────────────────────────────────────────────┐
│ CONTROL STRUCTURE │
│ │
│ [Human Developer] │
│ │ │
│ │ assign task (staging scope) │
│ │ approve / reject │
│ ▼ │
│ [AI Agent — Cursor + Claude Opus 4.6] │
│ │ │
│ ├─read files, edit code, run tests─▼ │
│ │ [Staging Env] │
│ │ │ │
│ │ logs, errors │
│ │ │ │
│ │◄──────────────────────────────┘ │
│ │ │
│ ├──── ⚠ DELETE staging volume ──| │
│ │ │ │
│ └──── ✗ DELETE prod volume ─────▼ │
│ [Railway API] │
│ │ │
│ ┌─────┴─────┐ │
│ ▼ ▼ │
│ [Prod DB [Staging DB │
│ + Backups] + Backups] │
└─────────────────────────────────────────────────────────┘
Two things jump out immediately.
First, the feedback loop between the human and the agent’s destructive actions is broken. There’s no confirmation gate between “agent decides to delete” and “Railway executes the delete.” In STPA terms, the human controller has no timely feedback channel for irreversible actions issued by the agent. That’s a structural hazard independent of anything the agent does. No control action by the human can stop the agent once it issues a control action.
Second, and this is the point I really want to sit with: the agent’s control action set includes DELETE_PROD_VOLUME. Ask yourself honestly—how often does a coding agent legitimately need to delete the production database? It may be non-zero, but I can’t imagine it’s both non-zero and often the case that there isn’t enough time to ask a human engineer to do it?
The UCA Table Nobody Built
Here’s what the UCA table looks like if you actually build it:
| Control Action | Controller | UCA Type | Hazardous Condition |
|---|---|---|---|
| Read files / list dirs | Agent | — | Not a hazard; low consequence, reversible |
| Edit code in staging | Agent | Type 3 | If applied to production files |
| Delete staging volume | Agent | Type 1 | If issued without explicit human instruction |
| Delete prod volume | Agent | Type 1 | If issued in any context |
| Modify prod schema | Agent | Type 1 | If issued without review gate |
| Use arbitrary API token | Agent | Type 1 | If token scope exceeds task scope |
The DELETE_PROD_VOLUME row is interesting because it almost can’t be a Type 2 UCA. There is no scenario where the agent failing to delete production is the hazard—no workflow where a legitimate task requires this and its omission causes harm. That asymmetry is a diagnostic. When a control action can generate Type 1 UCAs but essentially no Type 2 UCAs, the control action probably shouldn’t exist in the controller’s repertoire at all.
If a type 2 UCA existed here, we might consider mitigations besides removing the control action entirely. Scoped tokens would have been a start here, if the control action doesn’t need to occur immediately we can imagine building an API that requires human confirmation before executing.
What This Actually Looks Like in Practice
The methodology here is not exotic. Draw the control structure—controllers, controlled processes, control actions, feedback loops. List every control action the agent can issue. For each one, ask:
- How often does a legitimate workflow require this action?
- What is the blast radius if it fires in the wrong context?
- Is that blast radius reversible?
If the answer to (1) is “rarely or never” and (3) is “no,” the action shouldn’t be in the capability set. Not rate-limited. Not gated. Absent.
This eliminates a class of risk that no amount of model capability or prompt engineering can touch, because it operates at the architecture layer rather than the inference layer. The agent cannot confuse its process model about whether it’s in staging or production if it has no token that works on production. There is no reasoning error that leads to DELETE_PROD_VOLUME if DELETE_PROD_VOLUME is not an action the agent can take.
The PocketOS agent wasn’t compromised. It wasn’t adversarially manipulated. It was doing its job, encountered an obstacle, and reached for a tool it had. STPA would have looked at that tool and asked a question that takes about thirty seconds: does a coding agent working in staging need this? When the answer is no, you don’t give it the tool. That’s the whole analysis.