
Rare Agent Work · Security Engineering Edition
Rev 1.0 · Updated March 18, 2026
Six threat surfaces, twelve controls, and early-preview NemoClaw adoption caveats
Harden OpenClaw-style agent deployments against common threat classes - from plaintext secret exposure to indirect prompt injection - with controls mapped to testable failure modes.
What this report gives you
The finding that changes your next decision
“The most common agent security failure is not a sophisticated attack. It is an indirect prompt injection via retrieved content combined with plaintext secrets the runtime can read. The attacker does not need infrastructure access; they need influence over one document or payload your agent ingests.”
This report is right for you if any of these are true
Why this report exists
OpenClaw-style agent stacks optimize for developer velocity, which is useful for prototyping and insufficient by itself for production workflows. The six threat surfaces in this report are the ones security reviewers repeatedly ask teams to prove they have controlled: plaintext secrets accessible to the runtime, unrestricted tool execution, indirect prompt injection via retrieved content, missing audit trails, missing infrastructure-level cost enforcement, and shared compute blast radius. This report maps each surface to a testable control, provides a 12-item go/no-go checklist, and explains how to evaluate early-preview NemoClaw controls without treating them as a production guarantee.
Honest disqualification. If none of the above matches you, this report was written for you.
Six documented attack surfaces with specific incident classes — not a generic AI risk list. Each surface maps to a control and a test.
Why raw environment variables are insufficient for agent runtimes and the three-property secrets architecture that blocks common exfiltration paths.
Input sanitization, privilege separation, output validation, and anomaly detection — why any single layer is insufficient and how to implement all four.
Go/no-go criteria with test evidence requirements for each control. If you cannot produce test evidence, the item is not complete.
How early-preview NemoClaw intends to address key threat surfaces, plus the evidence teams should collect before relying on those controls.
SOC 2, HIPAA, and FINRA control mapping for NemoClaw deployments — with explicit distinctions between documented architecture and certification.
All 6 sections — scroll down to read.
Six threat surfaces with specific failure classes, including runtime-visible secret exposure that many teams leave undefended.
The security threat model for OpenClaw differs fundamentally from the threat model for a static API call to a language model. A static model call receives input, produces output, and terminates. OpenClaw agents persist across sessions, execute tools with real-world consequences, retrieve content from external sources, and operate with delegated authority to act on behalf of users. Each of these properties creates an attack surface that does not exist in static model deployments.
Threat Surface 1: Secrets accessible to the agent runtime
The common bare deployment pattern stores API keys as environment variables accessible to the Python or Node process running the agent. That creates a simple exploitation pattern: an indirect prompt injection via a retrieved document instructs the agent to include environment variable values in its response or tool output. If the runtime can read the raw key, the model can be induced to mishandle it. No exception is thrown. No alert fires.
Threat Surface 2: Unrestricted tool execution
OpenClaw agents can be granted access to tools — web search, code execution, file system access, database queries, email, API calls — and by default, there is no per-tool permission enforcement at the runtime layer. A user with access to an agent inherits all tools that agent has been given. Documented incident: an agent given file system read access for document indexing was prompted to read files in adjacent directories outside its intended scope. No permission boundary prevented this.
Threat Surface 3: Indirect prompt injection via retrieved content
Direct prompt injection via user input is relatively easy to defend against. Indirect injection — where adversarial instructions are embedded in content the agent retrieves from external sources — is the higher-severity attack surface and the one most teams leave entirely undefended. Documented incident: an agent summarizing competitor web pages encountered a page with hidden text instructions to output the agent's system prompt. The agent complied, exposing deployment configuration and internal tooling details.
Threat Surfaces 4–6 — missing audit trails, no infrastructure-level cost enforcement, and shared compute blast radius — are documented in the full report with the same specificity: exact failure mode, documented incident class, and the specific control that addresses each.
The three-property secrets architecture that blocks indirect prompt injection key exfiltration, with implementation details for each secrets manager option.
The repeatable failure pattern requires three conditions: the agent retrieves content from external sources, the attacker can inject content into those sources, and API keys are accessible within the agent runtime. When all three conditions are met, the attacker embeds instructions in retrieved content directing the agent to include secret values in its response or tool output. If the runtime can read the raw key, there is a path to exfiltration. No infrastructure alert fires. No exception is thrown.
The three-property secrets architecture that blocks it:
Property 1: Runtime inaccessibility. The agent runtime should not have access to the raw value of any secret. Instead of OPENAI_API_KEY=sk-... in the environment, the agent calls a secrets management endpoint that returns a short-lived capability token. The raw key never appears in the agent's environment — there is nothing to exfiltrate.
Property 2: Automatic rotation. Secrets rotate automatically without manual intervention. Model API keys: 30 days. Integration tokens: 90 days. Database credentials: 7 days. Rotation should not cause service interruption and should require no manual action.
Property 3: Access logging. Every secret access — which secret, which service, which timestamp — is logged and monitored for anomalies. A spike in secret access requests is an early indicator of a compromise attempt or a runaway session.
The full report includes implementation details for AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault, and any verified stack-native secret layer — with honest tradeoffs for each option.
4 more sections in this report
What unlocks with purchase:
The Four-Layer Prompt Injection Defense Stack
Why single-layer prompt injection defense always has a bypass — and how the four-layer stack prevents the classes that each individual layer misses.
External Audit Logging: Why Agent Memory Cannot Be the Source of Truth
External audit logging: the write-once storage policy, the 60-second delivery requirement, and why agent memory cannot be the incident investigation source of truth.
The 12-Item Pre-Production Hardening Checklist
12 go/no-go items with test evidence requirements — what enterprise procurement asks for and that most teams cannot produce on first request.
NemoClaw as the Control Plane: Architecture and Compliance Posture
NemoClaw evaluation: intended component-to-threat-surface mapping and SOC 2, HIPAA, and FINRA control-mapping caveats.
One-time purchase · Instant access · No subscription
Why single-layer prompt injection defense always has a bypass — and how the four-layer stack prevents the classes that each individual layer misses.
Every documented prompt injection defense has a bypass. Input validation fails against sufficiently obfuscated injections. Output validation misses attacks that use the agent's capabilities in technically correct but unintended ways. System prompt hardening degrades over long context windows. No single layer provides reliable defense. The correct approach is defense-in-depth.
Layer 1: Input sanitization — Validate and sanitize all text that enters the agent's context from external sources. Strip HTML from retrieved content. Detect and flag content containing patterns common in injection attacks. Rate-limit the volume of external content that can enter a single context window.
Layer 2: Privilege separation — The agent that retrieves external content should not have the same tool permissions as the agent that takes action. A retrieval agent has read-only access. An action agent receives a sanitized summary — not raw retrieved content. A successful injection in retrieved content only compromises the retrieval agent's limited capabilities.
Layer 3: Output validation — Before any agent output is returned to the user or passed to another system, validate that it does not contain environment variable names, API key formats, system prompt fragments, or instructions addressed to external systems. Flag for human review rather than silently dropping — silent dropping masks ongoing attacks.
Layer 4: Audit and anomaly detection — A monitoring system that baselines normal agent behavior and alerts on deviations is the last line of defense and the one most likely to catch attacks that bypass all other layers. Even a successful injection leaves traces: unusual tool call sequences, unexpected external requests, atypically high token consumption before a consequential action.
External audit logging: the write-once storage policy, the 60-second delivery requirement, and why agent memory cannot be the incident investigation source of truth.
OpenClaw's built-in logging captures conversation history, tool call inputs and outputs, and session metadata — stored in agent memory. This creates three investigation-critical gaps.
Gap 1: Mutability. The agent can modify its own memory. Logs stored in agent memory are unreliable for incident investigation — they can be altered by the same attack that caused the incident.
Gap 2: Inaccessibility. Logs stored in agent memory are not accessible to a SIEM, log aggregation system, or compliance auditor without specific export configuration that most teams never implement.
Gap 3: Format mismatch. The default log format optimizes for agent context, not forensic analysis. It lacks the structured fields, consistent schema, and timestamp precision that incident investigation and compliance audit require.
The required audit architecture: Every tool call, external request, memory operation, model invocation, and authentication event logged with structured JSON, consistent schema, and millisecond-precision timestamps. Logs shipped to an external destination immediately on creation — not buffered in agent memory first. Write-once or append-only storage with hash verification. Minimum 90-day retention; 1 year for regulated industries.
Documented incident: a team investigating a suspected data exfiltration found that the relevant session logs had been overwritten during a memory compaction routine. The incident could not be fully reconstructed. The audit failure was as damaging as the incident itself in the enterprise procurement review that followed.
12 go/no-go items with test evidence requirements — what enterprise procurement asks for and that most teams cannot produce on first request.
These 12 items are the go/no-go criteria before any OpenClaw deployment goes to production with real user data or real-world consequences. Every item has a test evidence requirement — not an assertion, not a vendor claim, not a documentation reference.
8–12 cover rollback procedures, compliance posture documentation, adversarial prompt test sets (minimum 20 inputs across 4 attack categories), cost monitoring alerts, and incident response runbook review — each with the same test evidence standard.
The procurement reality: Enterprise buyers now ask for this checklist completed with test evidence on first submission. Teams that can produce it move 3x faster through procurement than teams that provide assertions and documentation links.
NemoClaw evaluation: intended component-to-threat-surface mapping and SOC 2, HIPAA, and FINRA control-mapping caveats.
NemoClaw is an early-preview control-plane direction for OpenClaw-style deployments. Treat each component as a control to verify, not a production guarantee. The useful exercise is mapping intended controls to the six threat surfaces.
The component-to-threat-surface mapping: - Isolated compute namespace with NetworkPolicy → shared compute blast radius - Vault-integrated secrets management → runtime-visible secret exposure - RBAC + SSO integration → unrestricted tool execution - Prompt sanitization pipeline → indirect prompt injection - Tamper-evident external audit log → missing audit trail - Gateway-level token budget enforcement → cost explosion from runaway sessions
Compliance posture for regulated industries:
SOC 2 Type II: Access controls (CC6.1), logical and physical access restrictions (CC6.2, CC6.3), change management (CC8.1), risk assessment (CC3.1, CC3.2), and monitoring (CC7.1, CC7.2). A NemoClaw-based architecture may support these controls as it matures; SOC 2 certification requires implementation evidence and an independent audit.
HIPAA: Private inference routing ensures that PHI processed by the agent runtime does not transit a shared public API surface. The external audit logging supports controls under 45 CFR § 164.312(b). A Business Associate Agreement with your cloud provider is still required separately.
FINRA: The tamper-evident external audit log supports record-keeping requirements under FINRA Rule 4511. The audit architecture is designed to be defensible under regulatory examination — not just internally documented.
Every claim in this report traces to a verifiable source.
Last reviewed March 18, 2026
Who wrote this, what evidence shaped it, and how the recommendations are framed.
Author: Rare Agent Work Team · Written and maintained by the Rare Agent Work Team.
Proof 1
Six threat surfaces are mapped to concrete failure classes: exposed secrets, unrestricted tools, indirect prompt injection, missing audit trails, missing cost controls, and shared compute blast radius.
Proof 2
12-item pre-production checklist includes test evidence requirements for each control — not assertions, not documentation references.
Proof 3
NemoClaw section treats the stack as early preview and maps intended controls to the evidence teams should verify before relying on them.
Powered by Claude — trained on this report's content. Your first question is free.
Ask anything about implementation, setup, or how to apply the concepts in this report. Your first question is free — then we'll ask you to sign in.
Powered by Claude · First question free
When the report isn't enough
Architecture review, implementation rescue, and strategy calls for teams with real blockers. Every intake is read by a human before any next step.