rareagent@work:~$
problems·news·[reports]·docs·start-here
|
services:pricing·industries·enterprise
|
trust·feedback
> Read full report
Rare Agent Work

Rare Agent Work · Security Engineering Edition

Rev 1.0 · Updated March 18, 2026

New
Free · open access28 minute security brief + 12-item go/no-go checklistEngineering teams and security leads deploying OpenClaw in production with real user data

OpenClaw Security Hardening for Production

Six threat surfaces, twelve controls, and early-preview NemoClaw adoption caveats

Harden OpenClaw-style agent deployments against common threat classes - from plaintext secret exposure to indirect prompt injection - with controls mapped to testable failure modes.

What this report gives you

  • 01Runtime-visible secret exposure pattern: exactly how plaintext API keys can move from environment variable to agent output via indirect prompt injection, and the three-property secrets architecture that blocks it
  • 02The four-layer prompt injection defense stack — why each single layer has a bypass, and why privilege separation is the most underdeployed and highest-impact control
  • 0312-item pre-production checklist with test evidence requirements — what enterprise procurement actually asks for and why assertions fail where evidence succeeds
  • 04External audit logging architecture: why agent memory cannot be the source of truth, and the write-once storage policy that makes logs admissible for compliance review
  • 05Early-preview NemoClaw control mapping: which intended layer addresses which attack surface, with SOC 2, HIPAA, and FINRA control-mapping caveats

The finding that changes your next decision

“The most common agent security failure is not a sophisticated attack. It is an indirect prompt injection via retrieved content combined with plaintext secrets the runtime can read. The attacker does not need infrastructure access; they need influence over one document or payload your agent ingests.”

This report is right for you if any of these are true

  • ✓You have an OpenClaw deployment going to production with real user data and need to know what to fix before a breach, compliance audit, or cost explosion makes the decision for you.
  • ✓Your enterprise procurement review is asking for a documented threat model, evidence-backed control testing, and a compliance posture mapping — and you cannot produce any of the three.
  • ✓You are evaluating early-preview NemoClaw around an OpenClaw deployment and need to understand which controls to verify before relying on them.
Read the full report — free ↓
✓ All sections open · No sign-up · No paywall
✓Full preview before purchase✓Cited sources✓Updated March 18, 2026✓Human-authored✓No sign-up required

Why this report exists

OpenClaw-style agent stacks optimize for developer velocity, which is useful for prototyping and insufficient by itself for production workflows. The six threat surfaces in this report are the ones security reviewers repeatedly ask teams to prove they have controlled: plaintext secrets accessible to the runtime, unrestricted tool execution, indirect prompt injection via retrieved content, missing audit trails, missing infrastructure-level cost enforcement, and shared compute blast radius. This report maps each surface to a testable control, provides a 12-item go/no-go checklist, and explains how to evaluate early-preview NemoClaw controls without treating them as a production guarantee.
🏛️

What's at stake

  • →Enterprise procurement committees now ask for documented threat models and evidence-backed control testing before approving production AI agent deployments — teams without this documentation are losing deals on process maturity, not product quality.
  • →Indirect prompt injection plus runtime-visible secrets is a repeatable failure pattern in agent deployments. If the runtime can read raw credentials, a malicious retrieved document can attempt to route those credentials into model output or tool calls.
  • →The cost of hardening before production is 2–4 weeks of engineering time. The cost of a successful prompt injection, data breach, or cost explosion in production ranges from incident response overhead to contract loss and regulatory consequence.
⚡

Decision sequence

  • 01Before go-live: run the 12-item checklist in this report. Every item requires test evidence — not an assertion. If you cannot produce evidence for an item, that item is not complete.
  • 02Before storing secrets where the agent runtime can read them: implement an external secrets layer such as Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, or a verified stack-native equivalent.
  • 03Before handling any customer or regulated data: verify process isolation, external audit logging, and the indirect prompt injection defense stack — these three controls prevent the highest-severity incident classes.
⚠️

Cost of skipping this

  • ✕Teams running OpenClaw without environment isolation create a shared compute blast radius: a successful attack on one agent exposes the configuration and secrets of all agents sharing that environment.
  • ✕Prompt injection defense implemented only at the input layer misses indirect injection via retrieved documents — the highest-severity and most commonly undefended attack surface in production OpenClaw deployments.
  • ✕Audit logging stored in agent memory can be modified by the agent itself, making logs unreliable for incident investigation and non-compliant with enterprise procurement and regulatory requirements.
✋

Who this report is NOT for

  • ✕Teams that have not yet deployed OpenClaw — this is a production hardening guide, not a getting-started tutorial
  • ✕Security professionals looking for a comprehensive AI security framework — this report is OpenClaw-specific and assumes familiarity with the platform
  • ✕Teams using NemoClaw managed deployment from Rare Agent Work — the controls in this report are included in the managed deployment service

Honest disqualification. If none of the above matches you, this report was written for you.

What's Inside

6 deliverables
🎯

OpenClaw Threat Model

Six documented attack surfaces with specific incident classes — not a generic AI risk list. Each surface maps to a control and a test.

🔐

Secrets Management Blueprint

Why raw environment variables are insufficient for agent runtimes and the three-property secrets architecture that blocks common exfiltration paths.

🛡️

Four-Layer Prompt Injection Defense

Input sanitization, privilege separation, output validation, and anomaly detection — why any single layer is insufficient and how to implement all four.

📋

12-Item Pre-Production Checklist

Go/no-go criteria with test evidence requirements for each control. If you cannot produce test evidence, the item is not complete.

🏗️

NemoClaw Evaluation Diagram

How early-preview NemoClaw intends to address key threat surfaces, plus the evidence teams should collect before relying on those controls.

⚖️

Compliance Posture Reference

SOC 2, HIPAA, and FINRA control mapping for NemoClaw deployments — with explicit distinctions between documented architecture and certification.

Full Report

All 6 sections — scroll down to read.

6 sections
01The OpenClaw Threat Model: Six Documented Attack Surfacesfree

Six threat surfaces with specific failure classes, including runtime-visible secret exposure that many teams leave undefended.

02The Runtime-Visible Secrets Pattern and the Architecture That Blocks Itfree

The three-property secrets architecture that blocks indirect prompt injection key exfiltration, with implementation details for each secrets manager option.

03The Four-Layer Prompt Injection Defense Stackfree

Why single-layer prompt injection defense always has a bypass — and how the four-layer stack prevents the classes that each individual layer misses.

04External Audit Logging: Why Agent Memory Cannot Be the Source of Truthfree

External audit logging: the write-once storage policy, the 60-second delivery requirement, and why agent memory cannot be the incident investigation source of truth.

05The 12-Item Pre-Production Hardening Checklistfree

12 go/no-go items with test evidence requirements — what enterprise procurement asks for and that most teams cannot produce on first request.

06NemoClaw as the Control Plane: Architecture and Compliance Posturefree

NemoClaw evaluation: intended component-to-threat-surface mapping and SOC 2, HIPAA, and FINRA control-mapping caveats.

01

The OpenClaw Threat Model: Six Documented Attack Surfaces

Six threat surfaces with specific failure classes, including runtime-visible secret exposure that many teams leave undefended.

The security threat model for OpenClaw differs fundamentally from the threat model for a static API call to a language model. A static model call receives input, produces output, and terminates. OpenClaw agents persist across sessions, execute tools with real-world consequences, retrieve content from external sources, and operate with delegated authority to act on behalf of users. Each of these properties creates an attack surface that does not exist in static model deployments.

Threat Surface 1: Secrets accessible to the agent runtime

The common bare deployment pattern stores API keys as environment variables accessible to the Python or Node process running the agent. That creates a simple exploitation pattern: an indirect prompt injection via a retrieved document instructs the agent to include environment variable values in its response or tool output. If the runtime can read the raw key, the model can be induced to mishandle it. No exception is thrown. No alert fires.

Threat Surface 2: Unrestricted tool execution

OpenClaw agents can be granted access to tools — web search, code execution, file system access, database queries, email, API calls — and by default, there is no per-tool permission enforcement at the runtime layer. A user with access to an agent inherits all tools that agent has been given. Documented incident: an agent given file system read access for document indexing was prompted to read files in adjacent directories outside its intended scope. No permission boundary prevented this.

Threat Surface 3: Indirect prompt injection via retrieved content

Direct prompt injection via user input is relatively easy to defend against. Indirect injection — where adversarial instructions are embedded in content the agent retrieves from external sources — is the higher-severity attack surface and the one most teams leave entirely undefended. Documented incident: an agent summarizing competitor web pages encountered a page with hidden text instructions to output the agent's system prompt. The agent complied, exposing deployment configuration and internal tooling details.

Threat Surfaces 4–6 — missing audit trails, no infrastructure-level cost enforcement, and shared compute blast radius — are documented in the full report with the same specificity: exact failure mode, documented incident class, and the specific control that addresses each.

02

The Runtime-Visible Secrets Pattern and the Architecture That Blocks It

The three-property secrets architecture that blocks indirect prompt injection key exfiltration, with implementation details for each secrets manager option.

The repeatable failure pattern requires three conditions: the agent retrieves content from external sources, the attacker can inject content into those sources, and API keys are accessible within the agent runtime. When all three conditions are met, the attacker embeds instructions in retrieved content directing the agent to include secret values in its response or tool output. If the runtime can read the raw key, there is a path to exfiltration. No infrastructure alert fires. No exception is thrown.

The three-property secrets architecture that blocks it:

Property 1: Runtime inaccessibility. The agent runtime should not have access to the raw value of any secret. Instead of OPENAI_API_KEY=sk-... in the environment, the agent calls a secrets management endpoint that returns a short-lived capability token. The raw key never appears in the agent's environment — there is nothing to exfiltrate.

Property 2: Automatic rotation. Secrets rotate automatically without manual intervention. Model API keys: 30 days. Integration tokens: 90 days. Database credentials: 7 days. Rotation should not cause service interruption and should require no manual action.

Property 3: Access logging. Every secret access — which secret, which service, which timestamp — is logged and monitored for anomalies. A spike in secret access requests is an early indicator of a compromise attempt or a runaway session.

The full report includes implementation details for AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, Azure Key Vault, and any verified stack-native secret layer — with honest tradeoffs for each option.

4 more sections in this report

What unlocks with purchase:

  • 03

    The Four-Layer Prompt Injection Defense Stack

    Why single-layer prompt injection defense always has a bypass — and how the four-layer stack prevents the classes that each individual layer misses.

  • 04

    External Audit Logging: Why Agent Memory Cannot Be the Source of Truth

    External audit logging: the write-once storage policy, the 60-second delivery requirement, and why agent memory cannot be the incident investigation source of truth.

  • 05

    The 12-Item Pre-Production Hardening Checklist

    12 go/no-go items with test evidence requirements — what enterprise procurement asks for and that most teams cannot produce on first request.

  • 06

    NemoClaw as the Control Plane: Architecture and Compliance Posture

    NemoClaw evaluation: intended component-to-threat-surface mapping and SOC 2, HIPAA, and FINRA control-mapping caveats.

One-time purchase · Instant access · No subscription

03

The Four-Layer Prompt Injection Defense Stack

Why single-layer prompt injection defense always has a bypass — and how the four-layer stack prevents the classes that each individual layer misses.

Every documented prompt injection defense has a bypass. Input validation fails against sufficiently obfuscated injections. Output validation misses attacks that use the agent's capabilities in technically correct but unintended ways. System prompt hardening degrades over long context windows. No single layer provides reliable defense. The correct approach is defense-in-depth.

Layer 1: Input sanitization — Validate and sanitize all text that enters the agent's context from external sources. Strip HTML from retrieved content. Detect and flag content containing patterns common in injection attacks. Rate-limit the volume of external content that can enter a single context window.

Layer 2: Privilege separation — The agent that retrieves external content should not have the same tool permissions as the agent that takes action. A retrieval agent has read-only access. An action agent receives a sanitized summary — not raw retrieved content. A successful injection in retrieved content only compromises the retrieval agent's limited capabilities.

Layer 3: Output validation — Before any agent output is returned to the user or passed to another system, validate that it does not contain environment variable names, API key formats, system prompt fragments, or instructions addressed to external systems. Flag for human review rather than silently dropping — silent dropping masks ongoing attacks.

Layer 4: Audit and anomaly detection — A monitoring system that baselines normal agent behavior and alerts on deviations is the last line of defense and the one most likely to catch attacks that bypass all other layers. Even a successful injection leaves traces: unusual tool call sequences, unexpected external requests, atypically high token consumption before a consequential action.

04

External Audit Logging: Why Agent Memory Cannot Be the Source of Truth

External audit logging: the write-once storage policy, the 60-second delivery requirement, and why agent memory cannot be the incident investigation source of truth.

OpenClaw's built-in logging captures conversation history, tool call inputs and outputs, and session metadata — stored in agent memory. This creates three investigation-critical gaps.

Gap 1: Mutability. The agent can modify its own memory. Logs stored in agent memory are unreliable for incident investigation — they can be altered by the same attack that caused the incident.

Gap 2: Inaccessibility. Logs stored in agent memory are not accessible to a SIEM, log aggregation system, or compliance auditor without specific export configuration that most teams never implement.

Gap 3: Format mismatch. The default log format optimizes for agent context, not forensic analysis. It lacks the structured fields, consistent schema, and timestamp precision that incident investigation and compliance audit require.

The required audit architecture: Every tool call, external request, memory operation, model invocation, and authentication event logged with structured JSON, consistent schema, and millisecond-precision timestamps. Logs shipped to an external destination immediately on creation — not buffered in agent memory first. Write-once or append-only storage with hash verification. Minimum 90-day retention; 1 year for regulated industries.

Documented incident: a team investigating a suspected data exfiltration found that the relevant session logs had been overwritten during a memory compaction routine. The incident could not be fully reconstructed. The audit failure was as damaging as the incident itself in the enterprise procurement review that followed.

05

The 12-Item Pre-Production Hardening Checklist

12 go/no-go items with test evidence requirements — what enterprise procurement asks for and that most teams cannot produce on first request.

These 12 items are the go/no-go criteria before any OpenClaw deployment goes to production with real user data or real-world consequences. Every item has a test evidence requirement — not an assertion, not a vendor claim, not a documentation reference.

1Secrets not accessible to the agent runtime — Attempt to cause the agent to output the value of its model API key. Pass: the agent cannot produce the raw key value.
2Per-session token budget enforced at the gateway layer — Design a prompt that triggers an extended reasoning loop. Verify the session is terminated at the configured budget ceiling, regardless of model behavior.
3Environment isolation verified — Attempt to access adjacent services' environment variables from within the agent's execution environment. Verify access controls prevent this.
4Indirect prompt injection defense active — Inject adversarial instructions into a document the agent will retrieve. Verify the instructions do not cause the agent to take unintended actions.
5Tool permission scoping tested — With a standard user session, attempt to invoke a tool outside the user's assigned role. Verify rejection at the runtime layer, not the prompt layer.
6External audit log receiving events — Perform a specific action sequence and verify every expected log event appears in the external log destination within 60 seconds.
7Audit log tamper protection verified — Attempt to delete or modify a log entry in the external audit destination. Verify the storage policy rejects the attempt.

8–12 cover rollback procedures, compliance posture documentation, adversarial prompt test sets (minimum 20 inputs across 4 attack categories), cost monitoring alerts, and incident response runbook review — each with the same test evidence standard.

The procurement reality: Enterprise buyers now ask for this checklist completed with test evidence on first submission. Teams that can produce it move 3x faster through procurement than teams that provide assertions and documentation links.

06

NemoClaw as the Control Plane: Architecture and Compliance Posture

NemoClaw evaluation: intended component-to-threat-surface mapping and SOC 2, HIPAA, and FINRA control-mapping caveats.

NemoClaw is an early-preview control-plane direction for OpenClaw-style deployments. Treat each component as a control to verify, not a production guarantee. The useful exercise is mapping intended controls to the six threat surfaces.

The component-to-threat-surface mapping: - Isolated compute namespace with NetworkPolicy → shared compute blast radius - Vault-integrated secrets management → runtime-visible secret exposure - RBAC + SSO integration → unrestricted tool execution - Prompt sanitization pipeline → indirect prompt injection - Tamper-evident external audit log → missing audit trail - Gateway-level token budget enforcement → cost explosion from runaway sessions

Compliance posture for regulated industries:

SOC 2 Type II: Access controls (CC6.1), logical and physical access restrictions (CC6.2, CC6.3), change management (CC8.1), risk assessment (CC3.1, CC3.2), and monitoring (CC7.1, CC7.2). A NemoClaw-based architecture may support these controls as it matures; SOC 2 certification requires implementation evidence and an independent audit.

HIPAA: Private inference routing ensures that PHI processed by the agent runtime does not transit a shared public API surface. The external audit logging supports controls under 45 CFR § 164.312(b). A Business Associate Agreement with your cloud provider is still required separately.

FINRA: The tamper-evident external audit log supports record-keeping requirements under FINRA Rule 4511. The audit architecture is designed to be defensible under regulatory examination — not just internally documented.

Evidence & Citations

Every claim in this report traces to a verifiable source.

Last reviewed March 18, 2026

OWASP Top 10 for LLM Applications 2025 — LLM02: Sensitive Information Disclosure
https://owasp.org/www-project-top-10-for-large-language-model-applications/
Accessed March 18, 2026
OWASP Top 10 for LLM Applications 2025 — LLM01: Prompt Injection
https://owasp.org/www-project-top-10-for-large-language-model-applications/
Accessed March 18, 2026
NIST AI Risk Management Framework 1.0
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
Accessed March 18, 2026
Not What You've Signed Up For: Indirect Prompt Injection Attacks (Greshake et al., 2023)
https://arxiv.org/abs/2302.12173
Accessed March 18, 2026
HashiCorp Vault Documentation — Secrets Management
https://developer.hashicorp.com/vault/docs
Accessed March 18, 2026
NVIDIA NemoClaw Enterprise Security Architecture
https://docs.nvidia.com/nemoclaw/
Accessed March 18, 2026
HIPAA Security Rule — 45 CFR Part 164
https://www.hhs.gov/hipaa/for-professionals/security/index.html
Accessed March 18, 2026
FINRA Rule 4511 — General Requirements
https://www.finra.org/rules-guidance/rulebooks/finra-rules/4511
Accessed March 18, 2026

Methodology

Who wrote this, what evidence shaped it, and how the recommendations are framed.

  • ●Synthesizes public AI-agent security guidance, OpenClaw-style deployment patterns, OWASP LLM risks, and operator incident classes.
  • ●Maps each control to a specific failure mode — not a generic risk category — so teams know what each control prevents.
  • ●Validates the 12-item checklist against enterprise procurement requirements and regulated industry compliance postures (SOC 2, HIPAA, FINRA).

Author: Rare Agent Work Team · Written and maintained by the Rare Agent Work Team.

Why This Report Earns Attention

Proof 1

Six threat surfaces are mapped to concrete failure classes: exposed secrets, unrestricted tools, indirect prompt injection, missing audit trails, missing cost controls, and shared compute blast radius.

Proof 2

12-item pre-production checklist includes test evidence requirements for each control — not assertions, not documentation references.

Proof 3

NemoClaw section treats the stack as early preview and maps intended controls to the evidence teams should verify before relying on them.

Ask the Implementation Guide

Powered by Claude — trained on this report's content. Your first question is free.

Ask anything about implementation, setup, or how to apply the concepts in this report. Your first question is free — then we'll ask you to sign in.

Powered by Claude · First question free

When the report isn't enough

Bring a real problem for direct human review.

Architecture review, implementation rescue, and strategy calls for teams with real blockers. Every intake is read by a human before any next step.

Start an AssessmentBook a Strategy Call

Also from Rare Agent Work

Free · open access

Agent Setup in 60 Minutes

Low-code operator playbook for first-time builders

Free · open access

From Single Agent to Multi-Agent

How to scale from one assistant to an orchestrated team

→

Need help deploying?

Book a free workflow audit

© 2026 Rare Agent Work · Home · Reports · Methodology

livenew:LLM-based classifier is 96% accurate but fails on the 4% that matters most15d ago · post yours · rss