Securing Agentic AI: What OpenClaw gets wrong and how to do it right

Attackers have always loved shells, and now shellfish is on the menu.

What is OpenClaw?

Officially introduced in late January 2026 by Peter Steinberger, OpenClaw is an open-source, self-hosted artificial intelligence system that runs an agentic loop to serve as a powerful AI assistant.

It offers an abstraction from traditional “conversation turn” chatbots that aims to empower agents to plan tasks, make decisions, manage environments, and self-improve over time.

OpenClaw provides an AI model task-planning capabilities through features like:

Generating sub-agents to handle limited-scope tasks
Code execution within the installation environment
Scheduling jobs for future execution
Self-modification of personality and role definition files
Downloadable plugins and “skills” for tasks defined by other users

These capabilities enable OpenClaw to break down complex tasks and fulfil user requirements independently (in some cases, even executing while users sleep). Although the toolbox offers powerful functionality for automated tasks, the attack surface it introduces is near-limitless.

This article at a glance

OpenClaw, an agentic orchestration framework, has sent shockwaves through the hobbyist and professional AI automation crowd looking to reshape the future of digital assistants.
The runtime enables AI agents to adapt to users’ tasks and independently solve problems
OpenClaw is insecure out-of-the-box. Straying far from basic chatbot capabilities quickly exposes the system to significant risk
Secure OpenClaw deployments would require rearchitecting much of the system’s independent control flows and require organizations to tightly manage its capabilities and data flows

Agentic AI threat models

AI introduces novel considerations to traditional application threat modeling. Unlike traditional applications, in which the control flow is defined by the application components, AI systems are controlled by the data itself within the model’s context window. To use an analogy, if traditional applications are a series of pipes with well-defined paths and valves to control the flow of data moving through the system, then AI applications are pipes that can completely change their own layout depending on the data flowing inside of them.

For the first time in widespread application security, control flow and data are identical. AI security controls revolve around implementing constraints around the AI flows on a software architecture level. When data controls the environment’s execution pipeline, system architects are challenged to design structured trust boundaries that limit the model’s access to privileged resources based on the trust level of the data itself.

For example, an agent exposed to untrusted input should no longer have access to high-privilege tool calls, API keys (which should arguably never be exposed to agents in the first place), or other high-privilege agents.

OpenClaw is fundamentally an unstructured agentic system and lacks security levers to manage trusted and untrusted data, functions, and state. Although OpenClaw can spawn sub-agents to manage tasks, this functionality is not advertised as a security feature and does not prevent sub-agents from passing poisoned responses back to the orchestrator agent.

As bleak as the present landscape is, OpenClaw is not without hope —the security architecture model it can most easily adapt to is an Orchestration Tree design pattern, in which a centralized control agent divvies tasks to sub-agents with known privilege levels and restricted message passing between agents in different trust zones. But as it stands, the ambiguous “open-permission” model of OpenClaw violates Zero Trust principles and is vulnerable by design.

OpenClaw's security risks

The technology industry has long battled with security-functionality tradeoffs, and OpenClaw resides on the far end of that spectrum. It provides powerful tools for agents to execute tasks independently, but any number of those tasks could contain malicious instructions. Every instance of OpenClaw accessing data with an unknown trust level presents a foothold for threat actors.

Even if that data is served from a “secure” API, the provenance of the data itself is what determines its trustworthiness in AI systems. OpenClaw provides few, if any, tools to manage the AI’s access to privileged resources when exposed to data that might be polluted. Furthermore, OpenClaw’s plugin functionality enables its operators to inadvertently fall into these architectural traps.

OpenClaw empowers other users to publish “Skills” comprised of writeups and code files to empower agents with new capabilities. Skills are rarely developed with proper data segmentation, and the greater number of skills an agent installs, the greater its attack surface. For instance, several skills enable agents to interact with external services with unknown data provenance.

In other cases, NCC Group’s investigation team discovered blatantly malicious skills that gave remote attackers command line access to the OpenClaw server. NCC Group reported the malicious skill to OpenClaw, but this risk demonstrates the need for clear auditing of any skill installed to the platform.

The week this article was written, OpenClaw integrated VirusTotal scanning, which purports to leverage LLMs to detect malicious packages. Unfortunately, these LLM-powered scans fail to properly distinguish detailed skill instructions from prompt injection and report numerous false positives. OpenClaw operators may quickly begin to ignore the VirusTotal results due to their unreliability.

OpenClaw also enables persistence of malicious context through its self-writeable SOUL, IDENTITY, USER, and other memory-related files. These features are intended to enable the system to better define its operating parameters over time, but also serve as an ever-present data sink for threat actors to permanently alter the trustworthiness of the model.

In summary, OpenClaw does not yet appear to offer the architectural defense patterns AI-powered applications rely on to resist natural language attacks. Defense teams should tread cautiously before adopting the platform in any serious capacity, and doubly so when interacting with plugins, skills, or data not entirely controlled by the organization.

Approach OpenClaw with caution

Organizations that insist on implementing OpenClaw should design rigid constraints around the data the model can access and how that data is managed within the model context window. For example, suppose a user asks the model to find all urgent emails the user has received today. In order to prevent email-based prompt injection, the model should delegate email review to a sub-agent (if possible, one sub-agent per email) with access to no tool calls or code execution. Those sub-agents should return a Boolean true/false value to the primary agent indicating whether its email is urgent, and deterministic code should validate that the returned value is truly a Boolean rather than a hidden prompt injection.

In this system, the worst-case scenario is that an email is inappropriately marked as urgent. These architectural constraints limit the flexibility of OpenClaw, but are necessary to implement a secure agentic environment.

OpenClaw developers have certainly tried to implement certain security restrictions, including (off by default) sandboxing restrictions. Although sandboxing provides some constraints on the execution environment itself, sandboxing does not protect our software’s most critical asset: its data. OpenClaw’s documentation itself states that sandboxing exists to limit impact when “the model does something dumb,” not as a powerful security restriction.

More interestingly, OpenClaw provides an orchestration system named Lobster intended to enable intervention in agentic workflows and reduce the blast radius of execution-gone-wrong. Although Lobster demonstrates that the developers are considering the right solutions (execution pipelines), it does not equip operators or skill developers to sufficiently isolate low-privilege operations or pass safe datatypes between trust zones and is disabled by default.

Lobster has promising potential to be adapted into a secure AI orchestration system with sufficient technical effort.

In conclusion

OpenClaw is an impressive platform that does not yet offer the rigid security tooling necessary to secure agentic systems. However, the example cases and discussions it generates are valuable to the security industry when demonstrating the types of risk agentic AI introduces to otherwise-secure environments. Organizations looking to deploy OpenClaw should consider the risks and technical debts necessary to sufficiently constrain the system’s behavior and protect privileged resources from attacker-controlled data.

Many organizations face severe challenges when implementing agentic systems. NCC Group’s AI Red Team and Threat Modeling experts have a proven track record of critical AI findings. The security controls our teams recommend can mitigate prompt injection and other AI vulnerabilities once and for all before they become a risk in your production systems.

David Brauchler III

Technical Director, NCC Group NA

David Brauchler III is an NCC Group Technical Director based in Dallas, Texas. He is an adjunct professor for the Cyber Security graduate program at Southern Methodist University with a master's degree in Security Engineering and the Offensive Security Certified Professional (OSCP) certification

David published Analyzing AI Application Threat Models, which introduced new Models-As-Threat-Actors (MATA) methodology to the AI security industry and provided a new trust flow centric approach to evaluating risk in AI/ML-integrated environments. He has also released several new threat vector categories, AI/ML security controls, and recommendations to maximize the effectiveness of AI penetration tests.

FAQs

Q: Can I trust skills published to ClawHub?

A: Not without manually auditing their contents, even if the VirusTotal scan comes up clean. Even if the skill itself is safe, any untrusted data introduced by external resources is both unscannable and cannot easily be audited in advance.

Q: Is there any way to deploy a secure OpenClaw instance?

A: At this time, a secure configuration would require significant technical intervention to ensure that the system enforces proper trust boundaries around application data. In effect, every organization that seeks to leverage OpenClaw would have to customize its configuration to match their business case.

Q: What if we add more guardrails or prompt injection detection models?

A: Neither guardrails nor watchdog models have proven to reliably block prompt injection (NCC Group’s AI Red Team currently has a 100% success rate in bypassing these controls). Instead, applications should be designed from an architectural level to mitigate the impact of prompt injection. Our AI Threat Modeling experts can help your team design these control flows catered to your production environment.

Secure your AI systems before you're in a pinch.

Our AI security experts are dedicated to researching novel threats and the controls needed to counter them. Reach out today to safely unlock AI's potential for your business.

Get in touch