AI Security

AI and Security: How Agents Like OpenClaw Can Be Exploited

By Hélain ZimmermannCo-Founder & CTO @ Ailog · ex-INRIA researcherFeb 14, 2026Updated Mar 30, 2026

8 min readbeginner

AI SecurityOpenClawCybersecurityAI AgentsInfostealersThreat Model

The rise of autonomous AI agents has been one of the defining trends of early 2026. Among them, OpenClaw has captured enormous attention: an open-source agent framework that lets anyone deploy a personal AI assistant capable of reading emails, scheduling meetings, making purchases, and interacting with dozens of services on your behalf. Within weeks of its initial release, thousands of instances were running worldwide. The problem? Most of them were deployed with little regard for security.

As Forbes contributor Mark Kraynak put it bluntly: "OpenClaw showed the future of AI security, and it's going to be rough." He is not wrong. The security implications of giving an autonomous software agent your API keys, email credentials, and payment information, then exposing it to the internet, are severe. And yet, that is exactly what tens of thousands of early adopters did.

What Happened: Exposed Instances and Leaked Keys

Security researchers at BitSight conducted a sweep of publicly accessible OpenClaw instances in late January 2026. What they found was alarming: over 8,000 instances were reachable from the open internet, many with default configurations that exposed admin panels, conversation histories, and, critically, stored API keys in plaintext.

The root cause was a combination of factors. OpenClaw's default setup guide assumed a local-only deployment. But users, eager to access their agent from mobile devices or share it with family members, punched holes in their firewalls or deployed on cloud VMs without configuring authentication. The framework's web UI, designed for convenience, did not enforce authentication out of the box in early versions.

Within days of the BitSight report, infostealers specifically targeting OpenClaw configuration files began circulating. These malware variants scanned infected machines for OpenClaw's config.yaml and credentials.json files, exfiltrating API keys for OpenAI, Anthropic, Google, and various SaaS platforms. The stolen keys were then used for cryptomining, spam campaigns, and unauthorized access to victims' accounts.

Why AI Agents Are a Fundamentally Different Attack Surface

Traditional software vulnerabilities are concerning enough, but AI agents introduce a qualitatively different risk profile. Three properties make them uniquely dangerous when compromised.

First, agents hold credentials. An AI agent that can send emails on your behalf needs your email password or OAuth token. One that can make purchases needs payment credentials. One that can manage your calendar needs access to your Google or Microsoft account. A compromised agent is not just a data breach; it is a skeleton key to your entire digital life. This is particularly concerning given how much personal data LLM-based systems already handle.

Second, agents can act. Unlike a stolen database, a compromised agent can actively do things: send phishing emails from your account, authorize transactions, modify files, or interact with other people while impersonating you. The blast radius of a compromised agent is bounded only by the permissions you granted it.

Third, agents persist state. OpenClaw and similar frameworks maintain conversation history, learned preferences, and task context. This memory is highly valuable for social engineering. An attacker who gains access to your agent's memory knows your communication style, your contacts, your habits, and your pending tasks. They can craft highly targeted attacks or impersonate you convincingly. Much of this context is stored in vector databases that index past interactions for quick retrieval.

The Attack Taxonomy

Based on reported incidents and security research, attacks on AI agents like OpenClaw fall into several categories.

Exposed Endpoints

The most straightforward attack. Users deploy their agent on a server with a public IP, either intentionally or through misconfiguration. Attackers scan for known OpenClaw ports and endpoints, gain access, and extract credentials or hijack the agent. This is the digital equivalent of leaving your front door open, except the house contains keys to every other building you own.

Malicious Browser Extensions

Several browser extensions appeared on the Chrome Web Store claiming to enhance OpenClaw's functionality, offering features like "better memory management" or "enhanced web search." In reality, these extensions injected JavaScript that intercepted communications between the user's browser and their OpenClaw instance, siphoning credentials and conversation data to attacker-controlled servers. Google removed at least four such extensions in January, but the pattern is likely to continue.

Supply Chain Attacks on Plugins

OpenClaw's plugin ecosystem is one of its strengths, but also a vulnerability. Plugins are community-contributed modules that extend the agent's capabilities: connecting to new services, adding tools, or modifying behavior. The plugin installation process in early versions performed no code review, signature verification, or sandboxing. A malicious plugin could access everything the agent could access. Several proof-of-concept attacks demonstrated plugins that silently forwarded all agent actions to an external server.

Prompt Injection via External Content

When an OpenClaw agent reads an email or browses a webpage, the content it ingests can contain hidden instructions, a technique known as indirect prompt injection. An attacker could send you an email with invisible text instructing your agent to forward all future emails to an external address. Because the agent processes natural language, it can be tricked in ways that traditional software cannot. Agents built on multimodal models face an even wider injection surface, since images and documents can also carry hidden payloads.

Practical Security Advice

If you are running an AI agent, whether OpenClaw or any similar framework, here are concrete steps to reduce your risk.

Do not expose your agent to the public internet. If you need remote access, use a VPN or SSH tunnel. Never open ports directly. This single step would have prevented the majority of reported incidents.

Use sandboxing and least privilege. Create dedicated service accounts with minimal permissions for your agent. If it only needs to read your calendar, do not give it write access. If it only needs to send emails to specific contacts, restrict its scope. OpenClaw's newer versions support granular permission profiles, use them.

Rotate API keys regularly. Treat any API key your agent has access to as potentially compromised. Set up automated rotation where possible. Many API providers now support short-lived tokens, prefer these over long-lived keys.

Audit every extension and plugin. Before installing any plugin, review its source code. If you cannot read the code, do not install it. Stick to plugins from verified authors with public repositories. The OpenClaw community has started a plugin audit initiative, support it by contributing reviews.

Monitor your agent's actions. Enable detailed logging and periodically review what your agent has been doing. Look for unexpected API calls, unfamiliar contacts, or actions you did not request. Several monitoring tools have emerged specifically for AI agent oversight.

Keep your instance updated. The OpenClaw team has been responsive to security disclosures, shipping patches within days. But patches only help if you apply them. Enable automatic updates or check for new releases weekly.

The Bigger Picture

OpenClaw's security growing pains are not unique to OpenClaw. They are a preview of what the entire AI agent ecosystem will face as these tools go mainstream. Every major tech company is building agent capabilities: Google's Project Mariner, Apple's rumored Siri overhaul, Microsoft's Copilot agents. Each will face the same fundamental tension: agents need access to be useful, but access creates risk.

The security community is only beginning to develop frameworks for agent-specific threats. Traditional security models assume software does what it is programmed to do. Agents, by design, exhibit emergent behavior: they interpret instructions, make judgments, and take actions that were not explicitly coded. Securing a system whose behavior is not fully predictable requires new approaches.

We need standardized permission models for AI agents, similar to how mobile operating systems control app permissions. We need cryptographic attestation for agent plugins, like code signing for traditional software. We need audit trails that are tamper-resistant and human-readable. And we need a cultural shift where deploying an AI agent is treated with the same seriousness as deploying a production server, because that is exactly what it is. The finance sector, where autonomous agents already handle real money, offers a useful case study in what happens when these safeguards lag behind deployment.

OpenClaw's early stumbles are a warning. The question is whether we will heed it before the stakes get higher.

AI Security

All Articles

Hélain Zimmermann

Co-Founder & CTO @ Ailog

MSc Machine Learning @ KTH · ENSIMAG · ex-INRIA researcher

I build production AI systems: RAG pipelines, autonomous agents, privacy-preserving NLP. I write about what I ship, not what I read.