Securing Agentic Workflows: What Every Developer Team Needs to Know

AI coding agents are transforming how developers work — automating tests, generating code, and executing tasks with minimal input. But that power comes with a serious, often underestimated security risk. These agents run with the same system permissions as the user, meaning a single successful attack can have the same impact as a compromised developer machine.

The primary threat is indirect prompt injection: malicious instructions hidden in content the agent ingests — a poisoned repository, a crafted pull request, a .cursorrules file, or a rogue MCP server response. If an agent acts on those instructions, the consequences can range from data theft to full system compromise.

Based on guidance from the NVIDIA AI Red Team, here's what organizations should do about it.

The Three Non-Negotiables

These controls are considered mandatory because they block the most serious and commonly observed attacks.

1. Block outbound network access by default. Unrestricted network access lets an attacker exfiltrate secrets (SSH keys, API tokens, .env files) or establish a remote shell. Sandbox processes should require manual approval for any outbound connection. A tightly scoped allowlist — enforced at the proxy, IP, or port level — reduces approval fatigue while keeping the blast radius small. DNS resolution should be limited to trusted resolvers to close DNS-based exfiltration paths.

2. Prevent file writes outside the active workspace. Files like ~/.zshrc execute automatically and can be abused for remote code execution and sandbox escape. Config files like ~/.gitconfig can be redirected to attacker-controlled servers. Backdoored binaries can be dropped into ~/.local/bin. Write operations outside the workspace must be blocked at the OS level — not the application level — so they can't be bypassed through subprocesses or indirection.

3. Protect all agent configuration files unconditionally. Hooks, MCP configurations, skills, and files like CLAUDE.md or copilot-instructions.md shape agent behavior and often run outside the sandbox. A poisoned hooks config in a Git repository affects every developer who clones it. These files must be read-only to the agent, with no user approval override possible — only direct manual edits by the user are acceptable.

A tiered policy model works well here: enterprise-level blocks that can't be overridden, read-write freedom within the workspace, a narrow allowlist for legitimate out-of-workspace operations, and default-deny for everything else.

Strongly Recommended: Closing the Remaining Gaps

The mandatory controls handle the most critical risks, but several attack vectors remain.

Sandbox the entire IDE, not just shell tools. Many systems only sandbox command-line invocations, leaving hooks, MCP startup scripts, and skill runners exposed. All agent operations should be covered by the same restrictions.

Use full virtualization. Container-based sandboxes (Docker, macOS Seatbelt, Linux Bubblewrap) share the host kernel, which means kernel vulnerabilities are in scope for a compromised agent. Running agents inside a VM, microVM, or Kata container isolates the kernel entirely. The performance overhead is usually modest compared to the cost of LLM calls themselves.

Restrict reads outside the workspace too. Unrestricted read access lets an attacker enumerate the developer's machine — finding secrets, keys, and intellectual property. Access to external paths should be limited to what's strictly needed during sandbox initialization, then locked down thereafter.

Never cache approvals. Approving a sensitive action once should not unlock it for all future executions. A single legitimate approval of a write to ~/.zshrc becomes a door an attacker can walk through later. Every sensitive action needs fresh confirmation.

Inject secrets explicitly, don't inherit them. Developer environments carry a wide range of credentials that sandboxed processes often inherit by default. Instead, start the sandbox with minimal credentials and inject only what the current task requires — ideally via a credential broker that issues short-lived tokens rather than exposing long-lived keys in environment variables.

Manage the sandbox lifecycle. Long-running sandboxes accumulate downloaded dependencies, generated scripts, cached tokens, and leftover intellectual property. This makes a stale sandbox a high-value target. Use ephemeral environments that are destroyed after each task, or establish a schedule for wiping and recreating them in a known-good state.

The Bigger Picture

The core challenge with agentic security is balancing automation with oversight. Requiring manual approval for everything creates habituation — developers start clicking through without reading. The goal is a system where human review is reserved for genuinely unusual or risky actions, while routine operations proceed safely within well-defined boundaries.

As agentic tools gain new capabilities and integrations, the attack surface grows with them. These controls aren't a one-time setup — they need to be revisited as the tools evolve.

Based on security guidance published by the NVIDIA AI Red Team.