Uncategorized

When AI Agents Get Their Own Secrets

abhinav / May 15, 2026

When AI Agents Get Their Own Secrets A Credential Boundary Is Becoming Part of Agent Architecture A developer assigns an AI coding agent a task: update a module, run tests, and check whether a private package dependency still works. To complete the task, the agent may need repository access, package registry credentials, development scripts, MCP tools, and sometimes internal APIs. The question is no longer only what the agent can generate. It is what the agent can access while it is working. That access model is starting to change. In May 2026, GitHub introduced a dedicated “Agents” secrets and variables scope for the Copilot cloud agent. The change separates agent credentials from GitHub Actions, Codespaces, and Dependabot secrets, creating a distinct credential surface for agent workloads. AI agents are becoming operational actors inside software environments. They can inspect repositories, run commands, call tools, connect to MCP servers, and use credentials supplied by the organization. That makes credential management a governance problem, not only an engineering convenience. The GitHub change is narrow but important. It does not solve agent identity across the enterprise. It does create a practical control boundary: credentials intended for an AI agent can now be managed separately from CI/CD, development environments, and dependency automation. The larger pattern is that agent governance is moving from policy language to control surfaces. Agent access is becoming something organizations need to scope, review, audit, and revoke as its own category. Context Traditional automation credentials were designed for bounded systems. A CI/CD runner follows a pipeline. A service account supports a known application. A developer credential belongs to a human workflow. AI agents are different because they can interpret a delegated goal, choose tools, retry failed steps, call APIs, interact with MCP servers, and change course during a session. Access is no longer tied only to a predetermined workflow. That difference creates a credential-management problem. If agents reuse CI/CD secrets, developer-local credentials, broad service accounts, or shared automation tokens, organizations lose clean separation between what the credential was intended for and how it was actually used. The failure mode is not only secret leakage. It is credential inheritance, confused-deputy behavior, weak attribution, and unclear review boundaries. This is why agent-specific credential boundaries matter. OWASPʼs Agentic Top 10 identifies identity and privilege abuse as a core agentic failure mode, including credential inheritance and confused-deputy risks. NISTʼs NCCoE concept paper on software and AI agent identity also identifies shared API keys and service account credentials as anti-patterns for agent deployments. The issue is no longer theoretical. Platforms are starting to ship controls that separate agent credentials from other credential classes. How the Mechanism Works The emerging pattern is separation. Agent credentials are being pulled away from CI/CD secrets, developer credentials, generic service accounts, and broad environment variables so they can be scoped and reviewed on their own terms. GitHubʼs Agents scope is the most concrete recent example. Repository administrators can define secrets and variables specifically for the Copilot cloud agent. Organizations can also define agent secrets centrally and restrict them to all repositories, private and internal repositories, or selected repositories. GitHub states that the Copilot cloud agent does not receive Actions, Codespaces, or Dependabot secrets and variables. It receives agents’ secrets and variables. That creates a meaningful boundary. A token meant for an agent to access an internal package registry does not need to live beside deployment credentials. A value used by an MCP server does not need to share the same namespace as a workflow secret. The agentʼs credential surface becomes separately visible. GitHub also adds a limited MCP routing boundary. Secrets and variables prefixed with COPILOT_MCP_ are available only to MCP servers, while other agents’ secrets and variables may be exposed as environment variables in the agentʼs development environment. This is not full per-tool authorization, but it is a concrete example of credentials being routed to the tool layer rather than broadly exposed. A second boundary appears around workflow execution. GitHub warns that Actions workflows do not automatically run when Copilot pushes changes to a pull request unless approved. This matters because workflows may have access to Actions secrets. If unreviewed agent-written code could trigger privileged workflows automatically, the separation between agent credentials and CI/CD credentials would weaken. GitHubʼs Agent Tasks REST API, released May 13, 2026, adds another access management question. Copilot Business and Enterprise users can now start Copilot cloud agent tasks programmatically using classic PATs, fine-grained PATs, and OAuth tokens. GitHub App installation access token support is listed as coming later. That means enterprises need to govern not only what credentials an agent can use, but also who or what can start the agent. Other platforms address the same problem at different layers. OpenAI Codex cloud environments make secrets available only during setup and remove them before the agent phase, separating environment preparation from agent execution. Google Cloudʼs Agent Identity uses per-agent identities, SPIFFE identifiers, IAM policies, Principal Access Boundary, VPC Service Controls, credential vaulting, and audit logging. Microsoft Entra Agent ID treats agents as directory-governed identity objects with lifecycle and access governance. Analysis: Why This Matters Now Agents are moving closer to real systems. Once they can run commands, connect to tools, open pull requests, use package registries, interact with MCP servers, or act through APIs, access control becomes part of the agentʼs architecture. The GitHub release matters because it turns a governance principle into an administrative surface. Security teams can now ask a more precise question: which secrets are available to the agent, in which repositories, and for what purpose? That is more useful than asking whether “AI has access” in a general sense. It also changes the access review model. A CI/CD secret has one expected use pattern. An agent has another secret. CI/CD secrets may support deployment, artifact publishing, cloud operations, or environment promotion. Agent secrets may support coding tasks, test setup, MCP connections, or internal development resources. Mixing those credentials makes the review harder because the same secret store serves multiple operational purposes.

Uncategorized

The Database Layer Is Absorbing Agent Infrastructure

abhinav / May 15, 2026

The Database Layer Is Absorbing Agent Infrastructure A team builds an internal agent to answer customer support questions and update account records. The first version uses an operational database for customer data, a vector database for semantic retrieval, an embedding API for indexing, a reranking service for better answers, and Redis for short-term session state. The prototype works. In production, the hard part is keeping the agentʼs data path consistent, fresh, authorized, observable, and fast enough across systems that were never designed as one control plane. This pressure is pushing agent infrastructure closer to the data layer. Agent systems are changing what databases and data platforms are expected to do. Capabilities that previously sat in application code or separate retrieval services, including embedding generation, vector search, hybrid retrieval, reranking, persistent memory, and retrieval observability, are increasingly being implemented inside databases, warehouses, search platforms, and operational data services. This does not remove architecture work. It changes where the complexity lives. Instead of stitching together an embedding service, vector database, reranker, memory store, and operational database, teams are beginning to evaluate whether parts of that stack can be handled inside the same systems that already store, govern, replicate, and monitor enterprise data. The common RAG and agent architecture that emerged between 2022 and 2024 was assembled from multiple specialized systems. An application stored business data in a relational or document database, called an external embedding model, wrote vectors into a separate vector database, called a reranking model after retrieval, and often used another store for session state or agent memory. The architecture helped teams move quickly, but it introduced predictable operational problems. Source data and vector indexes could drift out of sync. Embeddings could become stale after records have changed. Retrieval latency accumulated across network calls. Access control had to be repeated across the database, vector store, application layer, and model service. Observability was fragmented across several logs and dashboards. The Shift in 2026 In 2026, database and data platform providers are addressing these problems directly. Oracle, MongoDB, Snowflake, Databricks, Google, Microsoft, and AWS are all adding or expanding capabilities around in-database embeddings, native vector search, hybrid retrieval, reranking, long-term memory, and retrieval observability. The specific implementations differ. Oracle AI Database 26ai supports in-database embedding generation through ONNX model loading and SQL-level vector functions. MongoDB Atlas has introduced automated embeddings and native reranking through its Voyage AI integration. Snowflake Cortex Search combines vector search, keyword search, and semantic reranking inside Snowflake-managed search services. Databricks Vector Search ties vector indexes to Delta tables, Unity Catalog, access controls, and automatic sync. Google Vertex AI RAG Engine uses managed Vector Search 2.0 backends, while Googleʼs Agent Platform Memory Bank exposes managed APIs for long-term agent memory. Microsoft is extending Azure AI Search and SQL vector capabilities, and AWS is expanding retrieval, reranking, multimodal search, structured retrieval, and hybrid search across Bedrock Knowledge Bases and ElastiCache. The common pattern is clear: the infrastructure needed by agents is moving closer to operational and analytical data systems. How the Mechanism Works The first part of the shift is embedding generation inside or adjacent to the database. In the older pattern, an application detected new or changed data, sent it to an embedding API, received a vector, and wrote that vector into a vector database. If one step failed, the source data and vector index could diverge. Oracle AI Database 26ai changes this pattern by allowing ONNX embedding models to be loaded into the database and invoked through SQL. MongoDB Atlas automated embeddings similarly move embedding generation closer to the database write path. Pinecone and Weaviate represent a related pattern in vector-native systems, where embedding and indexing can be performed through integrated inference or configured vectorizer modules. This can reduce one source of drift, but it does not eliminate embedding lifecycle management. Teams still need to decide when data should be re-embedded, how model changes should be handled, and whether index rebuilds are needed after schema, chunking, or policy updates. The second part is hybrid retrieval as a native query operation. Agent retrieval usually needs more than semantic similarity. A useful query may need vector similarity, keyword matching, structured filters, tenant boundaries, timestamps, access level, and document metadata. Snowflake Cortex Search combines vector search, keyword search, and semantic reranking. Databricks Vector Search supports hybrid keyword and similarity search with filtering and reranking. Oracleʼs hybrid vector search combines vector similarity with text search. MongoDB Atlas can combine vector search, BM25-style search, and metadata filtering against operational collections. PostgreSQL deployments using pgvector and full-text search follow the same architectural direction inside Postgres. Reranking is also moving into the retrieval path. Reranking improves retrieval quality by taking an initial result set and rescoring it with a more expensive model. In earlier architectures, this often required another external API call after vector search. MongoDB Atlas exposes Voyage AI reranking through its platform. Weaviate supports reranking modules inside the query path. Pinecone provides managed reranking models through its retrieval APIs. Snowflake, Databricks, and AWS Bedrock Knowledge Bases also expose reranking as part of managed retrieval flows. As a result, relevance tuning, latency, cost, and observability become part of data platform operations, not only application logic. The most sensitive part of the shift may be agent memory. Agent memory is different from ordinary chat history. It may include session state, long-term user preferences, learned facts, prior decisions, tool-use context, and summaries that survive across conversations. Googleʼs Agent Platform Memory Bank exposes APIs for sessions, generated memories, uploaded memories, retrieval, and deletion. MongoDB Atlas supports LangGraph long-term memory and checkpointing patterns. Oracle has demonstrated agent memory patterns where episodic memory, semantic memory, and procedural memory operate inside one database boundary. This creates a distinction enterprises need to preserve. Session-scoped memory should support short-term continuity inside a task or conversation. Persistent memory should survive across sessions and may require retention rules, deletion workflows, user scoping, and auditability. The fifth part is retrieval observability and lifecycle control, moving into platform services. Snowflake Cortex Search request monitoring captures

Uncategorized

Why Coding Agents Need More Than Containers

abhinav / May 15, 2026

Why Coding Agents Need More Than Containers A developer asks a coding agent to fix a failing test. The agent reads the repository, edits a source file, runs the test suite, installs a missing package, follows an error in a build script, and tries again. For the developer, that looks like useful automation. For a security team, it is a process tree with shell access, filesystem writes, package-manager access, and possible network access. The model is only one part of the risk. The runtime around the model is where the agentʼs permissions become real. Coding agents are starting to break the assumptions that traditional sandboxing was built around. They do not run one known binary with one fixed permission profile. They operate through loops, tools, shells, dependencies, repositories, and local developer context. That is why recent engineering work around coding-agent sandboxes is converging on one practical question: how do you let an agent work without giving it the host? The answer is becoming a layered runtime boundary, where filesystem rules, process controls, network policy, credential handling, and approval gates define what the agent can actually do. The failure mode is no longer theoretical. Recent vulnerabilities in agent frameworks have shown how prompt injection can become an execution risk when the surrounding tool layer exposes unsafe paths. In one case, prompt-controlled input reached a code execution path. In another, a function intended to transfer files from a containerized environment to the host gave the model a host file-write primitive. That second case is especially important. It was not a traditional container escape. The sandbox boundary was weakened by the tool architecture around it. If an agent can call a helper that writes to the host, the runtime has effectively handed the model a bridge across the boundary. This is the core problem for coding-agent sandboxes. Agents need enough access to be productive. They need to inspect code, run commands, install dependencies, execute tests, and modify files. At the same time, those same capabilities can be used to exfiltrate data, poison build configuration, modify hooks, abuse credentials, or reach internal services. OWASPʼs Agentic AI Top 10 treats unexpected code execution as a major agentic risk area, with sandboxing, input validation, and allowlisting positioned as core controls rather than optional hardening. That reflects the broader shift: once agents can act inside a runtime, security has to move closer to the place where those actions happen. How the Mechanism Works The most useful recent architecture is OpenAIʼs Codex Windows sandbox. It matters because it shows what happens when an AI coding agent has to work inside a real developer environment, and existing operating-system primitives do not fit cleanly. OpenAI evaluated AppContainer, Windows Sandbox, and Mandatory Integrity Control. Each failed for a related reason. Coding agents cannot declare all required capabilities in advance, cannot easily operate inside a separate disposable desktop when they need to work on a real checkout, and cannot rely on broad workspace relabeling without changing the trust model of the developerʼs project. The shipped Codex design combines several Windows primitives instead of relying on one sandbox feature. In elevated mode, Codex creates dedicated local sandbox users. It uses restricted tokens to reduce the privileges available to sandboxed processes. It applies filesystem ACLs so the sandbox can write only where intended. It uses Windows Firewall rules scoped to the sandbox user to enforce network restrictions. It also relies on a command-runner binary that launches child processes under the correct restricted context. That process-tree point is the important part. Coding agents do not execute one command and stop. They spawn shells, which spawn package managers, which spawn scripts, which spawn other tools. The sandbox has to follow the full chain. If a child process can escape the original restriction set, the model does not need a sophisticated exploit. It only needs one command path that leaves the boundary behind. Network control is another hard layer. Environment-level controls, such as proxy variables or blocking stubs earlier in the path, are not enough on their own. Processes can ignore environment variables, bypass path lookups, or open sockets directly. Stronger designs move enforcement into the operating system, firewall, proxy, container, or VM layer rather than relying only on process behavior. Docker Sandboxes take a different route. Instead of composing Windows primitives around a local process tree, Docker places the agent inside an isolated microVM. Each sandbox gets its own Docker daemon, filesystem, and network. This lets the agent install packages, build containers, and run development tasks without mounting the host Docker socket into the agent environment. The difference matters because a container with access to the host Docker socket can indirectly control host-level Docker resources. A microVM with its own daemon gives the agent a more complete development environment while keeping that control plane separate from the host. Other coding-agent tools are applying known local primitives to the same problem, including macOS Seatbelt, Linux controls such as Landlock and seccomp, and Docker or Podman-based isolation. These implementations are useful signals that the same pattern is spreading: coding agents need a runtime boundary that travels with the commands they execute. Across these approaches, the runtime has to answer six practical questions: What can the agent read? What can the agent write? What commands can it run? Do child processes inherit the same limits? Can it reach the network? Can it access credentials or host services? Those questions define the real authority of the agent. Analysis: Why This Matters Now The key change is that coding agents are becoming local execution systems. The security boundary can no longer sit only at the model interface. It has to exist where actions happen. The most interesting engineering tradeoff is not whether to sandbox. It is where to place the boundary. A local OS sandbox preserves the developer experience. The agent can work in the real checkout, use local tools, and operate close to the normal development loop. The cost is platform-specific complexity. OpenAIʼs Windows design needed dedicated users, restricted

Uncategorized

MCP Tool Calls Are Becoming a Security Signal for AI Agents

abhinav / May 15, 2026

MCP Tool Calls Are Becoming a Security Signal for AI Agents An AI agent is asked to prepare a customer renewal summary. To complete the task, it calls a file-reading tool, queries an account database, summarizes the result, and then invokes an email tool. Each tool call may look legitimate. The security question is whether the full sequence still looks legitimate once the calls, arguments, responses, identities, and data movement are viewed together. That is where MCP tool-call traffic becomes relevant. Recent MCP security research is beginning to treat agent tool-call sessions as a monitorable attack surface. The central idea is straightforward: even when an agentʼs internal reasoning is opaque, its tool use leaves structured traces through tool names, arguments, responses, errors, identity context, and call order. A May 2026 paper titled “MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic” focuses directly on this layer. It frames MCP sessions as traffic that can be modeled, inspected, and classified for abnormal behavior, including data exfiltration, malicious tool use, recursive tool injection, and suspicious tool-call sequences. Model Context Protocol, or MCP, gives AI agents a standard way to discover and call external tools. Under the MCP tools specification, servers expose tools with names, descriptions, input schemas, optional output schemas, and annotations. Clients can list available tools through tools/list and invoke a selected tool through tools/call. This creates a structured interaction layer between the agent and external systems. A tool call is not just a natural language instruction. It is a protocol event with a method, parameters, tool name, arguments, response content, and sometimes structured output or errors. That matters for security because agent attacks often become visible at this boundary. A prompt injection may begin in text, a poisoned tool description may influence planning, or a compromised tool may return malicious output. But when the agent acts, those decisions often pass through tool calls. Multiple MCP security efforts are converging on the same broad concern from different angles: tool trust, tool poisoning, information flow, registry integrity, and runtime anomaly detection. The May 2026 MCPShield paper is the most directly relevant to this article because it treats MCP tool-call traffic itself as the detection object. The enterprise question that follows is operational: not only which tools exist, but which tools are invoked, under which identity, with what parameters, and through which chain of agent behavior. How the Mechanism Works At the protocol level, MCP tool-call traffic is represented through JSON RPC messages. A tools/list request allows a client to discover available tools. A tools/call request invokes a specific tool by name and passes an arguments object. The response can include content blocks, structured content, and error indicators. This gives defenders several observable fields: Which tool was called What arguments were passed What response content came back Whether the tool returned an error What tool was called before and after Which agent or client initiated the call Which user, session, or authorization context was attached Whether data from one tool appears to influence later tool calls The May 2026 MCPShield paper models this as a session-level detection problem. In that model, each tool call becomes a node in a graph. Edges capture call order and data-flow relationships between calls. The system then uses content-aware features, including embeddings over serialized tool arguments and responses, to classify whether a session appears benign or malicious. The important technical shift is that detection is not based only on tool metadata. A metadata-only view might know that an agent called read_file, query_db, and send_email, but it may miss the meaning of the arguments and outputs. A content- aware view inspects the actual strings, fields, and returned values moving through the session. For example, a single send_email call may be normal. A sequence where the agent first reads a sensitive file, then queries customer records, then sends the combined output externally is different. The detection surface is the relationship between calls, not just the presence of one dangerous tool. A related attack path is recursive tool injection. A malicious or compromised tool response may not only return bad data, but also influence what the agent does next. In that case, the suspicious behavior may appear across the sequence: the returned content from one call shapes the arguments, destination, or tool choice in a later call. Related work expands this picture. MCPTox focuses on tool poisoning through malicious metadata or descriptions. MindGuard proposes decision-dependence graphs based on model attention patterns to detect and attribute poisoned tool selection. The formal MCP security framework describes information flow tracking as a way to detect cross-server data exfiltration. These approaches differ, but they share a common concern: agent tool use produces observable patterns that can reveal misuse, compromise, or unsafe delegation. Analysis This matters now because MCP is turning agent tool use into a standardized operational layer. When agents rely on tools to read files, query systems, send messages, update records, or execute code, the tool boundary becomes one of the few places where security teams can observe concrete behavior. The modelʼs internal reasoning may remain unavailable. The prompt that influenced the agent may be incomplete, hidden in retrieved content, or spread across multiple sources. The tool-call layer is more concrete. It shows what the agent attempted to do. The natural interception point is the MCP client, gateway, proxy, or logging layer, depending on how the organization runs agent infrastructure. A gateway-style pattern is especially relevant because it can inspect tool calls before they reach MCP servers and inspect responses before they return to the model. That does not make the gateway a complete defense, but it gives teams a place to apply policy, preserve traces, and detect abnormal tool behavior. Recent disclosures also show that this layer can be risky in both directions. Toolcall traffic can support monitoring, but raw tool-call logging can expose sensitive data if arguments contain credentials, secrets, customer data, or file paths. CVE 2026 42282, involving n8n-MCP, described sensitive MCP tools/call arguments being logged in HTTP mode before

Uncategorized

When Governance Becomes a Data-Flow Problem

abhinav / May 14, 2026

When Governance Becomes a Data-Flow Problem The Evidence Question An enterprise can publish an AI policy, assign an oversight committee, and adopt a governance framework, yet still fail a simple operational question: where did the data go, who could access it, how long was it kept, and what evidence exists to prove those answers? That gap between what governance documents say and what systems can actually demonstrate is becoming the central problem in enterprise AI compliance. Across federal procurement, state regulation, standards development, and litigation, the same questions keep surfacing. And they all resolve to the same operational layer: data-flow mapping, retention boundaries, and access controls. The Short Version Between March and April 2026, GSA published a draft procurement clause with specific data ownership, segregation, and disclosure requirements for federal AI contractors. NIST launched a new AI risk management profile for critical infrastructure. The White House released a national AI policy framework recommending federal preemption of state laws. A federal court allowed AI hiring bias claims to proceed in Mobley v. Workday. The authorities are different, the mandates are different, but the operational question is the same: can the organization produce evidence of how AI data is handled? What the GSA Clause Requires The clearest source of operational specificity is GSA’s draft GSAR 552.239 7001, published March 6, 2026. It applies to any GSA Schedule contract involving AI capabilities and reaches any contractor using AI tools in government contract performance. The data ownership terms: Government Data, defined to include all inputs and outputs in the government context, belongs to the government. Contractors cannot use it to train, fine-tune, or improve models, or to inform business decisions. At contract end, all Government Data must be securely deleted, and the contractor must certify deletion in writing. The processing evidence requirements: for systems using intermediary processing such as reasoning, retrieval, or agentic workflows, GSAR 552.239 7001 requires summarized intermediate processing actions and decision points, model routing decisions with accompanying rationale, and data retrieval methods with complete source attribution, including direct links and relevant excerpts from materials used in generation. That means governance is tied to reconstructing what data entered the system, what happened to it, and what sources contributed to the output. The retention requirements: all relevant logs, forensic images, and incident artifacts must be preserved for a minimum of 90 calendar days after a security incident involving Government Data. The access-control requirements: GSAR 552.239 7001 mandates “eyes-off” handling, restricting human review of Government Data except where strictly necessary. Any human access must be logged, justified, limited to the minimum necessary, and visible to the government. Government Data must be logically segregated from non-government customer data through access controls, policy enforcement points, labeling, and encryption. The disclosure timelines are tight: 30 days to identify all AI systems used in performance, 7 days to report material changes affecting bias or safety guardrails, and 72 hours to report security incidents to CISA. OMB has declared compliance with the clause “material to contract eligibility and payment,” language that could trigger False Claims Act liability. GSAR 552.239 7001 is currently in draft (deferred from MAS Refresh 31 to Refresh 32 after industry pushback from BSA, the U.S. Chamber of Commerce, and multiple law firms), but the direction is established. Where the Same Pattern Appears Elsewhere NIST’s April 7 concept note for a Trustworthy AI in Critical Infrastructure Profile extends governance requirements to operational technology. It covers all 16 critical infrastructure sectors and explicitly includes use cases such as AI-powered digital twins, autonomous robots with deterministic fail-safe controllers, and AI-enabled compliance monitoring. The profile will define trustworthiness requirements that operators must communicate across their supply chains, meaning governance evidence will need to flow beyond the organization into vendor and partner relationships. In Mobley v. Workday, Judge Rita Lin’s March 6 ruling allowed core age-discrimination claims against an AI hiring system to proceed under the ADEA. Baker Botts’ analysis frames the implication: employers using AI-assisted screening should be prepared to explain what the system does, how it is configured, and what monitoring exists to detect disparate impact. The exact discovery expectations are not yet standardized, but the direction points toward operational evidence about data flows, not policy statements about fairness. Why This Matters Now The compliance timeline is compressing. Colorado’s AI Act takes effect June 30, 2026. The EU AI Act’s transparency and high-risk rules begin August 2, 2026. California’s ADMT regulations take effect January 1, 2027. GSAR 552.239 7001, once finalized, will apply via mass modification with a 60-day acceptance window. These regimes do not align cleanly. GSAR 552.239 7001 requires that AI systems “must not refuse to produce data outputs or conduct analyses based on the Contractor’s or Service Provider’s discretionary policies.” The EU AI Act requires providers of high-risk systems to implement safeguards against harmful outputs. An organization operating under both faces a compliance conflict that policy language cannot resolve. It requires architectural workload segregation. Federal preemption of state AI laws has been recommended by the White House but not legislated, which means enterprises must comply with state requirements that may later be overridden. That uncertainty makes data-flow controls more operationally valuable, not less. Mapping where AI data goes, enforcing retention boundaries, and producing access evidence are jurisdiction-neutral capabilities. An organization that builds these once can configure them to satisfy GSA requirements, Colorado’s impact assessments, the EU AI Act’s high-risk obligations, and future federal legislation with the same underlying infrastructure. The alternative, separate compliance programs per jurisdiction, does not scale. What Remains Uncertain GSAR 552.239 7001 is in draft and the final language may change after substantial industry feedback. But the operational requirements around data ownership, processing evidence, and access control reflect a direction that is unlikely to reverse. Whether federal preemption passes Congress is unknown. Colorado enforcement begins in two months. Organizations cannot wait for legislative clarity. The NIST CI Profile is a concept note, not a finished standard. Its use cases signal where governance is heading for operational technology, but specific control requirements

Uncategorized

The Prompt Is No Longer the Unit of Design

abhinav / May 14, 2026

The Prompt Is No Longer the Unit of Design The Architecture Question During Google’s Agent Bake-Off, a team let their AI agent calculate compound interest directly. The model hallucinated the math. Google describes what followed as “massive validation errors,” and the root cause was not a bad prompt. The cause was that a probabilistic model was performing a task that required deterministic execution. The team that won the same challenge used the model to extract parameters and orchestrate the workflow, but routed every calculation to conventional code. The difference was not better prompting. It was a different system architecture. That distinction, between what the model should reason about and what code should execute, is the engineering question now running through Google’s and Anthropic’s recent agent guidance. The answer is reshaping how production agent systems are designed. The Short Version Between January and April 2026, Google published a series of engineering guidance documents that reframe agent development as a systems architecture discipline. Google Research’s multi-agent scaling study quantifies when coordination helps and when it hurts. The Agent Bake-Off distills five patterns from live competition. The Agent Development Kit provides eight canonical design patterns. And the Gemini CLI subagents feature turns agent topology into declarative configuration files. Anthropic’s “Building Effective Agents” guidance reaches a similar conclusion from the opposite direction: start with the simplest architecture that works and add complexity only when the task requires it. The convergent argument: production reliability comes from system decomposition, deterministic execution boundaries, and protocol-based integration, not from better prompts for a single monolithic agent. What the Bake-Off Found and How the Mechanism Works Google’s Agent Bake-Off April 14, 2026) distilled five engineering patterns from teams competing on production-style challenges. Google’s framing is direct: Prompting a single large agent to handle intent extraction, retrieval, and reasoning all at once is “a fast track to hallucinations and latency spikes.” Google documents “instruction dilution” as the primary failure mode, where accumulated context degrades the model’s ability to follow strict formatting or logic. The core mechanism is decomposition plus bounded coordination. Decompose into specialist micro-agents. A supervisor handles intent and planning. Specialists handle bounded execution. Each specialist operates in its own context with its own tools and returns a consolidated result to the orchestrator. The orchestrator never sees the full execution trace, only the output. This keeps the primary context lean and prevents one specialist’s intermediate work from degrading the next interaction. Route precision tasks to deterministic code paths. The banking challenge failure in the scenario above is the pattern in miniature. The agent’s role is to extract parameters and orchestrate. Conventional code or SQL performs the final computation. This applies to any task where exactness matters: financial calculations, data validation, schema enforcement, and unit conversions. It is a system boundary between probabilistic and deterministic execution. Integrate open protocols over custom glue. Google explicitly recommends MCP for tool integration and A2A for agent-to-agent coordination rather than bespoke wrappers for every integration. Treat multimodality as a native architectural feature. Teams that bolted image processing onto text-only architectures produced worse results than those that integrated multimodal models as a core design element. Test against real-world failure modes. Move beyond demo-quality evaluation to adversarial inputs and failure recovery. Google’s ADK translates these principles into eight reusable design patterns: sequential pipeline, coordinator/dispatcher, parallel fan-out, evaluator-optimizer loop, group chat, hierarchical delegation, custom orchestration, and human-in-the-loop. Each maps to a coordination topology rather than a prompt structure. Gemini CLI’s subagents feature April 15, 2026) makes these patterns configurable through declarative files. Each subagent is defined as a Markdown file with YAML frontmatter specifying name, description, tools, model, temperature, max_turns, and timeout. Tool access is scoped per subagent. Different subagents can connect to different MCP servers without sharing state. The specialist is no longer a part of a larger prompt. It is a deployable, versionable artifact that can be code-reviewed, committed to a repository, and shared across teams. What the Scaling Study Quantified Google Research’s “Towards a Science of Scaling Agent Systems” December 2025) tested the decomposition argument empirically. The study evaluated 180 configurations across five architectures, three LLM families, and four benchmarks, with standardized tools and token budgets to isolate architectural effects. Error amplification is topology-dependent. Independent agents operating without validation amplified errors up to 17.2x. Centralized coordination, where an orchestrator validates outputs before passing them along, contained amplification to 4.4x. Benefits are task-contingent. Centralized coordination improved performance by 80.9% on parallelizable tasks like financial data aggregation. On sequential reasoning tasks, every multi-agent variant degraded performance by 39 to 70%. The agents spent their token budget on coordination overhead rather than problem-solving. Capability saturation sets a ceiling. Adding coordination overhead produces negative returns when a single agent already performs above approximately 45% on a task. Google Research also built a predictive model using task properties (sequential dependencies, tool density, decomposability) that identifies the optimal architecture for 87% of unseen configurations. Architecture selection can be a principled engineering decision based on task analysis, not a guess. Anthropic’s “Building Effective Agents” guidance reinforces the caution embedded in these findings. Anthropic distinguishes workflows LLMs orchestrated through predefined code paths) from agents LLMs dynamically directing their own process) and recommends starting with workflows wherever possible. Anthropic explicitly warns against framework abstraction: “Incorrect assumptions about what’s under the hood are a common source of customer error.” The recommendation is to increase complexity only when the task demonstrably requires it. Why This Matters for Engineering Teams The shift changes what skills agent engineering requires. Prompt engineering remains relevant for individual agent behavior, but the higher-order decisions are now systems decisions: decomposition strategy, coordination topology, deterministic/probabilistic boundaries, tool-access scoping, and protocol integration. The deterministic/probabilistic boundary is the most underappreciated part of this shift. Google’s Bake-Off results make clear that allowing a model to perform calculations, validation, or data lookups that could be handled by code is an engineering failure, not a prompt failure. Identifying which parts of a workflow should be deterministic and routing them to code paths is a systems design

Uncategorized

When Subagents Turn Agent Design Into an Operating Model Decision

abhinav / May 14, 2026

When Subagents Turn Agent Design Into an Operating Model Decision The Configuration Question A team starts with one coding agent and one long prompt. It works well enough for simple tasks, but the session grows, tool calls pile up, and each new request carries the weight of everything that came before. Then the team splits the work. One agent investigates the codebase, another handles repetitive edits, a third runs a narrow review. The question stops being what prompt to write and becomes how many agents to run, what each is allowed to do, and how their work is coordinated. As of April 2026, that question has a product-level answer. The Short Version On April 15, 2026, Google introduced subagents in Gemini CLI (v0.38.1. Google describes them as specialized agents that operate alongside the primary session with their own context windows, system instructions, tools, and MCP server access, then return a consolidated result to the main agent. The update changes the agent structure from an implementation detail into a configurable operating choice. Once work is split across isolated specialists, teams are no longer managing a single model session. They are managing delegation, coordination, tool boundaries, and concurrency. What Led Here Google’s subagents’ release followed the engineering guidance it had published one day earlier. In its Agent Bake-Off post, Google argued that production-ready agents should move away from one large agent handling intent extraction, retrieval, and reasoning all at once, and instead decompose work into specialized subagents managed by a supervisor. Google framed the pattern as a way to reduce hallucinations, lower latency, and make systems easier to maintain. The Gemini CLI update operationalized that advice in a shipping product. Under the Hood A subagent in Gemini CLI is exposed to the main agent as a tool. When the main agent calls it, the task is delegated. The subagent runs in its own context loop and returns a single consolidated response. The intermediate steps, potentially dozens of tool calls, file reads, or test runs, never enter the main agent’s context. This is the core isolation model. Each subagent gets its own context window, system prompt, and conversation history. The orchestrator sees results, not execution traces. That keeps the main session lean and prevents intermediate output from one task from degrading the next. Tool access is scoped through YAML frontmatter in the Markdown definition file. Subagents can receive a restricted tool list, wildcard patterns ( mcp_* for all MCP tools, mcp_server_* for a specific server), or inline MCP servers isolated to that agent. If tools are not specified, the subagent inherits everything from the parent session. Tool isolation is opt-in, not default. Custom instructions live in the Markdown body, which becomes the subagent’s system prompt. Configuration fields include name , description , tools , model , temperature , max_turns (default 30 , and timeout_mins (default 10 . Definitions can be committed to a repository at the project level or stored globally at the user level. Each subagent becomes a versionable, shareable specialist role. Delegation happens automatically (the main agent routes based on the subagent’s description) or explicitly (via @agent_name syntax). Subagents cannot call other subagents, which prevents recursion. Remote subagents communicate through the Agent-to-Agent protocol, meaning a specialist can run on another machine or in another environment. Parallel execution is supported. Google explicitly warns that parallel subagents performing heavy code edits “can lead to conflicts and agents overwriting one another” and that parallel execution “will lead to usage limits being hit faster.” The GitHub issue tracker for the feature states that v1 “does not solve more complex concerns like agents having conflicts.” On the security side, Gemini CLI v0.36.0 introduced native macOS Seatbelt and Windows sandboxing for subagent security. Six built-in Seatbelt profiles control write access, network access, and read scope at different restriction levels. Different subagents within the same session can operate under different security profiles. JIT context injection delivers context dynamically at invocation rather than carrying it as static state. Why This Matters Now The significance is not that multi-agent patterns exist as a concept. What changed is that Google moved the pattern into a shipping product with explicit configuration, scoped tools, isolated context, parallel execution, and documented operational warnings. That changes the practical unit of deployment. A team adopting subagents is no longer tuning one assistant. It is defining a topology of roles, permissions, and execution paths. Which tasks deserve a separate agent? What tool access should each have? When is parallelism worth the coordination overhead? What should the orchestrator retain versus summarize? These are design decisions, and they now have a concrete configuration surface. Google’s Bake-Off guidance frames the motivation directly: prompting a single large agent to handle everything at once is “a fast track to hallucinations and latency spikes.” Decomposition into specialists with deterministic execution where needed is the engineering response. The subagents feature is the product implementation of that argument. What This Changes For Operations Agent topology becomes something teams must actively govern. Permissions are no longer global. Each subagent can have its own tool access, MCP connections, and security profile. That is a real improvement over a single agent with access to everything, but only if isolation is explicitly configured. Omitting the tools field from a subagent definition causes it to inherit the parent’s full tool set. The secure path requires deliberate configuration. Cost visibility is partially addressed. Gemini CLI’s /stats command now distinguishes requests by role (main agent, subagent, utility). Per-subagent bounds ( maxTurns, maxExecutionTime ) provide individual limits. But there is no aggregate cost ceiling across all subagents in a session. Parallel execution multiplies token consumption without a documented mechanism to cap total spend. Observability is the most notable gap. The orchestrator receives summaries, not traces. The full execution history of a subagent’s work lives inside that subagent’s context loop, not in the main session. Gemini CLI does not ship a dedicated observability framework for subagent execution chains. For teams running multiple subagents in parallel, understanding what happened across the full delegation requires

Uncategorized

When the Model Writes the Exploit

abhinav / May 14, 2026

When the Model Writes the Exploit The Timing Problem OpenBSD is one of the most security-hardened operating systems in the world. Its TCP stack has been reviewed by experienced security engineers, tested by fuzzers, and audited repeatedly over decades. For 27 years, a vulnerability in its Selective Acknowledgement implementation went undetected through all of it. An AI model found it. It then identified a second bug in the same code path, determined how to chain the two through a signed integer overflow on 32-bit sequence numbers, and produced a proof-of-concept that remotely crashes any OpenBSD machine responding over TCP. The campaign cost under $20,000. No human guided the process after the initial prompt. That is one of three fully disclosed results from Anthropic’s Claude Mythos Preview, a frontier model Anthropic chose not to release publicly. Instead, the company built a restricted defensive consortium, gave access to roughly fifty organizations, and committed $100 million in usage credits. Anthropic considers the model capable enough to deploy for defense and risky enough to withhold from broad availability. The Short Version This is not a story about AI helping with security research. AI has been doing that for some time. On April 7, 2026, Anthropic announced a model that can carry out substantial parts of the vulnerability lifecycle, from discovery through exploitation, with limited human involvement. That compresses the timeline between finding a flaw and having a working attack, and it puts pressure on enterprise processes that were designed around the assumption that exploitation takes longer than discovery. Anthropic paired the announcement with Project Glasswing, a controlled defensive program with partners including AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, and Palo Alto Networks, plus roughly forty organizations that maintain critical infrastructure software. Post-credit pricing is $25/$125 per million input/output tokens. Logan Graham, Anthropic’s head of offensive cyber research, told NBC News that comparable capabilities could be broadly distributed within six to twelve months, including from non-U.S. companies. Reuters reported concern from banking-sector experts about implications for legacy-heavy financial environments. The U.S. Treasury Secretary convened a meeting with systemically important banks, treating AI-driven cyber risk as a systemic stability concern. On April 14, SANS, CSA, OWASP, and [un]prompted jointly released an emergency briefing arguing that the discovery-to-exploit timeline has compressed from weeks to hours. Under the Hood The scaffold Anthropic describes is straightforward. A container runs in isolation with the target project and source code. Mythos Preview receives a one-paragraph prompt asking it to find a security vulnerability, then operates in a loop: reading code, forming hypotheses, running the software to test them, adding debug instrumentation as needed, and repeating until it produces a bug report with a proof-of-concept or concludes there is nothing to find. Anthropic ran many agents in parallel, each on a different file, pre-ranked by the model based on the likely attack surface. A validation agent filtered findings for severity, discarding bugs that were technically real but operationally trivial. Three vulnerabilities have been disclosed in full because the fixes have shipped. The OpenBSD SACK bug 27 years old). TCP sequence numbers are 32-bit integers compared using (int)(a – b) < 0, which is correct when values are within 2^31 of each other. Nothing in the code prevented an attacker from placing a SACK block start roughly 2^31 away from the real window. At that distance, the subtraction overflows the sign bit in both comparisons simultaneously, and the kernel concludes the attacker’s start is both below the hole and above the highest acknowledged byte at the same time. The kernel deletes the only SACK hole list entry, writes through the resulting null pointer, and crashes. Remote denial of service, no authentication required. The FFmpeg H.264 bug 16 years old). A slice ownership table uses 16-bit entries, while the slice counter is 32-bit. Initialization via memset(…, -1, …) fills every entry with 65,535 as a sentinel. A frame crafted with 65,536 slices causes slice 65,535 to collide with the sentinel. The decoder treats a nonexistent neighbor as belonging to the current slice, writes out of bounds, and crashes. Introduced in 2003, made exploitable in a 2010 refactor, and missed by five million fuzzer runs on the relevant code path. The FreeBSD NFS bug CVE-2026-4747 is 17 years old. The RPCSEC_GSS authentication handler copies packet data into a 128-byte stack buffer with a length check, allowing up to 400 bytes, leaving 304 bytes of overflow. The compiler skips the stack canary because the buffer is int32_t[32] rather than a character array. FreeBSD does not randomize the kernel load address. The remaining obstacle, a 16-byte GSS handle match, is bypassed through an unauthenticated NFSv4 EXCHANGE_ID call that returns the host UUID and boot time. Anthropic says the model assembled a twenty-gadget ROP chain across multiple packets without human involvement, delivering full root access to an unauthenticated remote attacker. Anthropic claims thousands more findings across every major OS and browser, including privilege escalation, JIT heap sprays, KASLR bypasses, and authentication bypasses. Fewer than one percent are patched. SHA 3 hashes of undisclosed findings serve as accountability commitments, with details to follow within 135 days of maintainer notification. In 198 manually reviewed reports, security contractors agreed with the model’s severity assessment 89 percent of the time. On a Firefox JavaScript engine exploit task, Anthropic says Opus 4.6 produced two working exploits from several hundred attempts while Mythos Preview produced 181 and achieved register control in 29 more. On an internal OSS Fuzz evaluation across 7,000 entry points, Opus 4.6 managed one tier-3 crash; Mythos Preview achieved ten full control-flow hijacks on fully patched targets. What the Outside Record Says Anthropic’s claims do not stand alone, but they are not fully corroborated. AISLE, an independent security firm with over 180 externally validated CVEs across more than thirty projects, isolated the vulnerable code from Anthropic’s showcased findings and ran it through eight smaller, cheaper models in single zero-shot calls with no scaffold or tooling. Every model detected the FreeBSD overflow, including a 3.6-billion-parameter model at $0.11 per million

Newsletter, Prompt Vault Resources, Uncategorized

The Enterprise AI Brief | Issue 8

abhinav / May 4, 2026

The Enterprise AI Brief | Issue 8 Inside This Issue The Threat Room When the Model Writes the Exploit Anthropic says its unreleased Mythos Preview model found and exploited high-severity vulnerabilities across every major operating system and browser, then chose to restrict access rather than release it. Independent researchers reproduced much of the discovery work using models costing a fraction of a cent per thousand tokens. The article examines what that split between cheap discovery and frontier exploitation means for enterprise patching programs that were built around a slower cycle. → Read the full article The Operations Room When Subagents Turn Agent Design Into an Operating Model Decision Google’s Gemini CLI now lets agents delegate work to specialist subagents, each with its own context, tools, and security profile. The feature looks like a developer convenience. In practice, it turns agent architecture into an operating model decision, with new questions about permissions, cost, parallel conflict, and observability that most teams have not had to answer before. → Read the full article The Engineering Room The Prompt Is No Longer the Unit of Design Google’s Agent Bake-Off found that teams relying on carefully crafted single-agent prompts consistently lost to teams that decomposed work across specialists with scoped tools and deterministic code paths. A companion study of 180 configurations quantifies the tradeoffs: 80.9% improvement on parallel tasks, 39 70% degradation on sequential reasoning, and 17.2x error amplification without orchestrator validation. The article maps what changes when agent engineering becomes a systems design discipline. → Read the full article The Governance Room When Governance Becomes a Data-Flow Problem GSA’s draft AI procurement clause spells out what governance evidence actually looks like: processing logs with routing rationale, source attribution with direct links, 90-day incident preservation, eyes-off access restrictions, logical data segregation, and written deletion certification. The article maps how those requirements connect to NIST’s new critical infrastructure profile, state AI laws, and a federal hiring-bias ruling that are all converging on the same operational layer. → Read the full article

Uncategorized

California’s 2026 AI Laws: When a Documentation Gap Becomes a Reportable Incident

abhinav / March 23, 2026

California’s 2026 AI Laws: When a Documentation Gap Becomes a Reportable Incident Key Takeaways Effective January 1, 2026, frontier AI developers face enforceable safety, transparency, and cybersecurity obligations under California law Cybersecurity control failures can trigger critical safety incident reporting with 15-day deadlines Enterprises buying from frontier AI vendors should expect new due diligence, contract clauses, and attestation requirements A foundation model is deployed with new fine-tuning. The model behaves as expected. Weeks later, an internal researcher flags that access controls around unreleased model weights are weaker than documented. Under California’s 2026 AI regime, that gap is no longer a quiet fix. If it results in unauthorized access, exfiltration, or other defined incident conditions, it becomes a critical safety incident with a 15-day reporting deadline, civil penalties, and audit trails. Beginning January 1, 2026, California’s Transparency in Frontier Artificial Intelligence Act and companion statutes shift AI governance from voluntary principles to enforceable operational requirements. The laws apply to a narrow group: frontier AI developers whose training compute exceeds 10^26 floating point or integer operations, with additional obligations for developers that meet the statute’s “large frontier developer” criteria, including revenue thresholds. Who This Applies To This framework primarily affects large frontier developers and has limited immediate scope. However, it sets expectations that downstream enterprises will likely mirror in vendor governance and procurement requirements. For covered developers, internal-use testing and monitoring are no longer technical hygiene. They are regulated evidence-producing activities. Failures in cybersecurity controls and model weight security can trigger incident reporting and penalties even when no malicious intent exists. What Developers Must Produce The law requires documented artifacts tied to deployment and subject to enforcement. Safety and security protocol. A public document describing how the developer identifies dangerous capabilities, assesses risk thresholds, evaluates mitigations, and secures unreleased model weights. Must include criteria for determining substantial modifications and when new assessments are triggered. Transparency reports. Published before or at deployment. Large frontier developers must include catastrophic risk assessments, third-party evaluations, and compliance descriptions. Frontier AI Framework. Required for large frontier developers. Documents governance structures, lifecycle risk management, and alignment with recognized standards. Updated annually or within 30 days of material changes. What Triggers Reporting The law defines catastrophic risk using explicit harm thresholds: large-scale loss of life or property damage exceeding one billion dollars. Critical safety incidents include: Most critical safety incidents must be reported to the Attorney General within 15 days. Events posing imminent risk of death or serious injury require disclosure within 24 hours. Why the Coupling of Safety and Cybersecurity Matters California’s framework treats model weight security, internal access governance, and shutdown capabilities as safety-bound controls. These are not infrastructure concerns. They are controls explicitly tied to statutory safety obligations, and failures carry compliance consequences. Access logging, segregation of duties, insider threat controls, and exfiltration prevention are directly linked to statutory risk definitions. A control weakness that would previously have been an IT finding can now constitute a compliance-triggering event if it leads to unauthorized access or other defined incidents. Internal use is explicitly covered and subject to audit. Testing, monitoring, and reporting obligations apply to dangerous capabilities that arise from employee use, not just public deployment. This means internal experimentation with frontier models produces compliance artifacts, not just research notes. Developers must document procedures for incident monitoring and for promptly shutting down copies of models they own and control. Operational Changes for Covered Developers Documentation becomes operational. Safety protocols and frameworks must stay aligned with real system behavior. Gaps between documentation and practice can become violations. Incident response expands. Processes must account for regulatory reporting timelines alongside technical containment. Whistleblower infrastructure is required. Anonymous reporting systems and defined response processes create new coordination requirements across legal, security, and engineering teams. Model lifecycle tracking gains compliance consequences. Fine-tuning, retraining, and capability expansion may constitute substantial modifications triggering new assessments. How frequently occurring changes will be interpreted remains unclear. Starting in 2030, large frontier developers must undergo annual independent third-party audits. Downstream Implications for Enterprise Buyers Most enterprises will not meet the compute thresholds that trigger direct coverage. But the framework will shape how they evaluate and contract with AI vendors. Vendor due diligence expands. Procurement and security teams will need to assess whether vendors are subject to California’s requirements and whether their published safety protocols and transparency reports are current. Gaps in vendor documentation become risk factors in sourcing decisions. Contractual flow-down becomes standard. Enterprises will likely require vendors to represent compliance with applicable safety and transparency obligations, notify buyers of critical safety incidents, and provide audit summaries or attestations. These clauses mirror patterns established under GDPR and SOC 2 regimes. Example language: “Vendor shall notify Buyer within 48 hours of any critical safety incident as defined under California Business and Professions Code Chapter 25.1, and shall provide Buyer with copies of all transparency reports and audit summaries upon request.” Internal governance benchmarks shift. Even where not legally required, enterprises may adopt elements of California’s framework as internal policy: documented safety protocols for high-risk AI use cases, defined thresholds for escalation, and audit trails for model deployment decisions. The framework provides a reference architecture for AI governance that extends beyond its direct scope. Security, legal, and procurement teams should expect vendor questionnaires, contract templates, and risk assessment frameworks to incorporate California’s definitions and reporting categories within the next 12 to 18 months. Open Questions Substantial modification thresholds. The protocol must define criteria, but how regulators will interpret frequent fine-tuning or capability expansions is not yet established. Extraterritorial application. The law does not limit applicability to entities physically located in California. Global providers may need to treat California requirements as a baseline. Enforcement priorities. The Attorney General is tasked with oversight, but application patterns across different developer profiles are not yet established. Regime alignment. The European Union’s AI Act defines harm and risk using different metrics, creating potential duplication in compliance strategies. Further Reading California Business and Professions Code Chapter 25.1 (SB 53) Governor of California AI legislation announcements White and Case analysis of California frontier AI laws Sheppard Mullin overview of

Blogs

White Papers

Case Studies

Blogs

White Papers

Case Studies

When AI Agents Get Their Own Secrets

The Database Layer Is Absorbing Agent Infrastructure

Why Coding Agents Need More Than Containers

MCP Tool Calls Are Becoming a Security Signal for AI Agents

When Governance Becomes a Data-Flow Problem

The Prompt Is No Longer the Unit of Design

When Subagents Turn Agent Design Into an Operating Model Decision

When the Model Writes the Exploit

California’s 2026 AI Laws: When a Documentation Gap Becomes a Reportable Incident

Contact Us

Contact Us

Contact Us

Contact Us