Zero Trust for AI Agents

By Puneet Ghanshani, Chief Architect, Microsoft

Autonomous AI agents have moved from science fiction to real-world business workflows. Unlike basic automation or standalone generative models, agentic AI systems perceive their environment, plan multi-step tasks, and act across digital tools with minimal human intervention.

In finance or healthcare settings, for example, an agentic system might autonomously monitor market movements or analyse medical scans, then execute trades or flag critical test results—all under your architectural guardrails.

To unlock their full potential, you need a secure, scalable integration layer that safely connects intelligent agents to enterprise systems.

Integration Architecture

The integration layer typically follows a client‑server design, where agent clients request services and backend servers provide them from databases, data lakes, or business applications. As organisations add more connections, the attack surface grows. Each new connection (or endpoint) increases potential vulnerabilities such as injection or credential theft

Secure deployment isn’t just an IT concern—it’s a board‑level issue that shapes how confidently organisations can adopt AI‑led automation.

Let’s understand the threats and attack vectors and mitigations for integration layer for Agentic AI.

Key Threats and Attack Vectors

Prompt Injection and Adversarial Inputs: Because AI agents take natural language instructions, malicious actors may attempt to manipulate them with crafted inputs (so-called prompt injection or jailbreaking attacks). An attacker could try to trick a customer service agent into revealing confidential info or executing an unintended action by including hidden commands in user input. Prompt injection remains the top LLM security risk per OWASP’s 2025 LLM Top 10 report and as agentic systems face broader injection surface (like web pages, or inter-agent messages), addressing this threat is even more important.

Unintended Agent Behaviour: By design, agentic AI has some autonomy, which means it might at times take actions that developers didn’t fully anticipate. This could be benign (creative problem-solving), or it could lead to errors. For instance, an AI operations agent might decide to restart a server to fix an issue, but if not properly constrained, that could disrupt other services; or an AI agent can place incorrect trade orders due to flawed reward functions.

Overprivileged or Compromised Agent: If an AI agent (a client) is given excessive permissions or if its credentials are compromised, it becomes a powerful tool for attackers. A malicious insider or external hacker who hijacks an agent’s identity could potentially access vast data stores or execute automated actions rapidly.

Model Tampering and Data Poisoning: As organisations develop custom AI agents (potentially fine-tuning models on proprietary data), protecting those models, and model training pipelines from tampering is essential. An attacker who can manipulate the model (for instance, poisoning the training data or altering model parameters) might induce behaviour changes. OWASP’s GenAI risks list also ranks data poisoning as a top integrity threat for LLM lifecycles (pre-training, fine-tuning, embedding).

Preventive and Detect Measures

Preventive Measures:

  • Filter incoming prompts through a Prompt Shield that strips or neutralizes suspicious payloads before they reach the model. This will reduce the attack surface for prompt injection and adversarial inputs
  • Leverage agent framework (e.g. Semantic Kernel integration) to declare and enforce task-specific constraints and guardrails, ensuring agents only execute approved operations.
  • Model each agent’s capabilities as explicit bounded contexts. Enforce these boundaries via built-in guardrails and require human approval for any high-impact or out-of-scope actions until operational confidence is proven.
  • Within the identity and access layer, assign each agent only the minimal roles and privileges needed for its domain, preventing over-privileged operations.
  • Authenticate every component and agent using OAuth 2.1 tokens and mutual TLS, tying authorization to centralized IAM policies for fine-grained access control.
  • Validate model binaries with checksums or hashes before deployment to avoid model tampering

Detective Measures:

  • Hook the agent’s telemetry into your SIEM, like Microsoft Defender, and anomaly-detection systems. Configure alerts on any spike of out-of-pattern agent requests or unexpected context switches.
  • Within the data-ingestion layer, continuously audit and validate external data sources for integrity and provenance.
  • Embed emergency stop controls like fail-safes and kill-switches so administrators can instantly isolate or shut down any misbehaving agent.

Governance

Standardising secure AI integration requires clear organisational processes.

Organisations must:

  • Form AI risk committees and designate security champions
  • Define clear policies for agent provisioning, testing, and lifecycle management
  • Integrate AI oversight into existing GRC workflows so every new agent passes through due diligence

Planning and practicing these response steps will make the organisation more resilient.

Conclusion

Autonomy without control is a recipe for chaos. By baking in strong authentication, least-privilege authorization, robust monitoring, and clear governance, we can turn AI agents from wildcards into trusted collaborators —positioning your organisation at the forefront of secure, and intelligent automation.