Post

cloud knowledge is becoming agent infrastructure

cloud knowledge is becoming agent infrastructure

AWS launched Agent Toolkit for AWS this month, and the interesting part is not that another cloud vendor has an AI thing. Of course they do. “We have agents” is the new “we have Kubernetes.”

The interesting part is what AWS chose to package: skills, a managed MCP server, plugins, documentation search, IAM guardrails, CloudWatch metrics, CloudTrail audit logs, and sandboxed execution.

That is not just a developer convenience.

That is cloud knowledge becoming infrastructure.

And honestly, it makes sense. If you have ever watched a general coding agent try to build something non-trivial on AWS, you know the feeling. It gets 70% of the shape right, then confidently invents the wrong property name or picks a weird IAM policy.

Useful? Yes.

this is fine

general agents are bad at local truth

Cloud work is full of local truth: one ecosystem, one account, one region, one service release, one policy boundary, one organization’s weird deployment history.

A foundation model can know a lot about AWS in the abstract. It can explain Lambda, ECS, IAM, VPCs, CloudFormation, Step Functions, Bedrock, and the usual suspects. That is useful background knowledge.

But real cloud work is not a trivia contest.

Real cloud work asks questions like:

  • Is this service available in this region?
  • Which IAM action is actually required for this operation?
  • What changed in this API last month?
  • Does this architecture follow current Well-Architected guidance?
  • Which logs prove the agent made this call?
  • Can we restrict the agent to read-only actions even if the human has broader credentials?
  • Can it run a multi-step diagnostic without touching the developer’s filesystem?

That is where general model knowledge gets thin.

The agent needs current docs, tested procedures, service-specific workflows, and policy-aware execution. It needs the boring stuff, which is where the leverage usually lives.

skills are runbooks for agents

AWS calls one part of the toolkit “Agent Skills.” These are curated packages of instructions, scripts, and reference material that help agents complete AWS tasks.

I like this framing more than I expected.

For humans, we have runbooks, golden paths, architecture decision records, service templates, onboarding docs, and those half-maintained wiki pages everyone swears they will clean up after the launch. Agents need the same thing, just in a shape they can load and follow.

The important shift is that cloud expertise is no longer only documentation a human reads. It becomes executable guidance an agent can discover at runtime.

That matters because context is a budget. Every token spent rediscovering how to configure a data pipeline, write a CloudFormation template, or troubleshoot a deployment is a token not spent understanding the actual application.

This is basically platform engineering again.

Golden paths for humans became skills for agents.

the managed mcp server is the real signal

The AWS MCP Server part is even more interesting.

MCP has been discussed a lot as a way to connect agents to tools. That is true, but incomplete. Once a tool connection can touch cloud resources, MCP becomes an operational control surface.

AWS is not only saying “your agent can call AWS APIs.” It is saying the call path can have IAM controls, CloudWatch metrics, CloudTrail logs, current documentation access, and sandboxed execution.

That is the grown-up version of agent tooling. The scary version is easy to imagine: a coding agent running AWS CLI commands through a local terminal, using the same broad credentials as the developer, with no clean way to distinguish agent actions from human actions later.

That might be fine for a toy account. It is not fine for serious engineering.

If agents are going to operate in cloud environments, we need to answer the same questions we ask for any other actor in the system:

  • What identity does it use?
  • What can it read?
  • What can it change?
  • Which actions require approval?
  • Where are the logs?
  • How do we tell agent actions apart from human actions?
  • What happens when it is wrong?

The moment an agent can mutate infrastructure, it stops being a fancy autocomplete and starts being an actor in your control plane.

cloud providers are productizing context

Cloud providers used to compete mostly on services, regions, pricing, managed abstractions, and ecosystem. They still do. But agent-era cloud providers will also compete on how well they package context for automation.

Not just “can the model write Terraform?”

More like:

  • Can the agent understand current service capabilities?
  • Can it choose the right service for a workload?
  • Can it follow a tested migration path?
  • Can it diagnose failures using the right telemetry?
  • Can it stay inside permission boundaries?
  • Can the organization audit what it did?
  • Can the platform team curate which capabilities are available?

That is a different product surface.

The cloud console was built for humans. SDKs and CLIs were built for developers and automation. Managed MCP servers, agent skills, and plugins are being built for agents. Same cloud. New consumer.

This is why “cloud knowledge is becoming agent infrastructure” is the right mental model. Docs, best practices, troubleshooting flows, API references, and policy hooks are becoming part of the runtime environment that determines whether an agent is useful or dangerous.

this will become a platform team responsibility

The naive version of agent adoption is every engineer installing whatever plugin helps them move faster. That will happen first.

Then someone will ask a very reasonable question: which agents in this company can access production cloud accounts?

That is when the platform team gets involved.

The platform answer probably cannot be “no agents ever.” The useful answer is more likely:

  • approved agent plugins
  • approved MCP servers
  • separate read-only and write-capable profiles
  • IAM policies that distinguish agent-initiated actions
  • audit trails that show the agent’s work
  • sandboxed execution for multi-step scripts
  • explicit approval gates for infrastructure mutation
  • team-owned skills for internal platforms

In other words, the same governance pattern we already use elsewhere.

We do not let every service invent its own deployment system forever. Agent access to cloud resources will go through the same maturation curve, probably faster because the blast radius is obvious.

your company’s knowledge matters too

Vendor-maintained cloud knowledge is only half the story. AWS can teach an agent how AWS works. It cannot teach the agent why your company uses one account structure, why that IAM role exists, or why nobody touches the legacy batch job during month-end close.

That knowledge belongs to the organization.

So the next step is obvious: companies will build their own skills and catalogs around internal platforms. “Deploy a service” will not mean a generic ECS tutorial. It will mean your organization’s deployment path, naming conventions, security constraints, rollback rules, and approval model.

The agent should not improvise your operating model. It should inherit it.

That is where the platform team can create real leverage: turning the organization’s best operational knowledge into agent-readable infrastructure.

what i would do first

If I were responsible for cloud platform engineering today, I would start small. First, define a read-only agent path. Let agents inspect documentation, deployment metadata, metrics, audit events, and stack status before allowing mutation.

Second, separate human credentials from agent behavior. Even when the same human starts the task, the system should know which actions were agent-initiated.

Third, create a tiny internal skill catalog. Not everything. Just the boring workflows people repeat all the time: create a service skeleton, diagnose a failed deployment, prepare a rollback plan, check basic platform standards.

Fourth, require visible reasoning for write actions. The agent should show the docs, telemetry, policy, or skill it used before proposing infrastructure changes.

None of this is glamorous. Good. The boring version is the version that survives contact with production.

the punchline

AWS Agent Toolkit is not just AWS chasing the AI wave. It is a sign that cloud providers understand the next layer of competition: agents need trusted, current, governed cloud context.

General coding agents will keep getting better. But cloud work has too much fast-moving, service-specific, organization-specific truth to rely on general knowledge alone.

The future is not an agent guessing its way through infrastructure from a terminal with broad credentials. The future is agents operating through curated skills, managed tool servers, policy boundaries, audit logs, and organization-specific workflows.

That sounds less magical than the demo version.

It also sounds much more useful.

Cloud knowledge used to live in docs, tickets, runbooks, staff engineers’ heads, and the occasional Slack thread from 2021. Now it is becoming something agents can load, execute, and be governed by.

That is infrastructure.

And like all infrastructure, somebody has to own it.

references

This post is licensed under CC BY 4.0 by the author.