server-side sharded watch is Kubernetes admitting the control plane has a data-scale problem

Posted May 8, 2026

By Paulo Victor Leite Lima Gomes

8 min read

Kubernetes scalability conversations usually start with the obvious stuff.

How many nodes? How many pods? How large is the cluster? How much etcd pain can one organization spiritually endure before someone says “maybe we should split this thing” in a meeting and everyone pretends they were already thinking it?

Fair questions.

But I think Kubernetes v1.36 server-side sharded list and watch points at a quieter, more interesting problem:

the control plane is becoming a data distribution system, and the hard part is no longer only storing objects. It is feeding every client that wants to continuously know what changed.

That sounds boring. Good. Boring infrastructure primitives are where the real architecture leaks out.

watch was always the magic trick

One of the reasons Kubernetes feels so powerful is the watch model.

Controllers do not constantly ask, “hey, did anything happen?” like an anxious intern refreshing a dashboard. They list the current state, then watch for changes. The API server becomes the coordination point for a lot of little loops trying to move reality toward desired state.

That model is elegant.

It is also everywhere.

Your deployment controller watches Deployments and ReplicaSets. Your autoscaler watches workloads and metrics-adjacent signals. Your policy engine watches resources. Your GitOps controller watches cluster state. Your service mesh watches endpoints and config. Your observability stack watches things. Your custom controllers watch things. Your shiny AI platform controller, written during a suspiciously optimistic sprint, also watches things.

Eventually the cluster has fewer “users” than “watchers.”

And that is the part people undercount.

A mature Kubernetes environment is not just a pile of workloads. It is a pile of clients trying to keep a local mental model of the cluster.

the API server is not just an API anymore

We still call it the Kubernetes API server, which is technically correct, but incomplete.

At small scale, it feels like an API. You send requests. You get responses. Nice.

At serious scale, it behaves more like a shared event distribution system with strong consistency expectations, historical state, authorization checks, fan-out pressure, and a very opinionated data model sitting behind it.

That is why server-side sharded list/watch matters.

The simple version: instead of forcing a client to list or watch a large resource set through one giant stream of objects, the server can split the work into shards. Clients can consume partitions of the dataset, and the system can distribute load more intelligently.

The exact implementation details are less important than the admission behind the feature:

large Kubernetes clusters have a data-scale problem at the watch layer.

Not a “Kubernetes is broken” problem. More like a “Kubernetes succeeded so hard that its coordination model is now carrying everybody’s automation habits” problem.

That is a very different vibe.

controllers are cheap until they are not

The platform engineering era made controllers feel cheap.

Need policy? Add a controller. Need sync? Add a controller. Need drift correction? Add a controller. Need to turn a YAML wish into some cloud-side reality? Controller. Need your internal platform to look declarative? Another controller.

I like controllers. They are one of the best ideas in modern infrastructure. But they are not free.

Every controller needs to observe. Every observer consumes API server capacity, cache memory, network bandwidth, authorization checks, and operational attention. A single controller can be harmless. A platform full of controllers becomes an ecosystem of little data subscribers.

This is where the accounting gets fuzzy.

Teams are usually pretty good at counting pods and nodes. They are less good at counting control-plane pressure created by automation.

A GitOps tool might look like “just one more platform component.” A policy engine might look like “just one more safety layer.” An operator installed by a vendor might look like “just how the product works.”

Individually, sure.

Together, they become a read-amplification machine pointed at the API server.

AI agents will make this worse, obviously

I do not mean “obviously” as in panic. I mean it as in: look at the pattern.

AI coding agents, platform agents, remediation bots, incident assistants, internal developer portals, MCP-style tools, and automation loops all want context. They want to inspect state. They want to understand what exists before acting. They want to subscribe to change, detect drift, summarize, explain, fix, and sometimes confidently invent a root cause because the logs looked lonely.

Some of those systems will talk directly to Kubernetes. Some will talk through platform APIs. Some will sit behind gateways. But the demand shape is the same: more machine clients wanting fresher operational state.

That means control-plane scalability becomes less about human kubectl usage and more about automated consumers.

The future cluster is not busy because one engineer ran kubectl get pods -A too many times.

It is busy because fifty systems are trying to keep themselves synchronized with reality.

Server-side sharded watch is the kind of primitive you need when “watching the world” becomes normal behavior.

this changes what platform teams should measure

If the control plane is a data product, platform teams need better product metrics for it.

Not just cluster size. Not just API server CPU. Not just etcd latency after everything is already sad.

I would want to know things like:

which clients are opening the most watches?
which resource types create the most list/watch pressure?
which controllers reconnect too aggressively?
how many clients are watching broad scopes when they only need namespaces?
which internal tools repeatedly list everything because nobody designed a narrower contract?
what happens to watch latency during deployments, outages, or large reconciliations?
which teams are adding control-plane load as a hidden dependency of their product?

That last one matters.

A platform feature that adds ten pods is easy to reason about. A platform feature that adds ten high-cardinality watchers across multiple clusters is harder to see, but sometimes more important.

This is the same kind of hidden tax we keep rediscovering in distributed systems: the expensive part is not always where the YAML makes noise.

sharding is not a license to be lazy

There is a trap with scalability features.

A system gets a better primitive, and everyone treats it as permission to continue the same behavior with slightly more confidence.

That would be the wrong lesson here.

Server-side sharded watch is useful because it gives Kubernetes a better way to serve large-scale clients. But it should also make platform teams more honest about their client design.

If your controller only needs a subset of objects, do not watch the universe. If your tool can tolerate stale summaries, do not demand live cluster truth every second. If your platform API can provide a curated view, do not leak raw Kubernetes watches to every consumer. If your automation acts on changes, make sure it has backpressure, jitter, retry discipline, and boring failure behavior.

Basically: do not turn the API server into Kafka because you were too busy to design an event contract.

Kubernetes watch is a great primitive. It is not a substitute for thinking.

The awkward part is that this is not only technical.

Control-plane pressure is often created by organizational boundaries.

One team installs an operator. Another adds policy. Another adds observability. Another adds security scanning. Another adds an internal platform abstraction. Nobody owns the combined shape until the API server starts sweating.

Then suddenly everyone discovers they are “just a client.”

This is why mature platform engineering needs ownership over control-plane consumption, not only control-plane availability. The platform team should not merely keep Kubernetes alive. It should define what good citizenship looks like for clients that depend on Kubernetes state.

That means documentation, defaults, metrics, review patterns, and sometimes saying no to tools that treat the API server like an infinite free database.

Not because platform teams enjoy being annoying.

Because shared control planes become tragedy-of-the-commons machines when every client optimizes locally.

my take

Server-side sharded list/watch is not the flashiest Kubernetes feature. It will not produce a thousand conference keynotes with lasers.

But it is one of those features that reveals where the real system is going.

Kubernetes is not just scheduling containers anymore. It is the coordination substrate for platforms, policies, agents, operators, and automation loops. That means the API server is not merely serving requests. It is distributing operational truth.

And once operational truth has many subscribers, data-scale problems show up.

So yes, sharded watch is a scalability feature.

But it is also a warning label.

If your platform keeps adding automation, controllers, agents, and “smart” tools, you are also adding readers of reality. Those readers have cost. They have failure modes. They have ownership questions.

The cluster does not only run workloads.

It runs everybody’s need to know what the workloads are doing.

That may be the more interesting scalability problem now.

Kubernetes, Platform Engineering, Infrastructure

This post is licensed under CC BY 4.0 by the author.