DevOps & Infrastructure

Kubernetes v1.36: Server-Side Sharded List/Watch Alpha

Kubernetes clusters are getting massive, and controllers are choking on the data. Version 1.36 offers an alpha fix, but is it enough to tame the beast?

Kubernetes v1.36: Scalability Fix for Massive Clusters — Open Source Beat

Key Takeaways

  • Kubernetes v1.36 introduces alpha support for server-side sharded list and watch to improve controller scalability in large clusters.
  • The feature filters events at the API server, reducing per-replica processing overhead for controllers watching high-cardinality resources.
  • This aims to decrease CPU, memory, and network usage for controllers, ultimately lowering operational costs for large deployments.

And just like that, another scaling bottleneck gets an alpha patch. Kubernetes v1.36 rolls out server-side sharded list and watch—KEP-5866, if you’re into that sort of thing—and the PR people are calling it a game-changer. But let’s pump the brakes, shall we? After two decades of watching Silicon Valley chase shiny new objects, I’ve learned that ‘alpha’ and ‘game-changer’ rarely coexist outside a marketing department’s fever dream.

The core problem is familiar: as your Kubernetes cluster balloons to tens of thousands of nodes, those pesky controllers—you know, the ones that keep everything humming along—start to drown. Every single replica of a controller, designed to be horizontally scaled for resilience, is still drowning in a full stream of events from the API server. They’re chewing up CPU, gobbling memory, and burning network bandwidth just to deserialize mountains of data, only to chuck most of it in the digital trash because it’s not their problem. And here’s the kicker: scaling out the controller doesn’t make this per-replica cost go down; it just makes it multiply.

When Client-Side Just Doesn’t Cut It Anymore

Look, some controllers already try to deal with this mess by doing a bit of client-side sharding. Think kube-state-metrics. Each replica gets a slice of the keyspace, and it dutifully throws away anything it doesn’t own. It’s a functional workaround, sure. But it doesn’t—and this is the critical part—actually reduce the sheer volume of data that’s sloshing around between the API server and your controllers. You’re still paying the network and CPU tax on every single event, even the ones destined for the bit bucket.

N replicas * full event stream: Every replica decodes and processes everything, then discards what it doesn’t need. Network bandwidth scales with the number of replicas, not the actual workload per shard. CPU cycles spent on deserialization become pure waste for the majority of objects.

This is where the new alpha feature, server-side sharded list and watch, swoops in. The idea is to shove that filtering job upstream, right into the API server itself. Each controller replica will tell the API server which specific hash range of resources it’s responsible for, and presto—the API server will only bother sending events that actually belong to that slice. Sweet, right? Or at least, potentially less painful.

So, How Does This Magic Actually Work?

Under the hood, it’s not exactly magic, but it is clever. The feature adds a shardSelector field to ListOptions. Your clients—specifically, the controllers—can then tell the API server which hash range they care about using a neat little function, shardRange(). It looks something like this:

shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')

The API server then takes whatever field you point it at (currently, object.metadata.uid or object.metadata.namespace are supported) and pipes it through a deterministic 64-bit FNV-1a hash. If that hash falls within the specified [start, end) range, the object gets sent. Crucially, the hash is the same no matter which API server replica is doing the work, which means this is designed to play nice with your existing multi-replica API server setup. No more “it works on my machine” API server lottery.

For controllers that typically use informers—and let’s be honest, that’s most of them—the integration involves tweaking those ListOptions with WithTweakListOptions. You’ll hand over your shardSelector there. To split the hash space for a two-replica deployment, Replica 0 might get the lower half ('0x0000000000000000' to '0x8000000000000000') and Replica 1 gets the upper half ('0x8000000000000000' to '0x10000000000000000'). You can even get fancy and define non-contiguous ranges if your workload demands it, though I’m not sure why you’d want to.

When the API server honors a shard selector, the list response includes a shardInfo field in the response metadata that echoes back the applied selector.

And how do you know if the API server actually listened? The response metadata will contain a shardInfo field. If it’s missing, well, assume the worst: the server didn’t honor your selector, and you’re back to square one, sifting through everything yourself. The client has to be ready to handle the full, unfiltered firehose if the server doesn’t play along.

Who’s Actually Making Money Here?

This is the million-dollar question, isn’t it? Or rather, the terabyte-of-data-saved question. On the surface, this is about efficiency. Reduced CPU, memory, and network chatter—that all translates to lower cloud bills, or at least less strain on your on-prem hardware. For operators managing truly gargantuan Kubernetes clusters, this could be the difference between a stable, performant system and a hobbled mess.

Who benefits directly? Controller authors, mainly. If you’re writing a controller that watches a lot of high-cardinality resources—think Pods, Services, custom resources at scale—this feature lightens your load significantly. It’s a win for the developers who have to deal with these scaling headaches firsthand. And by extension, it’s a win for the companies running those large clusters, saving them operational costs.

But let’s not pretend this is some altruistic gift from the Kubernetes gods. It’s a necessary evolution driven by the undeniable reality of how large these clusters are becoming. The pressure is on to keep Kubernetes itself from becoming a bottleneck, and features like this are how the project stays relevant. The real money, of course, is in the platforms and services built on top of Kubernetes. Making Kubernetes more scalable indirectly makes those businesses more scalable and profitable. This is less about a single company cashing in and more about the entire ecosystem finding ways to keep pace with its own success.

This feature is currently alpha, meaning it’s experimental. You’ll need to enable the ShardedListAndWatch feature gate on your API server. The call to action? Feedback. They’re explicitly looking for controller authors and operators running massive clusters to kick the tires and report back. Join the #sig-api-machinery channel on Kubernetes Slack if you’ve got thoughts. Just try not to drown in the watch events while you’re at it.


🧬 Related Insights

Frequently Asked Questions

What does server-side sharded list and watch actually do? It filters events at the API server level, so controller replicas only receive data they are responsible for, reducing CPU, memory, and network load.

Is this feature production-ready? No, it’s currently in alpha and requires enabling a specific feature gate. It’s intended for testing and feedback from large-scale deployments.

Will this fix all Kubernetes scaling problems? It addresses a specific scaling bottleneck related to high-cardinality resource watching for controllers, but Kubernetes has many other scaling dimensions.

Jordan Kim
Written by

Infrastructure reporter. Covers CNCF projects, cloud-native ecosystems, and OSS-backed platforms.

Frequently asked questions

What does server-side sharded list and watch actually do?
It filters events at the API server level, so controller replicas only receive data they are responsible for, reducing CPU, memory, and network load.
Is this feature production-ready?
No, it's currently in alpha and requires enabling a specific feature gate. It's intended for testing and feedback from large-scale deployments.
Will this fix all Kubernetes scaling problems?
It addresses a specific scaling bottleneck related to high-cardinality resource watching for controllers, but Kubernetes has many other scaling dimensions.

Worth sharing?

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Kubernetes Blog

Stay in the loop

The week's most important stories from Open Source Beat, delivered once a week.