Stop Wasting CPU Cores on Sidecars: Why Pod-Level Resource Managers Finally Solve a Real Problem
Admin User
Author
I spent three hours last week debugging why our ML inference pipeline was getting throttled on what should have been an oversized node. The culprit? A metrics-exporter sidecar that was hogging an entire dedicated CPU core because our Kubernetes setup required every container in the pod to request exclusive resources if we wanted NUMA alignment for our main workload.
That's when it hit me: we were throwing away performance and money to satisfy Kubernetes' rigid per-container resource model. Every sidecar, no matter how lightweight, had to claim exclusive CPU slices. This felt wrong—like having to buy premium gasoline for your entire car when only the engine actually needs it.
Kubernetes v1.36's Pod-Level Resource Managers feels like someone finally listened to people building real, performance-sensitive systems in production.
The Problem Nobody Talks About
Here's the gap that's been driving me crazy: modern Kubernetes pods don't exist in isolation. You've got your main application, sure, but you also have service mesh proxies, logging agents, monitoring sidecars, and backup services. Each one is a legitimate necessity.
But when you're running something latency-critical—a database, an ML workload, a real-time trading system—you need those exclusive, NUMA-aligned resources to guarantee performance. Before this feature, that meant declaring guaranteed QoS for every single container, including the tiny sidecar that just forwards logs.
It's inefficient. It's wasteful. It feels broken.
How Pod-Level Resource Managers Changes the Game
The core insight here is elegant: shift the abstraction from "every container gets its own resource allocation" to "the pod declares a total budget, and we're smarter about how those resources are distributed."
With feature gates enabled (PodLevelResources and PodLevelResourceManagers), the kubelet can now create hybrid allocation models. Your main application container gets exclusive, NUMA-aligned CPUs and memory. Your sidecars? They share a leftover "pod shared pool" that's still bounded by your overall pod limits, but they don't need exclusive allocations.
The topology manager handles this at either pod or container scope depending on your needs. Pod scope means all containers benefit from single NUMA alignment. Container scope means you're selective—the ML container gets aligned, the service mesh sidecar just runs in the general pool.
What This Actually Means for Our Infrastructure
Let me break down what I'd actually do differently in my clusters:
For latency-sensitive databases, this is a game-changer. We can allocate 6 exclusive CPUs to PostgreSQL from a pod budget of 8 CPUs, leaving 2 CPUs in a shared pool for the backup agent and metrics exporter. Everyone stays on the same NUMA node, nothing gets throttled, and we're not wasting dedicated cores on auxiliary work.
The ML training case is similarly practical. Our GPU workload gets its pinned resources and NUMA alignment. The Istio sidecar? Let it float in the node's shared pool. It doesn't need the same guarantees.
But here's what concerns me: this is alpha. The observability story seems incomplete in the article. How do I actually see, in production, what's consuming from the pod shared pool versus exclusive containers? CloudEvents? Metrics? I need visibility to trust this.
The Configuration Reality Check
apiVersion: v1
kind: Pod
metadata:
name: database-optimized
spec:
# Pod declares the total budget
resources:
requests:
cpu: "8"
memory: "16Gi"
limits:
cpu: "8"
memory: "16Gi"
containers:
- name: postgres
image: postgres:15
# This gets exclusive resources from the pod budget
resources:
requests:
cpu: "6"
memory: "12Gi"
limits:
cpu: "6"
memory: "12Gi"
- name: pg-backup
image: backup-agent:v1
# No resource requests = runs in pod shared pool
# Still bounded by overall pod limits
The key here is the asymmetry. Your main container is guaranteed exclusive slices. Your sidecars share what's left, but can't exceed the pod total. That's the efficiency win.
What I'm Still Thinking About
I need to understand the CPU CFS quota enforcement better before I'd enable this in production. Exclusive containers get their CFS quotas disabled (so no throttling), while shared pool containers have pod-level CFS quotas enforced. That's good, but what about burstiness? Can the shared pool containers briefly exceed their share if exclusive containers aren't using theirs?
Also, this requires Topology Manager to be configured with either best-effort, restricted, or single-numa-node policy. Single-numa-node is what I'd use, but I'm curious about the failure modes. What happens if the pod can't fit on a single NUMA node anymore?
The observability gaps feel like the real blocker for adoption right now. I'm not enabling alpha features in production without clear visibility into how they're behaving.
Your Turn
Have you built multi-container pods on Kubernetes where resource isolation became a real problem? Are you running workloads latency-sensitive enough to care about NUMA alignment? I'd genuinely like to hear what your pain points are before this feature hits stable.
Source: This post was inspired by "Kubernetes v1.36: Pod-Level Resource Managers (Alpha)" by Kubernetes Blog. Read the original article