Stop Letting Your Kubernetes Nodes Blow Up: Why Memory QoS Finally Makes Sense
Admin User
Author
I spent three hours debugging an OOM killer rampage on a production node last month. The system had plenty of RAM available—or so I thought—but every Burstable pod was hoarding memory like it was going out of style, and when the kernel finally had enough, it started executing processes with surgical randomness. I was staring at kubelet logs wondering why my request limits weren't preventing this chaos.
Then I realized: I'd enabled MemoryQoS in v1.27 and forgotten about it. The feature was setting memory.min (hard reservation) on everything, which meant the kernel couldn't actually use that memory for anything else, even under normal pressure. It was like booking every seat in a theater but leaving them empty. When I finally looked at the actual cgroup configuration, I understood why my node was suffocating.
Kubernetes v1.36 just fixed the exact problem I ran into, and honestly, this deserves way more attention than it's getting.
The Memory QoS Disaster I Didn't Know I Had
Here's what happened in earlier versions: when you enable MemoryQoS, the kubelet assumes all memory requests deserve hard protection. That means memory.min gets set to your request value, and the kernel promises—cross its heart—never to reclaim that memory, even if the entire system is starving.
This sounds safe until you do the math. If you have a node with 8GB of RAM and 7GB of Burstable pod requests, all 7GB gets locked down. The kernel can't touch it. Your system daemons, caches, and any headroom for emergencies? Gone.
The problem is that Burstable pods don't actually need hard protection. By definition, they're willing to be throttled or evicted under pressure. But the old behavior treated them like Guaranteed pods, which do need every byte they request.
The Tiered Approach That Actually Works
Kubernetes v1.36 introduces memoryReservationPolicy, and it's the kind of boring, incremental fix that saves you from 3am pages.
The new behavior is simple: only Guaranteed pods get memory.min (hard protection). Burstable pods get memory.low (soft protection)—the kernel tries to keep it available but will reclaim it if the alternative is a system-wide OOM. BestEffort pods get nothing, as expected.
# kubelet configuration in v1.36
memoryReservationPolicy: TieredReservation
memoryThrottlingFactor: 0.9 # Still applies to memory.high
This distinction matters deeply in production. Your memory-intensive but non-critical workloads (the definition of Burstable) now behave like they're supposed to: they get fair treatment under normal conditions but won't hold the entire system hostage under pressure.
What I'm Actually Paying Attention To
The part that grabbed me was the observability metrics. You can now query kubelet_memory_qos_node_memory_min_bytes and kubelet_memory_qos_node_memory_low_bytes to see exactly how much hard and soft reservation you're burning through.
I'm checking these metrics like they're the fuel gauge on a long road trip. If memory.min is creeping toward your node's physical RAM, you've got a problem brewing. This is the kind of metric that catches issues before they become incidents.
What I find curious is the kernel version requirement. The feature logs a warning if you're below kernel 5.9 due to a livelock bug, but it doesn't actually block you. I appreciate the pragmatism here—sometimes you can't upgrade kernels immediately—but it's also a reminder that Kubernetes is making increasingly direct use of low-level kernel features. You can't treat these upgrades as purely userspace anymore.
The Question I'm Still Sitting With
The opt-in reservation model (memoryReservationPolicy: None vs. TieredReservation) is good for gradual adoption, but I'm wondering how many teams will actually migrate to it. The safest path is to enable throttling first, observe behavior, then opt into reservation when you're confident.
That's the right approach, but it requires discipline. There's always pressure to "just enable it and move on." I'm planning to enable throttling in staging first, run our actual workload patterns against it for a week, and then move to tiered reservation.
My recommendation: if you're running v1.36, don't skip this. The memory handling in Kubernetes has been a constant source of frustration, and this finally feels like a genuine improvement grounded in how real systems actually work.
Source: This post was inspired by "Kubernetes v1.36: Tiered Memory Protection with Memory QoS" by Kubernetes Blog. Read the original article