Kubernetes 1.36 – What you need to know

Mar 12 2026

•

58 min read

Written by

Head of Developer Relations - Cloudsmith

Kubernetes 1.36 will be the first Kubernetes major release of 2026, and it’s full of really exciting updates for security, AI hardware, and more! As always, removing enhancements with the status of “Deferred” or “Removed from Milestone” we are seeing 80 enhancements in all listed within the official tracker. So, what can we expect in 1.36?

Kubernetes 1.36 brings a whole bunch of useful enhancements, including 36 changes tracked as ‘Graduating’ in this Kubernetes release. From these, just 18 enhancements are currently graduating to stable, such as Support for User Namespaces in pods and Mutating Admission Policies, as well as 4 DRA-specific enhancements graduating to that GA status.

A whopping 26 new alpha features are also listed in the enhancements tracker, one of which introduces the ability to report when a PVC was last used in pvc.Status. A simple by powerful use-case where users can now see if a PVC is sitting unused, so in larger clusters this would really help DevOps teams to get rid of any unused and unwanted PVCs.

As always, let’s jump into all of the major graduations, deferred enhancements and deprecations scheduled for Kubernetes 1.36.

Kubernetes 1.36 – Editor’s pick:

Here are a few of the changes that Cloudsmith employees are most excited about in this release:

#5055 Taints and Tolerations in DRA
It’s becoming a regular comment in these recent Kubernetes release notes, but the DRA API is seeing a LOT of exciting enhancements in the 1.36 update. [#4815, #4816, #5007, #5004, #5491, #5075, #5729, and #5517 all updated in this release]. This led the Cloudsmith team to add dedicated sig-storage and sig-node sections just for DRA updates. Anyways, this specific DRA enhancement is exciting because it brings more granularity and automation to hardware management, allowing admins to take specific devices offline for maintenance without disrupting the entire cluster. By introducing taints and tolerations for hardware, similar to what we already have for pod deployments, we can all benefit from automatically rescheduling pods away from failing devices while still letting specialised test pods access them for daily troubleshooting activities.

Nigel Douglas - Head of Developer Relations

#4639 OCI VolumeSource
I am absolutely thrilled to see OCI image volumes finally hitting Stable status in Kubernetes 1.36 because, as a Product Manager working on the OCI format support at Cloudsmith, I can see this enhancement turning existing container registries into universal artifact hubs. For years, dev teams have been forced into the so-called fat image anti-pattern, where they’re basically bundling massive ML models, static assets, and binary plugins directly into their application images, which ultimately creates a nightmare for security patching and bloats deployment times. By graduating this to Stable, we’re now seeing a native, high-performance way to decouple user data from your logic. Devs can now push Hugging Face LLM weights or commercial signatures as independent OCI artifacts and mount them as a VolumeSource just as easily as a ConfigMap. This doesn't just simplify CI/CD pipelines by allowing platform teams to swap content without rebuilding the base engine, but it also drastically shrinks your potential attack surface for the platform teams who are managing these deployments. It’s a massive win for the ecosystem that transforms the OCI registry from a basic storage location for apps into a more dynamic, structured delivery system for any content that a pod needs to succeed.

Liana Ertz - Product Manager

#5793 Manifest-based Admission Control configuration
As a CSM, I’m genuinely stoked for this 1.36 feature because it finally lets platform teams secure their actual admission control configs. When we think about moving mission-critical policies from the API to static files on the control plane disk, platform teams can now prevent the scary security blindspot where the cluster was vulnerable during the startup. Based on what I'm hearing from the teams I work with, this means vital guardrails (like blocking privileged containers) are now immune to accidental kubectl delete commands or etcd crashes. It’s a massive win for their platform stability, providing platform engineers with a somewhat tamper-proof approach to keeping the Golden Path secure from the very first second the API server wakes up.

Amy Strutton - Customer Success Manager

Apps in Kubernetes 1.36

#5440 Mutable Container Resources when Job is suspended
Stage: Graduating to Beta
Feature group: sig-apps

This proposal seeks to enhance Kubernetes Batch Job management by relaxing the immutability of a Job’s Pod template while it is in a suspended state. Building on the existing suspend flag, this change would allow higher-level queue controllers to mutate resource specifications (specifically CPU, Memory, GPU, and extended Resource Requests / Limits) before a Job is unsuspended and pods are created. By allowing these updates, cluster administrators and automated controllers can dynamically optimise resource allocation based on real-time cluster capacity, current queue priorities, and actual workload utilisation, ensuring that expensive hardware like GPUs is used efficiently.

The design ensures that these mutations are only permitted when a Job is fully suspended and has no active pods, mitigating the risk of disrupting running workloads. While the proposal does not include implementing a specific queue controller or supporting in-place pod updates, it provides the necessary API primitives for external tools (like Kueue) to perform checkpointing and resizing. This flexibility allows a Job to be suspended, its resource requests lowered to match actual usage, and then resumed, thereby improving overall cluster throughput and reducing resource waste.

#3541 Add Recreate Update Strategy to StatefulSet
Stage: Net New to Alpha
Feature group: sig-apps

Starting in Kubernetes 1.36, the introduction of the Recreate Update Strategy for StatefulSets addresses a long-standing stuck rollout issue that has plagued users since 2018. Previously, if a StatefulSet update was triggered with a broken configuration (such as a non-existent Docker image), the controller would wait indefinitely for the failing Pod to become Ready before proceeding. Even if a user reverted the configuration to a known-good state, the StatefulSet would refuse to manage or fix the broken Pod because it was stuck in a validation loop, forcing administrators to manually delete the Pod to unblock the controller.

The new Recreate strategy (moving to Alpha in this update) provides a more aggressive alternative to the standard RollingUpdate. It allows the StatefulSet controller to prioritise reaching the desired state by terminating existing Pods before creating new ones, effectively bypassing the waiting for ready deadlock. This change transforms what was previously documented as a known issue into a manageable configuration, ensuring that automated CI/CD pipelines and operators can recover from misconfigurations without manual human intervention.

#5547 WAS: Integrate Workload APIs with Job controller
Stage: Net New to Alpha
Feature group: sig-apps

The feature represents a major step in Kubernetes' evolution to natively support complex batch and AI/ML workloads. Historically, Kubernetes scheduled individual Pods independently, which caused issues for distributed training jobs that require all participants to start simultaneously (gang scheduling). This feature introduces the Workload API and a decoupled PodGroup object to act as a link between high-level controllers (like the Job controller) and the scheduler. By integrating these, the Job controller can now automatically generate a Workload representation of itself, allowing the scheduler to understand the resource requirements and scheduling constraints of the entire group of pods rather than treating them as isolated units.

For the Kubernetes 1.36 Alpha update, the proposal focuses on standardising this integration and refining the API surface. Key changes include introducing the PodGroup as a standalone runtime object and renaming the embedded field within the Workload spec to PodGroupTemplate to ensure a cleaner, more extensible hierarchy. Specifically for the Job controller, the 1.36 update aims to enable the creation of Workload and PodGroup objects by default for static Jobs (those where parallelism does not change). This release also seeks to resolve a critical debate on Scheduling Policies, which is whether the release team should maintain separate Basic and Gang policies or to unify them into a single parameter where a minCount of 1 represents basic scheduling and a minCount equal to the total replicas represents strict gang scheduling.

#5882 Deployment Pod Replacement Policy
Stage: Net New to Alpha
Feature group: sig-apps

The PR #5883 marked a simple, formal separation of two different KEPs that were previously managed as a single unit: KEP-3973 (terminating pods in Deployments) and KEP-5882 (the Deployment pod replacement policy). The split was justified because the #3973 enhancement which introduced the .status.terminatingReplicas field to track pods in the process of shutting down moved through the development cycle faster than the more complex pod replacement policy. By decoupling them, the sig-apps team can now track their graduation through Alpha, Beta, and Stable stages independently, preventing the progress of one from being stalled by the other.

The merged changes include updated documentation and readiness reviews (PRRs for short) for both features. Key technical discussions during the PR focused on ensuring proper version skew policies, specifically requiring that the kube-apiserver be upgraded before the kube-controller-manager to avoid status synchronisation issues. While the terminating replicas feature is already seeing implementation in recent releases, the specific API and logic for the pod replacement policy remain in progress, now neatly organised under its own dedicated proposal for future tracking in the 1.36 update.

API in Kubernetes 1.36

#3962 Mutating Admission Policies
Stage: Graduating to Stable
Feature group: sig-api-machinery

Graduating to stable, Mutating Admission Policies introduce a declarative, in-process alternative to traditional mutating admission webhooks by leveraging Common Expression Language (CEL). This enhancement allows cluster administrators to define resource modifications (such as injecting sidecar containers, enforcing image pull policies, or setting default labels) directly within Kubernetes using MutatingAdmissionPolicy and MutatingAdmissionPolicyBinding objects. By using CEL's object instantiation alongside Server-Side Apply (SSA) merge algorithms or JSON Patch, the API server can perform these mutations internally, significantly reducing the operational overhead, latency, and webhook fatigue associated with maintaining external infrastructure.

Beyond simplicity, this native approach offers superior performance and reliability. Because the mutations are in-process, the kube-apiserver can introspect changes to optimise execution order and safely re-run policies to ensure idempotency. While it does not aim for 100% feature parity with webhooks, specifically excluding external calls, it covers the vast majority of common use cases while providing a safety field to validate that mutations remain consistent. This move effectively brings the power and efficiency of ValidatingAdmissionPolicies to the mutation phase, creating a more robust and integrated policy management framework within the Kubernetes ecosystem.

#5647 Stale Controller Mitigation
Stage: Net New to Alpha
Feature group: sig-api-machinery

This proposal introduces a mechanism to mitigate controller staleness in Kubernetes by allowing controllers to detect when their local cache lags behind the API server. Currently, controllers in the kube-controller-manager (KCM) rely on eventually consistent watch streams, which can lead to spurious reconciles or incorrect decisions (like scaling or deleting resources) based on outdated data. To solve this, the enhancement proposes an opt-in read-after-write guarantee. By tracking the Resource Version (RV) of objects after a write and updating the ResourceEventHandlerFuncs with a new BookmarkFunc, controllers can verify if their previous changes have propagated to the local cache before proceeding with the next reconciliation.

In practice, this logic will be integrated into the core processing loops of sensitive controllers, such as the DaemonSet controller. If a controller determines that its cache has not yet caught up to its last-written state, it will skip and requeue the object with exponential backoff rather than performing a potentially erroneous reconcile. This targeted approach ensures consistency for specific resources without imposing a global performance penalty, though it requires careful handling of edge cases like controller restarts and cache resyncs to prevent permanent reconciliation stalls.

CLI in Kubernetes 1.36

#3104 Separate kubectl user preferences from cluster configs
Stage: Major Change to Beta
Feature group: sig-cli

The kuberc feature introduces a dedicated configuration file (typically located at ~/.kube/kuberc) designed to separate individual user preferences, such as command aliases and default flag settings, from the cluster credentials and server information stored in a standard kubeconfig. This separation allows users to maintain consistent local workflows (like enforcing interactive delete confirmations or silencing deprecation warnings) across multiple clusters without modifying shared or auto-generated connection files. The major change moving into v.1.36 Beta status is that the feature is now enabled by default, whereas it previously required manual activation. To support this promotion, the update introduces the kubectl kuberc management command to help users programmatically view and edit their preferences, and it formalises security controls through a credential plugin allowlist to prevent the execution of untrusted binaries.

Kubernetes 1.36 Networking

#5311 Relaxed validation for Services names
Stage: Graduating to Beta
Feature group: sig-network

This relaxed ServiceName validation feature is an improvement designed to bring the naming constraints of Service resources into alignment with other standard Kubernetes objects. Historically, Services were restricted by a stricter DNS-1035 label standard (NameIsDNS1035Label), which prohibited names from starting with a numeric digit. This update transitions the validation to the more flexible NameIsDNSLabel standard, finally allowing users to create Services with names like 123-backend or 8080-proxy. In the transition from Alpha to Beta, the feature is moving from an opt-in experimental state to being enabled by default in the kube-apiserver. This graduation signifies that the community has successfully completed compatibility testing with downstream systems like DNS providers and Ingress controllers, and has verified that the change remains safe during cluster upgrades and downgrades by maintaining immutability checks on existing resource names.

#3695 Extend PodResources to include resources from DRA
Stage: Graduating to Stable
Feature group: sig-network
Feature gate: KubeletPodResourcesGet Default value: enabled
Feature gate: KubeletPodResourcesDynamicResources Default value: enabled

This enhancement expands the Kubelet PodResources API to help improve visibility gap between the Kubernetes node and external monitoring tools. Historically, this API only allowed monitoring agents to see which CPUs or standard devices were assigned to a container. This feature extends that capability to include resources managed by the evolving DRA feature. By adding a dynamic_resources field and a new Get() method for targeted pod queries, the Kubelet can now report specific DRA-allocated hardware (such as GPUs or FPGAs) directly to monitoring stacks like Prometheus or NVIDIA's DCGM exporter. To ensure this data remains available even if a DRA driver goes offline, the system implements a checkpointing mechanism within the DRAManager, guaranteeing that resource assignments are preserved across Kubelet restarts.

The primary benefit of this enhancement is granular observability for next-gen hardware. Platform operators can now access per-pod metrics for DRA-managed resources, which is essential for accurate billing, performance tuning, and troubleshooting in AI/ML workloads. Since the feature is graduating to Stable in v1.36, the community can expect full production-ready reliability with a guaranteed 99.9% success rate for Get and List requests over a rolling 5-minute window as well as low-latency response times (P99 < 100ms). The graduation to GA means the KubeletPodResourcesDynamicResources and KubeletPodResourcesGet feature gates will be locked to on by default, allowing third-party ecosystem tools to officially rely on stable gRPC fields without the risk of breaking changes or experimental instability.

#4858 IP/CIDR validation improvements
Stage: Graduating to Beta
Feature group: sig-network

This is a security-focused refinement of how Kubernetes handles network addressing, specifically targeting the stricter validation of IP addresses and CIDR strings across the API. Historically, Kubernetes relied on older Go functions that were overly permissive, allowing ambiguous formats like IPv4 addresses with leading zeros (for example: 172.030.099.099), which different systems might interpret differently (octal vs. decimal), potentially leading to security vulnerabilities like CVE-2021-29923. For platform teams, this is highly beneficial because it eliminates many IP address risks and ensures consistency between Kubernetes and underlying network plugins (Calico and Cilium) and OS-level libraries. To maintain stability, the plan uses ratcheting validation, which prevents new invalid entries without breaking existing workloads, which means that teams can update a service’s labels without being forced to immediately fix a legacy IP field. This move toward canonical formats (like standardised IPv6 strings) simplifies long-term maintenance, auditing, and observability, as platform engineers can now trust that network data is unambiguous and follows a single source of truth format across the entire cluster.

Kubernetes 1.36 Authentication

#740 API for external signing of Service Account tokens
Stage: Graduating to Stable
Feature group: sig-auth

This KEP proposes a transition from static, file-based Service Account key management to a more dynamic integration with external providers like Hardware Security Modules (HSMs) and Cloud KMS. Currently, the kube-apiserver loads signing keys from the local disk at startup, a method that lacks flexibility and poses a security risk, as any user with filesystem access could exfiltrate the signing material. By introducing a new gRPC-based API (ExternalJWTSigner) accessible via a Unix domain socket, Kubernetes can delegate JWT signing and public key retrieval to an external plugin. This shift enables seamless key rotation without requiring an API server restart and ensures that sensitive private keys never reside in the cluster’s local memory or storage.

To maintain consistency and security, the kube-apiserver remains responsible for assembling the JWT claims and headers, while the external signer only provides the signature. This architecture prevents claim divergence and ensures that externally signed tokens remain compatible with existing OIDC discovery endpoints. While the proposal introduces a new --service-account-signing-endpoint flag, it preserves backward compatibility by keeping existing file-based flags as mutually exclusive options. Potential performance overhead from socket communication will be mitigated through benchmarking, and access to the signing socket will be restricted via standard Unix file permissions to prevent unauthorised token generation.

#5681 Conditional authorisation
Stage: Net New to Alpha
Feature group: sig-auth

Kubernetes 1.36 introduces a brand-new conditional authorisation capability, which is a framework that allows authorisation decisions to depend on the actual data within a resource (like specific fields or labels) rather than just its metadata. Currently, Kubernetes authorisers (like RBAC) are basically blind to the content of a request body. This KEP has a two-phase evaluation process.

In the first phase, the authoriser performs partial evaluation: if it can't reach a final Allow or Deny based on metadata alone, it returns a Conditional response containing a set of requirements (for example, " only Allow if storageClassName is 'dev' ").
In the second phase, these conditions are enforced during the Validating Admission stage, where the API server finally decodes the object and has access to the necessary field data to make a concrete decision.

This architecture solves several long-standing limitations by providing a unified, cohesive policy model that spans both authorisation and admission. It eliminates the need for administrators to essentially over-grant their permissions in RBAC only to restrict them later with separate tools like ValidatingAdmissionPolicy.

By propagating conditions through the request chain, the system ensures atomicity (the decision is based on a single snapshot of policy) and improves user experience, as users can see exactly why they are restricted via kubectl auth can-i lookups. Concretely, the enhancement extends the SubjectAccessReview API and introduces an AuthorizationConditionsReview API, enabling both in-tree and out-of-tree authorisers to implement fine-grained controls, like restricting which fields user can update or which signerName a CSR can use without compromising API server's performance or security.

#5284 Constrained impersonation
Stage: Graduating to Beta
Feature group: sig-auth

Constrained impersonation helps Kubernetes move away from the unrestricted legacy model where an impersonator automatically inherits all permissions of the target user. To mitigate these security risks (especially for controllers and per-node agents) impersonators must now possess two distinct sets of permissions:

The authority to impersonate a specific identity (impersonate:user-info)
Right to perform specific actions on behalf of that identity (impersonate-on:user-info:list).

This opt-in mechanism ensures that even if a controller is compromised, its impersonation power is limited to specific resources and verbs. In the below Go example, a controller can be restricted to impersonating only the specific node it is running on by using the downward API to identify itself and configuring the client as follows:

// Example: Restricting a controller to impersonate its own node
kubeConfig, _ := clientcmd.BuildConfigFromFlags("", "")
kubeConfig.Impersonate = rest.ImpersonationConfig{
	UserName: "system:node:" + os.Getenv("MY_NODE_NAME"),
}

By introducing these prefixed verbs like impersonate-on:user-info:watch, platform engineers can now define highly specific RBAC roles. This setup ensures that an impersonator can only execute a request if both the impersonator has the impersonate-on permission for the action and the impersonated principal has the underlying permission to perform the task itself.

#3926 Handling undecryptable resources
Stage: Graduating to Beta
Feature group: sig-auth

Kubernetes 1.36 is addressing a long-standing recovery gap where encrypted API resources become undeletable due to missing keys or data corruption. Currently, if a single object in etcd fails to decrypt or decode, listing that resource type fails entirely, forcing admins to bypass the Kubernetes API and manually manipulate the database, which is a really risky and complex process. To solve this, the proposal introduces a way to identify these broken resources and provides a new DeleteOption that allows for their removal even when their content remains unreadable.

However, this unconditional delete is a high-risk operation. Since the system cannot read the object's metadata, deleting it bypasses standard safety features like finalisers and garbage collection, potentially leaving orphaned system processes (like running Pods) behind. To mitigate this, the design includes a new StatusReasonStoreReadError for better diagnostics and requires explicit confirmation through kubectl prompts and server-side admission layers to ensure administrators understand the impact before purging malformed data.

Kubernetes 1.36 Nodes

#127 Support User Namespaces in pods
Stage: Graduating to Stable
Feature group: sig-node

This enhancement introduces support for user namespaces, a critical isolation feature that maps containerised user and group IDs to different, unprivileged IDs on the host. By allowing a process to hold root privileges within a pod while remaining unprivileged on the underlying node, the KEP significantly bolsters node-to-pod and pod-to-pod isolation. This ensures that even if a process achieves a container breakout or possesses elevated capabilities like CAP_SYS_ADMIN, its impact is strictly confined to the namespace, preventing it from exercising administrative control over the host or other pods.

The implementation specifically addresses and mitigates several high-severity vulnerabilities where container escapes or privilege escalations previously threatened host integrity. Key CVEs resolved or dampened by this update include:

CVE-2019-5736: Completely mitigates ability to overwrite the host runc binary from a container.
CVE-2017-1002101: Fixes critical vulnerability (CVSS: 9.6) involving volume mounts & subpaths.
Azurescape: Neutralises the first known cross-account container takeover in public cloud provider.
CVE-2018-15664 & CVE-2016-8867: Prevents TOCTOU race attacks & internal privilege escalation.
CVE-2021-25741 & CVE-2021-30465: Mitigates symlink & mount-based attacks by ensuring container-root is not host-root.

#2862 Fine-grained Kubelet API authorisation
Stage: Graduating to Stable
Feature group: sig-node
Feature gate: KubeletFineGrainedAuthz Default value: true

Another welcome stable update sees the generally-availability of fine-grained authorisation for the Kubelet API to better support the principle of least privilege. Previously, the Kubelet used a coarse RBAC scheme where low-risk actions, such as reading health status (/healthz) or listing pods (/pods), required the same high-level proxy subresource permission as dangerous actions like executing arbitrary code (/exec). By introducing specific subresources for /configz, /healthz, and /pods, the proposal allows monitoring and logging agents to access necessary data without being granted over-privileged access that could be exploited for lateral movement or privilege escalation.

The implementation introduces a new feature gate, KubeletFineGrainedAuthz, and ensures backward compatibility by falling back to the original proxy check if the new granular permissions are not found. To minimise performance impacts from additional SubjectAccessReview requests, the design leverages the Kubelet’s existing authorisation cache. Beyond security hardening, this change also aims to officially document these previously internal Kubelet endpoints, providing a standardised and secure way for ecosystem tools to interact with node-level data.

#5554 Support in-place update pod resources alongside static CPU manager policy
Stage: Net New to Alpha
Feature group: sig-node

Transitioning to Alpha status in 1.36, platform teams benefit from a major improvement to Kubernetes resource management by enabling in-place vertical scaling for pods even when a Static CPU Manager policy is active. Previously, the so-called “Static” policies, which grant pods exclusive access to specific CPU cores, struggled to reconcile real-time resource changes without restarting the container. This update integrates Topology Manager feasibility checks into the pod resize path, allowing the Kubelet to dynamically allocate or release exclusive CPU cores during an upsize or downsize event. By updating the CPUManager checkpoint format to track both original and resized allocations, the system can now perform admission-style validation during a live resize to ensure NUMA alignment and resource availability are maintained without interrupting the workload.

#4205 Support PSI based on cgroupv2
Stage: Graduating to Stable
Feature group: sig-node

The GA state for integration of Pressure Stall Information (PSI) in Kubernetes 1.36 enhances node monitoring by allowing the kubelet to natively ingest and expose CPU, memory, and I/O pressure metrics from cAdvisor and runc. Previously, users had to rely on external tools like Node Exporter to gain this visibility. However, this update embeds PSI data directly into the Summary API at both the node and pod levels. By providing barometer-like insights into resource shortages before they cause failures, these metrics enable more granular detection of congestion (categorised as some or full pressure) over 10, 60, and 300-second windows. This foundational change sets the stage for more proactive node management and intelligent responses to impending resource scarcity within the cluster.

To help visualise how data is now organised in the API, here’s a simplified look at a new metrics schema:

Metric-level	Resource Types	Data Points Provided
NodeStats	CPU, Memory, I/O	Avg10, Avg60, Avg300 (%), and Total (ns)
PodStats	CPU, Memory, I/O	Some (partial stall) and Full (complete stall)

#4265 add ProcMount option
Stage: Graduating to Stable
Feature group: sig-node

This enhancement, revitalised for K8s 1.36, graduates the long-standing ProcMountType feature to allow high-trust workloads to bypass the default security masking of the /proc filesystem. Traditionally, the Kubelet instructs container runtimes to mask or set certain /proc paths as read-only to prevent sensitive host data exposure. However, this restriction blocks critical use cases like nesting unprivileged containers or building container images within a Pod. By introducing the procMount: Unmasked field to the securityContext, users can now opt out of these defaults. Because unmasking /proc can theoretically allow a root container to modify the host kernel, this capability is strictly governed by the Privileged Pod Security Admission (PSA) level and is intended to be used in conjunction with User Namespaces to mitigate escalation risks.

#5109 Split L3 Cache Topology Awareness in CPU Manager
Stage: Graduating to Stable
Feature group: sig-node

This stable enhancement offers a new Kubernetes CPU Manager static policy option, prefer-align-cpus-by-uncorecache, designed to optimise workload performance on modern modular CPU architectures. While traditional processors often use a single, shared uncore (last-level) cache, newer x86 and ARM designs frequently employ a split uncore cache where subsets of cores share dedicated cache units. The current Kubelet is unaware of this hierarchy, often spreading container processes across multiple uncore caches. This leads to noisy neighbor issues and increased inter-cache latency, which can degrade performance for latency-sensitive applications like HPC, networking, and Telco functions.

The proposed feature introduces an opt-in, best-effort allocation algorithm that prioritises grouping CPU assignments within the fewest number of uncore cache domains possible. By adding an uncorecacheId to the CPU topology awareness, the CPU Manager can now sort and allocate resources that align with these physical hardware boundaries. This optimisation has demonstrated significant real-world benefits, such as an 18% performance uplift in database workloads. Importantly, the policy is designed to be non-disruptive. Basically, it integrates with existing options like full-pcpus-only and falls back to standard behaviour if optimal alignment isn't possible or if the hardware doesn't support split caches, ensuring high-density and mixed-workload support remains intact.

#5328 Node declared features (formerly Node Capabilities)
Stage: Graduating to Beta
Feature group: sig-node

The Node Declared Features KEP proposed a standardised approach for nodes to automatically report their supported feature-gated capabilities directly to the control plane. Currently, managing version skew (which is the gap between a newly upgraded control plane and older Kubelet versions) relies on DevOps engineers to manually apply complex taints, labels, and selectors to ensure pods land on compatible hardware. This proposal introduces a declaredFeatures field in the node’s status, which the Kubelet populates during its bootstrap process. By making these features a first-class signal, the kube-scheduler can proactively filter out incompatible nodes, keeping pods in a Pending state with clear feedback rather than allowing them to fail with runtime errors after being scheduled.

Beyond scheduling, this framework enhances API safety and cluster stability during gradual rollouts. Admission controllers can use these declared features to validate requests, such as rejecting an in-place pod resize if the target node is running an older version that lacks support for the operation. It also allows the API server to dynamically adapt its communication protocols, such as transitioning from SPDY to WebSockets only when a node confirms it is capable. Ultimately, this mechanism reduces operational overhead and increases workload portability by replacing inconsistent, provider-specific manual configurations with a native, automated lifecycle for graduating Kubernetes features.

#5394 PSI-based node conditions
Stage: Net New to Alpha
Feature group: sig-node

This Alpha stage enhancement introduces Pressure Stall Information (PSI)-based node conditions to Kubernetes, which looks to have been somewhat in limbo since the 1.34 release. Originally split from KEP-4205 to allow for independent tracking, this enhancement leverages cgroupv2 to provide more granular visibility into resource contention. By exposing PSI metrics, the kubelet can now identify and report when a node is experiencing significant pressure on Memory and I/O resources. Notably, the implementation distinguishes between these uncompressible resources and CPU PSI, which, due to its compressible nature, is slated as a requirement for the Beta phase rather than being used for immediate node tainting in the Alpha stage. Graduating to Alpha, this enhancement significantly improves the scheduler & operator's ability to respond to resource exhaustion before a node becomes completely unresponsive.

#5825 CRI List streaming
Stage: Net New to Alpha
Feature group: sig-node

The core of this Alpha enhancement is transforming how CRI handles these List operations, ultimately moving away from a single, bulky all-or-nothing response toward a continuous stream of data. By transitioning to server-side streaming, the List* operations can effectively drip-feed the container and pod data to the kubelet. This prevents the system from hitting that hard 16 MB gRPC wall when dealing with massive datasets, such as nodes managing over 10,000 containers, while ensuring the kubelet still receives the full list it needs to function.

#5419 Pod-level resources support with in-place pod vertical scaling
Stage: Graduating to Beta
Feature group: sig-node

By extending In-Place Pod Resize (IPPR) to support aggregate resource specifications at the Pod level, building on the foundation of KEP-2837. Currently, IPPR is limited to container-level adjustments, often forcing pod recreations to change the overall resource footprint. By enabling dynamic, in-place scaling of pod-level CPU and memory, operators can improve cluster utilisation and reduce operational overhead without disrupting running services. This enhancement is specifically designed for cgroupv2 environments and was originally introduced as an opt-in alpha feature in Kubernetes v1.34 under the feature gate - InPlacePodLevelResourcesVerticalScaling. Now making the graduation to beta status.

The design ensures consistency with existing IPPR workflows while introducing new tracking mechanisms in PodStatus to reflect actual allocated resources. While the proposal offers significant flexibility for multi-container pods, it maintains a strict scope: it does not cover non-compute resources (like GPUs), QoS class changes, or the removal of lower-priority pods to facilitate resizing. Potential risks, such as scheduler race conditions and impacts on tools that rely on legacy cgroup derived values, are mitigated through gradual rollouts (Alpha to GA) and comprehensive documentation of the new resource calculation methods.

Node-specific DRA enhancements in 1.36

Dynamic Resource Allocation (DRA) is causing a seismic shift within Kubernetes. As there is an increasing movement to get Kubernetes to support dedicated GPU hardware like Nvidia for AI and LLM workloads, we’ll be seeing more Node-related changes to accommodate DRA in production Kubernetes environments like ChatGPT.

#5304 Device Attributes in Downward API
Stage: Net New to Alpha
Feature group: sig-node
Feature gate: DRADownwardDeviceAttributes Default value: disabled

The Device Attributes in Downward API is a net-new enhancement entering Alpha in Kubernetes 1.36. This feature allows Pods to self-discover specific attributes of allocated devices, like hardware IDs, NUMA affinity, or custom vendor strings, by resolving them directly from ResourceSlices and surfacing them into the Pod's environment or volumes. By utilising a new DRADeviceFieldRef structure, the system can handle complex allocation scenarios, including single-device indexing or concatenating attributes from multiple devices into a single comma-separated string for easier application consumption.

The implementation is unique because it operates as a framework-level opt-in rather than a core Kubernetes API change, requiring DRA drivers to explicitly enable the metadata feature through the kubeletplugin library. To ensure operational stability and security, the design incorporates unique identifiers like the claimUID in host file paths to prevent metadata collisions between different incarnations of the same resource claim. Furthermore, the enhancement includes a dedicated command-line flag within the DRA driver framework, giving administrators and developers a straightforward mechanism to disable the metadata generation code path without needing to refactor the driver’s core logic if issues arise during the Alpha phase.

#5018 AdminAccess for ResourceClaims and ResourceClaimTemplates
Stage: Graduating to Stable
Feature group: sig-node

The DRAAdminAccess feature, graduating to General Availability (GA) in Kubernetes 1.36, solves a critical operational bottleneck in DRA by providing a secure, privileged monitoring path for hardware devices. Historically, DRA was designed for exclusive workload access, so once a GPU or FPGA was claimed by a user it basically became a black box to platform engineers. DRAAdminAccess introduces a formal mechanism for cluster admins to bypass standard allocation logic, allowing them to deploy monitoring agents or diagnostic tools to devices already in use without displacing the primary workload or violating security boundaries.

This graduation to Stable status matters because it transitions the feature from an experimental toggle to a core, production-ready standard for hardware management. By enforcing a strict security model that requires both an adminAccess flag in the ResourceClaim and a specific label on the namespace (resource.kubernetes.io/admin-access: "true") it ultimately prevents non-privileged users from snooping on shared hardware. For dev teams running large-scale AI or HPC clusters, this provides a really powerful feature for real-time health telemetry and troubleshooting, ensuring expensive hardware isn't just utilised, but also properly maintained and observable

To use this feature, the Kubernetes admin first labels a secure namespace and then creates a ResourceClaim with the adminAccess field set to true. This allows the pod to attach to devices even if they are already allocated to other users.

apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
  name: gpu-monitoring-claim
  # The namespace must have the label: resource.kubernetes.io/admin-access: "true"
  namespace: kube-system 
spec:
  devices:
    requests:
      - deviceClassName: nvidia-gpu-class
        # This flag bypasses standard allocation checks in the scheduler
        adminAccess: true

#5677 Resource Availability Visibility
Stage: Net New to Alpha
Feature group: sig-node

Resource Availability Visibility addresses yet another critical transparency gap in the framework by providing a standardised way to query the real-time availability of hardware resources like GPUs, FPGAs, and NICs. This enhancement introduces a new API object, the ResourcePoolStatusRequest, which functions similarly to a Certificate Signing Request (CSR) in that a user creates the object to ask the cluster about available resources, and a controller populates the status with calculated data before eventually being cleaned up.

Users should care because, prior to this, determining whether a cluster actually had the specialised hardware capacity to run a complex workload was often a try and fail guessing game, especially in multi-tenant environments. By providing a principled, bounded view of resource pools, this feature enables developers and autoscalers to make informed scheduling decisions, reduces pod pending times caused by resource exhaustion, and offers much-needed observability for administrators managing high-performance computing hardware.

Scheduling in Kubernetes 1.36

#5710 Workload-aware preemption
Stage: Net New to Alpha
Feature group: sig-scheduling

This enhancement transitions the Kubernetes scheduler to a workload-centric view to better support modern AI computing requirements. Historically, if a high-priority task needed resources, the scheduler might kill just one or two pods from a lower-priority job. For tightly-coupled workloads like AI training or multi-host inference, losing even a single pod often renders the entire job useless, leaving the remaining pods to sit idle and waste expensive GPU resources. This KEP introduces the PodGroup API and a DisruptionMode setting, which allows the scheduler to recognise these dependencies. By treating a group of pods as a single preemption unit, the scheduler ensures that if it must reclaim resources, it preempts the entire workload at once, preventing the creation of zombie jobs and maximising the functional utilisation of the cluster.

For platform teams, this Alpha stage enhancement provides the foundational building blocks to manage capacity-constrained environments with much higher precision. It introduces a delayed preemption logic that prevents the unnecessary disruption of running workloads until the scheduler is certain that the new, higher-priority workload can actually be bound and started. While this stage focuses on simple implementations and API standardisation, it allows platform engineers to start defining priorities at the workload level rather than managing them pod-by-pod. This standardisation is critical for teams running mixed-workload clusters, as it paves the way for tighter integration with autoscaling and disruption budgets, ultimately ensuring that high-value AI training and inference jobs meet their SLOs without manual intervention or custom, external scheduling logic.

#5732 Topology-aware workload scheduling
Stage: Net New to Alpha
Feature group: sig-scheduling

By embedding Topology and DRA awareness directly into the kube-scheduler, the system can now evaluate Placements, which are subsets of the cluster like specific racks or interconnected hardware blocks, as a single unit. Instead of scheduling pods one-by-one and hoping they end up near each other, the scheduler simulates the placement of the entire pod group within candidate domains. This ensures that high-performance workloads are co-located to meet strict low-latency and high-bandwidth requirements without relying on external, less-integrated tools.

For platform teams, this is a critical evolution for two reasons:

Performance reliability and cost efficiency. AI/ML workloads, particularly distributed training, are notoriously sensitive to network jitter/latency; if a single pod in a training job is scheduled across a slow network hop, it can bottleneck the entire GPU cluster, wasting expensive compute cycles.
By utilising TopologyConstraints and DRAConstraints, platform teams can guarantee that interconnected accelerators and their respective pods are physically co-located on the same rack or PCIe switch. This deep integration into the core kube-scheduler reduces operational complexity by eliminating the need for third-party schedulers, ensuring that AI workloads achieve maximum hardware utilisation and predictable training times right out of the box.

#5832 Decouple PodGroup from Workload API
Stage: Net New to Alpha
Feature group: sig-scheduling

This KEP proposes transitioning the PodGroup API into a standalone runtime object for v1alpha2, decoupling it from the Workload API. Previously, PodGroups were embedded within the Workload spec, which led to significant scalability issues (such as hitting the 1.5MB etcd object limit) as well as architectural friction between long-lived config intent and transient scheduling units. By separating them, the Workload object remains a static template for scheduling policies, while the PodGroup acts as a self-contained, controller-owned unit that tracks its own runtime status and lifecycle.

This separation of concerns allows for better garbage collection of associated resources (like ResourceClaims) and reduces API contention. Under this new model, controllers like Job or JobSet automatically create PodGroups based on a podGroupTemplate defined in the referenced Workload. The proposed structure for the standalone object is as follows:

apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
  name: pd-1
  namespace: ns-1
spec:
  podGroupTemplateRef:
    workloadName: training-workload
    podGroupTemplateName: pd-1-template
  schedulingPolicy:
    gang:
      minCount: 2
status:
  conditions:
  - type: PodGroupScheduled
    status: "True"

Storage DRA enhancements in 1.36

Normally this falls under sig-scheduling, but due to the rapid feature development and storage-specific enhancements related specifically to DRA, I’ve decided to break this out into its own individual section for the 1.36 release – similar to what we did with the node-specific DRA breakout.

#4815 Partitionable Devices
Stage: Graduating to Beta
Feature group: sig-scheduling

The DRA storage-specific enhancement specifically focuses on dynamic device partitioning using structured parameters. While traditional device plugins require hardware to be partitioned into fixed sizes before a task starts, this new DRA framework allows a vendor to advertise overlapping potential partitions. This means the hardware (GPUs & TPUs) stays as a single bag of resources until a workload is actually scheduled, at which point the scheduler dynamically carves out the specific slice needed.

This approach is highly beneficial because it significantly increases resource utilisation and scheduling flexibility. By using a new construct called Counter Sets, the scheduler can track shared physical resources (like memory slices or compute engines) across multiple potential configurations. This prevents fragmentation where a pre-partitioned device sits idle because its fixed size doesn't match any incoming requests. It also enables multi-host scheduling, allowing logical devices (such as interconnected TPU clusters) to be treated as a single allocatable unit across multiple nodes while ensuring the underlying topology remains valid.

The following code snippet demonstrates how a user can request specific, non-overlapping partitions of a GPU (like NVIDIA MIG devices) by defining multiple requests within a single ResourceClaim:

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: mig-devices
spec:
  devices:
    requests:
    - name: mig-1g-5gb-0
      exactly:
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"
    - name: mig-2g-10gb
      exactly:
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '2g.10gb'"
    constraints:
    - requests: ["mig-1g-5gb-0", "mig-2g-10gb"]
      matchAttribute: "gpu.nvidia.com/parentUUID"

#4816 Prioritised Alternatives in Device Requests
Stage: Graduating to Stable
Feature group: sig-scheduling

The prioritised list enhancement for DRA is a significant evolution in how Kubernetes handles specialised hardware through a flexible preferences model. Previously, if a workload requested a specific GPU that was unavailable, the pod would simply fail to schedule. This feature introduces the FirstAvailable field, allowing teams to define an ordered list of fallback options. Essentially, it allows a developer to say: "I prefer an H100 GPU, but if those are gone, I'll take an A100, or even two T4s" all within a single ResourceClaim.

For teams deploying AI workloads, this matters because it drastically improves resource obtainability and deployment portability. In the current landscape of GPU scarcity, waiting for a specific chip can stall CI/CD pipelines or production scaling. By providing plan B and C options, AI engineers ensure their training or inference jobs actually start, even if they run on slightly less optimal hardware. Furthermore, it simplifies life for MLOps teams who distribute shared manifests; they can now create a single configuration that works across different clusters with varying hardware availability without requiring users to manually edit YAML files for every environment.

The following snippet demonstrates how to request a high-end GPU with a prioritised fallback to multiple mid-tier GPUs. Note how the config specifically targets the fallback sub-request to ensure the application environment is adjusted only when those specific devices are selected.

apiVersion: resource.k8s.io/v1beta2
kind: ResourceClaim
metadata:
  name: ai-workload-gpu-claim
spec:
  devices:
    requests:
    - name: gpu-request
      # The scheduler tries these in order:
      firstAvailable:
      - name: ultra-gpu
        deviceClassName: nvidia-h100
        count: 1
      - name: standard-gpu
        deviceClassName: nvidia-a100
        count: 1
      - name: fallback-gpu
        deviceClassName: nvidia-t4
        count: 2  # Requesting more of a weaker device to compensate
    # Specific configuration that only applies if the 'fallback-gpu' is chosen
    config:
    - requests: ["gpu-request/fallback-gpu"]
      opaque:
        driver: gpu.example.com
        parameters:
          apiVersion: gpu.example.com/v1
          kind: GPUConfig
          optimizationLevel: "high-memory"

#5007 Device BindingConditions
Stage: Graduating to Beta
Feature group: sig-scheduling

The BindingConditions feature is a significant update to the Kubernetes DRA framework designed to handle slow-to-ready hardware. Historically, the Kubernetes scheduler assumes that once a Pod is assigned to a node, any required resources are immediately available. However, modern infrastructure like Composable Disaggregated Infrastructure (CDI) often uses fabric-attached GPUs or FPGAs that require time-consuming steps like physical attachment over a network fabric, PCIe switching, or firmware reprogramming before they can actually be used.

This enhancement introduces a wait-and-see phase in the scheduling process called Readiness-Aware Binding. Instead of blindly binding a Pod to a node and hoping for the best, the scheduler checks specific BindingConditions (such as the is-prepared condition). If the resource isn't ready, the scheduler defers the final binding. Now, if it fails to prepare (like a hardware error), it would then trigger a BindingFailureCondition, which allows the pod to be safely rescheduled elsewhere without getting stuck in a CrashLoopBackOff scenario.

#5055 Device Taints & Tolerations
Stage: Graduating to Beta
Feature group: sig-scheduling

This KEP proposes an extension to DRA that introduces a tainting mechanism for GPU hardware devices, modelled after the existing Kubernetes node taints. Under this proposal, DRA drivers or cluster administrators (via a new DeviceTaintRule API) can mark specific devices as unhealthy or restricted. This allows for granular maintenance, such as taking a single GPU or accelerator offline without impacting the rest of the node, as well as providing a standard way for hardware to report degraded states, such as overheating, without immediately failing workloads.The system supports two primary effects:

NoSchedule prevents new pods from using the device
NoExecute triggers the eviction of currently running pods.

Users can bypass these restrictions by adding tolerations directly to their ResourceClaim, allowing for specific scenarios like running diagnostic test pods on a tainted device. By decoupling device health from node status, this feature enables safer pod evictions and more resilient cluster management, ensuring that only workloads capable of handling specific hardware conditions are scheduled onto affected resources.

#5004 Handle extended resource requests via DRA Driver
Stage: Graduating to Beta
Feature group: sig-scheduling

This Kubernetes 1.36 enhancement introduces a link between the traditional Extended Resources (simple, integer-based requests) and the newer DRA (which is flexible but more complex). Historically, using advanced hardware like GPUs required choosing between the easy-to-use Device Plugin model or the feature-rich DRA model. This KEP allows cluster administrators to advertise resources managed by DRA drivers as Extended Resources. This means developers can keep using simple resources.limits in their Pod specs while the backend leverages DRA's sophisticated resource tracking and allocation logic.

The primary benefit is a seamless transition path. It allows a single cluster to have a mix of nodes, with some using legacy device plugins and others using DRA, all without requiring any changes to existing Deployment manifests. When a Pod requests an extended resource (like example.com/gpu: 1), the scheduler can now satisfy that request using either a traditional node capacity or a DRA ResourceSlice. If a DRA node is chosen, the scheduler automatically handles the heavy lifting by creating a ResourceClaim to track the allocation, ensuring that the resource is reserved and properly mapped to the container.

The DeviceClass now acts as the link, mapping a specific class of DRA-managed hardware to an Extended Resource name, such as in the below example:

apiVersion: resource.k8s.io/v1beta1
kind: DeviceClass
metadata:
  name: gpu.example.com
spec:
  selectors:
  - cel:
      expression: device.driver == 'gpu.example.com' && device.attributes['gpu.example.com'].type == 'gpu'
  # This field bridges DRA to the simple Extended Resource name
  extendedResourceName: example.com/gpu

Existing applications can remain completely unaware of the underlying DRA architecture, using the same familiar syntax they have used for years.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: demo
        image: nvidia/cuda:8.0-runtime
        command: ["/bin/sh", "-c"]
        args: ["nvidia-smi && tail -f /dev/null"]
        resources:
          limits:
		# The app asks for a simple integer
		# K8s decides if this comes from a Device Plugin or a DRA ResourceSlice.
            example.com/gpu: 1

#5491 List types for attributes
Stage: Net New to Alpha
Feature group: sig-scheduling

This KEP enhances the DRA API by introducing support for list-typed attributes inside ResourceSlice objects. Currently, device characteristics are limited to scalar values, which are insufficient for representing complex hardware topologies where a single device might relate to multiple entities, such as a CPU adjacent to multiple PCIe roots or NUMA nodes. By allowing attributes to be lists of strings, integers, booleans, or versions, the API can more accurately model modern hardware relationships.

To support this change, the proposal redefines the semantics of matchAttribute and distinctAttribute constraints within a ResourceClaim. Specifically, matchAttribute now requires a non-empty intersection between sets of attributes across candidate devices, while distinctAttribute requires them to be pairwise disjoint. To ensure backward compatibility, scalar values are treated as single-element lists. This transition preserves the monotonicity required by the allocator’s algorithms, ultimately ensuring computational complexity remains bounded while also introducing a type-agnostic .include helper function for CEL expressions to simplify the migration for driver developers and users.

#5075 Consumable capacity
Stage: Graduating to Beta
Feature group: sig-scheduling

This specific DRA-related KEP introduces a framework for multi-allocatable devices, moving beyond the previous model of strictly-exclusive / dedicated device assignments. Under this new logic, independent resource claims from unrelated pods (even those across different namespaces) can allocate specific shares of the same underlying hardware. This is managed through a consumable capacity model: the DRA scheduler tracks a device’s total capacity and ensures that the sum of all active claims remains within limits, while also enforcing requestPolicy rules like minimum or maximum per-claim allocations.

To implement this, the KEP introduces several technical fields, including an AllowMultipleAllocations property to identify sharable hardware and a ConsumedCapacity field to track usage in allocation results. It also provides a distinct attribute constraint to prevent a single claim from accidentally grabbing the same multi-allocatable device twice. This is particularly vital for networking (like sharing a physical NIC via CNI) and/or GPU virtualisation, where users need to reserve specific fractions of memory or bandwidth without requiring the platform teams to pre-define a massive number of static partitions.

#5729 ResourceClaim support for workloads
Stage: Net New to Alpha
Feature group: sig-scheduling

The proposed enhancement to the Workload and PodGroup APIs introduces a mechanism to associate ResourceClaims and ResourceClaimTemplates directly with PodGroups rather than individual Pods. This decision addresses a critical scalability bottleneck in Kubernetes DRA where the current 256-entry limit in a ResourceClaim’s status.reservedFor list. By reserving a claim for an entire PodGroup, large-scale AI/ML workloads can share a single topological resource, such as a high-speed network fabric or a GPU cluster across hundreds or thousands of Pods. Additionally, referencing a ResourceClaimTemplate at the PodGroup level allows for the automatic, consistent generation of claims for replicated groups, simplifying the lifecycle management for high-level controllers like JobSet and LeaderWorkerSet.

The following snippet illustrates how these new fields allow a workload to define shared DRA devices at the group level, ensuring that all Pods within a specific logic unit are scheduled within the same topological boundary:

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: pg-claim-template
  namespace: default
spec:
  spec:
    devices:
      requests:
      - name: my-device
        exactly:
          deviceClassName: example
—
apiVersion: example.com/v1
kind: MyWorkload
metadata:
  name: my-workload
  namespace: default
spec:
  ...

This architectural change moves the responsibility of tracking resource consumers from individual Pod entries to the group identity. While this introduces a slight memory overhead for the device_taint_eviction controller (which must now index Pods via their group associations) it significantly enhances the ability of DRA to orchestrate multi-node logical devices. By decoupling the claim lifecycle from individual Pod names, Kubernetes can now natively support the strict topological constraints required for high-performance distributed training and complex infrastructure reprogramming.

#5517 ResourceClaim support for workloads
Stage: Net New to Alpha
Feature group: sig-scheduling

Implementing DRA drivers for primary system assets (such as dra-driver-cpu) allows for precise resource orchestration but triggers a significant discrepancy in usage tracking. Because the default kube-scheduler accounting logic does not communicate with the DynamicResources plugin, the two systems operate in isolation, creating a high risk of node oversubscription.

While this issue mirrors the challenges faced by DRA Extended Resources (#5004), the existing fix is not a direct fit due to how discovery works:

Firstly, Extended Resources are broadcast through either node.status.allocatable or a ResourceSlice, but never both at once.
Secondly, Core Resources are the kind of Assets like CPU that are permanently defined in node.status.allocatable. A DRA driver, however, would simultaneously track these same assets via ResourceSlice.

This dual-representation of the same physical hardware creates a sync gap that justifies the need for a more integrated, unified accounting framework to prevent future scheduling conflicts.

Storage in Kubernetes 1.36

#1710 Speed up recursive SELinux label change
Stage: Graduating to Stable
Feature group: sig-storage

Graduating to stable in Kubernetes 1.36, this KEP addresses a long-standing performance bottleneck in container storage. Historically, when a Pod starts on a system with SELinux (like RHEL or Fedora), the container runtime must recursively visit every single file and directory on a volume to apply a security label, which is a process that is agonisingly slow for volumes containing millions of files and can lead to out of space errors or Pod startup timeouts. By graduating this to stable, Kubernetes is officially changing the default behaviour to use the Linux kernel’s -o context mount option. This allows the system to assign the correct security context to the entire volume at the mount level instantly, bypassing the need for a file-by-file recursive walk and significantly decreasing Pod startup times.

The move to stable in 1.36 is critical because it standardises the safer, faster, and more predictable security approach across the ecosystem. Beyond performance, it improves security by preventing relabelling attacks (where a compromised Pod might trick the system into relabelling host files) and enables better support for read-only volumes that previously couldn't be relabelled. While this change introduces a strict requirement that all Pods sharing a volume on the same node must use the same SELinux label, Kubernetes provides a clear migration path via the SELinuxChangePolicy field in the PodSpec. This allows users with complex edge cases such as mixing privileged and unprivileged Pods on the same volume to explicitly opt back into the old recursive behaviour, ensuring that the performance gains of the stable release don't come at the cost of breaking existing, specialised workloads.

#3314 CSI Differential Snapshot for Block Volumes
Stage: Graduating to Beta
Feature group: sig-storage

This enhancement proposes a new, optional CSI SnapshotMetadata API designed to bring efficient, cloud-native differential backup capabilities to Kubernetes. By implementing Changed Block Tracking (CBT), the API allows backup apps to identify only the specific data blocks that have changed between two snapshots (or all allocated blocks in a single snapshot). This avoids the resource-heavy process of backing up entire volumes, significantly reducing storage and network overhead.

To ensure scalability and performance, the design utilises a proxy sidecar (external-snapshot-metadata) that handles communication between the backup client and the CSI driver. This architecture allows large volumes of snapshot metadata to be streamed directly via a TLS-secured gRPC connection, effectively bypassing the Kubernetes API server to prevent it from being overloaded. Security is maintained through a robust model using Kubernetes-scoped authentication tokens, ensuring that only authorised backup applications can access sensitive volume metadata while keeping the implementation flexible for various storage providers.

#3476 VolumeGroupSnapshot
Stage: Graduating to Stable
Feature group: sig-storage
Feature gate: VolumeGroupSnapshot Default value: disabled

This feature introduces the VolumeGroupSnapshot API for Kubernetes, designed to solve the problem of write-order consistency across multiple volumes. While the existing VolumeSnapshot API handles individual volumes, applications like databases often spread data and logs across different disks; snapshotting these at different times can lead to corrupted data upon restoration. This new feature allows users to group multiple Persistent Volume Claims (PVCs) together using a label selector and trigger a coordinated snapshot that captures all volumes at the exact same point-in-time, ensuring a crash-consistent state without necessarily requiring the application to be paused (quiesced).

As the feature moves from a conceptual proposal / Net New status into the Alpha stage, the focus is on establishing the core architectural plumbing and initial CRDs (VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass). In this stage, the Snapshot Controller and CSI-snapshotter sidecar are updated with new logic to recognise these group objects. The primary goal of Alpha is to validate the end-to-end flow: the controller identifies PVCs via labels, communicates with a CSI driver that supports the newly added CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability, and successfully generates both the group snapshot and the underlying individual volume snapshots. It remains an opt-in feature, gated by a feature flag, intended for initial vendor testing and early feedback so is disabled by default.

#4876 Mutable CSINode Allocatable Property
Stage: Graduating to Stable
Feature group: sig-storage

Graduating to stable in 1.36, this enhancement makes the PersistentVolume.spec.nodeAffinity field mutable. Previously, node affinity was immutable once set, but this enhancement allows storage providers to update accessibility requirements dynamically, such as when migrating data between zones or enabling features not supported by all nodes. While the update does not disrupt currently running pods, which continue to function under a "required during scheduling, ignored during execution" style logic, this essentially ensures that any new or rescheduled pods are directed to nodes compatible with the updated volume topology.

To handle potential race conditions or mis-scheduling during the transition, the proposal includes a new Kubelet behaviour: if a pod is scheduled to a node that no longer satisfies the PV’s updated affinity, the Kubelet will proactively fail the pod rather than letting it get stuck in a perpetual ContainerCreating state. This triggers controllers like StatefulSets or Deployments to recreate the pod on a valid node. Merged in October 2025, originally for the v1.35 milestone, the KEP reached consensus for Alpha with the understanding that while the operation is highly privileged, it provides essential flexibility for evolving storage environments.

#5541 Report last used time on a PVC
Stage: Net New to Alpha
Feature group: sig-storage

The Kubernetes community has officially merged KEP-5541, which introduced a new UnusedSince timestamp field to the PersistentVolumeClaimStatus object. This feature is designed to help DevOps teams and developers identify inactive storage by recording the exact time a Persistent Volume Claim (PVC) last transitioned from being used by a pod to an unused state. Throughout the review process, the field was renamed from LastUsedTime to UnusedSince to more clearly indicate that a nil value signifies a PVC is currently in active use. While the initial alpha implementation focuses on the API field and controller logic, the PR discussions highlighted future plans for potential integration with kube-state-metrics to enhance observability for large-scale cluster management.

#5538 CSI driver opt-in for service account tokens via secrets field
Stage: Graduating to Stable
Feature group: sig-storage

Finally graduating to stable in v.1.36, we’ll see a more secure delivery mechanism for service account tokens by allowing CSI drivers to opt into receiving them via the dedicated secrets field in the NodePublishVolumeRequest. Currently, these sensitive tokens are passed through the volume_context map, a field not designed for confidential data. This architectural flaw has led to significant security vulnerabilities, such as CVE-2023-2878, where tokens were inadvertently leaked into logs because standard sanitisation tools do not treat volume context as sensitive. By transitioning to the secrets field, the proposal ensures that tokens are handled by existing security protocols & proto-sanitisers, reducing need for inconsistent, driver-specific workarounds.

To implement this while maintaining backward compatibility, a new field, serviceAccountTokenInSecrets, will be added to the CSIDriver spec. When set to true, the kubelet will redirect tokens from the volume context to the secrets field using the established key - csi.storage.k8s.io/serviceAccount.tokens. Default still remains false to ensure existing drivers do not break, though the API server will issue warnings to encourage migration to the more secure path.

apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  name: example-csi-driver
spec:
  tokenRequests:
    - audience: "example.com"
      expirationSeconds: 3600
  # New field for opting into secrets delivery
  serviceAccountTokenInSecrets: true  # defaults to false

Autoscaling in Kubernetes 1.36

#5030 Integrate CSI Volume attach limits with cluster autoscaler
Stage: Major Change to Alpha
Feature group: sig-autoscaling

This major change to the existing Alpha stage feature addresses a critical gap in Kubernetes resource management by integrating CSI (Container Storage Interface) Volume attachment limits directly into the Cluster Autoscaler. Currently, if a node reaches its maximum capacity for attached volumes, the Cluster Autoscaler may not effectively account for this constraint when deciding whether to scale up or where to place new pods. By bridging the gap between sig-storage and sig-autoscaling, this update ensures that the autoscaler recognises when a pod cannot be scheduled due to storage limits, triggering the provision of new nodes instead of leaving pods in a Pending state on a saturated node.

As of February 2026, the project is finally tracked for the v1.36 release and will remain in Alpha status. The proposal has successfully passed the PRR and the enhancement freeze deadlines. Development is actively moving forward, with documentation placeholders and code PRs (such as preventing pod scheduling to nodes without the required CSI drivers) already in progress. At the time of writing, the release team were also planning a feature blog to coincide with the release to highlight how this integration improves the reliability of stateful workloads in scaling environments.

#5679 HPA External Metrics Fallback on Retrieval Failure
Stage: Net New to Alpha
Feature group: sig-autoscaling

The net new advancement allows for fallback in Horizontal Pod Autoscalers (HPAs) on failure to still retrieve external metrics. This is a significant reliability enhancement for the HPA, specifically targeting scenarios where external metric APIs (like cloud provider queues or Datadog) experience downtime. Instead of leaving the application in a static state (or even worse, under-provisioned) when a metric becomes unknown, this feature allows operators to define an optional fallback static replica count that triggers after a configurable failure duration. By moving away from metric value substitution to a fixed replica count, Kubernetes ensures that the HPA can maintain a safe capacity baseline during API outages without the risk of unbounded scaling. The enhancement has reached implementable status for its Alpha debut in 1.36, with the core logic merged into the master branch and API types finalised to include new fields like failureDurationSeconds and fallbackStatus.

Instrumentation in Kubernetes 1.36

#4827 StatusZ for Kubernetes Components
Stage: Graduating to Beta
Feature group: sig-instrumentation
Feature gate: ComponentStatusz Default value: disabled

This proposal introduces a standardised /statusz endpoint for core Kubernetes components, modelled after Google’s internal z-pages, to provide low-overhead, real-time insights into a component's internal state. By exposing critical data (such as binary versions, Go versions, and build metadata) directly from the serving process, it empowers developers and operators to perform high-precision troubleshooting without sifting through logs or configuring complex external monitoring tools. The scope is intentionally limited to the primary process to avoid the maintenance complexities of legacy status APIs, ensuring a lightweight and reliable inside-out view of component health.

Security and stability are prioritised through strict RBAC integration and a versioned API rollout. Access is restricted to the system:monitoring group to prevent unauthorised exposure, while the implementation utilises a feature gate for a cautious Alpha release. The endpoint defaults to a human-readable text/plain format but supports a structured API (v1alpha1) for programmatic access via explicit content negotiation. To enable this across the cluster, the system:monitoring ClusterRole can be updated as follows:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:monitoring
rules:
# existing rules
- apiGroups: [""]
  resources: ["nodes/statusz"]
  verbs: ["get"]

#4828 FlagZ for Kubernetes Components
Stage: Graduating to Beta
Feature group: sig-instrumentation
Feature gate: ComponentFlagz Default value: disabled

This proposal introduces a new /flagz endpoint across core Kubernetes components to enhance observability, troubleshooting, and real-time configuration auditing. By providing direct visibility into the active command-line flags a component was started with, the feature allows cluster admins and devs to quickly diagnose misconfigurations and verify runtime state without relying on external logs or manual inspection. Similar to StatusZ, to ensure the security and performance, access is restricted to the system:monitoring group, and the endpoint is designed with minimal computational overhead.

The endpoint also similarly defaults to a text/plain format for human readability but supports structured, versioned API responses (for example JSON, YAML, CBOR) for programmatic access via explicit header negotiation. During its alpha phase, the feature will be guarded by a feature gate to prevent premature dependency on unstable formats while offering a consistent interface alongside existing diagnostic paths like /healthz and /readyz.

Sample response in text/plain, as discussed above:

----------------------------
title: Kubernetes Flagz
description: Command line flags that Kubernetes component was started with.
----------------------------

default-watch-cache-size=100
delete-collection-workers=1
enable-garbage-collector=true
encryption-provider-config-automatic-reload=false
...

#5808 Native Histogram Support for Kubernetes Metrics
Stage: Net New to Alpha
Feature group: sig-instrumentation

KEP-5808 proposes the integration of Prometheus Native Histograms into the Kubernetes control plane. Currently, Kubernetes relies on classic histograms, which require pre-defined, fixed bucket boundaries (like: le="0.1" or le="0.5"). This approach is often inefficient, as it forces users to choose between low resolution (grouping 1ms and 40ms requests together) or high cardinality (creating dozens of separate time series that bloat Prometheus storage). By moving toward a native histogram format, Kubernetes can leverage exponential bucket boundaries that automatically adjust to data distributions, offering significantly higher precision for detecting performance regressions while simultaneously reducing the total number of time series by approximately 10x.

The proposal aims to solve the chicken-and-egg migration problem through a dual exposition strategy. Under the new NativeHistograms feature gate, Kubernetes components like the kube-apiserver and kubelet will serve metrics in both classic and native formats simultaneously when requested via Protobuf. This allows Platform Engineers to reduce monitoring costs and SREs to set more granular SLOs without breaking existing dashboards or alerts. The plan includes a careful rollout path that accounts for Prometheus version differences, ensuring that users can transition to high-resolution metrics at their own pace without risking silent failures in their observability stack.

Deprecations in Kubernetes 1.36

#5040 Remove gitRepo volume driver
Stage: Deprecation/Removal
Feature group: sig-storage

The planned removal of the gitRepo volume driver in Kubernetes 1.36 marks the end of a long-deprecated feature that has become a significant security liability. Although it was designed to provide a convenient way to manifest Git repository files into a Pod, the driver’s implementation requires the kubelet to run as root. This architectural flaw was highlighted under CVE-2024-10220, where researchers demonstrated how an attacker could use Git hooks within a malicious repository to execute arbitrary code on the host node with root privileges. Given that the feature has been deprecated for nearly five years and presents such a critical escape-to-host risk, the community has pivoted toward a complete removal of the in-tree driver to ensure the platform is secure by default.

For users still relying on gitRepo volumes, the migration path is well-established and significantly more robust. The Kubernetes project recommends using an emptyDir volume in conjunction with an initContainer to clone the repository, or utilising the dedicated git-sync sidecar project. These alternatives offer better isolation, support for modern authentication, and frequent updates that the legacy in-tree driver lacked. While the removal represents a breaking change, the timeline (stretching through 1.36 and beyond) provided a structured window for cluster administrators to implement Validating Admission Policies (VAP) to identify and migrate any remaining workloads before the driver is fully purged from the kubelet.

#5707 Deprecate service.spec.externalIPs
Stage: Deprecation/Removal
Feature group: sig-network

The deprecation of service.spec.externalIPs in Kubernetes 1.36 is the direct result of a long-standing design flaw identified as CVE-2020-8554. This vulnerability exposed a fundamental security gap in multi-tenant clusters: any user with basic permissions to create or edit a Service could claim an arbitrary IP address (including those of internal DNS servers or external websites) by simply listing them in the externalIPs field. Because kube-proxy would then blindly program the cluster's network rules to redirect traffic for those IPs to the attacker’s pods, it enabled high-impact Man-in-the-Middle (MitM) attacks and unauthorised traffic interception without requiring high-level administrative privileges.

While the community initially addressed this risk through external mitigations like the DenyServiceExternalIPs admission controller and OPA Gatekeeper policies, these were essentially band-aids for a feature that lacked native validation or authorisation. The 1.36 deprecation finally marks the beginning of the end for this architectural debt, moving from optional blocking to a formal phased removal. By transitioning users toward modern alternatives like the Gateway API or LoadBalancer services, Kubernetes is finally stripping out the underlying code in kube-proxy that made the CVE-2020-8554 exploit possible, effectively hardening the cluster networking model by default.

Deferred / Removed from Milestone

#5507 Container Resource Controls for OOM Behaviour
Stage: Net New to Alpha
Feature group: sig-node
Status: Deferred

As the title of the enhancement suggests, this KEP proposes a way to configure how the Linux kernel handles Out-of-Memory (OOM) events at the container level, specifically allowing users to choose between killing only the offending process (Single) or the entire container group (Group). The feature would've introduced an oomKillMode to the Container spec to leverage cgroup v2's memory.oom.group setting.

The enhancement was deferred in v1.36 because the proposal required more discussion regarding its API design, specifically whether it should also apply at the pod level and how to handle compatibility across different environments like cgroup v1 or Windows. The author, utam0k, opted to skip the v1.36 release cycle to address maintainer feedback and because the PRR (Production Readiness Review) deadline was too close for the remaining open architectural questions.

#5869 Wildcard Matching in Toleration Keys
Stage: Net New to Alpha
Feature group: sig-scheduling
Status: Deferred

This enhancement proposes adding wildcard matching (for example key: "readiness.k8s.io/*") to Kubernetes pod tolerations, allowing a single pod to match multiple node taints without listing them individually.

The KEP was put on hold because KEP-5500 (CEL for Taint Toleration Matching) was introduced simultaneously; reviewers determined that using CEL offers a more powerful and generic solution that covers wildcard use cases and more, making a specific wildcard-only implementation redundant. Additionally, concerns were raised regarding rollback risks, specifically that disabling the feature could cause mass evictions or prevent controllers from creating new pods if they already utilised the wildcard syntax.

#1432 Persistent Volume Health Monitor
Stage: Net New to Alpha
Feature group: sig-storage
Status: Removed from Milestone

The PV Health Monitor was designed to provide a mechanism for monitoring the health status of Persistent Volumes. It allows the system to detect if a volume has become unhealthy or degraded and then "mark" that status, enabling users or automated controllers to take corrective action.

The enhancement was removed from the v1.36 milestone because it failed to meet the PRR freeze deadline on February 5, 2026. To stay in the release cycle, an enhancement generally must have a PR open or merged that includes things like a completed PRR questionnaire, the updated kep.yaml files reflecting the target milestone and an assigned PRR approver. Since these requirements weren't met by the cutoff, the Release Team cleared the milestone and simply labeled the issue as tracked/no.

The other KEPs that were removed from the milestone are:

#5773 DRA: Priority for ResourceSlices in a resource pool
Stage: Alpha
Feature group: sig-storage
Status: Removed from Milestone

#24 AppArmor support
Stage: Stable
Feature group: sig-node
Status: Removed from Milestone

#5194 DRA: ReservedFor Workloads
Stage: Alpha
Feature group: sig-storage
Status: Removed from Milestone

#5234 DRA: ResourceSlice Mixins
Stage: Alpha
Feature group: sig-storage
Status: Removed from Milestone

#5683 Specialised Lifecycle Management
Stage: Net New to Alpha
Feature group: sig-node
Status: Removed from Milestone

Timeline of v.1.36 Kubernetes Release

Kubernetes users can expect the v1.36 release process to unfold throughout April 2026, with past milestones including the start of the cycle on January 12th and the Enhancements Freeze on February 12th. Upcoming technical milestones involve the creation of the release-1.36 branch on April 8th, following the Code and Test Freeze on March 11th. The process culminates in the official v1.36.0 release on Wednesday, April 22nd, 2026, notably following the community's gathering at KubeCon Amsterdam in late March.

What is happening?	By whom?	And when?
Release Cycle Begins	Lead	Monday 12th January 2026
v1.36.0-alpha.1 released	Branch Manager	Wednesday 4th February 2026
Enhancements Freeze	Enhancements Lead	Thursday 12th February 2026
v1.36.0-alpha.2 released	Branch Manager	Wednesday 18th February 2026
Code & Test Freeze	Branch Manager	Wednesday 11th March 2026
Kubecon Amsterdam	CNCF Event	Monday 23rd March 2026
release-1.36 branch created	Branch Manager	Wednesday 8th April 2026
Kubernetes v1.36.0 released	Branch Manager	Wednesday 22nd April 2026

Kubernetes 1.36 release schedule

Kubernetes Release Archive

Start secure, stay secure with WizOS and Cloudsmith

Hardened container base images only fix the security problem if every team actually pulls them. Together, WizOS and Cloudsmith enables teams to use images compiled from source with SLA-backed CVE patching, served through the registry URL developers already use…

Integrations & partners

5 min read

From trusted artifact to controlled deployment: Cloudsmith and Octopus Deploy

The gap between "is this artifact trustworthy?" and "how does it get deployed?" is where most governance breaks down. Cloudsmith and Octopus Deploy cover that full journey, with a clean policy handoff between them…

Integrations & partners

2 min read

Cloudsmith’s take on Chainguard Repository

Experience the power of a complete security ecosystem where Chainguard’s vulnerability-free images meet Cloudsmith’s global delivery and observability. Together, these platforms provide the essential control and scale needed to safeguard modern software development against evolving threats…

Integrations & partners

4 min read

Intelligence and governance in the software supply chain with Endor Labs and Cloudsmith

Together Endor Labs and Cloudsmith provide deep code intelligence and centralized package control for modern development teams. They streamline the remediation of software vulnerabilities while ensuring a secure and consistent build environment for all artifacts…

Integrations & partners

5 min read

Secure containers by default with Cloudsmith and Docker Hardened Images

Use Cloudsmith to operationalize Docker Hardened Images by integrating them as an authenticated upstream source within existing repositories. This configuration enables teams to distribute secure, pre-patched base images across the organization without requiring developers to modify their local configurations or CLI commands…

Integrations & partners

49 min read

Kubernetes 1.35 – What you need to know

Kubernetes 1.35 is fast approaching, and it’s a loaded update! Removing enhancements with the status of “Deferred” or “Removed from Milestone” we have 59 Enhancements in all listed within the official…

Kubernetes 1.36 – What you need to know

Kubernetes 1.36 – Editor’s pick:

Apps in Kubernetes 1.36

API in Kubernetes 1.36

CLI in Kubernetes 1.36

Kubernetes 1.36 Networking

Kubernetes 1.36 Authentication

Kubernetes 1.36 Nodes

Node-specific DRA enhancements in 1.36

Scheduling in Kubernetes 1.36

Storage DRA enhancements in 1.36

Storage in Kubernetes 1.36

Autoscaling in Kubernetes 1.36

Instrumentation in Kubernetes 1.36

Deprecations in Kubernetes 1.36

Deferred / Removed from Milestone

Timeline of v.1.36 Kubernetes Release

Kubernetes Release Archive

More articles

Start secure, stay secure with WizOS and Cloudsmith

From trusted artifact to controlled deployment: Cloudsmith and Octopus Deploy

Cloudsmith’s take on Chainguard Repository

Intelligence and governance in the software supply chain with Endor Labs and Cloudsmith

Secure containers by default with Cloudsmith and Docker Hardened Images

Kubernetes 1.35 – What you need to know