
Kubernetes 1.36 – What you need to know

Kubernetes 1.36 will be the first Kubernetes major release of 2026, and it’s full of really exciting updates for security, AI hardware, and more! As always, removing enhancements with the status of “Deferred” or “Removed from Milestone” we are seeing 80 enhancements in all listed within the official tracker. So, what can we expect in 1.36?
Kubernetes 1.36 brings a whole bunch of useful enhancements, including 35 changes tracked as ‘Graduating’ in this Kubernetes release. From these, just 17 enhancements are graduating to stable, such as Support for User Namespaces in pods and Mutating Admission Policies, as well as 4 DRA-specific enhancements graduating to that GA status.
A whopping 26 new alpha features are also listed in the enhancements tracker, one of which introduces the ability to report when a PVC was last used in pvc.Status. A simple by powerful use-case where users can now see if a PVC is sitting unused, so in larger clusters this would really help DevOps teams to get rid of any unused and unwanted PVCs.
As always, let’s jump into all of the major graduations, deferred enhancements and deprecations scheduled for Kubernetes 1.36.
Kubernetes 1.36 – Editor’s pick:
Here are a few of the changes that Cloudsmith employees are most excited about in this release:
#5055 Taints and Tolerations in DRA
It’s becoming a regular comment in these recent Kubernetes release notes, but the DRA API is seeing a LOT of exciting enhancements in the 1.36 update. [#4815, #4816, #5007, #5004, #5491, #5075, #5729, and #5517 all updated in this release]. This led the Cloudsmith team to add dedicated sig-storage and sig-node sections just for DRA updates. Anyways, this specific DRA enhancement is exciting because it brings more granularity and automation to hardware management, allowing admins to take specific devices offline for maintenance without disrupting the entire cluster. By introducing taints and tolerations for hardware, similar to what we already have for pod deployments, we can all benefit from automatically rescheduling pods away from failing devices while still letting specialized test pods access them for daily troubleshooting activities.
Nigel Douglas - Head of Developer Relations
I am absolutely thrilled to see OCI image volumes finally hitting Stable status in Kubernetes 1.36 because, as a Product Manager working on the OCI format support at Cloudsmith, I can see this enhancement turning existing container registries into universal artifact hubs. For years, dev teams have been forced into the so-called fat image anti-pattern, where they’re basically bundling massive ML models, static assets, and binary plugins directly into their application images, which ultimately creates a nightmare for security patching and bloats deployment times. By graduating this to Stable, we’re now seeing a native, high-performance way to decouple user data from your logic. Devs can now push Hugging Face LLM weights or commercial signatures as independent OCI artifacts and mount them as a VolumeSource just as easily as a ConfigMap. This doesn't just simplify CI/CD pipelines by allowing platform teams to swap content without rebuilding the base engine, but it also drastically shrinks your potential attack surface for the platform teams who are managing these deployments. It’s a massive win for the ecosystem that transforms the OCI registry from a basic storage location for apps into a more dynamic, structured delivery system for any content that a pod needs to succeed.
Liana Ertz - Product Manager
#5793 Manifest-based Admission Control configuration
As a CSM, I’m genuinely stoked for this 1.36 feature because it finally lets platform teams secure their actual admission control configs. When we think about moving mission-critical policies from the API to static files on the control plane disk, platform teams can now prevent the scary security blindspot where the cluster was vulnerable during the startup. Based on what I'm hearing from the teams I work with, this means vital guardrails (like blocking privileged containers) are now immune to accidental kubectl delete commands or etcd crashes. It’s a massive win for their platform stability, providing platform engineers with a somewhat tamper-proof approach to keeping the Golden Path secure from the very first second the API server wakes up.
Amy Strutton - Customer Success Manager
Apps in Kubernetes 1.36
#5440 Mutable Container Resources when Job is suspended
Stage: Graduating to Beta
Feature group: sig-apps
This proposal seeks to enhance Kubernetes Batch Job management by relaxing the immutability of a Job’s Pod template while it is in a suspended state. Building on the existing suspend flag, this change would allow higher-level queue controllers to mutate resource specifications (specifically CPU, Memory, GPU, and extended Resource Requests / Limits) before a Job is unsuspended and pods are created. By allowing these updates, cluster administrators and automated controllers can dynamically optimize resource allocation based on real-time cluster capacity, current queue priorities, and actual workload utilization, ensuring that expensive hardware like GPUs is used efficiently.
The design ensures that these mutations are only permitted when a Job is fully suspended and has no active pods, mitigating the risk of disrupting running workloads. While the proposal does not include implementing a specific queue controller or supporting in-place pod updates, it provides the necessary API primitives for external tools (like Kueue) to perform checkpointing and resizing. This flexibility allows a Job to be suspended, its resource requests lowered to match actual usage, and then resumed, thereby improving overall cluster throughput and reducing resource waste.
#3541 Add Recreate Update Strategy to StatefulSet
Stage: Net New to Alpha
Feature group: sig-apps
Starting in Kubernetes 1.36, the introduction of the Recreate Update Strategy for StatefulSets addresses a long-standing stuck rollout issue that has plagued users since 2018. Previously, if a StatefulSet update was triggered with a broken configuration (such as a non-existent Docker image), the controller would wait indefinitely for the failing Pod to become Ready before proceeding. Even if a user reverted the configuration to a known-good state, the StatefulSet would refuse to manage or fix the broken Pod because it was stuck in a validation loop, forcing administrators to manually delete the Pod to unblock the controller.
The new Recreate strategy (moving to Alpha in this update) provides a more aggressive alternative to the standard RollingUpdate. It allows the StatefulSet controller to prioritize reaching the desired state by terminating existing Pods before creating new ones, effectively bypassing the waiting for ready deadlock. This change transforms what was previously documented as a known issue into a manageable configuration, ensuring that automated CI/CD pipelines and operators can recover from misconfigurations without manual human intervention.
#5547 WAS: Integrate Workload APIs with Job controller
Stage: Net New to Alpha
Feature group: sig-apps
The feature represents a major step in Kubernetes' evolution to natively support complex batch and AI/ML workloads. Historically, Kubernetes scheduled individual Pods independently, which caused issues for distributed training jobs that require all participants to start simultaneously (gang scheduling). This feature introduces the Workload API and a decoupled PodGroup object to act as a bridge between high-level controllers (like the Job controller) and the scheduler. By integrating these, the Job controller can now automatically generate a Workload representation of itself, allowing the scheduler to understand the resource requirements and scheduling constraints of the entire group of pods rather than treating them as isolated units.
For the Kubernetes 1.36 Alpha update, the proposal focuses on standardizing this integration and refining the API surface. Key changes include introducing the PodGroup as a standalone runtime object and renaming the embedded field within the Workload spec to PodGroupTemplate to ensure a cleaner, more extensible hierarchy. Specifically for the Job controller, the 1.36 update aims to enable the creation of Workload and PodGroup objects by default for static Jobs (those where parallelism does not change). This release also seeks to resolve a critical debate on Scheduling Policies, which is whether the release team should maintain separate Basic and Gang policies or to unify them into a single parameter where a minCount of 1 represents basic scheduling and a minCount equal to the total replicas represents strict gang scheduling.
#5882 Deployment Pod Replacement Policy
Stage: Net New to Alpha
Feature group: sig-apps
The PR #5883 marked a simple, formal separation of two different KEPs that were previously managed as a single unit: KEP-3973 (terminating pods in Deployments) and KEP-5882 (the Deployment pod replacement policy). The split was justified because the #3973 enhancement which introduced the .status.terminatingReplicas field to track pods in the process of shutting down moved through the development cycle faster than the more complex pod replacement policy. By decoupling them, the sig-apps team can now track their graduation through Alpha, Beta, and Stable stages independently, preventing the progress of one from being stalled by the other.
The merged changes include updated documentation and readiness reviews (PRRs for short) for both features. Key technical discussions during the PR focused on ensuring proper version skew policies, specifically requiring that the kube-apiserver be upgraded before the kube-controller-manager to avoid status synchronization issues. While the terminating replicas feature is already seeing implementation in recent releases, the specific API and logic for the pod replacement policy remain in progress, now neatly organized under its own dedicated proposal for future tracking in the 1.36 update.
API in Kubernetes 1.36
#3962 Mutating Admission Policies
Stage: Graduating to Stable
Feature group: sig-api-machinery
Graduating to stable, Mutating Admission Policies introduce a declarative, in-process alternative to traditional mutating admission webhooks by leveraging Common Expression Language (CEL). This enhancement allows cluster administrators to define resource modifications (such as injecting sidecar containers, enforcing image pull policies, or setting default labels) directly within Kubernetes using MutatingAdmissionPolicy and MutatingAdmissionPolicyBinding objects. By using CEL's object instantiation alongside Server-Side Apply (SSA) merge algorithms or JSON Patch, the API server can perform these mutations internally, significantly reducing the operational overhead, latency, and webhook fatigue associated with maintaining external infrastructure.
Beyond simplicity, this native approach offers superior performance and reliability. Because the mutations are in-process, the kube-apiserver can introspect changes to optimize execution order and safely re-run policies to ensure idempotency. While it does not aim for 100% feature parity with webhooks, specifically excluding external calls, it covers the vast majority of common use cases while providing a safety field to validate that mutations remain consistent. This move effectively brings the power and efficiency of ValidatingAdmissionPolicies to the mutation phase, creating a more robust and integrated policy management framework within the Kubernetes ecosystem.
#5647 Stale Controller Mitigation
Stage: Net New to Alpha
Feature group: sig-api-machinery
This proposal introduces a mechanism to mitigate controller staleness in Kubernetes by allowing controllers to detect when their local cache lags behind the API server. Currently, controllers in the kube-controller-manager (KCM) rely on eventually consistent watch streams, which can lead to spurious reconciles or incorrect decisions (like scaling or deleting resources) based on outdated data. To solve this, the enhancement proposes an opt-in read-after-write guarantee. By tracking the Resource Version (RV) of objects after a write and updating the ResourceEventHandlerFuncs with a new BookmarkFunc, controllers can verify if their previous changes have propagated to the local cache before proceeding with the next reconciliation.
In practice, this logic will be integrated into the core processing loops of sensitive controllers, such as the DaemonSet controller. If a controller determines that its cache has not yet caught up to its last-written state, it will skip and requeue the object with exponential backoff rather than performing a potentially erroneous reconcile. This targeted approach ensures consistency for specific resources without imposing a global performance penalty, though it requires careful handling of edge cases like controller restarts and cache resyncs to prevent permanent reconciliation stalls.
CLI in Kubernetes 1.36
#3104 Separate kubectl user preferences from cluster configs
Stage: Major Change to Beta
Feature group: sig-cli
The kuberc feature introduces a dedicated configuration file (typically located at ~/.kube/kuberc) designed to separate individual user preferences, such as command aliases and default flag settings, from the cluster credentials and server information stored in a standard kubeconfig. This separation allows users to maintain consistent local workflows (like enforcing interactive delete confirmations or silencing deprecation warnings) across multiple clusters without modifying shared or auto-generated connection files. The major change moving into v.1.36 Beta status is that the feature is now enabled by default, whereas it previously required manual activation. To support this promotion, the update introduces the kubectl kuberc management command to help users programmatically view and edit their preferences, and it formalizes security controls through a credential plugin allowlist to prevent the execution of untrusted binaries.
Kubernetes 1.36 Networking
#5311 Relaxed validation for Services names
Stage: Graduating to Beta
Feature group: sig-network
This relaxed ServiceName validation feature is an improvement designed to bring the naming constraints of Service resources into alignment with other standard Kubernetes objects. Historically, Services were restricted by a stricter DNS-1035 label standard (NameIsDNS1035Label), which prohibited names from starting with a numeric digit. This update transitions the validation to the more flexible NameIsDNSLabel standard, finally allowing users to create Services with names like 123-backend or 8080-proxy. In the transition from Alpha to Beta, the feature is moving from an opt-in experimental state to being enabled by default in the kube-apiserver. This graduation signifies that the community has successfully completed compatibility testing with downstream systems like DNS providers and Ingress controllers, and has verified that the change remains safe during cluster upgrades and downgrades by maintaining immutability checks on existing resource names.
#3695 Extend PodResources to include resources from DRA
Stage: Graduating to Stable
Feature group: sig-network
Feature gate: KubeletPodResourcesGet Default value: enabled
Feature gate: KubeletPodResourcesDynamicResources Default value: enabled
This enhancement expands the Kubelet PodResources API to bridge the visibility gap between the Kubernetes node and external monitoring tools. Historically, this API only allowed monitoring agents to see which CPUs or standard devices were assigned to a container. This feature extends that capability to include resources managed by the evolving DRA feature. By adding a dynamic_resources field and a new Get() method for targeted pod queries, the Kubelet can now report specific DRA-allocated hardware (such as GPUs or FPGAs) directly to monitoring stacks like Prometheus or NVIDIA's DCGM exporter. To ensure this data remains available even if a DRA driver goes offline, the system implements a checkpointing mechanism within the DRAManager, guaranteeing that resource assignments are preserved across Kubelet restarts.
The primary benefit of this enhancement is granular observability for next-gen hardware. Platform operators can now access per-pod metrics for DRA-managed resources, which is essential for accurate billing, performance tuning, and troubleshooting in AI/ML workloads. Since the feature is graduating to Stable in v1.36, the community can expect full production-ready reliability with a guaranteed 99.9% success rate for Get and List requests over a rolling 5-minute window as well as low-latency response times (P99 < 100ms). The graduation to GA means the KubeletPodResourcesDynamicResources and KubeletPodResourcesGet feature gates will be locked to on by default, allowing third-party ecosystem tools to officially rely on stable gRPC fields without the risk of breaking changes or experimental instability.
#4858 IP/CIDR validation improvements
Stage: Graduating to Beta
Feature group: sig-network
This is a security-focused refinement of how Kubernetes handles network addressing, specifically targeting the stricter validation of IP addresses and CIDR strings across the API. Historically, Kubernetes relied on older Go functions that were overly permissive, allowing ambiguous formats like IPv4 addresses with leading zeros (for example: 172.030.099.099), which different systems might interpret differently (octal vs. decimal), potentially leading to security vulnerabilities like CVE-2021-29923. For platform teams, this is highly beneficial because it eliminates many IP address risks and ensures consistency between Kubernetes and underlying network plugins (Calico and Cilium) and OS-level libraries. To maintain stability, the plan uses ratcheting validation, which prevents new invalid entries without breaking existing workloads, which means that teams can update a service’s labels without being forced to immediately fix a legacy IP field. This move toward canonical formats (like standardized IPv6 strings) simplifies long-term maintenance, auditing, and observability, as platform engineers can now trust that network data is unambiguous and follows a single source of truth format across the entire cluster.
Kubernetes 1.36 Authentication
#740 API for external signing of Service Account tokens
Stage: Graduating to Stable
Feature group: sig-auth
This KEP proposes a transition from static, file-based Service Account key management to a more dynamic integration with external providers like Hardware Security Modules (HSMs) and Cloud KMS. Currently, the kube-apiserver loads signing keys from the local disk at startup, a method that lacks flexibility and poses a security risk, as any user with filesystem access could exfiltrate the signing material. By introducing a new gRPC-based API (ExternalJWTSigner) accessible via a Unix domain socket, Kubernetes can delegate JWT signing and public key retrieval to an external plugin. This shift enables seamless key rotation without requiring an API server restart and ensures that sensitive private keys never reside in the cluster’s local memory or storage.
To maintain consistency and security, the kube-apiserver remains responsible for assembling the JWT claims and headers, while the external signer only provides the signature. This architecture prevents claim divergence and ensures that externally signed tokens remain compatible with existing OIDC discovery endpoints. While the proposal introduces a new --service-account-signing-endpoint flag, it preserves backward compatibility by keeping existing file-based flags as mutually exclusive options. Potential performance overhead from socket communication will be mitigated through benchmarking, and access to the signing socket will be restricted via standard Unix file permissions to prevent unauthorized token generation.
#5681 Conditional authorization
Stage: Net New to Alpha
Feature group: sig-auth
Kubernetes 1.36 introduces a brand-new conditional authorization capability, which is a framework that allows authorization decisions to depend on the actual data within a resource (like specific fields or labels) rather than just its metadata. Currently, Kubernetes authorizers (like RBAC) are basically blind to the content of a request body. This KEP bridges that gap by using a two-phase evaluation process.
- In the first phase, the authorizer performs partial evaluation: if it can't reach a final Allow or Deny based on metadata alone, it returns a Conditional response containing a set of requirements (for example, " only Allow if
storageClassNameis 'dev' "). - In the second phase, these conditions are enforced during the Validating Admission stage, where the API server finally decodes the object and has access to the necessary field data to make a concrete decision.
This architecture solves several long-standing limitations by providing a unified, cohesive policy model that spans both authorization and admission. It eliminates the need for administrators to essentially over-grant their permissions in RBAC only to restrict them later with separate tools like ValidatingAdmissionPolicy.
By propagating conditions through the request chain, the system ensures atomicity (the decision is based on a single snapshot of policy) and improves user experience, as users can see exactly why they are restricted via kubectl auth can-i lookups. Concretely, the enhancement extends the SubjectAccessReview API and introduces an AuthorizationConditionsReview API, enabling both in-tree and out-of-tree authorizers to implement fine-grained controls, like restricting which fields user can update or which signerName a CSR can use without compromising API server's performance or security.
#5284 Constrained impersonation
Stage: Graduating to Beta
Feature group: sig-auth
Constrained impersonation helps Kubernetes move away from the unrestricted legacy model where an impersonator automatically inherits all permissions of the target user. To mitigate these security risks (especially for controllers and per-node agents) impersonators must now possess two distinct sets of permissions:
- The authority to impersonate a specific identity (
impersonate:user-info) - Right to perform specific actions on behalf of that identity (
impersonate-on:user-info:list).
This opt-in mechanism ensures that even if a controller is compromised, its impersonation power is limited to specific resources and verbs. In the below Go example, a controller can be restricted to impersonating only the specific node it is running on by using the downward API to identify itself and configuring the client as follows:
// Example: Restricting a controller to impersonate its own node
kubeConfig, _ := clientcmd.BuildConfigFromFlags("", "")
kubeConfig.Impersonate = rest.ImpersonationConfig{
UserName: "system:node:" + os.Getenv("MY_NODE_NAME"),
}By introducing these prefixed verbs like impersonate-on:user-info:watch, platform engineers can now define highly specific RBAC roles. This setup ensures that an impersonator can only execute a request if both the impersonator has the impersonate-on permission for the action and the impersonated principal has the underlying permission to perform the task itself.
#3926 Handling undecryptable resources
Stage: Graduating to Beta
Feature group: sig-auth
Kubernetes 1.36 is addressing a long-standing recovery gap where encrypted API resources become undeletable due to missing keys or data corruption. Currently, if a single object in etcd fails to decrypt or decode, listing that resource type fails entirely, forcing admins to bypass the Kubernetes API and manually manipulate the database, which is a really risky and complex process. To solve this, the proposal introduces a way to identify these broken resources and provides a new DeleteOption that allows for their removal even when their content remains unreadable.
However, this unconditional delete is a high-risk operation. Since the system cannot read the object's metadata, deleting it bypasses standard safety features like finalizers and garbage collection, potentially leaving orphaned system processes (like running Pods) behind. To mitigate this, the design includes a new StatusReasonStoreReadError for better diagnostics and requires explicit confirmation through kubectl prompts and server-side admission layers to ensure administrators understand the impact before purging malformed data.
Kubernetes 1.36 Nodes
#127 Support User Namespaces in pods
Stage: Graduating to Stable
Feature group: sig-node
This enhancement introduces support for user namespaces, a critical isolation feature that maps containerized user and group IDs to different, unprivileged IDs on the host. By allowing a process to hold root privileges within a pod while remaining unprivileged on the underlying node, the KEP significantly bolsters node-to-pod and pod-to-pod isolation. This ensures that even if a process achieves a container breakout or possesses elevated capabilities like CAP_SYS_ADMIN, its impact is strictly confined to the namespace, preventing it from exercising administrative control over the host or other pods.
The implementation specifically addresses and mitigates several high-severity vulnerabilities where container escapes or privilege escalations previously threatened host integrity. Key CVEs resolved or dampened by this update include:
- CVE-2019-5736: Completely mitigates ability to overwrite the host
runcbinary from a container. - CVE-2017-1002101: Fixes critical vulnerability (CVSS: 9.6) involving volume mounts & subpaths.
- Azurescape: Neutralizes the first known cross-account container takeover in public cloud provider.
- CVE-2018-15664 & CVE-2016-8867: Prevents TOCTOU race attacks & internal privilege escalation.
- CVE-2021-25741 & CVE-2021-30465: Mitigates symlink & mount-based attacks by ensuring container-root is not host-root.
#2862 Fine-grained Kubelet API authorization
Stage: Graduating to StableFeature group: sig-node
Feature gate: KubeletFineGrainedAuthz Default value: true
Another welcome stable update sees the generally-availability of fine-grained authorization for the Kubelet API to better support the principle of least privilege. Previously, the Kubelet used a coarse RBAC scheme where low-risk actions, such as reading health status (/healthz) or listing pods (/pods), required the same high-level proxy subresource permission as dangerous actions like executing arbitrary code (/exec). By introducing specific subresources for /configz, /healthz, and /pods, the proposal allows monitoring and logging agents to access necessary data without being granted over-privileged access that could be exploited for lateral movement or privilege escalation.
The implementation introduces a new feature gate, KubeletFineGrainedAuthz, and ensures backward compatibility by falling back to the original proxy check if the new granular permissions are not found. To minimize performance impacts from additional SubjectAccessReview requests, the design leverages the Kubelet’s existing authorization cache. Beyond security hardening, this change also aims to officially document these previously internal Kubelet endpoints, providing a standardized and secure way for ecosystem tools to interact with node-level data.
#5554 Support in-place update pod resources alongside static CPU manager policy
Stage: Net New to Alpha
Feature group: sig-node
Transitioning to Alpha status in 1.36, platform teams benefit from a major improvement to Kubernetes resource management by enabling in-place vertical scaling for pods even when a Static CPU Manager policy is active. Previously, the so-called “Static” policies, which grant pods exclusive access to specific CPU cores, struggled to reconcile real-time resource changes without restarting the container. This update integrates Topology Manager feasibility checks into the pod resize path, allowing the Kubelet to dynamically allocate or release exclusive CPU cores during an upsize or downsize event. By updating the CPUManager checkpoint format to track both original and resized allocations, the system can now perform admission-style validation during a live resize to ensure NUMA alignment and resource availability are maintained without interrupting the workload.
#4205 Support PSI based on cgroupv2
Stage: Graduating to Stable
Feature group: sig-node
The GA state for integration of Pressure Stall Information (PSI) in Kubernetes 1.36 enhances node monitoring by allowing the kubelet to natively ingest and expose CPU, memory, and I/O pressure metrics from cAdvisor and runc. Previously, users had to rely on external tools like Node Exporter to gain this visibility. However, this update embeds PSI data directly into the Summary API at both the node and pod levels. By providing barometer-like insights into resource shortages before they cause failures, these metrics enable more granular detection of congestion (categorized as some or full pressure) over 10, 60, and 300-second windows. This foundational change sets the stage for more proactive node management and intelligent responses to impending resource scarcity within the cluster.
To help visualize how data is now organized in the API, here’s a simplified look at a new metrics schema:
| Metric-level | Resource Types | Data Points Provided |
|---|---|---|
| NodeStats | CPU, Memory, I/O | Avg10, Avg60, Avg300 (%), and Total (ns) |
| PodStats | CPU, Memory, I/O | Some (partial stall) and Full (complete stall) |
#4265 add ProcMount option
Stage: Graduating to Stable
Feature group: sig-node
This enhancement, revitalized for K8s 1.36, graduates the long-standing ProcMountType feature to allow high-trust workloads to bypass the default security masking of the /proc filesystem. Traditionally, the Kubelet instructs container runtimes to mask or set certain /proc paths as read-only to prevent sensitive host data exposure. However, this restriction blocks critical use cases like nesting unprivileged containers or building container images within a Pod. By introducing the procMount: Unmasked field to the securityContext, users can now opt out of these defaults. Because unmasking /proc can theoretically allow a root container to modify the host kernel, this capability is strictly governed by the Privileged Pod Security Admission (PSA) level and is intended to be used in conjunction with User Namespaces to mitigate escalation risks.
#5109 Split L3 Cache Topology Awareness in CPU Manager
Stage: Graduating to Stable
Feature group: sig-node
This stable enhancement offers a new Kubernetes CPU Manager static policy option, prefer-align-cpus-by-uncorecache, designed to optimize workload performance on modern modular CPU architectures. While traditional processors often use a single, shared uncore (last-level) cache, newer x86 and ARM designs frequently employ a split uncore cache where subsets of cores share dedicated cache units. The current Kubelet is unaware of this hierarchy, often spreading container processes across multiple uncore caches. This leads to noisy neighbor issues and increased inter-cache latency, which can degrade performance for latency-sensitive applications like HPC, networking, and Telco functions.
The proposed feature introduces an opt-in, best-effort allocation algorithm that prioritizes grouping CPU assignments within the fewest number of uncore cache domains possible. By adding an uncorecacheId to the CPU topology awareness, the CPU Manager can now sort and allocate resources that align with these physical hardware boundaries. This optimization has demonstrated significant real-world benefits, such as an 18% performance uplift in database workloads. Importantly, the policy is designed to be non-disruptive. Basically, it integrates with existing options like full-pcpus-only and falls back to standard behavior if optimal alignment isn't possible or if the hardware doesn't support split caches, ensuring high-density and mixed-workload support remains intact.
#5328 Node declared features (formerly Node Capabilities)
Stage: Graduating to Beta
Feature group: sig-node
The Node Declared Features KEP proposed a standardized approach for nodes to automatically report their supported feature-gated capabilities directly to the control plane. Currently, managing version skew (which is the gap between a newly upgraded control plane and older Kubelet versions) relies on DevOps engineers to manually apply complex taints, labels, and selectors to ensure pods land on compatible hardware. This proposal introduces a declaredFeatures field in the node’s status, which the Kubelet populates during its bootstrap process. By making these features a first-class signal, the kube-scheduler can proactively filter out incompatible nodes, keeping pods in a Pending state with clear feedback rather than allowing them to fail with runtime errors after being scheduled.
Beyond scheduling, this framework enhances API safety and cluster stability during gradual rollouts. Admission controllers can use these declared features to validate requests, such as rejecting an in-place pod resize if the target node is running an older version that lacks support for the operation. It also allows the API server to dynamically adapt its communication protocols, such as transitioning from SPDY to WebSockets only when a node confirms it is capable. Ultimately, this mechanism reduces operational overhead and increases workload portability by replacing inconsistent, provider-specific manual configurations with a native, automated lifecycle for graduating Kubernetes features.
#5394 PSI-based node conditions
Stage: Net New to Alpha
Feature group: sig-node
This Alpha stage enhancement introduces Pressure Stall Information (PSI)-based node conditions to Kubernetes, which looks to have been somewhat in limbo since the 1.34 release. Originally split from KEP-4205 to allow for independent tracking, this enhancement leverages cgroupv2 to provide more granular visibility into resource contention. By exposing PSI metrics, the kubelet can now identify and report when a node is experiencing significant pressure on Memory and I/O resources. Notably, the implementation distinguishes between these uncompressible resources and CPU PSI, which, due to its compressible nature, is slated as a requirement for the Beta phase rather than being used for immediate node tainting in the Alpha stage. Graduating to Alpha, this enhancement significantly improves the scheduler & operator's ability to respond to resource exhaustion before a node becomes completely unresponsive.
#5825 CRI List streaming
Stage: Net New to Alpha
Feature group: sig-node
The core of this Alpha enhancement is transforming how CRI handles these List operations, ultimately moving away from a single, bulky all-or-nothing response toward a continuous stream of data. By transitioning to server-side streaming, the List* operations can effectively drip-feed the container and pod data to the kubelet. This prevents the system from hitting that hard 16 MB gRPC wall when dealing with massive datasets, such as nodes managing over 10,000 containers, while ensuring the kubelet still receives the full list it needs to function.
#5419 Pod-level resources support with in-place pod vertical scaling
Stage: Graduating to Beta
Feature group: sig-node
By extending In-Place Pod Resize (IPPR) to support aggregate resource specifications at the Pod level, building on the foundation of KEP-2837. Currently, IPPR is limited to container-level adjustments, often forcing pod recreations to change the overall resource footprint. By enabling dynamic, in-place scaling of pod-level CPU and memory, operators can improve cluster utilization and reduce operational overhead without disrupting running services. This enhancement is specifically designed for cgroupv2 environments and was originally introduced as an opt-in alpha feature in Kubernetes v1.34 under the feature gate - InPlacePodLevelResourcesVerticalScaling. Now making the graduation to beta status.
The design ensures consistency with existing IPPR workflows while introducing new tracking mechanisms in PodStatus to reflect actual allocated resources. While the proposal offers significant flexibility for multi-container pods, it maintains a strict scope: it does not cover non-compute resources (like GPUs), QoS class changes, or the removal of lower-priority pods to facilitate resizing. Potential risks, such as scheduler race conditions and impacts on tools that rely on legacy cgroup derived values, are mitigated through gradual rollouts (Alpha to GA) and comprehensive documentation of the new resource calculation methods.
Node-specific DRA enhancements in 1.36
Dynamic Resource Allocation (DRA) is causing a seismic shift within Kubernetes. As there is an increasing movement to get Kubernetes to support dedicated GPU hardware like Nvidia for AI and LLM workloads, we’ll be seeing more Node-related changes to accommodate DRA in production Kubernetes environments like ChatGPT.
#5304 Device Attributes in Downward API
Stage: Net New to Alpha
Feature group: sig-node
Feature gate: DRADownwardDeviceAttributes Default value: disabled
The Device Attributes in Downward API is a net-new enhancement entering Alpha in Kubernetes 1.36, designed to bridge the metadata gap between hardware providers and containerized workloads. This feature allows Pods to self-discover specific attributes of allocated devices, like hardware IDs, NUMA affinity, or custom vendor strings, by resolving them directly from ResourceSlices and surfacing them into the Pod's environment or volumes. By utilizing a new DRADeviceFieldRef structure, the system can handle complex allocation scenarios, including single-device indexing or concatenating attributes from multiple devices into a single comma-separated string for easier application consumption.
The implementation is unique because it operates as a framework-level opt-in rather than a core Kubernetes API change, requiring DRA drivers to explicitly enable the metadata feature through the kubeletplugin library. To ensure operational stability and security, the design incorporates unique identifiers like the claimUID in host file paths to prevent metadata collisions between different incarnations of the same resource claim. Furthermore, the enhancement includes a dedicated command-line flag within the DRA driver framework, giving administrators and developers a straightforward mechanism to disable the metadata generation code path without needing to refactor the driver’s core logic if issues arise during the Alpha phase.
#5018 AdminAccess for ResourceClaims and ResourceClaimTemplates
Stage: Graduating to Stable
Feature group: sig-node
The DRAAdminAccess feature, graduating to General Availability (GA) in Kubernetes 1.36, solves a critical operational bottleneck in DRA by providing a secure, privileged monitoring path for hardware devices. Historically, DRA was designed for exclusive workload access, so once a GPU or FPGA was claimed by a user it basically became a black box to platform engineers. DRAAdminAccess introduces a formal mechanism for cluster admins to bypass standard allocation logic, allowing them to deploy monitoring agents or diagnostic tools to devices already in use without displacing the primary workload or violating security boundaries.
This graduation to Stable status matters because it transitions the feature from an experimental toggle to a core, production-ready standard for hardware management. By enforcing a strict security model that requires both an adminAccess flag in the ResourceClaim and a specific label on the namespace (resource.kubernetes.io/admin-access: "true") it ultimately prevents non-privileged users from snooping on shared hardware. For dev teams running large-scale AI or HPC clusters, this provides a really powerful feature for real-time health telemetry and troubleshooting, ensuring expensive hardware isn't just utilized, but also properly maintained and observable
To use this feature, the Kubernetes admin first labels a secure namespace and then creates a ResourceClaim with the adminAccess field set to true. This allows the pod to attach to devices even if they are already allocated to other users.
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
name: gpu-monitoring-claim
# The namespace must have the label: resource.kubernetes.io/admin-access: "true"
namespace: kube-system
spec:
devices:
requests:
- deviceClassName: nvidia-gpu-class
# This flag bypasses standard allocation checks in the scheduler
adminAccess: true#5677 Resource Availability Visibility
Stage: Net New to Alpha
Feature group: sig-node
Resource Availability Visibility addresses yet another critical transparency gap in the framework by providing a standardized way to query the real-time availability of hardware resources like GPUs, FPGAs, and NICs. This enhancement introduces a new API object, the ResourcePoolStatusRequest, which functions similarly to a Certificate Signing Request (CSR) in that a user creates the object to ask the cluster about available resources, and a controller populates the status with calculated data before eventually being cleaned up.
Users should care because, prior to this, determining whether a cluster actually had the specialized hardware capacity to run a complex workload was often a try and fail guessing game, especially in multi-tenant environments. By providing a principled, bounded view of resource pools, this feature enables developers and autoscalers to make informed scheduling decisions, reduces pod pending times caused by resource exhaustion, and offers much-needed observability for administrators managing high-performance computing hardware.
Scheduling in Kubernetes 1.36
#5710 Workload-aware preemption
Stage: Net New to Alpha
Feature group: sig-scheduling
This enhancement transitions the Kubernetes scheduler to a workload-centric view to better support modern AI computing requirements. Historically, if a high-priority task needed resources, the scheduler might kill just one or two pods from a lower-priority job. For tightly-coupled workloads like AI training or multi-host inference, losing even a single pod often renders the entire job useless, leaving the remaining pods to sit idle and waste expensive GPU resources. This KEP introduces the PodGroup API and a DisruptionMode setting, which allows the scheduler to recognize these dependencies. By treating a group of pods as a single preemption unit, the scheduler ensures that if it must reclaim resources, it preempts the entire workload at once, preventing the creation of zombie jobs and maximizing the functional utilization of the cluster.
For platform teams, this Alpha stage enhancement provides the foundational building blocks to manage capacity-constrained environments with much higher precision. It introduces a delayed preemption logic that prevents the unnecessary disruption of running workloads until the scheduler is certain that the new, higher-priority workload can actually be bound and started. While this stage focuses on simple implementations and API standardization, it allows platform engineers to start defining priorities at the workload level rather than managing them pod-by-pod. This standardization is critical for teams running mixed-workload clusters, as it paves the way for tighter integration with autoscaling and disruption budgets, ultimately ensuring that high-value AI training and inference jobs meet their SLOs without manual intervention or custom, external scheduling logic.
#5732 Topology-aware workload scheduling
Stage: Net New to Alpha
Feature group: sig-scheduling
By embedding Topology and DRA awareness directly into the kube-scheduler, the system can now evaluate Placements, which are subsets of the cluster like specific racks or interconnected hardware blocks, as a single unit. Instead of scheduling pods one-by-one and hoping they end up near each other, the scheduler simulates the placement of the entire pod group within candidate domains. This ensures that high-performance workloads are co-located to meet strict low-latency and high-bandwidth requirements without relying on external, less-integrated tools.
For platform teams, this is a critical evolution for two reasons:
- Performance reliability and cost efficiency. AI/ML workloads, particularly distributed training, are notoriously sensitive to network jitter/latency; if a single pod in a training job is scheduled across a slow network hop, it can bottleneck the entire GPU cluster, wasting expensive compute cycles.
- By utilizing
TopologyConstraintsandDRAConstraints, platform teams can guarantee that interconnected accelerators and their respective pods are physically co-located on the same rack or PCIe switch. This deep integration into the corekube-schedulerreduces operational complexity by eliminating the need for third-party schedulers, ensuring that AI workloads achieve maximum hardware utilization and predictable training times right out of the box.
#5832 Decouple PodGroup from Workload API
Stage: Net New to Alpha
Feature group: sig-scheduling
This KEP proposes transitioning the PodGroup API into a standalone runtime object for v1alpha2, decoupling it from the Workload API. Previously, PodGroups were embedded within the Workload spec, which led to significant scalability issues (such as hitting the 1.5MB etcd object limit) as well as architectural friction between long-lived config intent and transient scheduling units. By separating them, the Workload object remains a static template for scheduling policies, while the PodGroup acts as a self-contained, controller-owned unit that tracks its own runtime status and lifecycle.
This separation of concerns allows for better garbage collection of associated resources (like ResourceClaims) and reduces API contention. Under this new model, controllers like Job or JobSet automatically create PodGroups based on a podGroupTemplate defined in the referenced Workload. The proposed structure for the standalone object is as follows:
apiVersion: scheduling.k8s.io/v1alpha2
kind: PodGroup
metadata:
name: pd-1
namespace: ns-1
spec:
podGroupTemplateRef:
workloadName: training-workload
podGroupTemplateName: pd-1-template
schedulingPolicy:
gang:
minCount: 2
status:
conditions:
- type: PodGroupScheduled
status: "True"Storage DRA enhancements in 1.36
Normally this falls under sig-scheduling, but due to the rapid feature development and storage-specific enhancements related specifically to DRA, I’ve decided to break this out into its own individual section for the 1.36 release – similar to what we did with the node-specific DRA breakout.
#4815 Partitionable Devices
Stage: Graduating to Beta
Feature group: sig-scheduling
The DRA storage-specific enhancement specifically focuses on dynamic device partitioning using structured parameters. While traditional device plugins require hardware to be partitioned into fixed sizes before a task starts, this new DRA framework allows a vendor to advertise overlapping potential partitions. This means the hardware (GPUs & TPUs) stays as a single bag of resources until a workload is actually scheduled, at which point the scheduler dynamically carves out the specific slice needed.
This approach is highly beneficial because it significantly increases resource utilization and scheduling flexibility. By using a new construct called Counter Sets, the scheduler can track shared physical resources (like memory slices or compute engines) across multiple potential configurations. This prevents fragmentation where a pre-partitioned device sits idle because its fixed size doesn't match any incoming requests. It also enables multi-host scheduling, allowing logical devices (such as interconnected TPU clusters) to be treated as a single allocatable unit across multiple nodes while ensuring the underlying topology remains valid.
The following code snippet demonstrates how a user can request specific, non-overlapping partitions of a GPU (like NVIDIA MIG devices) by defining multiple requests within a single ResourceClaim:
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
name: mig-devices
spec:
devices:
requests:
- name: mig-1g-5gb-0
exactly:
deviceClassName: mig.nvidia.com
selectors:
- cel:
expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"
- name: mig-2g-10gb
exactly:
deviceClassName: mig.nvidia.com
selectors:
- cel:
expression: "device.attributes['gpu.nvidia.com'].profile == '2g.10gb'"
constraints:
- requests: ["mig-1g-5gb-0", "mig-2g-10gb"]
matchAttribute: "gpu.nvidia.com/parentUUID"#4816 Prioritized Alternatives in Device Requests
Stage: Graduating to Stable
Feature group: sig-scheduling
The prioritized list enhancement for DRA is a significant evolution in how Kubernetes handles specialized hardware through a flexible preferences model. Previously, if a workload requested a specific GPU that was unavailable, the pod would simply fail to schedule. This feature introduces the FirstAvailable field, allowing teams to define an ordered list of fallback options. Essentially, it allows a developer to say: "I prefer an H100 GPU, but if those are gone, I'll take an A100, or even two T4s" all within a single ResourceClaim.
For teams deploying AI workloads, this matters because it drastically improves resource obtainability and deployment portability. In the current landscape of GPU scarcity, waiting for a specific chip can stall CI/CD pipelines or production scaling. By providing plan B and C options, AI engineers ensure their training or inference jobs actually start, even if they run on slightly less optimal hardware. Furthermore, it simplifies life for MLOps teams who distribute shared manifests; they can now create a single configuration that works across different clusters with varying hardware availability without requiring users to manually edit YAML files for every environment.
The following snippet demonstrates how to request a high-end GPU with a prioritized fallback to multiple mid-tier GPUs. Note how the config specifically targets the fallback sub-request to ensure the application environment is adjusted only when those specific devices are selected.
apiVersion: resource.k8s.io/v1beta2
kind: ResourceClaim
metadata:
name: ai-workload-gpu-claim
spec:
devices:
requests:
- name: gpu-request
# The scheduler tries these in order:
firstAvailable:
- name: ultra-gpu
deviceClassName: nvidia-h100
count: 1
- name: standard-gpu
deviceClassName: nvidia-a100
count: 1
- name: fallback-gpu
deviceClassName: nvidia-t4
count: 2 # Requesting more of a weaker device to compensate
# Specific configuration that only applies if the 'fallback-gpu' is chosen
config:
- requests: ["gpu-request/fallback-gpu"]
opaque:
driver: gpu.example.com
parameters:
apiVersion: gpu.example.com/v1
kind: GPUConfig
optimizationLevel: "high-memory"#5007 Device BindingConditions
Stage: Graduating to Beta
Feature group: sig-scheduling
The BindingConditions feature is a significant update to the Kubernetes DRA framework designed to handle slow-to-ready hardware. Historically, the Kubernetes scheduler assumes that once a Pod is assigned to a node, any required resources are immediately available. However, modern infrastructure like Composable Disaggregated Infrastructure (CDI) often uses fabric-attached GPUs or FPGAs that require time-consuming steps like physical attachment over a network fabric, PCIe switching, or firmware reprogramming before they can actually be used.
This enhancement introduces a wait-and-see phase in the scheduling process called Readiness-Aware Binding. Instead of blindly binding a Pod to a node and hoping for the best, the scheduler checks specific BindingConditions (such as the is-prepared condition). If the resource isn't ready, the scheduler defers the final binding. Now, if it fails to prepare (like a hardware error), it would then trigger a BindingFailureCondition, which allows the pod to be safely rescheduled elsewhere without getting stuck in a CrashLoopBackOff scenario.
#5055 Device Taints & Tolerations
Stage: Graduating to Beta
Feature group: sig-scheduling
This KEP proposes an extension to DRA that introduces a tainting mechanism for GPU hardware devices, modeled after the existing Kubernetes node taints. Under this proposal, DRA drivers or cluster administrators (via a new DeviceTaintRule API) can mark specific devices as unhealthy or restricted. This allows for granular maintenance, such as taking a single GPU or accelerator offline without impacting the rest of the node, as well as providing a standard way for hardware to report degraded states, such as overheating, without immediately failing workloads.The system supports two primary effects:
NoScheduleprevents new pods from using the deviceNoExecutetriggers the eviction of currently running pods.
Users can bypass these restrictions by adding tolerations directly to their ResourceClaim, allowing for specific scenarios like running diagnostic test pods on a tainted device. By decoupling device health from node status, this feature enables safer pod evictions and more resilient cluster management, ensuring that only workloads capable of handling specific hardware conditions are scheduled onto affected resources.
#5004 Handle extended resource requests via DRA Driver
Stage: Graduating to Beta
Feature group: sig-scheduling
This Kubernetes 1.36 enhancement introduces a bridge between the traditional Extended Resources (simple, integer-based requests) and the newer DRA (which is flexible but more complex). Historically, using advanced hardware like GPUs required choosing between the easy-to-use Device Plugin model or the feature-rich DRA model. This KEP allows cluster administrators to advertise resources managed by DRA drivers as Extended Resources. This means developers can keep using simple resources.limits in their Pod specs while the backend leverages DRA's sophisticated resource tracking and allocation logic.
The primary benefit is a seamless transition path. It allows a single cluster to have a mix of nodes, with some using legacy device plugins and others using DRA, all without requiring any changes to existing Deployment manifests. When a Pod requests an extended resource (like example.com/gpu: 1), the scheduler can now satisfy that request using either a traditional node capacity or a DRA ResourceSlice. If a DRA node is chosen, the scheduler automatically handles the heavy lifting by creating a ResourceClaim to track the allocation, ensuring that the resource is reserved and properly mapped to the container.
The DeviceClass now acts as the link, mapping a specific class of DRA-managed hardware to an Extended Resource name, such as in the below example:
apiVersion: resource.k8s.io/v1beta1
kind: DeviceClass
metadata:
name: gpu.example.com
spec:
selectors:
- cel:
expression: device.driver == 'gpu.example.com' && device.attributes['gpu.example.com'].type == 'gpu'
# This field bridges DRA to the simple Extended Resource name
extendedResourceName: example.com/gpuExisting applications can remain completely unaware of the underlying DRA architecture, using the same familiar syntax they have used for years.
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo
spec:
replicas: 1
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
containers:
- name: demo
image: nvidia/cuda:8.0-runtime
command: ["/bin/sh", "-c"]
args: ["nvidia-smi && tail -f /dev/null"]
resources:
limits:
# The app asks for a simple integer
# K8s decides if this comes from a Device Plugin or a DRA ResourceSlice.
example.com/gpu: 1#5491 List types for attributes
Stage: Net New to Alpha
Feature group: sig-scheduling
This KEP enhances the DRA API by introducing support for list-typed attributes inside ResourceSlice objects. Currently, device characteristics are limited to scalar values, which are insufficient for representing complex hardware topologies where a single device might relate to multiple entities, such as a CPU adjacent to multiple PCIe roots or NUMA nodes. By allowing attributes to be lists of strings, integers, booleans, or versions, the API can more accurately model modern hardware relationships.
To support this change, the proposal redefines the semantics of matchAttribute and distinctAttribute constraints within a ResourceClaim. Specifically, matchAttribute now requires a non-empty intersection between sets of attributes across candidate devices, while distinctAttribute requires them to be pairwise disjoint. To ensure backward compatibility, scalar values are treated as single-element lists. This transition preserves the monotonicity required by the allocator’s algorithms, ultimately ensuring computational complexity remains bounded while also introducing a type-agnostic .include helper function for CEL expressions to simplify the migration for driver developers and users.
#5075 Consumable capacity
Stage: Graduating to Beta
Feature group: sig-scheduling
This specific DRA-related KEP introduces a framework for multi-allocatable devices, moving beyond the previous model of strictly-exclusive / dedicated device assignments. Under this new logic, independent resource claims from unrelated pods (even those across different namespaces) can allocate specific shares of the same underlying hardware. This is managed through a consumable capacity model: the DRA scheduler tracks a device’s total capacity and ensures that the sum of all active claims remains within limits, while also enforcing requestPolicy rules like minimum or maximum per-claim allocations.
To implement this, the KEP introduces several technical fields, including an AllowMultipleAllocations property to identify sharable hardware and a ConsumedCapacity field to track usage in allocation results. It also provides a distinct attribute constraint to prevent a single claim from accidentally grabbing the same multi-allocatable device twice. This is particularly vital for networking (like sharing a physical NIC via CNI) and/or GPU virtualization, where users need to reserve specific fractions of memory or bandwidth without requiring the platform teams to pre-define a massive number of static partitions.
#5729 ResourceClaim support for workloads
Stage: Net New to Alpha
Feature group: sig-scheduling
The proposed enhancement to the Workload and PodGroup APIs introduces a mechanism to associate ResourceClaims and ResourceClaimTemplates directly with PodGroups rather than individual Pods. This decision addresses a critical scalability bottleneck in Kubernetes DRA where the current 256-entry limit in a ResourceClaim’s status.reservedFor list. By reserving a claim for an entire PodGroup, large-scale AI/ML workloads can share a single topological resource, such as a high-speed network fabric or a GPU cluster across hundreds or thousands of Pods. Additionally, referencing a ResourceClaimTemplate at the PodGroup level allows for the automatic, consistent generation of claims for replicated groups, simplifying the lifecycle management for high-level controllers like JobSet and LeaderWorkerSet.
The following snippet illustrates how these new fields allow a workload to define shared DRA devices at the group level, ensuring that all Pods within a specific logic unit are scheduled within the same topological boundary:
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: pg-claim-template
namespace: default
spec:
spec:
devices:
requests:
- name: my-device
exactly:
deviceClassName: example
—
apiVersion: example.com/v1
kind: MyWorkload
metadata:
name: my-workload
namespace: default
spec:
...This architectural change moves the responsibility of tracking resource consumers from individual Pod entries to the group identity. While this introduces a slight memory overhead for the device_taint_eviction controller (which must now index Pods via their group associations) it significantly enhances the ability of DRA to orchestrate multi-node logical devices. By decoupling the claim lifecycle from individual Pod names, Kubernetes can now natively support the strict topological constraints required for high-performance distributed training and complex infrastructure reprogramming.
#5517 ResourceClaim support for workloads
Stage: Net New to Alpha
Feature group: sig-scheduling
Implementing DRA drivers for primary system assets (such as dra-driver-cpu) allows for precise resource orchestration but triggers a significant discrepancy in usage tracking. Because the default kube-scheduler accounting logic does not communicate with the DynamicResources plugin, the two systems operate in isolation, creating a high risk of node oversubscription.
While this issue mirrors the challenges faced by DRA Extended Resources (#5004), the existing fix is not a direct fit due to how discovery works:
- Firstly, Extended Resources are broadcast through either
node.status.allocatableor aResourceSlice, but never both at once. - Secondly, Core Resources are the kind of Assets like CPU that are permanently defined in
node.status.allocatable. A DRA driver, however, would simultaneously track these same assets viaResourceSlice.
This dual-representation of the same physical hardware creates a sync gap that justifies the need for a more integrated, unified accounting framework to prevent future scheduling conflicts.
Storage in Kubernetes 1.36
#1710 Speed up recursive SELinux label change
Stage: Graduating to Stable
Feature group: sig-storage
Graduating to stable in Kubernetes 1.36, this KEP addresses a long-standing performance bottleneck in container storage. Historically, when a Pod starts on a system with SELinux (like RHEL or Fedora), the container runtime must recursively visit every single file and directory on a volume to apply a security label, which is a process that is agonizingly slow for volumes containing millions of files and can lead to out of space errors or Pod startup timeouts. By graduating this to stable, Kubernetes is officially changing the default behavior to use the Linux kernel’s -o context mount option. This allows the system to assign the correct security context to the entire volume at the mount level instantly, bypassing the need for a file-by-file recursive walk and significantly decreasing Pod startup times.
The move to stable in 1.36 is critical because it standardizes the safer, faster, and more predictable security approach across the ecosystem. Beyond performance, it improves security by preventing relabeling attacks (where a compromised Pod might trick the system into relabeling host files) and enables better support for read-only volumes that previously couldn't be relabeled. While this change introduces a strict requirement that all Pods sharing a volume on the same node must use the same SELinux label, Kubernetes provides a clear migration path via the SELinuxChangePolicy field in the PodSpec. This allows users with complex edge cases such as mixing privileged and unprivileged Pods on the same volume to explicitly opt back into the old recursive behavior, ensuring that the performance gains of the stable release don't come at the cost of breaking existing, specialized workloads.
#3314 CSI Differential Snapshot for Block Volumes
Stage: Graduating to BetaFeature group: sig-storage
This enhancement proposes a new, optional CSI SnapshotMetadata API designed to bring efficient, cloud-native differential backup capabilities to Kubernetes. By implementing Changed Block Tracking (CBT), the API allows backup apps to identify only the specific data blocks that have changed between two snapshots (or all allocated blocks in a single snapshot). This avoids the resource-heavy process of backing up entire volumes, significantly reducing storage and network overhead.
To ensure scalability and performance, the design utilizes a proxy sidecar (external-snapshot-metadata) that handles communication between the backup client and the CSI driver. This architecture allows large volumes of snapshot metadata to be streamed directly via a TLS-secured gRPC connection, effectively bypassing the Kubernetes API server to prevent it from being overloaded. Security is maintained through a robust model using Kubernetes-scoped authentication tokens, ensuring that only authorized backup applications can access sensitive volume metadata while keeping the implementation flexible for various storage providers.
#3476 VolumeGroupSnapshot
Stage: Graduating to Stable
Feature group: sig-storage
Feature gate: VolumeGroupSnapshot Default value: disabled
This feature introduces the VolumeGroupSnapshot API for Kubernetes, designed to solve the problem of write-order consistency across multiple volumes. While the existing VolumeSnapshot API handles individual volumes, applications like databases often spread data and logs across different disks; snapshotting these at different times can lead to corrupted data upon restoration. This new feature allows users to group multiple Persistent Volume Claims (PVCs) together using a label selector and trigger a coordinated snapshot that captures all volumes at the exact same point-in-time, ensuring a crash-consistent state without necessarily requiring the application to be paused (quiesced).
As the feature moves from a conceptual proposal / Net New status into the Alpha stage, the focus is on establishing the core architectural plumbing and initial CRDs (VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass). In this stage, the Snapshot Controller and CSI-snapshotter sidecar are updated with new logic to recognize these group objects. The primary goal of Alpha is to validate the end-to-end flow: the controller identifies PVCs via labels, communicates with a CSI driver that supports the newly added CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability, and successfully generates both the group snapshot and the underlying individual volume snapshots. It remains an opt-in feature, gated by a feature flag, intended for initial vendor testing and early feedback so is disabled by default.
#4876 Mutable CSINode Allocatable Property
Stage: Graduating to StableFeature group: sig-storage
Graduating to stable in 1.36, this enhancement makes the PersistentVolume.spec.nodeAffinity field mutable. Previously, node affinity was immutable once set, but this enhancement allows storage providers to update accessibility requirements dynamically, such as when migrating data between zones or enabling features not supported by all nodes. While the update does not disrupt currently running pods, which continue to function under a "required during scheduling, ignored during execution" style logic, this essentially ensures that any new or rescheduled pods are directed to nodes compatible with the updated volume topology.
To handle potential race conditions or mis-scheduling during the transition, the proposal includes a new Kubelet behavior: if a pod is scheduled to a node that no longer satisfies the PV’s updated affinity, the Kubelet will proactively fail the pod rather than letting it get stuck in a perpetual ContainerCreating state. This triggers controllers like StatefulSets or Deployments to recreate the pod on a valid node. Merged in October 2025, originally for the v1.35 milestone, the KEP reached consensus for Alpha with the understanding that while the operation is highly privileged, it provides essential flexibility for evolving storage environments.
#5541 Report last used time on a PVC
Stage: Net New to Alpha
Feature group: sig-storage
The Kubernetes community has officially merged KEP-5541, which introduced a new UnusedSince timestamp field to the PersistentVolumeClaimStatus object. This feature is designed to help DevOps teams and developers identify inactive storage by recording the exact time a Persistent Volume Claim (PVC) last transitioned from being used by a pod to an unused state. Throughout the review process, the field was renamed from LastUsedTime to UnusedSince to more clearly indicate that a nil value signifies a PVC is currently in active use. While the initial alpha implementation focuses on the API field and controller logic, the PR discussions highlighted future plans for potential integration with kube-state-metrics to enhance observability for large-scale cluster management.
#5538 CSI driver opt-in for service account tokens via secrets field
Stage: Graduating to Stable
Feature group: sig-storage
Finally graduating to stable in v.1.36, we’ll see a more secure delivery mechanism for service account tokens by allowing CSI drivers to opt into receiving them via the dedicated secrets field in the NodePublishVolumeRequest. Currently, these sensitive tokens are passed through the volume_context map, a field not designed for confidential data. This architectural flaw has led to significant security vulnerabilities, such as CVE-2023-2878, where tokens were inadvertently leaked into logs because standard sanitization tools do not treat volume context as sensitive. By transitioning to the secrets field, the proposal ensures that tokens are handled by existing security protocols & proto-sanitizers, reducing need for inconsistent, driver-specific workarounds.
To implement this while maintaining backward compatibility, a new field, serviceAccountTokenInSecrets, will be added to the CSIDriver spec. When set to true, the kubelet will redirect tokens from the volume context to the secrets field using the established key - csi.storage.k8s.io/serviceAccount.tokens. Default still remains false to ensure existing drivers do not break, though the API server will issue warnings to encourage migration to the more secure path.
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: example-csi-driver
spec:
tokenRequests:
- audience: "example.com"
expirationSeconds: 3600
# New field for opting into secrets delivery
serviceAccountTokenInSecrets: true # defaults to falseAutoscaling in Kubernetes 1.36
#5030 Integrate CSI Volume attach limits with cluster autoscaler
Stage: Major Change to Alpha
Feature group: sig-autoscaling
This major change to the existing Alpha stage feature addresses a critical gap in Kubernetes resource management by integrating CSI (Container Storage Interface) Volume attachment limits directly into the Cluster Autoscaler. Currently, if a node reaches its maximum capacity for attached volumes, the Cluster Autoscaler may not effectively account for this constraint when deciding whether to scale up or where to place new pods. By bridging the gap between sig-storage and sig-autoscaling, this update ensures that the autoscaler recognizes when a pod cannot be scheduled due to storage limits, triggering the provision of new nodes instead of leaving pods in a Pending state on a saturated node.
As of February 2026, the project is finally tracked for the v1.36 release and will remain in Alpha status. The proposal has successfully passed the PRR and the enhancement freeze deadlines. Development is actively moving forward, with documentation placeholders and code PRs (such as preventing pod scheduling to nodes without the required CSI drivers) already in progress. At the time of writing, the release team were also planning a feature blog to coincide with the release to highlight how this integration improves the reliability of stateful workloads in scaling environments.
#5679 HPA External Metrics Fallback on Retrieval Failure
Stage: Net New to Alpha
Feature group: sig-autoscaling
The net new advancement allows for fallback in Horizontal Pod Autoscalers (HPAs) on failure to still retrieve external metrics. This is a significant reliability enhancement for the HPA, specifically targeting scenarios where external metric APIs (like cloud provider queues or Datadog) experience downtime. Instead of leaving the application in a static state (or even worse, under-provisioned) when a metric becomes unknown, this feature allows operators to define an optional fallback static replica count that triggers after a configurable failure duration. By moving away from metric value substitution to a fixed replica count, Kubernetes ensures that the HPA can maintain a safe capacity baseline during API outages without the risk of unbounded scaling. The enhancement has reached implementable status for its Alpha debut in 1.36, with the core logic merged into the master branch and API types finalized to include new fields like failureDurationSeconds and fallbackStatus.
Instrumentation in Kubernetes 1.36
#4827 StatusZ for Kubernetes Components
Stage: Graduating to Beta
Feature group: sig-instrumentation
Feature gate: ComponentStatusz Default value: disabled
This proposal introduces a standardized /statusz endpoint for core Kubernetes components, modeled after Google’s internal z-pages, to provide low-overhead, real-time insights into a component's internal state. By exposing critical data (such as binary versions, Go versions, and build metadata) directly from the serving process, it empowers developers and operators to perform high-precision troubleshooting without sifting through logs or configuring complex external monitoring tools. The scope is intentionally limited to the primary process to avoid the maintenance complexities of legacy status APIs, ensuring a lightweight and reliable inside-out view of component health.
Security and stability are prioritized through strict RBAC integration and a versioned API rollout. Access is restricted to the system:monitoring group to prevent unauthorized exposure, while the implementation utilizes a feature gate for a cautious Alpha release. The endpoint defaults to a human-readable text/plain format but supports a structured API (v1alpha1) for programmatic access via explicit content negotiation. To enable this across the cluster, the system:monitoring ClusterRole can be updated as follows:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:monitoring
rules:
# existing rules
- apiGroups: [""]
resources: ["nodes/statusz"]
verbs: ["get"]#4828 FlagZ for Kubernetes Components
Stage: Graduating to Beta
Feature group: sig-instrumentation
Feature gate: ComponentFlagz Default value: disabled
This proposal introduces a new /flagz endpoint across core Kubernetes components to enhance observability, troubleshooting, and real-time configuration auditing. By providing direct visibility into the active command-line flags a component was started with, the feature allows cluster admins and devs to quickly diagnose misconfigurations and verify runtime state without relying on external logs or manual inspection. Similar to StatusZ, to ensure the security and performance, access is restricted to the system:monitoring group, and the endpoint is designed with minimal computational overhead.
The endpoint also similarly defaults to a text/plain format for human readability but supports structured, versioned API responses (for example JSON, YAML, CBOR) for programmatic access via explicit header negotiation. During its alpha phase, the feature will be guarded by a feature gate to prevent premature dependency on unstable formats while offering a consistent interface alongside existing diagnostic paths like /healthz and /readyz.
Sample response in text/plain, as discussed above:
----------------------------
title: Kubernetes Flagz
description: Command line flags that Kubernetes component was started with.
----------------------------
default-watch-cache-size=100
delete-collection-workers=1
enable-garbage-collector=true
encryption-provider-config-automatic-reload=false
...#5808 Native Histogram Support for Kubernetes Metrics
Stage: Net New to Alpha
Feature group: sig-instrumentation
KEP-5808 proposes the integration of Prometheus Native Histograms into the Kubernetes control plane. Currently, Kubernetes relies on classic histograms, which require pre-defined, fixed bucket boundaries (like: le="0.1" or le="0.5"). This approach is often inefficient, as it forces users to choose between low resolution (grouping 1ms and 40ms requests together) or high cardinality (creating dozens of separate time series that bloat Prometheus storage). By moving toward a native histogram format, Kubernetes can leverage exponential bucket boundaries that automatically adjust to data distributions, offering significantly higher precision for detecting performance regressions while simultaneously reducing the total number of time series by approximately 10x.
The proposal aims to solve the chicken-and-egg migration problem through a dual exposition strategy. Under the new NativeHistograms feature gate, Kubernetes components like the kube-apiserver and kubelet will serve metrics in both classic and native formats simultaneously when requested via Protobuf. This allows Platform Engineers to reduce monitoring costs and SREs to set more granular SLOs without breaking existing dashboards or alerts. The plan includes a careful rollout path that accounts for Prometheus version differences, ensuring that users can transition to high-resolution metrics at their own pace without risking silent failures in their observability stack.
Deprecations in Kubernetes 1.36
#5040 Remove gitRepo volume driver
Stage: Deprecation/Removal
Feature group: sig-storage
The planned removal of the gitRepo volume driver in Kubernetes 1.36 marks the end of a long-deprecated feature that has become a significant security liability. Although it was designed to provide a convenient way to manifest Git repository files into a Pod, the driver’s implementation requires the kubelet to run as root. This architectural flaw was highlighted under CVE-2024-10220, where researchers demonstrated how an attacker could use Git hooks within a malicious repository to execute arbitrary code on the host node with root privileges. Given that the feature has been deprecated for nearly five years and presents such a critical escape-to-host risk, the community has pivoted toward a complete removal of the in-tree driver to ensure the platform is secure by default.
For users still relying on gitRepo volumes, the migration path is well-established and significantly more robust. The Kubernetes project recommends using an emptyDir volume in conjunction with an initContainer to clone the repository, or utilizing the dedicated git-sync sidecar project. These alternatives offer better isolation, support for modern authentication, and frequent updates that the legacy in-tree driver lacked. While the removal represents a breaking change, the timeline (stretching through 1.36 and beyond) provided a structured window for cluster administrators to implement Validating Admission Policies (VAP) to identify and migrate any remaining workloads before the driver is fully purged from the kubelet.
#5707 Deprecate service.spec.externalIPs
Stage: Deprecation/Removal
Feature group: sig-network
The deprecation of service.spec.externalIPs in Kubernetes 1.36 is the direct result of a long-standing design flaw identified as CVE-2020-8554. This vulnerability exposed a fundamental security gap in multi-tenant clusters: any user with basic permissions to create or edit a Service could claim an arbitrary IP address (including those of internal DNS servers or external websites) by simply listing them in the externalIPs field. Because kube-proxy would then blindly program the cluster's network rules to redirect traffic for those IPs to the attacker’s pods, it enabled high-impact Man-in-the-Middle (MitM) attacks and unauthorized traffic interception without requiring high-level administrative privileges.
While the community initially addressed this risk through external mitigations like the DenyServiceExternalIPs admission controller and OPA Gatekeeper policies, these were essentially band-aids for a feature that lacked native validation or authorization. The 1.36 deprecation finally marks the beginning of the end for this architectural debt, moving from optional blocking to a formal phased removal. By transitioning users toward modern alternatives like the Gateway API or LoadBalancer services, Kubernetes is finally stripping out the underlying code in kube-proxy that made the CVE-2020-8554 exploit possible, effectively hardening the cluster networking model by default.
Deferred / Removed from Milestone
#5507 Container Resource Controls for Out-Of-Memory (OOM) Behavior
Stage: Net New to Alpha
Feature group: sig-node
Status: Deferred
#5869 Wildcard Matching in Toleration Keys
Stage: Net New to Alpha
Feature group: sig-scheduling
Status: Deferred
#1432 Persistent Volume Health Monitor
Stage: Net New to Alpha
Feature group: sig-storage
Status: Removed from Milestone
#5773 DRA: Priority for ResourceSlices in a resource pool
Stage: Alpha
Feature group: sig-storage
Status: Removed from Milestone
#24 AppArmor support
Stage: Stable
Feature group: sig-node
Status: Removed from Milestone
#5194 DRA: ReservedFor Workloads
Stage: AlphaFeature group: sig-storageStatus: Removed from Milestone
#5234 DRA: ResourceSlice Mixins
Stage: Alpha
Feature group: sig-storage
Status: Removed from Milestone
#5683 Specialized Lifecycle Management
Stage: Net New to Alpha
Feature group: sig-node
Status: Removed from Milestone
Timeline of v.1.36 Kubernetes Release
Kubernetes users can expect the v1.36 release process to unfold throughout April 2026, with past milestones including the start of the cycle on January 12th and the Enhancements Freeze on February 12th. Upcoming technical milestones involve the creation of the release-1.36 branch on April 8th, following the Code and Test Freeze on March 11th. The process culminates in the official v1.36.0 release on Wednesday, April 22nd, 2026, notably following the community's gathering at KubeCon Amsterdam in late March.
| What is happening? | By whom? | And when? |
|---|---|---|
| Release Cycle Begins | Lead | Monday 12th January 2026 |
| v1.36.0-alpha.1 released | Branch Manager | Wednesday 4th February 2026 |
| Enhancements Freeze | Enhancements Lead | Thursday 12th February 2026 |
| v1.36.0-alpha.2 released | Branch Manager | Wednesday 18th February 2026 |
| Code & Test Freeze | Branch Manager | Wednesday 11th March 2026 |
| Kubecon Amsterdam | CNCF Event | Monday 23rd March 2026 |
| release-1.36 branch created | Branch Manager | Wednesday 8th April 2026 |
| Kubernetes v1.36.0 released | Branch Manager | Wednesday 22nd April 2026 |
Kubernetes Release Archive
Kubernetes 1.36 – What you need to know
Kubernetes 1.35 – What you need to know
Kubernetes 1.34 – What you need to know
Kubernetes 1.33 – What you need to know
More articles


Secure containers by default with Cloudsmith and Docker Hardened Images

Kubernetes 1.35 – What you need to know

Python 3.14 – What you need to know

Migrating from Docker Content Trust to Sigstore

