Containers
Announcing Karpenter 1.0
Introduction
In November 2021, AWS announced the launch of v0.5 of Karpenter, “a new open source Kubernetes cluster auto scaling project.” Originally conceived as a flexible, dynamic, and high-performance alternative to the Kubernetes Cluster Autoscaler, in the nearly three years since then Karpenter has evolved substantially into a fully featured, Kubernetes native node lifecycle manager.
The project has been adopted for mission-critical use cases by industry leaders. It has added key features such as workload consolidation, which is designed to automatically improve utilization, and disruption controls to allow users to specify how and when Karpenter performs node lifecycle management operations in their clusters. In October 2023, the project graduated to beta, and AWS contributed the vendor-neutral core of the project to the Cloud Native Computing Foundation (CNCF) through the Kubernetes SIG auto scaling. Engagement from the Karpenter community made it one of the top-ten most-popular AWS open source projects by GitHub stars, and contributions by non-AWS community members have increased both in number and scope. Underpinning this evolution, the user success stories, new features, and community contributions, the Karpenter team at AWS has been working diligently to raise the bar on the project’s maturity and operational stability.
Today, with the release of Karpenter v1.0.0, we are proud to announce that Karpenter has graduated out of beta. With this release, the stable Karpenter APIs – NodePool
and EC2NodeClass
–remains available for future 1.0 minor version releases and will not be modified in ways that result in breaking changes from one minor release to another. In this post we describe the changes between the current v0.37 Karpenter release and v1.0.0.
What is changing?
As part of the v1 release, the custom resource definition (CRD) application programming interface (API) groups and kind name remain unchanged. We have also created conversion webhooks to make the migration from beta to stable more seamless. On the subsequent minor version of Karpenter after v1 (v1.1.0
), we plan to drop support for v1beta1 APIs. The following shows a summary of the new features and changes.
Enhanced disruption controls by reason
In Karpenter release v0.34.0, Karpenter introduced disruption controls to give users more control over how and when Karpenter terminates nodes to improve the balance between cost-efficiency, security, and application availability. These disruption budgets follow the expressive cron syntax and can be scheduled to apply at certain times of the day, days of the week, hours, minutes, or all the time to further protect application availability. By default, if a Karpenter disruption block is not set, then Karpenter limits disruptions to 10% of nodes at any point in time.
Karpenter v1 adds support for disruption budgets by reason. The supported reasons are Underutilized
, Empty
, and Drifted
. This enables users to have finer-grained control of the disruption budgets that apply to specific disruption reasons. For example, the following disruption budgets define how a user can implement a control where:
- 0% of nodes can be disrupted Monday to Friday from 9:00 UTC for eight hours if drifted or underutilized
- 100% of nodes can be disrupted if empty at all times.
- At any other time of day it allows 10% of nodes to be disrupted, when drifted or underutilized and a stricter budget is not active.
Users might use this budget to make sure that empty nodes can be terminated during periods of peak application traffic to optimize compute. If a reason is not set, then the budget applies to all reasons.
Renamed consolidation policy WhenUnderutilized to WhenEmptyOrUnderutilized
The WhenUnderutilized
consolidation policy has been renamed to WhenEmptyOrUnderutilized
. The functionality remains the same as it was in v1beta1 were Karpenter would consolidate nodes that are partially utilized or empty when consolidationPolicy=WhenUnderutilized
. The new name WhenEmptyOrUnderutilized
explicitly reflects the conditions correctly.
Introducing consolidateAfter consolidation control for underutilized nodes
Karpenter prioritizes nodes to consolidate based on the least number of pods scheduled. Users with workloads that experience rapid surges in demand or interruptible jobs might have high pod churn and have asked to be able to tune how quickly Karpenter attempts to consolidate nodes to retain capacity and minimize node churn. Previously, consolidateAfter
could only be used when consolidationPolicy=WhenEmpty
, which is when the last pod is removed. consolidateAfter
can now be used when consolidationPolicy= WhenEmptyOrUnderutilized
, thus allowing users to specify in hours, minutes, or seconds how long Karpenter waits when a pod is added or removed before consolidating. If you would like the same behavior as v1beta1, then set consolidateAfter
to 0 when consolidationPolicy=WhenEmptyOrUnderutilized
.
New disruption control terminationGracePeriod
Cluster administrators would like a way to enforce a maximum node lifetime natively within Karpenter, and for it to be compliant with security requirements. Karpenter gracefully disrupts nodes by respecting Pod Disruption Budgets (PDB), a pod’s terminationGracePeriodSeconds,
and karpenter.sh/do-not-disrupt
annotation. If these settings were misconfigured, then Karpenter would block indefinitely waiting for nodes to be disrupted, which prevents cluster admins from rolling out new Amazon Machine Images (AMIs).
Therefore, a terminationGracePeriod
has been introduced. terminationGracePeriod
is the maximum time Karpenter will be draining a node before forcefully deleting, and it won’t wait for a replacement node after the node’s expiration has been met. The maximum lifetime of a node is its terminationGracePeriod + expireAfter
. As part of this change, expireAfter
configuration has also been moved from the disruption block to template spec.
In the following example, a cluster administrator might configure a NodePool
so that the nodes start draining after 30 days, but they have a 24 hour grace period before being forcefully terminated so that existing workloads (such as long-running batch jobs) have enough time to complete before being forcefully terminated.
Drift feature gate removed
Karpenter drift replaces nodes that drift from a desired state (e.g. using an outdated AMI). In v1 drift has been promoted to stable and the feature gate removed, which means nodes now drift by default. If users disabled the drift feature gate in v1beta1 they can now control drift by using disruption budgets by reason.
Require amiSelectorTerms
In Karpenter v1beta1 APIs, when specifying amiFamily
with no amiSelectorTerms
, Karpenter would automatically update nodes through drift when a new version of the Amazon EKS optimized AMI in that family is released. This works well in pre-production environments where it’s nice to be auto-upgraded to the latest version for testing, but might not be desired in production environments. Karpenter now recommends that users pin AMIs in their production environments. More information on how to manage AMIs can be found in the Karpenter documentation.
amiSelectorTerms
has now been made a required field and a new term, alias
, has been introduced, which consists of an AMI family and a version (family@version
). If an alias
exists in the EC2NodeClass
, then Karpenter selects the Amazon EKS optimized AMI for that family. With this new feature, users can pin to a specific version of the Amazon EKS optimized AMI. The following Amazon EKS optimized AMI families can be configured: al2, al2023, bottlerocket
, windows2019
, and windows2022
. The following section provides an example.
Using Amazon EKS optimized AMIs
In this example, Karpenter provisions nodes with the Bottlerocket v1.20.3 Amazon EKS optimized AMI. Even after AWS releases newer versions of the Bottlerocket Amazon EKS optimized AMI, the worker nodes do not drift.
Using custom AMIs
If the EC2NodeClass
does not specify an alias
term, then amiFamily
needs to configure which user data is used. The amiFamily
can be set to one of AL2
, AL2023,
Bottlerocket
, Windows2019
, and Windows2022
to select a pre-generated user data, or to Custom
if the user provides their own user data. You can use the existing tags, name, or ID field in in amiSelectorTerms
to select an AMI. Examples of injected user data can be found in the Karpenter documentation for the Amazon EKS optimized AMI families.
In the following example, the EC2NodeClass
selects a user-specified AMI with ID “ami-123” and uses the Bottlerocket generated user data.
Removed Ubuntu AMI family selection
Beginning with v1 the Ubuntu AMI family has been removed. To continue using the Ubuntu AMI you can configure an AMI in amiSelectorTerms
pinned to the latest Ubuntu AMI ID. Furthermore, you can reference amiFamily: AL2
in your EC2NodeClass
to get the same user data configuration that you received before. The following is an example:
Restrict Instance Metadata Service access from containers by default
It is an Amazon EKS best practice to restrict pods from accessing the AWS Identity Access and Management (IAM) instance profile attached to nodes to help make sure that your applications only have the permissions they need, and not that of their nodes. Therefore, by default for new EC2NodeClass
, access to Instance Metadata Service (IMDS) is blocked by setting the hop count to one (httpPutResponseHopLimit:1
) and requiring IMDSv2 (httpTokens: required
). Pods using host networking mode continue to have access to IMDS. Users should use Amazon EKS Pod Identity or IAM roles for service accounts to grant AWS permissions to pods to access AWS services.
Moved kubelet configuration to EC2NodeClass
Karpenter provides the ability to specify a subset of kubelet arguments for additional customization. In Karpenter v1 the kubelet configuration has been moved to the EC2NodeClass
API. If you provided a custom kubelet configuration and have multiple NodePool
with different kubelet configurations referencing a single EC2NodeClass
, then you now need to use multiple EC2NodeClass
. In Karpenter v1 the conversion webhooks maintain this compatibility. However, before migrating to v1.1.x, users must update their NodePool
to reference the correct EC2NodeClass
, which results in nodes drifting.
NodeClaims made immutable
Karpenter v1beta1 did not enforce immutability on NodeClaim
s, but it assumed that users would not be acting against these objects after creation. Therefore, NodeClaim
s are now immutable as the NodeClaim
lifecycle controller won’t react to changes after the initial instance launch.
Require all NodePool nodeClassRef fields and rename apiVersion field to group
Karpenter v1beta1 did not require users to set the apiVersion
and kind of the NodeClass
that they were referencing. In Karpenter v1 users are now required to set all nodeClassRef
fields. In addition, the apiVersion
field in the nodeClassRef
has been renamed to group
.
Karpenter Prometheus metric changes
Karpenter makes several metrics available in the Prometheus format to allow monitoring of the Karpenter controller and cluster provisioning status. As part of the Karpenter v1 release a number of the v1beta1 metrics have changed, therefore for users that have dashboards with queries that use these metrics will need to be updated. For a detailed list of metric changes review the Karpenter v1 upgrade documentation.
Planned deprecations
As part of this change the following beta deprecations have been removed in v1:
karpenter.sh/do-not-evict
annotation was introduced as a pod-level control in alpha. This control was superseded by thekarpenter.sh/do-not-disrupt
annotation that disables disruption operations against the node on which the pod is running. Thekarpenter.sh/do-not-evict
annotation was declared as deprecated throughout beta and is dropped in v1.karpenter.sh/do-not-consolidate
annotation was introduced as a node-level control in alpha. This control was superseded by thekarpenter.sh/do-not-disrupt
annotation that disabled the disruption operations rather than just consolidation. Thekarpenter.sh/do-not-consolidate
annotation was declared as deprecated throughout beta and is dropped in v1.- ConfigMap-based configuration was deprecated in v1beta1 and has been fully removed in v1. This configuration was deprecated in favor of a simpler, CLI/environment variable based configuration.
- Support for the
karpenter.sh/managed-by
tag which stores the cluster name in its value, is replaced byeks:eks-cluster-name
.
For a full list of new features, changes, and deprecations, read the detailed changelog.
Migration path
As the v1 APIs for Karpenter do not result in a changing API group or resource, this enables use of the Kubernetes webhook conversion process to upgrade APIs in-place without having to roll nodes. Prior to upgrading you must be on a version (0.33.0+) of Karpenter that supports v1beta1 APIs, such as NodePool
, NodeClaim
, and EC2NodeClass
.
A summary of the upgrade process from beta to v1 is as follows:
- Apply the updated v1
NodePool
,NodeClaim
, andEC2NodeClass
CRDs - Upgrade Karpenter controller to its v1.0.0 version. This version of Karpenter starts reasoning in terms of the v1 API schema in its API requests. Resources are converted from the v1beta1 to the v1 version automatically, using conversion webhooks shipped by the upstream Karpenter project and the Providers (for
EC2NodeClass
changes). - Next and before upgrading to Karpenter v1.1.0 users must update their v1beta1 manifests to use the new v1 version, taking into consideration the API changes for the release. See the before upgrading to Karpenter v1.1.0 in the v1 migration documentation for more details.
For detailed upgrade steps, see the Karpenter v1 migration documentation.
Conclusion
In this post, you learned about the Karpenter 1.0.0 release, and a summary of the new features and changes. Before you upgrade Karpenter to v1.0.0, we recommend that you read the full Karpenter v1 migration documentation and test your upgrade process in a non-production environment. If you have questions or feedback, then reach out in the Kubernetes slack #karpenter channel or on GitHub where you can share feedback.