Skip to content

Commit 61a7116

Browse files
committed
Decouple Startup CPU Boost from VPA modes
1 parent fabcbe5 commit 61a7116

File tree

1 file changed

+167
-77
lines changed
  • vertical-pod-autoscaler/enhancements/7862-cpu-startup-boost

1 file changed

+167
-77
lines changed

vertical-pod-autoscaler/enhancements/7862-cpu-startup-boost/README.md

Lines changed: 167 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
# AEP-7862: CPU Startup Boost
22

3-
<!-- toc -->
3+
<!-- TOC -->
4+
45
- [AEP-7862: CPU Startup Boost](#aep-7862-cpu-startup-boost)
5-
- [Summary](#summary)
66
- [Goals](#goals)
77
- [Non-Goals](#non-goals)
88
- [Proposal](#proposal)
99
- [Design Details](#design-details)
1010
- [Workflow](#workflow)
1111
- [API Changes](#api-changes)
12-
- [Priority of `StartupBoost`](#priority-of-startupboost)
12+
- [Priority of StartupBoost](#priority-of-startupboost)
1313
- [Validation](#validation)
1414
- [Static Validation](#static-validation)
1515
- [Dynamic Validation](#dynamic-validation)
@@ -19,12 +19,17 @@
1919
- [Kubernetes Version Compatibility](#kubernetes-version-compatibility)
2020
- [Test Plan](#test-plan)
2121
- [Examples](#examples)
22-
- [CPU Boost Only](#cpu-boost-only)
23-
- [CPU Boost and Vanilla VPA](#cpu-boost-and-vanilla-vpa)
22+
- [Per-pod configurations startupBoost configured in VerticalPodAutoscalerSpec](#per-pod-configurations-startupboost-configured-in-verticalpodautoscalerspec)
23+
- [Startup CPU Boost Enabled & VPA Disabled](#startup-cpu-boost-enabled--vpa-disabled)
24+
- [Startup CPU Boost Disabled & VPA Enabled](#startup-cpu-boost-disabled--vpa-enabled)
25+
- [Startup CPU Boost Enabled & VPA Enabled](#startup-cpu-boost-enabled--vpa-enabled)
26+
- [Per-container configurations startupBoost configured in ContainerPolicies](#per-container-configurations-startupboost-configured-in-containerpolicies)
27+
- [Startup CPU Boost Enabled & VPA Disabled](#startup-cpu-boost-enabled--vpa-disabled)
28+
- [Startup CPU Boost Disabled & VPA Enabled](#startup-cpu-boost-disabled--vpa-enabled)
29+
- [Startup CPU Boost Enabled & VPA Enabled](#startup-cpu-boost-enabled--vpa-enabled)
2430
- [Implementation History](#implementation-history)
25-
<!-- /toc -->
2631

27-
## Summary
32+
<!-- /TOC -->
2833

2934
Long application start time is a known problem for more traditional workloads
3035
running in containerized applications, especially Java workloads. This delay can
@@ -38,10 +43,6 @@ the pod startup and to scale the CPU resources back down when the pod is
3843
`Ready` or after certain time has elapsed, leveraging the
3944
[in-place pod resize Kubernetes feature](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources).
4045

41-
> [!NOTE]
42-
> This feature depends on the new `InPlaceOrRecreate` VPA mode:
43-
> [AEP-4016: Support for in place updates in VPA](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md)
44-
4546
### Goals
4647

4748
* Allow VPA to boost the CPU request and limit of a pod's containers during the
@@ -61,17 +62,16 @@ time.
6162

6263
## Proposal
6364

64-
* To extend [`ContainerResourcePolicy`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L191)
65+
* To extend [`VerticalPodAutoscalerSpec`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L75)
6566
with a new `StartupBoost` field to allow users to configure the CPU startup
6667
boost.
6768

68-
* To extend [`ContainerScalingMode`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L231-L236)
69-
with a new `StartupBoostOnly` mode to allow users to only enable the startup
70-
boost feature and not vanilla VPA altogether.
69+
* To extend [`ContainerResourcePolicy`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L191)
70+
with a new `StartupBoost` field to allow users to optionally customize the
71+
startup boost behavior for individual containers.
7172

72-
* To allow CPU startup boost if a `StartupBoost` config is specified in `Auto`
73-
[`ContainerScalingMode`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L231-L236)
74-
container policies.
73+
* To enable only startup boost (if the `StartupBoost` config is present in the
74+
VPA object) without having to ALSO use the traditional VPA functionality.
7575

7676
## Design Details
7777

@@ -95,24 +95,37 @@ down the CPU resources to the appropriate non-boosted value:
9595

9696
### API Changes
9797

98-
The new `StartupBoost` parameter will be added to the [`ContainerResourcePolicy`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L191)
99-
and contain the following fields:
100-
* `StartupBoost.CPU.Factor`: the factor by which to multiply the initial
101-
resource request and limit of the containers' targeted by the VPA object.
102-
* `StartupBoost.CPU.Value`: the target value of the CPU request or limit
103-
during the startup boost phase.
98+
The new `StartupBoost` parameter will be added to both:
99+
* [`VerticalPodAutoscalerSpec`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L75):
100+
Will allow users to specify the default CPU startup boost for all containers of the pod targeted by the VPA object.
101+
* [`ContainerResourcePolicy`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L191):
102+
Will allow users to optionally customize the startup boost behavior for individual containers.
103+
104+
`StartupBoost` will contain the following fields:
105+
* [Optional] `StartupBoost.CPU.Type`:A string that specifies the kind of boost
106+
to apply. Supported values are:
107+
* `Factor`: The `StartupBoost.CPU.Value` field will be interpreted as a
108+
multiplier for the recommended CPU request. For example, a value of `2` will
109+
double the CPU request.
110+
* `Quantity`: The `StartupBoost.CPU.Value` field will be interpreted as an
111+
absolute CPU resource quantity (e.g., `"500m"`, `"1"`) to be used as the CPU
112+
request or limit during the boost phase.
113+
* If not specified, `StartupBoost.CPU.Type` defaults to `Factor`.
114+
115+
* `StartupBoost.CPU.Value`: A string representing the magnitude of
116+
the boost, interpreted based on the `StartupBoost.CPU.Type`.
117+
* If `StartupBoost.CPU.Type`is `Factor`, this field is optional and
118+
defaults to `"1"`.
119+
* If `StartupBoost.CPU.Type`is `Quantity`, this field is required.
104120
* [Optional] `StartupBoost.CPU.Duration`: if specified, it indicates for how
105121
long to keep the pod boosted **after** it goes to `Ready`.
122+
* It defaults to `0s` if not specified.
106123

107124
> [!IMPORTANT]
108125
> The boosted CPU value will be capped by
109126
> [`--container-recommendation-max-allowed-cpu`](https://github.com/kubernetes/autoscaler/blob/4d294562e505431d518a81e8833accc0ec99c9b8/vertical-pod-autoscaler/pkg/recommender/main.go#L122)
110127
> flag value, if set.
111128
112-
> [!IMPORTANT]
113-
> Only one of `Factor` or `Value` may be specified per container policy.
114-
115-
116129
> [!NOTE]
117130
> To ensure that containers are unboosted only after their applications are
118131
> started and ready, it is recommended to configure a
@@ -121,22 +134,15 @@ and contain the following fields:
121134
> section for more details on this feature's behavior for different combinations
122135
> of probers + `StartupBoost.CPU.Duration`.
123136
124-
We will also add a new mode to the [`ContainerScalingMode`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L231-L236):
125-
* **NEW**: `StartupBoostOnly`: new mode that will allow users to only enable
126-
the startup boost feature for a container and not vanilla VPA altogether.
127-
* **NEW**: `Auto`: we will modify the existing `Auto` mode to enable both
128-
vanilla VPA and CPU Startup Boost (when `StartupBoost` parameter is
129-
specified).
130-
131137
#### Priority of `StartupBoost`
132138

133-
The new `StartupBoost` field will take precedence over the rest of the container
134-
resource policy configurations. Functioning independently from all other fields
135-
in [`ContainerResourcePolicy`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L191),
139+
The new `StartupBoost` field will take precedence over the rest of the fields
140+
in [`VerticalPodAutoscalerSpec`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L75)
141+
and [`ContainerResourcePolicy`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L191),
136142
**except for**:
137-
* [`ContainerName`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L192-L195)
138-
* [`Mode`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L196-L198)
139-
* [`ControlledValues`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L214-L217)
143+
* [`VerticalPodAutoscalerSpec.TargetRef`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L88)
144+
* [`ContainerResourcePolicy.ContainerName`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L192-L195)
145+
* [`ContainerResourcePolicy.ControlledValues`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L214-L217)
140146

141147
This means that a container's CPU request/limit can be boosted during startup
142148
beyond [`MaxAllowed`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.3.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L203-L206),
@@ -149,24 +155,21 @@ excluded from [`ControlledResources`](https://github.com/kubernetes/autoscaler/b
149155

150156
* We will check that the `startupBoost` configuration is valid when VPA objects
151157
are created/updated:
152-
* The VPA autoscaling mode must be `InPlaceOrRecreate` (since it does not
153-
make sense to use this feature with disruptive modes of VPA).
154-
* The boost factor is >= 1 (via CRD validation rules)
155-
* Only one of `StartupBoost.CPU.Factor` or `StartupBoost.CPU.Value` is
156-
specified
157-
* The [feature enablement](#feature-enablement) flags must be on.
158+
* The boost factor value is >= 1 (via CRD validation rules)
159+
* The [feature enablement](#feature-enablement-and-rollback) flags must be
160+
on.
158161

159162

160163
#### Dynamic Validation
161164

162-
* `StartupBoost.CPU.Value` must be greater than the CPU request or limit of the
165+
* The boosted CPU value must be greater than the CPU request or limit of the
163166
container during the boost phase, otherwise we risk downscaling the container.
164167

165168
### Mitigating Failed In-Place Downsizes
166169

167170
The VPA Updater **will not** evict a pod if it attempted to scaled the pod down
168171
in place (to unboost its CPU resources) and the update failed (see the
169-
[scenarios](https://github.com/kubernetes/autoscaler/blob/0a34bf5d3a71b486bdaa440f1af7f8d50dc8e391/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md?plain=1#L164-L169 ) where the VPA
172+
[scenarios](https://github.com/kubernetes/autoscaler/blob/0a34bf5d3a71b486bdaa440f1af7f8d50dc8e391/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md?plain=1#L164-L169) where the VPA
170173
updater will consider that the update failed). This is to avoid an eviction
171174
loop:
172175

@@ -179,37 +182,33 @@ the pod in-place and it fails.
179182

180183
#### How can this feature be enabled / disabled in a live cluster?
181184

182-
* Feature gates names: `CPUStartupBoost` and `InPlaceOrRecreate` (from
183-
[AEP-4016](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md#feature-enablement-and-rollback))
185+
* Feature gates name: `CPUStartupBoost`
184186
* Components depending on the feature gates:
185187
* admission-controller
186188
* updater
187189

188-
Enabling of feature gates `CPUStartupBoost` AND `InPlaceOrRecreate` will cause
189-
the following to happen:
190+
Enabling of feature gates `CPUStartupBoost` will cause the following to happen:
190191
* admission-controller to **accept** new VPA objects being created with
191-
`StartupBoostOnly` configured.
192+
`StartupBoost` configured.
192193
* admission-controller to **boost** CPU resources.
193194
* updater to **unboost** the CPU resources.
194195

195-
Disabling of feature gates `CPUStartupBoost` OR `InPlaceOrRecreate` will cause
196-
the following to happen:
196+
Disabling of feature gates `CPUStartupBoost` will cause the following to happen:
197197
* admission-controller to **reject** new VPA objects being created with
198-
`StartupBoostOnly` configured.
198+
`StartupBoost` configured.
199199
* A descriptive error message should be returned to the user letting them
200200
know that they are using a feature gated feature.
201201
* admission-controller **to not** boost CPU resources, should it encounter a
202-
VPA configured with a `StartupBoost` config and `StartupBoostOnly` or `Auto`
203-
`ContainerScalingMode`.
202+
VPA configured with a `StartupBoost` config.
204203
* updater **to not** unboost CPU resources when pods meet the scale down
205204
requirements, should it encounter a VPA configured with a `StartupBoost`
206-
config and `StartupBoostOnly` or `Auto` `ContainerScalingMode`.
205+
config.
207206

208207
### Kubernetes Version Compatibility
209208

210209
Similarly to [AEP-4016](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support#kubernetes-version-compatibility),
211-
`StartupBoost` configuration and `StartupBoostOnly` mode are built assuming that
212-
VPA will be running on a Kubernetes 1.33+ with the beta version of
210+
`StartupBoost` configuration is built assuming that VPA will be running on a
211+
Kubernetes 1.33+ with the beta version of
213212
[KEP-1287: In-Place Update of Pod Resources](https://github.com/kubernetes/enhancements/issues/1287)
214213
enabled. If this is not the case, VPA's attempt to unboost pods may fail and the
215214
pods may remain boosted for their whole lifecycle.
@@ -242,11 +241,31 @@ down.
242241
Here are some examples of the VPA CR incorporating CPU boosting for different
243242
scenarios.
244243

245-
### CPU Boost Only
244+
### Per-pod configurations (`startupBoost` configured in `VerticalPodAutoscalerSpec`)
246245

247-
All containers under `example` deployment will receive "regular" VPA updates,
248-
**except for** `boosted-container-name`. `boosted-container-name` will only be
249-
CPU boosted/unboosted, because it has a `StartupBoostOnly` container policy.
246+
#### Startup CPU Boost Enabled & VPA Disabled
247+
248+
```yaml
249+
apiVersion: "autoscaling.k8s.io/v1"
250+
kind: VerticalPodAutoscaler
251+
metadata:
252+
name: example-vpa
253+
spec:
254+
targetRef:
255+
apiVersion: "apps/v1"
256+
kind: Deployment
257+
name: example
258+
updatePolicy:
259+
# This only disables VPA actuations. It doesn't disable
260+
# startup boost configurations.
261+
updateMode: "Off"
262+
startupBoost:
263+
cpu:
264+
value: 3.0
265+
duration: 10s
266+
```
267+
268+
#### Startup CPU Boost Disabled & VPA Enabled
250269
251270
```yaml
252271
apiVersion: "autoscaling.k8s.io/v1"
@@ -259,23 +278,94 @@ spec:
259278
kind: Deployment
260279
name: example
261280
updatePolicy:
262-
# VPA Update mode must be InPlaceOrRecreate
263-
updateMode: "InPlaceOrRecreate"
281+
updateMode: "Auto"
282+
```
283+
284+
#### Startup CPU Boost Enabled & VPA Enabled
285+
286+
```yaml
287+
apiVersion: "autoscaling.k8s.io/v1"
288+
kind: VerticalPodAutoscaler
289+
metadata:
290+
name: example-vpa
291+
spec:
292+
targetRef:
293+
apiVersion: "apps/v1"
294+
kind: Deployment
295+
name: example
296+
updatePolicy:
297+
updateMode: "Auto"
298+
startupBoost:
299+
cpu:
300+
value: 3.0
301+
duration: 10s
302+
```
303+
304+
### Per-container configurations (`startupBoost` configured in `ContainerPolicies`)
305+
306+
#### Startup CPU Boost Enabled & VPA Disabled
307+
308+
All containers under `example` deployment will receive "regular" VPA updates
309+
(VPA is in `"Auto"` mode in this example), **except for**
310+
`boosted-container-name`. `boosted-container-name` will only be CPU
311+
boosted/unboosted (`StartupBoost` is enabled and VPA `Mode` is set to `Off`).
312+
313+
```yaml
314+
apiVersion: "autoscaling.k8s.io/v1"
315+
kind: VerticalPodAutoscaler
316+
metadata:
317+
name: example-vpa
318+
spec:
319+
targetRef:
320+
apiVersion: "apps/v1"
321+
kind: Deployment
322+
name: example
264323
resourcePolicy:
265324
containerPolicies:
266325
- containerName: "boosted-container-name"
267-
mode: "StartupBoostOnly"
326+
# VPA mode is set to Off, so it never changes pod resources for this
327+
# container. This setting is independent from the startup boost mode.
328+
# CPU startup boost changes will still be applied.
329+
mode: "Off"
268330
startupBoost:
269331
cpu:
270-
factor: 2.0
332+
type: "Quantity"
333+
value: "2"
271334
```
272335

273-
### CPU Boost and Vanilla VPA
336+
#### Startup CPU Boost Disabled & VPA Enabled
337+
338+
All containers under `example` deployment will receive "regular" VPA updates
339+
and be CPU boosted/unboosted, except for `disable-cpu-boost-for-this-container`.
340+
It has a `containerPolicy` `startupBoost` overriding the global VPA config that
341+
sets the boost factor to 1.
342+
343+
```yaml
344+
apiVersion: "autoscaling.k8s.io/v1"
345+
kind: VerticalPodAutoscaler
346+
metadata:
347+
name: example-vpa
348+
spec:
349+
targetRef:
350+
apiVersion: "apps/v1"
351+
kind: Deployment
352+
name: example
353+
startupBoost:
354+
cpu:
355+
value: 2.0
356+
resourcePolicy:
357+
containerPolicies:
358+
- containerName: "disable-cpu-boost-for-this-container"
359+
startupBoost:
360+
value: 1.0
361+
```
362+
363+
#### Startup CPU Boost Enabled & VPA Enabled
274364

275365
All containers under `example` deployment will receive "regular" VPA updates,
276366
**including** `boosted-container-name`. Additionally, `boosted-container-name`
277367
will be CPU boosted/unboosted, because it has a `StartupBoost` config in its
278-
container policy and `Auto` container policy mode.
368+
container policy.
279369

280370
```yaml
281371
apiVersion: "autoscaling.k8s.io/v1"
@@ -287,13 +377,9 @@ spec:
287377
apiVersion: "apps/v1"
288378
kind: Deployment
289379
name: example
290-
updatePolicy:
291-
# VPA Update mode must be InPlaceOrRecreate
292-
updateMode: "InPlaceOrRecreate"
293380
resourcePolicy:
294381
containerPolicies:
295382
- containerName: "boosted-container-name"
296-
mode: "Auto" # Vanilla VPA mode + Startup Boost
297383
minAllowed:
298384
cpu: "250m"
299385
memory: "100Mi"
@@ -303,10 +389,14 @@ spec:
303389
# The CPU boosted resources can go beyond maxAllowed.
304390
startupBoost:
305391
cpu:
306-
value: 4
392+
type: "Quantity"
393+
value: "4"
307394
```
308395

309396
## Implementation History
310397

398+
* 2025-06-23: Decouple Startup CPU Boost from InPlaceOrRecreate mode, allow
399+
users to specify a `startupBoost` config in `VerticalPodAutoscalerSpec` and in
400+
`ContainerPolicies` to make the API simpler and add more yaml examples.
311401
* 2025-03-20: Initial version.
312402

0 commit comments

Comments
 (0)