-
Notifications
You must be signed in to change notification settings - Fork 4.2k
WIP: Update cluster-api provider to use machineTemplate.status.nodeInfo for architecture-aware autoscale from zero #8345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dd56e8b
to
13dea52
Compare
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go
Outdated
Show resolved
Hide resolved
8ec180c
to
c3a296a
Compare
c3a296a
to
98c4898
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aleskandro The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
21705b1
to
9f183df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on balance this is making sense, i think we need to clean up a couple things.
reducing the number of api server calls is important for the capi provider as we tend to be very noisy from an HTTP perspective.
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go
Outdated
Show resolved
Hide resolved
// InstanceSystemInfo sets the nodeSystemInfo from the infrastructure reference resource. | ||
// If the infrastructure reference resource is not found, returns nil. | ||
func (r unstructuredScalableResource) InstanceSystemInfo() *apiv1.NodeSystemInfo { | ||
infraObj, err := r.readInfrastructureReferenceResource() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something i am concerned about is the notion that we will be calling this function twice for every node template we create. each call to this function results in another call to the api server to get the infrastructure machine template. i think we need to find a way to combine the call required in this function with the call we make in the InstanceCapacity
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep having two separate functions and avoid coupling, I've made a change to the readInfrastructureReferenceResource to cache the infraObj via lazy loading. Wdyt?
cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured_test.go
Show resolved
Hide resolved
9f183df
to
9d1c610
Compare
…r architecture-aware autoscale from zero kubernetes-sigs/cluster-api#11962 introduced the nodeInfo field for MachineTemplates. Providers can reconcile this field in the status subresource to inform the autoscaler about the architecture and operating system that the MachineTemplate's nodes will run. Previously, we have been implementing this behavior in the cluster autoscaler by leveraging the labels capacity annotation and, as a fallback, default values set in environment variables at cluster-autoscaler deployment time. With this commit, the cluster autoscaler computes the future architecture of a node with the following priority order: - Labels set in existing nodes for not-autoscale-from-zero cases - Labels set in the labels capacity annotation of machine template, machine set, and machine deployment. - Values in the status.nodeSystemInfo of MachineTemplates - Generic/default labels set in the environment of the cluster autoscaler
9d1c610
to
7e356f2
Compare
…eInfo for architecture-aware autoscale from zero kubernetes-sigs/cluster-api#11962 introduced the nodeInfo field for MachineTemplates. Providers can reconcile this field in the status subresource to inform the autoscaler about the architecture and operating system that the MachineTemplate's nodes will run. Previously, we have been implementing this behavior in the cluster autoscaler by leveraging the labels capacity annotation and, as a fallback, default values set in environment variables at cluster-autoscaler deployment time. With this commit, the cluster autoscaler computes the future architecture of a node with the following priority order: - Labels set in existing nodes for not-autoscale-from-zero cases - Labels set in the labels capacity annotation of machine template, machine set, and machine deployment. - Values in the status.nodeSystemInfo of MachineTemplates - Generic/default labels set in the environment of the cluster autoscaler
7e356f2
to
8b78773
Compare
What type of PR is this?
/kind feature
What this PR does / why we need it:
kubernetes-sigs/cluster-api#11962 introduced the nodeInfo field for MachineTemplates. Providers can reconcile this field in the status subresource to inform the autoscaler about the architecture and operating system that the MachineTemplate's nodes will run.
Previously, we have been implementing this behavior in the cluster autoscaler by leveraging the labels capacity annotation and, as a fallback, default values set in environment variables at cluster-autoscaler deployment time.
With this commit, the cluster autoscaler computes the future architecture of a node with the following priority order:
Which issue(s) this PR fixes:
Fixes #8347
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: