Skip to content

[v2][Bugfix] Applying K8s manifests to GKE clusters via URL #4352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

shubpal07
Copy link
Contributor

Fix: Correctly apply Kubernetes manifests from remote URLs

NOTE: We had re-raise this PR due to a regression found in the original PR #4292 . The root cause was identified:

Some of the integration tests are likely passing manifest objects that do not have a 
source attribute at all (for example, a manifest that only uses the content attribute).
In these cases, manifest.source is null. The startswith() function cannot accept a null 
value; it requires a string. This causes Terraform to stop with an error.

This PR resolves a bug that caused Terraform to fail with an Invalid count argument error when applying a Kubernetes manifest from a remote URL. The issue occurred when the kubectl-apply module had a dependency on another resource i.e. GKE cluster, being created in the same run.

The root cause was a Terraform plan/apply lifecycle conflict. The module was trying to fetch the URL content after the cluster was created (in the apply phase), but other parts of the plan depended on that content before the apply could start.

This fix refactors the kubectl-apply module with the following strategy:

  • The parent "controller" module now fetches all URL-based manifest content during the terraform plan phase.
  • It then passes the raw text content not the URL to the submodule responsible for the apply.
  • The submodule has been simplified to no longer handle URLs, completely resolving the timing conflict.

This change allows for the robust and predictable deployment of URL-based manifests.

How to Test and Verify

  1. Configure a Blueprint: Use a blueprint that calls the workload-manager-install module (modules/management/kubectl-apply).

  2. Use a URL Source: In the settings, disable the internal Kueue installation and provide the Kueue manifest via a URL in the apply_manifests list. This configuration specifically targets the fixed logic.

YAML

# Example blueprint settings:
settings:
 # Ensure internal install is disabled to avoid conflicts
 # kueue:
 #   install: true

 apply_manifests:
 - source: "https://raw.githubusercontent.com/GoogleCloudPlatform/cluster-toolkit/refs/heads/develop/modules/management/kubectl-apply/manifests/kueue-v0.11.4.yaml"
   # Flags needed for large manifests and pre-existing resources
   server_side_apply: true
   force_conflicts: true 

  1. Deploy: Run the deployment command (e.g., ./gcluster deploy ...).

  2. Expected Result:
    The terraform plan and terraform apply phases should complete successfully without any "Invalid count" or other planning errors.

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

shubpal07 added 2 commits July 3, 2025 11:15
Revert "Merge pull request GoogleCloudPlatform#4350 from GoogleCloudPlatform/revert-4292-shubham/bugs/apply-manifest-from-url"

This reverts commit 72f98ee, reversing
changes made to 857aacd.
@shubpal07 shubpal07 self-assigned this Jul 3, 2025
@shubpal07 shubpal07 requested review from samskillman and a team as code owners July 3, 2025 11:30
@shubpal07 shubpal07 added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Jul 3, 2025
@shubpal07
Copy link
Contributor Author

Ran the failing integration tests:

  1. PR-test-gke-managed-parallelstore [No longer present] Deleted as a part of deprecating parallelstore
  2. PR-test-gke-managed-hyperdisk (Passed)

http = {
source = "hashicorp/http"
version = "~> 3.0"
}
}
Copy link
Contributor

@SwarnaBharathiMantena SwarnaBharathiMantena Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it was removed earlier was to overcome a consistent warning.
Please review this PR: #4190

The usage of the provider is not the same. So, can go ahead with the PR as long as the warning is not repeating at the end of blueprint deployment.

Can you please double-check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-module-improvements Added to release notes under the "Module Improvements" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants