Skip to content

VMExtensionProvisioningError with capi-ubun2-2204 and capi-ubun2-2404 #5372

Open
@thiDucTran

Description

@thiDucTran

/kind bug

What steps did you take and what happened:

  1. clusterctl generate cluster capz-private --infrastructure azure --kubernetes-version v1.31.4 --control-plane-machine-count=1 --worker-machine-count=1
  2. Then I add on the below to the generated yaml file
      image:
        computeGallery:
          gallery: "ClusterAPI-f72ceb4f-5159-4c26-a0fe-2ea738f0d019"
          name: "capi-ubun2-2404"
          version: "1.31.4"
networkSpec:
    privateDNSZoneName: "capz-private.capz.io"
    apiServerLB:
      type: Internal
      frontendIPs:
      - name: lb-private-ip-frontend
        privateIP: 172.19.2.100
    controlPlaneOutboundLB:
      frontendIPsCount: 1
    nodeOutboundLB:
      frontendIPsCount: 1
    subnets:
    - name: control-plane-subnet
      role: control-plane
      cidrBlocks:
      - 172.19.2.0/25
    - name: node-subnet
      role: node
      cidrBlocks:
        - 172.19.2.128/25
    vnet:
      name: vnet-capz-private
      cidrBlocks:
      - 172.19.2.0/24
      peerings:
      - resourceGroup: rg-name
        remoteVnetName: vnet-capi-mgmt-stage
  1. The cluster is Provisioned but: I see 2 errors (below), KubeAdmControlPlane not initialized
E0116 16:26:38.836076       1 cluster_accessor.go:257] "Connect failed" err="error creating HTTP client and mapper: cluster is not reachable: Get \"https://apiserver.capz-private.capz.io:6443/?timeout=5s\": context deadline exceeded" controller="clustercache" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="default/capz-private" namespace="default" name="capz-private" reconcileID="4ec298dc-f392-4809-9426-d35cf8bfa55d"         
E0116 16:31:31.645375       1 controller.go:324] "Reconciler error" err=<failed to reconcile AzureMachine: failed to reconcile AzureMachine service vmextensions: extension state failed. This likely means the Kubernetes node bootstrapping process failed or timed out. Check VM boot diagnostics logs to learn more: failed to create or update resource thi-poc-capz-private/CAPZ.Linux.Bootstrapping (service: vmextensions): GET https://management.azure.com/subscriptions/9d54088e-89a6-4a3a-8f91-05699ba3c93a/providers/Microsoft.Compute/locations/northcentralus/operations 6921441d-f760-44e3-8db1-b7aa18c05a5e --------------------------------------------------------------------------------                                                            RESPONSE 200: 200 OK
ERROR CODE: VMExtensionProvisioningError  --------------------------------------------------------------------------------                                                                                                                                                                                                 "code": "VMExtensionProvisioningError",
"message": "VM has reported a failure when processing extension 'CAPZ.Linux.Bootstrapping' (publisher 'Microsoft.Azure.ContainerUpstream' and type 'CAPZ.Linux.Bootstrapping'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=1\n[stdout]\n\n[stderr]\n'. More information on troubleshooting is available at https://aka.ms/vmextensionlinuxtroubleshoot.

What did you expect to happen:

  1. KubeAdmControlPlane gets initialized, CAPZ.Linux.Bootstrapping executes successfully

Anything else you would like to add:

  • The logs mentioned in https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/features-linux?tabs=azure-cli#troubleshoot-vm-extensions does not give any additional info..at least to me it doesnt..i can share it if requested
  • VNET, Subnets, and Network security groups do not pre-exist..CAPI creates them
  • Even though the bootstrapping does not execute successfully, I can do clusterctl get kubeconfig and do kubectl get po to prove that the API server is functioning and there is not network connectivity issue
  • Compared to the cluster that I don't specify the API SERVER LB as type: Internal, I noticed that kube-proxy is missing..may be that is related to the boostrapping
  • Using the same node image, I can create and have a functioning cluster if I don't specify the API SERVER LB as type: Internal

Environment:

  • cluster-api-provider-azure version
    image
  • Kubernetes version: (use kubectl version): v1.30.6
  • OS (e.g. from /etc/os-release): 20.04.6 LTS (Focal Fossa)
  • clusterctl: v1.9.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions