Skip to content

Add run_on_latest_version support for backfill and clear operations #52177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Jun 24, 2025

With this option, users are able to choose the dag version they want
to run their dag/task after clearing or when running backfill. This
only applies to versioned bundles as non-versioned bundles run with
the latest dag version.

When the user choose the run with latest version, the bundle_version
associated with the dagrun is updated to the latest and the associated
serialized dag version updated to the latest. Choosing not to run
with latest version which is the default means that the bundle version
and serialized dag version that the dag ran with initially would be used
in running it again.

For backfill, there's now --run-on-latest-version flag that makes it run
with the latest version, otherwise it will run with the original bundle
the dagrun was created with. Note that it's only useful when rerunning
a dagrun using backfill. The default behaviour is using the initial bundle/version
and this is intentional otherwise running backfill will fail if there was
task rename in the latest version.

Summary of changes:

  • Use SchedulerDagBag instead of DagBag for execution API
  • Add run_on_latest_version field to DAGRunClearBody and ClearTaskInstancesBody models
  • Add --run-on-latest-version CLI flag for backfill command
  • Update backfill.py to support running tasks with latest DAG version
  • Add UI checkbox for "Run with latest version" in clear dialogs
  • Update SchedulerDagBag to handle latest version parameter
  • Update API endpoints to support run_on_latest_version parameter

closes: #49007, closes: #49047

@boring-cyborg boring-cyborg bot added area:airflow-ctl area:API Airflow's REST/HTTP API area:Scheduler including HA (high availability) scheduler area:translations area:UI Related to UI/UX. For Frontend Developers. translation:default labels Jun 24, 2025
@ephraimbuddy ephraimbuddy force-pushed the properly-clear-ti-specific-version branch 5 times, most recently from 1f4b2e3 to bfc2a49 Compare June 27, 2025 15:04
@ephraimbuddy ephraimbuddy changed the title Fix clearing dags of versioned bundles and allow running on previous version Add run_on_latest_version support for backfill and clear operations Jun 27, 2025
@ephraimbuddy
Copy link
Contributor Author

Screenshot 2025-06-27 at 16 28 35

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. There are some questions about the usage of SchedulerDagBag in core-api.

@bbovenzi
Copy link
Contributor

UI for dagrun clear and task instance clear look good. We'll add this to backfills next?

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design question: instead of --run-on-latest-version, what do you think of --bundle-version "latest" as default?

@ephraimbuddy
Copy link
Contributor Author

Design question: instead of --run-on-latest-version, what do you think of --bundle-version "latest" as default?

I think it makes more sense for backfill but for clearing tasks, we can still keep run on the latest version or do you suggest we should show a form for the user to fill the version they want to run after clearing?

@ephraimbuddy
Copy link
Contributor Author

Design question: instead of --run-on-latest-version, what do you think of --bundle-version "latest" as default?

I think it makes more sense for backfill but for clearing tasks, we can still keep run on the latest version or do you suggest we should show a form for the user to fill the version they want to run after clearing?

Also, by default, it runs with the version the DAG initially used, if it has run before. if we default --bundle-version with the latest, then most rerun backfill will fail if the bundle has changed

@kaxil
Copy link
Member

kaxil commented Jul 3, 2025

Design question: instead of --run-on-latest-version, what do you think of --bundle-version "latest" as default?

I think it makes more sense for backfill but for clearing tasks, we can still keep run on the latest version or do you suggest we should show a form for the user to fill the version they want to run after clearing?

Also, by default, it runs with the version the DAG initially used, if it has run before. if we default --bundle-version with the latest, then most rerun backfill will fail if the bundle has changed

aah ok

@kaxil
Copy link
Member

kaxil commented Jul 3, 2025

Will we ever have a situation where we have both --run-on-latest-version and --bundle-version -- this is the only thing to avoid or have it list as mutually exclusive or somrthign

@ephraimbuddy
Copy link
Contributor Author

Will we ever have a situation where we have both --run-on-latest-version and --bundle-version -- this is the only thing to avoid or have it list as mutually exclusive or somrthign

I think if we need it, then it should be mutually exclusive.

@ephraimbuddy
Copy link
Contributor Author

Will we ever have a situation where we have both --run-on-latest-version and --bundle-version -- this is the only thing to avoid or have it list as mutually exclusive or somrthign

I connected with Jed, and we agreed to mark the flag as experimental for now. This way, if we decide to introduce --bundle-version in the future, we can remove --run-on-latest-version from the CLI without breaking expectations.

@ephraimbuddy ephraimbuddy force-pushed the properly-clear-ti-specific-version branch from 9e4faa4 to 8713b05 Compare July 9, 2025 14:39
@kaxil
Copy link
Member

kaxil commented Jul 10, 2025

Will we ever have a situation where we have both --run-on-latest-version and --bundle-version -- this is the only thing to avoid or have it list as mutually exclusive or somrthign

I connected with Jed, and we agreed to mark the flag as experimental for now. This way, if we decide to introduce --bundle-version in the future, we can remove --run-on-latest-version from the CLI without breaking expectations.

Sounds good

@ephraimbuddy ephraimbuddy force-pushed the properly-clear-ti-specific-version branch from 8713b05 to 87aa44c Compare July 10, 2025 13:48
@ephraimbuddy ephraimbuddy requested a review from eladkal as a code owner July 10, 2025 13:48
Copy link
Contributor

@bugraoz93 bugraoz93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have one comment around UI-backend, not a blocker for the PR if it is needed. Thanks, Ephraim!

Found the change in migrations, which is already done

@ephraimbuddy ephraimbuddy force-pushed the properly-clear-ti-specific-version branch 2 times, most recently from 13f44b4 to b7ab41b Compare July 14, 2025 09:17
ephraimbuddy and others added 12 commits July 14, 2025 19:48
With this option, users are able to choose the dag version they want
to run their dag/task after clearing or when running backfill. This
only applies to versioned bundles as non-versioned bundles run with
the latest dag version.

When the user choose the run with latest version, the bundle_version
associated with the dagrun is updated to the latest and the associated
serialized dag version updated to the latest. Choosing not to run
with latest version which is the default means that the bundle version
and serialized dag version that the dag ran with initially would be used
in running it again.

For backfill, there's now --run-on-latest-version flag that makes it run
with the latest version, otherwise it will run with the original bundle
the dagrun was created with. Note that it's only useful when rerunning
a dagrun using backfill. The default behaviour is using the initial bundle/version
and this is intentional otherwise running backfill will fail if there was
task rename in the latest version.

Summary of changes:

- Use SchedulerDagBag instead of DagBag for execution API
- Add run_on_latest_version field to DAGRunClearBody and ClearTaskInstancesBody models
- Add --run-on-latest-version CLI flag for backfill command
- Update backfill.py to support running tasks with latest DAG version
- Add UI checkbox for "Run with latest version" in clear dialogs
- Update SchedulerDagBag to handle latest version parameter
- Update API endpoints to support run_on_latest_version parameter
@ephraimbuddy ephraimbuddy force-pushed the properly-clear-ti-specific-version branch from b7ab41b to 71ec6c1 Compare July 14, 2025 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:airflow-ctl area:API Airflow's REST/HTTP API area:Scheduler including HA (high availability) scheduler area:translations area:UI Related to UI/UX. For Frontend Developers. translation:default
Projects
None yet
7 participants