Skip to content

Add pre-commit To Prevent Usage of session.query #47275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Prab-27
Copy link
Contributor

@Prab-27 Prab-27 commented Mar 2, 2025

closes : #45461
Introduce a pre-commit hook to prevent the use of session.query in the core Airflow code.
This is limited to the source code and excludes the tests/ package.
Removed session.query() and updated a new style


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@Prab-27
Copy link
Contributor Author

Prab-27 commented Mar 2, 2025

I will raise a PR soon for removing session.query() from airflow-core and task-sdk as we discussed in #45714

@Prab-27 Prab-27 force-pushed the add-pre-commit-to-prevent-usage-session.query branch from 5d4d4dc to b6775a8 Compare March 2, 2025 14:07
language: python
additional_dependencies: ['rich>=12.4.4']
files: ^airflow.*\.py$|^task_sdk.*\.py
exclude: ^tests/.*\.py$|^task_sdk/tests/.*\.py$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exclude: ^tests/.*\.py$|^task_sdk/tests/.*\.py$

Since we have given it specific files to check, do we still need to exclude others?

Copy link
Contributor Author

@Prab-27 Prab-27 Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya you are right ! I'll remove this line

@ephraimbuddy
Copy link
Contributor

I will raise a PR soon for removing session.query() from airflow-core and task-sdk as we discussed in #45714

You can do it in this PR. This PR should raise and point to where there are usages of session.query

f"\nSQLAlchemy 2.0 deprecates the `Query` object"
f"use the `select()` construct instead."
)
errors += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! We don't need this integer since occurrence isn't important in return and response text. We could make a return bool and the main can still return one similar to below. This can also do an early exit and reduce scans since a single occurrence should trigger the hook. What do you think?

return 1 if check_session_query(ast_module) else 0

Copy link
Contributor Author

@Prab-27 Prab-27 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya ! you are right !!
I'll update it soon

@Prab-27 Prab-27 force-pushed the add-pre-commit-to-prevent-usage-session.query branch from e9bcf8e to 1c65940 Compare March 14, 2025 14:05

for xcom in query:
for xcom in session.execute(query).all():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this works fine for session.delete(xcom).

Could you please clarify when you get some free time?

@Prab-27 Prab-27 force-pushed the add-pre-commit-to-prevent-usage-session.query branch from 1c65940 to 98be988 Compare March 18, 2025 15:55
@Prab-27
Copy link
Contributor Author

Prab-27 commented Mar 18, 2025

I have been trying to update the code but couldn't
query = session.query(base_table).with_entities(text(f"{base_table_alias}.*")) from here

could you please help me upgrade this ?

Copy link

github-actions bot commented May 3, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label May 3, 2025
@Prab-27 Prab-27 force-pushed the add-pre-commit-to-prevent-usage-session.query branch from 98be988 to 376a2dc Compare May 6, 2025 14:28
@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label May 7, 2025
@Prab-27 Prab-27 force-pushed the add-pre-commit-to-prevent-usage-session.query branch from 376a2dc to 7f7cc6c Compare May 15, 2025 10:23
@Prab-27 Prab-27 requested a review from kaxil as a code owner May 15, 2025 10:23
@@ -270,7 +270,7 @@ def set_xcom(
if not run_id:
raise HTTPException(status.HTTP_404_NOT_FOUND, f"Run with ID: `{run_id}` was not found")

dag_run_id = session.query(DagRun.id).filter_by(dag_id=dag_id, run_id=run_id).scalar()
dag_run_id = session.execute(DagRun.id).where(dag_id=dag_id, run_id=run_id).scalar()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dag_run_id = session.execute(DagRun.id).where(dag_id=dag_id, run_id=run_id).scalar()
dag_run_id = session.scalar(select(DagRun.id).where(dag_id=dag_id, run_id=run_id))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

active_bundle_names = set(self._bundle_config.keys())
for name in active_bundle_names:
stored = {b.name: b for b in session.scalars(select(DagBundleModel)).all()}
for name in self._bundle_config.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are omitting the active_bundle_names filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a mistake. Rebased again and resolved

updated_rows = session.execute(
update(DagVersion)
.where(DagVersion.bundle_name.in_(inactive_bundle_names))
.values({DagVersion.bundle_name: new_bundle_name})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.values({DagVersion.bundle_name: new_bundle_name})
.values(bundle_name=new_bundle_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been updated so I rebased and resolved it

"filename": filename,
"bundle_name": bundle_name,
"timestamp": utcnow(),
"stacktrace": stacktrace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values accept keyword args not dictionary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been updated, so I rebased and resolved it

@Prab-27 Prab-27 force-pushed the add-pre-commit-to-prevent-usage-session.query branch from 7f7cc6c to e37f259 Compare June 22, 2025 11:47
@uranusjr
Copy link
Member

Instead of trying to fix everything in this PR, it might be easier to submit smaller PRs fixing things one by one, and rebase this repeatedly after each PR is merged until CI is green.

@Prab-27
Copy link
Contributor Author

Prab-27 commented Jun 23, 2025

Instead of trying to fix everything in this PR, it might be easier to submit smaller PRs fixing things one by one, and rebase this repeatedly after each PR is merged until CI is green.

Thanks for the suggestion—I'll proceed with this plan.

@potiuk
Copy link
Member

potiuk commented Jun 23, 2025

Instead of trying to fix everything in this PR, it might be easier to submit smaller PRs fixing things one by one, and rebase this repeatedly after each PR is merged until CI is green.

Yep - it could also be done similar to the #52020

  • find all files/folders that need fixing
  • convert them to a list
  • implement a mechanism (.pre-comit files selection) that includes certain folders in .pre-commit check
  • ask for help (and do yourself some of that) with PRs that incrementally fix things and add stuff to be checked by .pre-commit so that no session is added to already "cleaned" folders
  • when completed, switch it to check "everywhere".

@Prab-27
Copy link
Contributor Author

Prab-27 commented Jun 23, 2025

Instead of trying to fix everything in this PR, it might be easier to submit smaller PRs fixing things one by one, and rebase this repeatedly after each PR is merged until CI is green.

Yep - it could also be done similar to the #52020

  • find all files/folders that need fixing
  • convert them to a list
  • implement a mechanism (.pre-comit files selection) that includes certain folders in .pre-commit check
  • ask for help (and do yourself some of that) with PRs that incrementally fix things and add stuff to be checked by .pre-commit so that no session is added to already "cleaned" folders
  • when completed, switch it to check "everywhere".

Appreciate your approach to this; I'm jumping in headfirst

@Prab-27
Copy link
Contributor Author

Prab-27 commented Jun 23, 2025

I've created a list as you suggested

Total changes needed files : 116
Including -

airflow-core - 10 files
airflow-core tests - 49 files
dev - 1 files
devel-common - 6 files
Providers - including tests - 50 files

My Plan :
Single file checks for all - 66 files
Providers - chnages as per module which has session.query - including tests and Providers both

@potiuk Would love to hear your thoughts on this.

@potiuk
Copy link
Member

potiuk commented Jun 23, 2025

@potiuk Would love to hear your thoughts on this.

Sure. Also you might split per module in "core". I think what works best is if you do one or two yourself - some representative examples of PRs - and provide instructions to others based on thsose PRs. That avoids any ambiguities with description and speeds up people ramping up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add pre-commit to prevent usage of session.query in Airflow Core
5 participants