Skip to content

Memory budget strategy for activation checkpointing #297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 8, 2025

Conversation

tyler-romero
Copy link
Contributor

See https://pytorch.org/blog/activation-checkpointing-techniques/ for more details, but essentially this is an easy way to try to enable selective activation checkpointing without fiddling with a bunch of different options to try to make it fast but stay within your GPU memory allowance.

image

We observe a 50% memory reduction by recomputing only pointwise ops, with a steady drop-off as you recompute more and more of your matmuls. Attention is the most expensive, so you tend to want to recompute those last.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@tyler-romero
Copy link
Contributor Author

Olmo2 on 4 B100s w/ ac budget = 0.5

system/GPU active mem (%)=41.62
2025-07-03T19:33:13.452060977Z     system/GPU active mem (GiB)=74.24
2025-07-03T19:33:13.452062681Z     system/GPU reserved mem (%)=45.29
2025-07-03T19:33:13.452063875Z     system/GPU reserved mem (GiB)=80.79
2025-07-03T19:33:13.452064945Z     throughput/device/BPS=0.0173
2025-07-03T19:33:13.452066249Z     throughput/device/BPS (actual avg)=0.0173
2025-07-03T19:33:13.452067346Z     throughput/device/TPS=18,168
2025-07-03T19:33:13.452068408Z     throughput/device/TPS (actual avg)=18,122

@tyler-romero
Copy link
Contributor Author

Now with ac budget = 0.2

system/GPU active mem (%)=29.57
2025-07-03T19:32:59.859Z     system/GPU active mem (GiB)=52.74
2025-07-03T19:32:59.859Z     system/GPU reserved mem (%)=34.76
2025-07-03T19:32:59.859Z     system/GPU reserved mem (GiB)=61.99
2025-07-03T19:32:59.859Z     throughput/device/BPS=0.0162
2025-07-03T19:32:59.859Z     throughput/device/BPS (actual avg)=0.0162
2025-07-03T19:32:59.859Z     throughput/device/TPS=16,974
2025-07-03T19:32:59.859Z     throughput/device/TPS (actual avg)=16,974

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@tyler-romero tyler-romero marked this pull request as ready for review July 3, 2025 19:43

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@tyler-romero tyler-romero enabled auto-merge (squash) July 8, 2025 20:06

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@tyler-romero tyler-romero merged commit 992a79e into main Jul 8, 2025
15 checks passed
@tyler-romero tyler-romero deleted the tyler/budget-ac branch July 8, 2025 22:05
TianhuaTao pushed a commit that referenced this pull request Jul 10, 2025

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
See https://pytorch.org/blog/activation-checkpointing-techniques/ for
more details, but essentially this is an easy way to try to enable
selective activation checkpointing without fiddling with a bunch of
different options to try to make it fast but stay within your GPU memory
allowance.


![image](https://github.com/user-attachments/assets/5e17af03-aa43-489e-b30e-471ee3025c7e)

> We observe a 50% memory reduction by recomputing only pointwise ops,
with a steady drop-off as you recompute more and more of your matmuls.
Attention is the most expensive, so you tend to want to recompute those
last.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants