GPU Memory changes by TP/PP and recompute-activations when the GPU-NUM is stable #1486
Unanswered
Listen-WLS
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As mentioned in the title, when the number of Gpus remains the same, I change the size of TP/PP and get the following GPU memory usage. After I read the article "https://arxiv.org/pdf/2205.05198.pdf", I think the GPU memory footprint mainly includes the following several parts:
(1) Parameter occupation
(2) Optimizer occupation
(3) Gradient occupation
(4) Activation value occupation
I used 8 Gpus to train the Llama3-8B model. When batchsize=1, the GPU memory occupancy can be expressed as:
I have done many sets of experiments to prove the corresponding situation between this GPU memory formula and the truth GPU memory occupation, but the real GPU memory occupation will gradually increase with the change of PP. I would like to know which of the above four GPU memory occupation is affected by the growth of PP.
Beta Was this translation helpful? Give feedback.
All reactions