You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when I train Qwen2.5-32B with Megatron, I found the throughput was 420 or so using H200x2 to train, the partion
were tp=4,pp=2, so according to the mfu calculation, the util of GPU is 420/1979, that is very small, why is that? Was the logit of num_floating_point_operations training.py wrong? What's the approximate number when you guys train such model?