Skip to content

[QUESTION]How to calculate MFU based on the flops? #1565

@Lynnzake

Description

@Lynnzake

when I train Qwen2.5-32B with Megatron, I found the throughput was 420 or so using H200x2 to train, the partion
were tp=4,pp=2, so according to the mfu calculation, the util of GPU is 420/1979, that is very small, why is that? Was the logit of num_floating_point_operations training.py wrong? What's the approximate number when you guys train such model?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions