Skip to content

How to split the num_layers unevenly when using pipeline parallelism? #381

@aoyulong

Description

@aoyulong

For now, Megatron-LM requires the num_layers must be divisible by pipeline_model_parallel_size:
assert args.num_layers % args.transformer_pipeline_model_parallel_size == 0, \ 'num_layers must be divisible by transformer_pipeline_model_parallel_size'
How to split the num_layers unevenly? For example, given num_layers=7 and pipeline_model_parallel_size=2, the results will be 3 num_layers for stage 0 and 4 num_layers for stage 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions