Skip to content

Commit 0c6c117

Browse files
committed
Merge branch 'hongbinl/support_pp_1' into 'main'
feat(MoE): Support Expert Parallel A2A Overlapping - (02) Support EP A2A overlap at PP=1 See merge request ADLR/megatron-lm!3470
2 parents af962d8 + ae1c882 commit 0c6c117

File tree

26 files changed

+2107
-164
lines changed

26 files changed

+2107
-164
lines changed

megatron/core/model_parallel_config.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,14 @@ class ModelParallelConfig:
237237
Set the bootstrapping backend out of 'nccl', 'mpi', and 'gloo'
238238
"""
239239

240+
overlap_moe_expert_parallel_comm: bool = False
241+
"""Overlap EP A2A communications with independent computations of different micro-batches
242+
in 1f1b phase of pipelining or non-pipelining schedule.
243+
"""
244+
245+
delay_wgrad_compute: bool = False
246+
"""Delay the weight gradient computation to improve batch-level communication overlapping"""
247+
240248
###################
241249
# Pipeline Parallel
242250
###################
@@ -307,9 +315,6 @@ class ModelParallelConfig:
307315
rank 1 | 0 1 2 0 1 2 3 4 3 4
308316
"""
309317

310-
delay_wgrad_compute: bool = False
311-
"""If true, delay the wgrad compute for better overlapping in combined 1F1B."""
312-
313318
###################
314319
# CPU Offloading
315320
###################

0 commit comments

Comments
 (0)