NVIDIA Megatron-LM · Discussions · GitHub

Sort by: Latest activity

Discussions

You must be logged in to vote

function missing

ywb2018 asked Jul 8, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=2, worker_count=1, timeout=0:10:00)
stale No activity in 60 days on issue or PR
JanryPei asked Apr 16, 2024 in Q&A · Unanswered

3
You must be logged in to vote

[QUESTION] Why is expert parallelism not supported during fp16 training?
stale No activity in 60 days on issue or PR
yutian-mt asked May 7, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] Does Megatron-Core supports LLAMA models?
stale No activity in 60 days on issue or PR
noob-ctrl asked May 3, 2024 in Q&A · Unanswered

6
You must be logged in to vote

[QUESTION] How to pre-build the dataset's index ?
stale No activity in 60 days on issue or PR
etiennemlb asked Apr 24, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[BUG]Question about helpers.cpp in version core_v0.7.0

longzhang418 asked Jun 28, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] Getting tools/preprocess_data.py to work is painful

sambar1729 asked Jun 26, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain?
stale No activity in 60 days on issue or PR
REIGN12 asked Apr 9, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] Gloo connectFullMesh failed when the number of nodes setting "export GLOO_SOCKET_IFNAME=bond4" exceeds 60

Genlovy-Hoo asked Jun 19, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] How to time the code

Weifan1226 asked Jun 16, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] Using segformer segmentation models

cporrasn asked Jun 14, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] why the _p2p_ops functions has the condition branches for get_pipeline_model_parallel_rank()

lichenlu asked Jun 14, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
stale No activity in 60 days on issue or PR
starkhu asked Apr 9, 2024 in Q&A · Unanswered

1
You must be logged in to vote

Incorrect shuffling of documents across epochs in GPTDataset
stale No activity in 60 days on issue or PR
argitrage asked Feb 20, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION]why f and g must conjucates each other?
stale No activity in 60 days on issue or PR
bescks asked Mar 9, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] Why take too much time to sync up barrier information between ranks
stale No activity in 60 days on issue or PR
yanminjia asked Mar 20, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] In RotaryEmbedding, the datatype of inv_freq and the corresponding sin/cos computations should be maintained as torch.float32?
stale No activity in 60 days on issue or PR
rchardx asked Mar 21, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] why the time of one iter in nsys longer than that in the ouput log?
stale No activity in 60 days on issue or PR
hanwen-sun asked Mar 14, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] What is the difference between with/without mcore model in pretrain_gpt.py?
stale No activity in 60 days on issue or PR
TING2938 asked Feb 22, 2024 in Q&A · Unanswered

2
You must be logged in to vote

Does Megatron has plan to support Gemma？
stale No activity in 60 days on issue or PR
anlongfei asked Feb 26, 2024 in Q&A · Unanswered

3
You must be logged in to vote

[QUESTION] What is the retrieval datasets when evaluating downstream tasks?
stale No activity in 60 days on issue or PR
ZihaoLin0123 asked Feb 27, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] Megatron-LM installation with CUDA 11.6
stale No activity in 60 days on issue or PR
ghtaro asked Feb 22, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[ENHANCEMENT] Do you have a plan that supports Mixtral 8x7B?
stale No activity in 60 days on issue or PR
matrixssy started Jan 4, 2024 in Ideas

7
You must be logged in to vote

[QUESTION]Why forward_backward_pipelining_without_interleaving cannot open config.overlap_p2p_comm?
stale No activity in 60 days on issue or PR
zhouyiyuan-mt asked Feb 4, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] How to release the model and optimizer memory manually?
stale No activity in 60 days on issue or PR
robotsp asked Jan 15, 2024 in Q&A · Unanswered

7