-
Notifications
You must be signed in to change notification settings - Fork 3k
NVIDIA Megatron-LM Discussions
Sort by:
Latest activity
Categories, most helpful, and community links
Categories
Community links
Discussions
-
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Why is expert parallelism not supported during fp16 training?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] Does Megatron-Core supports LLAMA models?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How to pre-build the dataset's index ?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 Incorrect shuffling of documents across epochs in GPTDataset
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION]why f and g must conjucates each other?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] Why take too much time to sync up barrier information between ranks
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] In RotaryEmbedding, the datatype of inv_freq and the corresponding sin/cos computations should be maintained as torch.float32?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] why the time of one iter in nsys longer than that in the ouput log?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] What is the difference between with/without mcore model in pretrain_gpt.py?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 Does Megatron has plan to support Gemma?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] What is the retrieval datasets when evaluating downstream tasks?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] Megatron-LM installation with CUDA 11.6
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 💡 [ENHANCEMENT] Do you have a plan that supports Mixtral 8x7B?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION]Why forward_backward_pipelining_without_interleaving cannot open config.overlap_p2p_comm?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How to release the model and optimizer memory manually?
staleNo activity in 60 days on issue or PR