Skip to content

gpt-oss implementation #1739

@sbhavani

Description

@sbhavani

Description

This outlines the current status of gpt-oss features that need to be implemented in Megatron Core, leveraging Transformer Engine (TE).

Core functionality has been implemented and validated through convergence testing in the chcui/gpt_oss branch of Megatron-LM. Future efforts will focus on performance optimization and integration into Megatron Core.

Note: MoE (Mixture of Experts) features are already fully supported - see Megatron Core MoE Roadmap for comprehensive MoE feature support.

MoE Layer

Enabled Bias

Attention Mechanisms

Alternating Sliding-Window Attention Pattern

  • Status:Supported - Infrastructure exists for per-layer patterns and sliding window attention using TE

Attention Sinks

  • Status: Work in Progress - in Transformer Engine and cuDNN
  • Reference: Streaming LLM
  • Related Transformer Engine PR: TBD

Activation Functions

Custom SwiGLU with Clamping

Positional Encodings

YaRN RoPE Scaling

  • Megatron Core Implementation
    • YaRN scaling to 128k context
    • Integration with existing RoPE
    • YaRN for general RoPE/GPT models
    • Convergence validation
    • Performance optimization for extended sequences
  • Megatron-LM Branch: https://github.com/NVIDIA/Megatron-LM/tree/chcui/gpt_oss
  • Reference: arXiv:2309.00071
  • Status: Work in Progress - YaRN implemented for MLA only in Megatron Core. General RoPE/GPT support available in the POC branch.

Credits: @cuichenx

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions