More Rope Scaling Implementations (PI, Yarn) #330

tyler-romero · 2025-07-28T23:51:43Z

Implements

Position Interpolation
Stepwise (llama 3.1) scaling
Yarn

abertsch72

thanks so much for standardizing this! were you also going to add a way to turn on/off sliding-window-layer scaling? this seems more important now that it seems like some of our sliding window scaling runs actually outperform global-only scaling

abertsch72 · 2025-07-29T00:04:37Z

src/olmo_core/nn/rope.py

+    """
+
+    def __post_init__(self):
+        if self.attention_rescale_factor < 1.0:


do we have to restrict attention rescaling factor to be >1?

abertsch72 · 2025-07-29T00:05:53Z

src/olmo_core/nn/rope.py

+
+
+@dataclass
+class PerFrequencyRoPEScalingConfig(RoPEScalingConfig):


nit, but the name suggests to me that you get more finegrained control than high/low...

Yeah good point, I'll switch to calling it stepwise

abertsch72 · 2025-07-29T00:11:44Z

src/olmo_core/nn/rope.py

+    Denominator that determines the *high-frequency* wavelength cut-off
+    (a smaller value keeps more of the very short wavelengths untouched).
+    """
+


it would be nice if these params used the same name / interpretation as the YaRN params that control the same thing... minimally, I'd vote for calling them something other than factor because we also have factor above that does a different thing

(I know these names are probably holdovers from the methods, though-- if this is how huggingface does the two methods then maybe it's better to be consistent with them rather than internally consistent)

dirkgr

The way you preserve backwards compatibility here is by having the base class do basic RoPE extension like the RoPE paper says, and have subclasses that do the fancier stuff?

tyler-romero · 2025-07-29T03:06:00Z

PerFrequencyRoPEScalingConfig is the same as the old rope scaling config. I think its wrong to think of that as the base case, and there are 0 dependencies on that in main, so I'm not sure its worth distorting the factorization of the code for.

tyler-romero added 2 commits July 28, 2025 16:34

Rope Scaling Implementations (PI, Yarn)

85f9a98

passing tests

7c8b8c1

tyler-romero requested review from epwalsh, dirkgr, abertsch72 and soldni July 28, 2025 23:55

small clean

469e5f4

abertsch72 reviewed Jul 29, 2025

View reviewed changes

dirkgr reviewed Jul 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More Rope Scaling Implementations (PI, Yarn) #330

More Rope Scaling Implementations (PI, Yarn) #330

Uh oh!

tyler-romero commented Jul 28, 2025 •

edited

Loading

Uh oh!

abertsch72 left a comment

Uh oh!

abertsch72 Jul 29, 2025

Uh oh!

abertsch72 Jul 29, 2025

Uh oh!

tyler-romero Jul 29, 2025

Uh oh!

abertsch72 Jul 29, 2025

Uh oh!

dirkgr left a comment

Uh oh!

tyler-romero commented Jul 29, 2025

Uh oh!

Uh oh!



		@dataclass
		class PerFrequencyRoPEScalingConfig(RoPEScalingConfig):

More Rope Scaling Implementations (PI, Yarn) #330

Are you sure you want to change the base?

More Rope Scaling Implementations (PI, Yarn) #330

Uh oh!

Conversation

tyler-romero commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abertsch72 left a comment

Choose a reason for hiding this comment

Uh oh!

abertsch72 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

abertsch72 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

tyler-romero Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

abertsch72 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

dirkgr left a comment

Choose a reason for hiding this comment

Uh oh!

tyler-romero commented Jul 29, 2025

Uh oh!

Uh oh!

tyler-romero commented Jul 28, 2025 •

edited

Loading