Bare bones `GenerationModule` #324

tyler-romero · 2025-07-15T21:14:33Z

Add a simple GenerationModule (analogous to TrainModule) that can be used to configure and run autoregressive next token prediction.

As of now this supports:
1. loading distributed checkpoints exactly as they were saved during training.
2. temperature parameter for generation.
3. using FSDP to shard larger models across multiple devices (mostly as a demonstration of how other types of parallelism can be worked in).
4. Attention masks passed through to SDPA so that batched generation w/ left-padding is supported.

Note that this implementation is very inefficient compared to transformers or vllm, in part due to the lack of kv caching.

tyler-romero · 2025-07-15T21:37:38Z

src/olmo_core/train/train_module/transformer/common.py

+    max_sequence_length: Optional[int] = None,
+    rank_microbatch_size: Optional[int] = None,


These actually are optional and if provided are used to warm up the RoPE cache (max_seq_len) and an MoE component (rank_microbatch_size).

…generation

epwalsh

This is a great start. I just have a few comments so far

epwalsh · 2025-07-17T20:14:47Z

src/olmo_core/generate/generation.py

Nit: maybe rename this generation_module.py to be more specific. Also consider moving the transformer implementation to its own submodule.

epwalsh · 2025-07-17T20:16:33Z

src/olmo_core/generate/generation.py

+from olmo_core.generate.config import GenerationConfig
+from olmo_core.generate.selection import temperature_sampling


Nit: relative imports are nice to have at least within the same submodule.

Suggested change

from olmo_core.generate.config import GenerationConfig

from olmo_core.generate.selection import temperature_sampling

from .config import GenerationConfig

from .selection import temperature_sampling

epwalsh · 2025-07-17T20:20:49Z

src/olmo_core/generate/generation.py

+        Args:
+            checkpoint_dir: Path to checkpoint directory
+            work_dir: Working directory for caching remote checkpoints
+            process_group: Process group for distributed loading
+            pre_download: Whether to pre-download remote checkpoints
+            load_thread_count: Number of threads to use for loading the checkpoint
+
+        Raises:
+            FileNotFoundError: If checkpoint directory doesn't exist
+            RuntimeError: If checkpoint loading fails


nit: this docstring syntax is inconsistent with our other docstrings

epwalsh · 2025-07-17T20:25:07Z

src/olmo_core/generate/generation.py

+        work_dir = Path(
+            work_dir or (tempfile.mkdtemp() if get_rank(process_group) == 0 else "/tmp")
+        )


This assumes all ranks share the filesystem? Which is usually only true for single-node jobs.

Alternatively just force the user to provide a work dir.

tyler-romero added 8 commits July 15, 2025 12:36

Barebones GenerationModule

4f95a93

.

6f80b17

some improvements

de3d310

Passing tests?

059ee72

Passing tests

2360daa

Refactor

5013e30

Fix circular import

db7f1f3

Cleanup tests

9e8e55f

tyler-romero commented Jul 15, 2025

View reviewed changes

tyler-romero added 2 commits July 15, 2025 14:39

isort

568610a

changelog

919bba5

tyler-romero requested review from epwalsh, 2015aroras and dirkgr July 15, 2025 21:43

tyler-romero marked this pull request as ready for review July 15, 2025 21:43

tyler-romero added 4 commits July 15, 2025 16:16

More control

323beb4

checks

88599fc

Support attention masking so that left-padding can be used for batch …

39452a3

…generation

Fixups

fbbbf64

epwalsh reviewed Jul 17, 2025

View reviewed changes

tyler-romero added 10 commits July 17, 2025 14:01

fix

6e630f8

Top p and top k support

79d3215

fixes

5a99576

.

a5927c1

Prefix padding support for flash attn

98ab9d6

allow dtype

17111e1

.

02c0326

.

88f5cce

.

99b157b

.

fbb431e

tyler-romero added 27 commits July 18, 2025 08:41

one more time

2560160

.

91f8631

.

6bd3821

logits to keep in LMHead

a3d906c

logits to keep

699dc7b

better generation loop

015308e

better generation

6a7063f

improvements

154dff7

log timing

ab2cc5a

improvements

1487301

better logging

3aa0b5c

so far

cc06b11

Working but doesnt spark joy

e2f5132

start_pos param for RoPE

c6899cb

working?

66d33d8

passing tests

cb48dcf

throughput benchmark

5cfae27

better benchmarking

35692d5

better benchmark script

048a5b5

.

169746c

.

391dbae

.

9e933a5

.

1711aac

.

4dbda2f

autocast

5b4c23d

.

5edbcf5

clean up and simplify generation

5b6bcd6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bare bones `GenerationModule` #324

Bare bones `GenerationModule` #324

Uh oh!

tyler-romero commented Jul 15, 2025 •

edited

Loading

Uh oh!

tyler-romero Jul 15, 2025

Uh oh!

epwalsh left a comment

Uh oh!

epwalsh Jul 17, 2025

Uh oh!

epwalsh Jul 17, 2025

Uh oh!

epwalsh Jul 17, 2025

Uh oh!

epwalsh Jul 17, 2025

Uh oh!

epwalsh Jul 17, 2025

Uh oh!

Uh oh!

		max_sequence_length: Optional[int] = None,
		rank_microbatch_size: Optional[int] = None,

		from olmo_core.generate.config import GenerationConfig
		from olmo_core.generate.selection import temperature_sampling

Bare bones GenerationModule #324

Are you sure you want to change the base?

Bare bones GenerationModule #324

Uh oh!

Conversation

tyler-romero commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tyler-romero Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

epwalsh left a comment

Choose a reason for hiding this comment

Uh oh!

epwalsh Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

epwalsh Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

epwalsh Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

epwalsh Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

epwalsh Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Bare bones `GenerationModule` #324

Bare bones `GenerationModule` #324

tyler-romero commented Jul 15, 2025 •

edited

Loading