Cover custom sdpa + kv cache + quant in CI #75

guangy10 · 2025-06-06T01:25:26Z

As titled, showing all supported optimizations are composable and can be combined to achieve better perf.

HuggingFaceDocBuilderDev · 2025-06-06T01:28:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

guangy10 · 2025-06-06T01:28:50Z

@kimishpatel for review

kimishpatel · 2025-06-06T02:56:10Z

tests/models/test_modeling_llama.py

+        tokenizer = AutoTokenizer.from_pretrained(model_id)
+        model = ExecuTorchModelForCausalLM.from_pretrained(
+            model_id,
+            recipe="xnnpack",
+            attn_implementation="custom_sdpa",
+            use_custom_kv_cache=True,
+            **{"qlinear": True, "qembeeding": True},
+        )
+        self.assertIsInstance(model, ExecuTorchModelForCausalLM)
+        self.assertIsInstance(model.model, ExecuTorchModule)
+        generated_text = model.text_generation(
+            tokenizer=tokenizer,
+            prompt="Simply put, the theory of relativity states that",
+            max_seq_len=32,
+        )
+        logging.info(f"\nGenerated text:\n\t{generated_text}")
+        generated_tokens = tokenizer(generated_text, return_tensors="pt").input_ids
+
+        # Free memory before loading eager for quality check
+        del model
+        del tokenizer
+        gc.collect()


you can factor out a bunch of this code in util so only one place needs update

Yeah, I was thinking to parameterize these tests when I have time, but it’s not a priority, only the coverage in CI matters atm to ensure all pieces can work together. I can file a github task to refactor these tests

kimishpatel

looks good but some refactor would be nice

guangy10 · 2025-06-06T04:20:04Z

Will check the CI failures tomorrow

guangy10 · 2025-06-06T17:19:05Z

@kimishpatel It looks like the custom kv cache only works with static cache. For gemma3, the cache must be hybrid. To leverage the custom kv cache, I think additional work may be needed. https://github.com/huggingface/optimum-executorch/actions/runs/15480739347/job/43586007691?pr=75#step:5:1855

guangy10 mentioned this pull request Jun 6, 2025

Simplify installation for dev #74

Merged

guangy10 marked this pull request as ready for review June 6, 2025 01:28

guangy10 requested a review from echarlaix June 6, 2025 01:28

kimishpatel reviewed Jun 6, 2025

View reviewed changes

kimishpatel approved these changes Jun 6, 2025

View reviewed changes

guangy10 force-pushed the improve_tests branch 2 times, most recently from 11a411d to 2df1009 Compare June 6, 2025 17:28

Cover custom sdpa + kv cache + quant in CI

ef664a9

guangy10 force-pushed the improve_tests branch from 2df1009 to ef664a9 Compare June 6, 2025 17:30

guangy10 merged commit 1c653dc into huggingface:main Jun 6, 2025
106 of 107 checks passed

guangy10 deleted the improve_tests branch June 6, 2025 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cover custom sdpa + kv cache + quant in CI #75

Cover custom sdpa + kv cache + quant in CI #75

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 6, 2025

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

kimishpatel Jun 6, 2025

Uh oh!

guangy10 Jun 6, 2025

Uh oh!

kimishpatel left a comment

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

Cover custom sdpa + kv cache + quant in CI #75

Cover custom sdpa + kv cache + quant in CI #75

Uh oh!

Conversation

guangy10 commented Jun 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 6, 2025

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

kimishpatel Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

guangy10 Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

guangy10 commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!