Allow using few SMs for low-latency mode #277

fzyzcjy · 2025-07-03T10:06:08Z

The code diff is surely not for merging, but for demonstration how the experiments below are done. If anyone is interested / this direction looks acceptable to be merged, I am happy to polish and further work on the code!

The code and experiment data are extracted from old experiments for my previous #249.

Figure 1: num-sm vs performance
As can be seen, when using 9 warpgroup - ie few SMs, the performance only slightly slow down. Thus this makes a simple overlapping between this and computation feasible.

For dispatch we may need to do extra work though, since the warp specialization may be suboptimal when there are few SMs.

more

8d1c641

fzyzcjy changed the title ~~Use few SMs for low-latency mode with almost full speed~~ Use few SMs for low-latency mode Jul 3, 2025

fzyzcjy changed the title ~~Use few SMs for low-latency mode~~ Allow using few SMs for low-latency mode Jul 3, 2025

sphish force-pushed the main branch from 8ff19f5 to bdd119f Compare July 22, 2025 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow using few SMs for low-latency mode #277

Allow using few SMs for low-latency mode #277

Uh oh!

fzyzcjy commented Jul 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Allow using few SMs for low-latency mode #277

Are you sure you want to change the base?

Allow using few SMs for low-latency mode #277

Uh oh!

Conversation

fzyzcjy commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fzyzcjy commented Jul 3, 2025 •

edited

Loading