Skip to content

Allow using few SMs for low-latency mode #277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fzyzcjy
Copy link
Contributor

@fzyzcjy fzyzcjy commented Jul 3, 2025

The code diff is surely not for merging, but for demonstration how the experiments below are done. If anyone is interested / this direction looks acceptable to be merged, I am happy to polish and further work on the code!

The code and experiment data are extracted from old experiments for my previous #249.

Figure 1: num-sm vs performance
As can be seen, when using 9 warpgroup - ie few SMs, the performance only slightly slow down. Thus this makes a simple overlapping between this and computation feasible.

image

For dispatch we may need to do extra work though, since the warp specialization may be suboptimal when there are few SMs.

@fzyzcjy fzyzcjy changed the title Use few SMs for low-latency mode with almost full speed Use few SMs for low-latency mode Jul 3, 2025
@fzyzcjy fzyzcjy changed the title Use few SMs for low-latency mode Allow using few SMs for low-latency mode Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant