Optimized ReID inference on GPU #118
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The current inference implementation in ReIDModel does not fully take advantage of the GPU, attempting to do inference in series rather than in parallel. Additionally, it moves crops on and off the GPU individually instead of as a stack, increasing the number of data transfers to and from GPU. This change allows ReIDModel to do batched inference while on GPU while still allowing for inference in series when no GPU is available.
List any dependencies that are required for this change: None
Type of change
New feature (non-breaking change which adds functionality)
How has this change been tested, please provide a testcase or example of how you tested the change?
I've tested this change in my private codebase, where I see >2x speedup on a handful of
timm
models:I don't see any test cases in this repo yet, so I'm unsure if you'd like me to add any tests?
Any specific deployment considerations
None
Docs
I don't think any update is necessary.