Distilled model benchmarks?

Thanks for the great work!
It's a very clever way to compute embeddings beforehand and use them directly as target values during backpropagation step.

### Questions
- Have you done any testing to find out, how well the distilled model performs as compared to the original teacher model?
- If we use Vision Transformer (ViT) models as base, should there be any improvement to embedding quality?
- Instead of using the distilled model for classification task by computing the `probs`, How well it performs in case we want to utilize the raw embeddings for ranking the images based on cosine distance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distilled model benchmarks? #2

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Distilled model benchmarks? #2

Description

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions