Description
Feature request: Support load converted model from hub
Define the use-case
The fundamental use case behind is to cache the converted model and allow users to fetch it efficiently w/o always going through the exporting process.
Preferred path
During the trip to Paris last year, we briefly mentioned the idea of creating an Executorch Community and built a prototype of it with help from you guys. Here it is: https://huggingface.co/executorch-community. Ideally we would like host the .pte
model, its configuration, and a snapshot of the recipe in the Executorch Community. Do you think it's a more appropriate place to scale?
Process
What is the process for maintaining the generated artifacts? It's open to discuss but I think GGUF and ONNX already have similar features, so I'm curious if anything we can learn, particular around the life cycle of the generated artifacts:
- when is it triggered in the first place?
- how is it updated or override?
- how is the compatibility (backward and forward) policy of consuming the generated artifacts enforced?