Description
🚀 Feature
Another UX improvement, similarly to load a cached model from hub #15.
Today when users run ExecuTorchModelForXxx.from_pretrained(model_id, export=True)
it will save the converted model to the local filesystem. When users rerun this command, it will run through the entire stack again including fetching the model from hub, exporting to ExecuTorch, saving it to a local filesytem, etc. This is inefficient as it's common that users will invoke the from_pretrained
API multiple times in different locations in their script. Just like loading a from_pretrained
model from transformers, I'm wondering if it's possible to support loading the cached model from the local fs directly if it exists. From the API perspective, I think we can introduce a new flag for users to control it explicitly.
CC: @echarlaix