-
Hi, Could someone explain why during training we use optimization_strategy "qlora" but during prediction we load the trained checkpoint and than use optimization_strategy "None"? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @dcfabian 👋🏻 Good question! When training, we use an optimization strategy like LoRA or QLoRA to help the model learn efficiently. QLoRA applies techniques like quantization and low-rank adaptation to reduce memory usage and speed up training without losing performance. This makes the training process more efficient and resource-friendly. Once training is complete, however, the model has already learned the necessary patterns and relationships. For prediction (or inference), we simply load the trained checkpoint and run the model as-is—there’s no need to use the training-specific optimizations. That’s why the optimization strategy is set to NONE during prediction. In short, QLoRA is useful for improving the training process, but once training is done, the model doesn't need those extra adjustments during inference. |
Beta Was this translation helpful? Give feedback.
Hi @dcfabian 👋🏻 Good question!
When training, we use an optimization strategy like LoRA or QLoRA to help the model learn efficiently. QLoRA applies techniques like quantization and low-rank adaptation to reduce memory usage and speed up training without losing performance. This makes the training process more efficient and resource-friendly.
Once training is complete, however, the model has already learned the necessary patterns and relationships. For prediction (or inference), we simply load the trained checkpoint and run the model as-is—there’s no need to use the training-specific optimizations. That’s why the optimization strategy is set to NONE during prediction.
In short, QLoRA is us…