LoRA is a Parameter Efficient Finetuning technique which allows fine-tuning LLMs by adjusting a subset of weights rather than altering the entire weight matrix.
Here, the change in model weights (delta W, or ΔW) are re-parametrized into a lower dimension matrices which can be trained to adapt to the new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn’t receive any further adjustments. To produce the final results, both the original and the adapted weights are combined.
Lora approach has a number of advantages:
LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters.
Performance of models fine-tuned using LoRA is comparable to the performance of fully fine-tuned models.
LoRA does not add any inference latency because adapter weights can be merged with the base model.