PEFT documentation

VeLoRA

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.19.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

VeLoRA

This is a variant of LoRA and therefore everything that is possible with LoRA is valid for this method except otherwise stated on this page.

VeLoRA is a LoRA variant that reduces training memory by compressing the activations saved for the LoRA in the forward pass and then reconstructing them in the backwards pass to implement the update rules. In PEFT, VeLoRA is configured as a LoRA variant through the velora_config argument on LoraConfig.

from peft import LoraConfig, VeloraConfig

config = LoraConfig(
    target_modules=["q_proj", "v_proj"],
    velora_config=VeloraConfig(
        num_groups=64,
        scale=0.2,
        init_type="batch_average",
    ),
)

VeLoRA is applied to every LoRA layer selected by target_modules. num_groups controls how the input activation depth is split before compression. If the activation depth is not evenly divisible by num_groups, VeLoRA pads the grouped representation internally and removes the padding after reconstruction. scale rescales the reconstructed activations during the backward pass, and init_type chooses how the projection is initialized.

Use batch_average_once to initialize the projection from the first training batch, batch_average to update it from every training forward pass, or random to initialize it immediately from a random normalized vector.

Below are some results with the MetaMathQA benchmark.

Variant Training Loss Max Memory (GiB) Tokens/sec
LoRA 0.5427 27.69 2366.2
LoRA + GC 0.5426 13.17 1671.8
LoRA+VeLoRA 0.5427 19.94 2057.6

Caveats

  • VeLoRA is currently supported on standard LoRA linear layers only.
Update on GitHub