VeLoRA

This is a variant of LoRA and therefore everything that is possible with LoRA is valid for this method except otherwise stated on this page.

VeLoRA is a LoRA variant that reduces training memory by compressing the activations saved for the LoRA in the forward pass and then reconstructing them in the backwards pass to implement the update rules. In PEFT, VeLoRA is configured as a LoRA variant through the velora_config argument on LoraConfig.

from peft import LoraConfig, VeloraConfig

config = LoraConfig(
    target_modules=["q_proj", "v_proj"],
    velora_config=VeloraConfig(
        num_groups=64,
        scale=0.2,
        init_type="batch_average",
    ),
)

VeLoRA is applied to every LoRA layer selected by target_modules. num_groups controls how the input activation depth is split before compression. If the activation depth is not evenly divisible by num_groups, VeLoRA pads the grouped representation internally and removes the padding after reconstruction. scale rescales the reconstructed activations during the backward pass, and init_type chooses how the projection is initialized.

Use batch_average_once to initialize the projection from the first training batch, batch_average to update it from every training forward pass, or random to initialize it immediately from a random normalized vector.

Below are some results with the MetaMathQA benchmark.

Variant	Training Loss	Max Memory (GiB)	Tokens/sec
LoRA	0.5427	27.69	2366.2
LoRA + GC	0.5426	13.17	1671.8
LoRA+VeLoRA	0.5427	19.94	2057.6

Caveats

VeLoRA is currently supported on standard LoRA linear layers only.

Update on GitHub