LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging

LiteVGGT is a 3D vision foundation model that significantly boosts vanilla VGGT's performance by achieving up to 10x speedup and substantial memory reduction. This enables efficient processing of large-scale scenes (up to 1000 images) for 3D reconstruction, while maintaining high accuracy in camera pose and point cloud prediction. The method introduces a geometry-aware cached token merging strategy to optimize anchor token selection and reuse merge indices, preserving key geometric information with minimal accuracy impact.

This model was presented in the paper: LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging.

Overview

For 1000 input images, LiteVGGT achieves a 10× speedup over VGGT while maintaining high accuracy in camera pose and point cloud prediction. Its scalability and robustness make large-scale scene reconstruction more efficient and reliable.

teaser

Run Demo

To quickly try out LiteVGGT for 3D reconstruction, follow these steps:

First, create a virtual environment using Conda, clone this repository to your local machine, and install the required dependencies.

conda create -n litevggt python=3.10
conda activate litevggt
git clone [email protected]:GarlicBa/LiteVGGT-repo.git
cd LiteVGGT-repo
pip install -r requirements.txt

Install the Transformer Engine package following its official installation requirements (see https://github.com/NVIDIA/TransformerEngine):

export CC=your/gcc/path
export CXX=your/g++/path
pip install --no-build-isolation transformer_engine[pytorch]

Then, download our LiteVGGT checkpoint that has been finetuned and TE-remapped:

wget https://huggingface.co/ZhijianShu/LiteVGGT/resolve/main/te_dict.pt

Finally:

python run_demo.py \
  --ckpt_path path/to/your/te_dict.pt \
  --img_dir path/to/your/img_dir \
  --output_dir ./recon_result \

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support