FastVLM-1.5B-GPTQ-Int4
This version of FastVLM-1.5B-GPTQ-Int4 has been converted to run on the Axera NPU using w4a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 5.1-patch1.
Please note that the context of the model is 1k and the maximum prefill length is 640 tokens.
Convert tools links:
For those who are interested in model conversion, you can try to quant and export axmodel through the original repo:
https://huggingface.co/apple/FastVLM-1.5B
How to Convert LLM from Huggingface to axmodel[TODO]
Support Platform
- AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card
| Chips | image encoder | ttft | w4a16 |
|---|---|---|---|
| AX650 | 216.257 ms (1024x1024) | 709.455 ms (291tokens) | 21.38 tokens/sec |
| AX650 | 44.747 ms (512x512) | 167.543 ms (99tokens) | 21.38 tokens/sec |
How to use
Download all files from this repository to the device
$ tree -L 1
.
├── config.json
├── fastvlm_ax650_context_1k_prefill_640_int4
├── fastvlm_tokenizer
├── images
├── infer_axmodel.py
├── README.md
├── requirements.txt
└── utils
5 directories, 4 files
Install transformer
pip install -r requirements.txt
Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
Run the following command on the Axera board to start a chat conversation:
$ python infer_axmodel.py -v ./fastvlm_ax650_context_1k_prefill_640_int4/image_encoder_512x512.axmodel -m ./fastvlm_ax650_context_1k_prefill_640_int4 -t ./fastvlm_tokenizer/ -i 512
output:
[INFO] Available providers: ['AXCLRTExecutionProvider']
Loading config, tokenizer and init model.
Detected prefixes: ['llava_qwen2'], chosen: llava_qwen2, layers: 28
Init InferenceSession: 0%| | 0/28 [00:00<?, ?it/s][INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
Init InferenceSession: 4%|████ | 1/28 [00:00<00:20, 1.31it/s][INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
Init InferenceSession: 7%|████████▏ | 2/28 [00:01<00:14, 1.85it/s][INFO] Using provider: AXCLRTExecutionProvider
...
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
Init InferenceSession: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:11<00:00, 2.48it/s]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
Model loaded successfully!
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
[INFO]: 输入文本进行对话,或者输入图片路径进行图片理解, 或者输入q退出对话。
prompt<<who are you
slice_indices: [0]
Slice prefill done: 0
answer >> I am an artificial intelligence language model developed by Apple Inc.
prompt<<./images/ssd_horse.jpg
slice_indices: [0]
Slice prefill done: 0
answer >> The image depicts a young man riding a brown horse in an outdoor setting. The horse is standing on a dirt ground, and the rider is wearing a blue jacket and jeans. The horse has a white blaze on its face and white markings on its legs. The rider is holding the reins and appears to be looking down at a brown dog standing on the ground next to the horse. The dog is wearing a collar and is looking up at the rider. In the background, there is a silver pickup truck parked on the grass, and a fence can be seen further back. The scene appears to be set in a rural or farm-like environment.
prompt<<./images/image_1.jpg
slice_indices: [0]
Slice prefill done: 0
answer >> The image depicts a panda bear in a natural setting. The panda is sitting on the ground, surrounded by greenery, including bamboo and other plants. The panda has a distinctive black and white fur pattern, with black fur around its eyes, ears, and limbs, and white fur on its face, chest, and legs. The panda is sitting on its hind legs, with its front paws resting on its chest. The background shows a forested area with trees and rocks, suggesting that the panda is in its natural habitat. The panda appears to be looking directly at the camera, giving the impression that it is aware of the photographer's presence. The overall scene is peaceful and serene, capturing the beauty of the panda in its natural environment.
prompt<<q
[INFO]: 对话结束,再见。
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for AXERA-TECH/FastVLM-1.5B-GPTQ-Int4
Base model
apple/FastVLM-1.5B