wli1995 commited on
Commit
7eab1cc
·
verified ·
1 Parent(s): 564fa66

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -65,3 +65,4 @@ fastvlm_ax650_context_1k_prefill_640/llava_qwen2_p128_l9_together.axmodel filter
65
  fastvlm_ax650_context_1k_prefill_640/llava_qwen2_post.axmodel filter=lfs diff=lfs merge=lfs -text
66
  images/image_1.jpg filter=lfs diff=lfs merge=lfs -text
67
  images/ssd_horse.jpg filter=lfs diff=lfs merge=lfs -text
 
 
65
  fastvlm_ax650_context_1k_prefill_640/llava_qwen2_post.axmodel filter=lfs diff=lfs merge=lfs -text
66
  images/image_1.jpg filter=lfs diff=lfs merge=lfs -text
67
  images/ssd_horse.jpg filter=lfs diff=lfs merge=lfs -text
68
+ fastvlm_ax650_context_1k_prefill_640/image_encoder_512x512.axmodel filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,15 +1,3 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - apple/FastVLM-1.5B
7
- pipeline_tag: image-to-text
8
- tags:
9
- - vlm
10
- - fastvlm
11
- - en
12
- ---
13
  # FastVLM-1.5B
14
 
15
  This version of FastVLM-1.5B has been converted to run on the Axera NPU using **w8a16** quantization.
@@ -37,7 +25,7 @@ How to Convert LLM from Huggingface to axmodel[TODO]
37
 
38
  |Chips|image encoder 1024|ttft(291tokens)|w8a16|
39
  |--|--|--|--|
40
- |AX650| 216.257 ms | 861.213 ms | 11.90 tokens/sec|
41
 
42
 
43
  ## How to use
@@ -86,7 +74,106 @@ Init InferenceSession: 4%|████
86
  [INFO] VNPU type: VNPUType.DISABLED
87
  [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
88
  Init InferenceSession: 7%|████████▏ | 2/28 [00:01<00:21, 1.20it/s][INFO] Using provider: AXCLRTExecutionProvider
89
- ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  [INFO] SOC Name: AX650N
91
  [INFO] VNPU type: VNPUType.DISABLED
92
  [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
@@ -128,4 +215,4 @@ prompt<<q
128
  [INFO]: 对话结束,再见。
129
  ```
130
  ![ssd_horse.jpg](./images/ssd_horse.jpg)
131
- ![iamge_1.jpg](./images/image_1.jpg)
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # FastVLM-1.5B
2
 
3
  This version of FastVLM-1.5B has been converted to run on the Axera NPU using **w8a16** quantization.
 
25
 
26
  |Chips|image encoder 1024|ttft(291tokens)|w8a16|
27
  |--|--|--|--|
28
+ |AX650| 216.257 ms | 861.213 ms | 13.88 tokens/sec|
29
 
30
 
31
  ## How to use
 
74
  [INFO] VNPU type: VNPUType.DISABLED
75
  [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
76
  Init InferenceSession: 7%|████████▏ | 2/28 [00:01<00:21, 1.20it/s][INFO] Using provider: AXCLRTExecutionProvider
77
+ [INFO] SOC Name: AX650N
78
+ [INFO] VNPU type: VNPUType.DISABLED
79
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
80
+ Init InferenceSession: 11%|████████████▏ | 3/28 [00:02<00:19, 1.30it/s][INFO] Using provider: AXCLRTExecutionProvider
81
+ [INFO] SOC Name: AX650N
82
+ [INFO] VNPU type: VNPUType.DISABLED
83
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
84
+ Init InferenceSession: 14%|████████████████▎ | 4/28 [00:03<00:17, 1.36it/s][INFO] Using provider: AXCLRTExecutionProvider
85
+ [INFO] SOC Name: AX650N
86
+ [INFO] VNPU type: VNPUType.DISABLED
87
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
88
+ Init InferenceSession: 18%|████████████████████▎ | 5/28 [00:03<00:16, 1.40it/s][INFO] Using provider: AXCLRTExecutionProvider
89
+ [INFO] SOC Name: AX650N
90
+ [INFO] VNPU type: VNPUType.DISABLED
91
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
92
+ Init InferenceSession: 21%|████████████████████████▍ | 6/28 [00:04<00:15, 1.42it/s][INFO] Using provider: AXCLRTExecutionProvider
93
+ [INFO] SOC Name: AX650N
94
+ [INFO] VNPU type: VNPUType.DISABLED
95
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
96
+ Init InferenceSession: 25%|████████████████████████████▌ | 7/28 [00:05<00:14, 1.43it/s][INFO] Using provider: AXCLRTExecutionProvider
97
+ [INFO] SOC Name: AX650N
98
+ [INFO] VNPU type: VNPUType.DISABLED
99
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
100
+ Init InferenceSession: 29%|████████████████████████████████▌ | 8/28 [00:05<00:13, 1.44it/s][INFO] Using provider: AXCLRTExecutionProvider
101
+ [INFO] SOC Name: AX650N
102
+ [INFO] VNPU type: VNPUType.DISABLED
103
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
104
+ Init InferenceSession: 32%|████████████████████████████████████▋ | 9/28 [00:06<00:13, 1.44it/s][INFO] Using provider: AXCLRTExecutionProvider
105
+ [INFO] SOC Name: AX650N
106
+ [INFO] VNPU type: VNPUType.DISABLED
107
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
108
+ Init InferenceSession: 36%|████████████████████████████████████████▎ | 10/28 [00:07<00:12, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
109
+ [INFO] SOC Name: AX650N
110
+ [INFO] VNPU type: VNPUType.DISABLED
111
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
112
+ Init InferenceSession: 39%|████████████████████████████████████████████▍ | 11/28 [00:07<00:11, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
113
+ [INFO] SOC Name: AX650N
114
+ [INFO] VNPU type: VNPUType.DISABLED
115
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
116
+ Init InferenceSession: 43%|████████████████████████████████████████████████▍ | 12/28 [00:08<00:11, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
117
+ [INFO] SOC Name: AX650N
118
+ [INFO] VNPU type: VNPUType.DISABLED
119
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
120
+ Init InferenceSession: 46%|████████████████████████████████████████████████████▍ | 13/28 [00:09<00:10, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
121
+ [INFO] SOC Name: AX650N
122
+ [INFO] VNPU type: VNPUType.DISABLED
123
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
124
+ Init InferenceSession: 50%|████████████████████████████████████████████████████████▌ | 14/28 [00:09<00:09, 1.46it/s][INFO] Using provider: AXCLRTExecutionProvider
125
+ [INFO] SOC Name: AX650N
126
+ [INFO] VNPU type: VNPUType.DISABLED
127
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
128
+ Init InferenceSession: 54%|████████████████████████████████████████████████████████████▌ | 15/28 [00:10<00:08, 1.46it/s][INFO] Using provider: AXCLRTExecutionProvider
129
+ [INFO] SOC Name: AX650N
130
+ [INFO] VNPU type: VNPUType.DISABLED
131
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
132
+ Init InferenceSession: 57%|████████████████████████████████████████████████████████████████▌ | 16/28 [00:11<00:08, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
133
+ [INFO] SOC Name: AX650N
134
+ [INFO] VNPU type: VNPUType.DISABLED
135
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
136
+ Init InferenceSession: 61%|████████████████████████████████████████████████████████████████████▌ | 17/28 [00:12<00:07, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
137
+ [INFO] SOC Name: AX650N
138
+ [INFO] VNPU type: VNPUType.DISABLED
139
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
140
+ Init InferenceSession: 64%|████████████████████████████████████████████████████████████████████████▋ | 18/28 [00:12<00:06, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
141
+ [INFO] SOC Name: AX650N
142
+ [INFO] VNPU type: VNPUType.DISABLED
143
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
144
+ Init InferenceSession: 68%|████████████████████████████████████████████████████████████████████████████▋ | 19/28 [00:13<00:06, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
145
+ [INFO] SOC Name: AX650N
146
+ [INFO] VNPU type: VNPUType.DISABLED
147
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
148
+ Init InferenceSession: 71%|████████████████████████████████████████████████████████████████████████████████▋ | 20/28 [00:14<00:05, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
149
+ [INFO] SOC Name: AX650N
150
+ [INFO] VNPU type: VNPUType.DISABLED
151
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
152
+ Init InferenceSession: 75%|████████████████████████████████████████████████████████████████████████████████████▊ | 21/28 [00:14<00:04, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
153
+ [INFO] SOC Name: AX650N
154
+ [INFO] VNPU type: VNPUType.DISABLED
155
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
156
+ Init InferenceSession: 79%|████████████████████████████████████████████████████████████████████████████████████████▊ | 22/28 [00:15<00:04, 1.46it/s][INFO] Using provider: AXCLRTExecutionProvider
157
+ [INFO] SOC Name: AX650N
158
+ [INFO] VNPU type: VNPUType.DISABLED
159
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
160
+ Init InferenceSession: 82%|████████████████████████████████████████████████████████████████████████████████████████████▊ | 23/28 [00:16<00:03, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
161
+ [INFO] SOC Name: AX650N
162
+ [INFO] VNPU type: VNPUType.DISABLED
163
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
164
+ Init InferenceSession: 86%|████████████████████████████████████████████████████████████████████████████████████████████████▊ | 24/28 [00:16<00:02, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
165
+ [INFO] SOC Name: AX650N
166
+ [INFO] VNPU type: VNPUType.DISABLED
167
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
168
+ Init InferenceSession: 89%|████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 25/28 [00:17<00:02, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
169
+ [INFO] SOC Name: AX650N
170
+ [INFO] VNPU type: VNPUType.DISABLED
171
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
172
+ Init InferenceSession: 93%|████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 26/28 [00:18<00:01, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
173
+ [INFO] SOC Name: AX650N
174
+ [INFO] VNPU type: VNPUType.DISABLED
175
+ [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
176
+ Init InferenceSession: 96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 27/28 [00:18<00:00, 1.45it/s][INFO] Using provider: AXCLRTExecutionProvider
177
  [INFO] SOC Name: AX650N
178
  [INFO] VNPU type: VNPUType.DISABLED
179
  [INFO] Compiler version: 5.1-patch1-dirty 140e8d4a-dirty
 
215
  [INFO]: 对话结束,再见。
216
  ```
217
  ![ssd_horse.jpg](./images/ssd_horse.jpg)
218
+ ![iamge_1.jpg](./images/image_1.jpg)
fastvlm_ax650_context_1k_prefill_640/image_encoder_512x512.axmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86a9657370520266913b4fcd7a9725216cdb2bae273bc082c13050ada4e9a69c
3
+ size 170758109
infer_axmodel.py CHANGED
@@ -24,10 +24,10 @@ def load_model_and_tokenizer(model_path):
24
 
25
  return config, tokenizer
26
 
27
- def vision_encoder(image_path, ax_session):
28
 
29
- image_processor = CLIPImageProcessor(size={"shortest_edge": 1024}, # CLIP 支持 336x336
30
- crop_size={"height": 1024, "width": 1024},
31
  image_mean=[0, 0, 0],
32
  image_std=[1/255, 1/255, 1/255]
33
  )
@@ -43,7 +43,7 @@ def vision_encoder(image_path, ax_session):
43
 
44
  return vit_output
45
 
46
- def llm_infer(image_features, llm_path, config, tokenizer, imer, get_input):
47
 
48
  embeds = np.load(os.path.join(llm_path, "model.embed_tokens.weight.npy"))
49
 
@@ -53,7 +53,7 @@ def llm_infer(image_features, llm_path, config, tokenizer, imer, get_input):
53
 
54
  if image_features is not None:
55
  # # for idx in range(len(image_features)):
56
- prompt += "\n<img>" + "<image>"*256 + "</img>\n"
57
  prompt += "<|im_end|>\n<|im_start|>assistant\n"
58
 
59
  token_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX)
@@ -66,7 +66,7 @@ def llm_infer(image_features, llm_path, config, tokenizer, imer, get_input):
66
  if image_features is not None:
67
  image_start_index = np.where(np.array(token_ids) == -200)[0][0] # <image> tag 151646
68
  image_insert_index = image_start_index + 1
69
- prefill_data[image_insert_index : image_insert_index + 256] = image_features[0, :, :]
70
 
71
  eos_token_id = None
72
  if isinstance(config.eos_token_id, list) and len(config.eos_token_id) > 1:
@@ -88,11 +88,19 @@ if __name__ == "__main__":
88
  args.add_argument("--vision_model", "-v", type=str, default="./fastvlm_ax650_context_1k_prefill_640/image_encoder_1024x1024.axmodel", help="Path to the vision axmodel.")
89
  args.add_argument("--model_path", "-m", type=str, default="./fastvlm_ax650_context_1k_prefill_640", help="Path to the llm axmodel.")
90
  args.add_argument("--tokenizer_path", "-t", type=str, default="./fastvlm_tokenizer", help="Path to the tokenizer.")
91
- # args.add_argument("--images", type=str, default=None, help="Paths to the input images.")
92
  # args.add_argument("--question", type=str, default="介绍一下你自己", help="The question to ask the model.")
93
 
94
  args = args.parse_args()
95
 
 
 
 
 
 
 
 
 
96
  print("Loading config, tokenizer and init model.")
97
  config, tokenizer = load_model_and_tokenizer(model_path=args.tokenizer_path)
98
 
@@ -115,9 +123,9 @@ if __name__ == "__main__":
115
  if not os.path.isfile(get_input):
116
  print("[INFO]: 输入错误,请检查图片输入路径。")
117
  continue
118
- image_features = vision_encoder(get_input, ax_session)
119
  get_input = "Describe the image in detail."
120
- llm_infer(image_features, args.model_path, config, tokenizer, imer, get_input)
121
  else:
122
  image_features = None
123
- llm_infer(image_features, args.model_path, config, tokenizer, imer, get_input)
 
24
 
25
  return config, tokenizer
26
 
27
+ def vision_encoder(image_path, ax_session, args):
28
 
29
+ image_processor = CLIPImageProcessor(size={"shortest_edge": int(args.input_size)}, # CLIP 支持 336x336
30
+ crop_size={"height": int(args.input_size), "width": int(args.input_size)},
31
  image_mean=[0, 0, 0],
32
  image_std=[1/255, 1/255, 1/255]
33
  )
 
43
 
44
  return vit_output
45
 
46
+ def llm_infer(image_features, llm_path, config, tokenizer, imer, get_input, token_length):
47
 
48
  embeds = np.load(os.path.join(llm_path, "model.embed_tokens.weight.npy"))
49
 
 
53
 
54
  if image_features is not None:
55
  # # for idx in range(len(image_features)):
56
+ prompt += "\n<img>" + "<image>"*token_length + "</img>\n"
57
  prompt += "<|im_end|>\n<|im_start|>assistant\n"
58
 
59
  token_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX)
 
66
  if image_features is not None:
67
  image_start_index = np.where(np.array(token_ids) == -200)[0][0] # <image> tag 151646
68
  image_insert_index = image_start_index + 1
69
+ prefill_data[image_insert_index : image_insert_index + token_length] = image_features[0, :, :]
70
 
71
  eos_token_id = None
72
  if isinstance(config.eos_token_id, list) and len(config.eos_token_id) > 1:
 
88
  args.add_argument("--vision_model", "-v", type=str, default="./fastvlm_ax650_context_1k_prefill_640/image_encoder_1024x1024.axmodel", help="Path to the vision axmodel.")
89
  args.add_argument("--model_path", "-m", type=str, default="./fastvlm_ax650_context_1k_prefill_640", help="Path to the llm axmodel.")
90
  args.add_argument("--tokenizer_path", "-t", type=str, default="./fastvlm_tokenizer", help="Path to the tokenizer.")
91
+ args.add_argument("--input_size", "-i", type=str, default="1024", help="Input size of the vision encoder model.")
92
  # args.add_argument("--question", type=str, default="介绍一下你自己", help="The question to ask the model.")
93
 
94
  args = args.parse_args()
95
 
96
+ token_len_map = {"2048": 1280,
97
+ "1024": 256,
98
+ "768": 144,
99
+ "512": 64,
100
+ "256": 16}
101
+
102
+ token_length = token_len_map[args.input_size]
103
+
104
  print("Loading config, tokenizer and init model.")
105
  config, tokenizer = load_model_and_tokenizer(model_path=args.tokenizer_path)
106
 
 
123
  if not os.path.isfile(get_input):
124
  print("[INFO]: 输入错误,请检查图片输入路径。")
125
  continue
126
+ image_features = vision_encoder(get_input, ax_session, args)
127
  get_input = "Describe the image in detail."
128
+ llm_infer(image_features, args.model_path, config, tokenizer, imer, get_input, token_length)
129
  else:
130
  image_features = None
131
+ llm_infer(image_features, args.model_path, config, tokenizer, imer, get_input, token_length)
utils/__pycache__/infer_func.cpython-313.pyc CHANGED
Binary files a/utils/__pycache__/infer_func.cpython-313.pyc and b/utils/__pycache__/infer_func.cpython-313.pyc differ