Cheng Chang
commited on
Commit
·
e1d539f
1
Parent(s):
29baff7
readme edit
Browse files- LICENSE +49 -0
- README.md +122 -36
- sample_code.py → sanity.py +0 -0
LICENSE
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
OpenMDW License Agreement, version 1.0 (OpenMDW-1.0)
|
| 2 |
+
|
| 3 |
+
By exercising rights granted to you under this agreement, you accept and agree
|
| 4 |
+
to its terms.
|
| 5 |
+
|
| 6 |
+
As used in this agreement, "Model Materials" means the materials provided to
|
| 7 |
+
you under this agreement, consisting of: (1) one or more machine learning
|
| 8 |
+
models (including architecture and parameters); and (2) all related artifacts
|
| 9 |
+
(including associated data, documentation and software) that are provided to
|
| 10 |
+
you hereunder.
|
| 11 |
+
|
| 12 |
+
Subject to your compliance with this agreement, permission is hereby granted,
|
| 13 |
+
free of charge, to deal in the Model Materials without restriction, including
|
| 14 |
+
under all copyright, patent, database, and trade secret rights included or
|
| 15 |
+
embodied therein.
|
| 16 |
+
|
| 17 |
+
If you distribute any portion of the Model Materials, you shall retain in your
|
| 18 |
+
distribution (1) a copy of this agreement, and (2) all copyright notices and
|
| 19 |
+
other notices of origin included in the Model Materials that are applicable to
|
| 20 |
+
your distribution.
|
| 21 |
+
|
| 22 |
+
If you file, maintain, or voluntarily participate in a lawsuit against any
|
| 23 |
+
person or entity asserting that the Model Materials directly or indirectly
|
| 24 |
+
infringe any patent, then all rights and grants made to you hereunder are
|
| 25 |
+
terminated, unless that lawsuit was in response to a corresponding lawsuit
|
| 26 |
+
first brought against you.
|
| 27 |
+
|
| 28 |
+
This agreement does not impose any restrictions or obligations with respect to
|
| 29 |
+
any use, modification, or sharing of any outputs generated by using the Model
|
| 30 |
+
Materials.
|
| 31 |
+
|
| 32 |
+
THE MODEL MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
| 33 |
+
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 34 |
+
FITNESS FOR A PARTICULAR PURPOSE, TITLE, NONINFRINGEMENT, ACCURACY, OR THE
|
| 35 |
+
ABSENCE OF LATENT OR OTHER DEFECTS OR ERRORS, WHETHER OR NOT DISCOVERABLE, ALL
|
| 36 |
+
TO THE GREATEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW.
|
| 37 |
+
|
| 38 |
+
YOU ARE SOLELY RESPONSIBLE FOR (1) CLEARING RIGHTS OF OTHER PERSONS THAT MAY
|
| 39 |
+
APPLY TO THE MODEL MATERIALS OR ANY USE THEREOF, INCLUDING WITHOUT LIMITATION
|
| 40 |
+
ANY PERSON'S COPYRIGHTS OR OTHER RIGHTS INCLUDED OR EMBODIED IN THE MODEL
|
| 41 |
+
MATERIALS; (2) OBTAINING ANY NECESSARY CONSENTS, PERMISSIONS OR OTHER RIGHTS
|
| 42 |
+
REQUIRED FOR ANY USE OF THE MODEL MATERIALS; OR (3) PERFORMING ANY DUE
|
| 43 |
+
DILIGENCE OR UNDERTAKING ANY OTHER INVESTIGATIONS INTO THE MODEL MATERIALS OR
|
| 44 |
+
ANYTHING INCORPORATED OR EMBODIED THEREIN.
|
| 45 |
+
|
| 46 |
+
IN NO EVENT SHALL THE PROVIDERS OF THE MODEL MATERIALS BE LIABLE FOR ANY CLAIM,
|
| 47 |
+
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
|
| 48 |
+
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL MATERIALS, THE
|
| 49 |
+
USE THEREOF OR OTHER DEALINGS THEREIN.
|
README.md
CHANGED
|
@@ -43,7 +43,7 @@ library_name: transformers
|
|
| 43 |
text-decoration:none;
|
| 44 |
font-weight:600;
|
| 45 |
font-size:16px;">
|
| 46 |
-
🌐 Website
|
| 47 |
</a>
|
| 48 |
<a href="https://arxiv.org/abs/2510.09872" style="
|
| 49 |
display:inline-block;
|
|
@@ -65,7 +65,7 @@ library_name: transformers
|
|
| 65 |
text-decoration:none;
|
| 66 |
font-weight:600;
|
| 67 |
font-size:16px;">
|
| 68 |
-
💻 Code
|
| 69 |
</a>
|
| 70 |
</div>
|
| 71 |
|
|
@@ -96,7 +96,7 @@ ActIO-UI is developed by [Orby AI](https://www.orby.ai/), a [Uniphore](https://w
|
|
| 96 |
|
| 97 |
|
| 98 |
|
| 99 |
-
#
|
| 100 |
|
| 101 |
- [ActIO-UI-7B-SFT](https://huggingface.co/Uniphore/actio-ui-7b-sft): a 7B model trained with supervised finetuning (SFT) using distilled subtask data.
|
| 102 |
- [ActIO-UI-7B-RLVR](?????(model_link)): a 7B model trained with Reinforcement Learning with Verifiable Rewards (RLVR) over the ActIO-UI-7B-SFT checkpoint.
|
|
@@ -139,8 +139,6 @@ ActIO-UI models are specifically trained to solve GUI subtask problems. Both the
|
|
| 139 |
</div>
|
| 140 |
|
| 141 |
|
| 142 |
-
|
| 143 |
-
|
| 144 |
## Other Benchmarks
|
| 145 |
|
| 146 |
To access generalizability of GUI subtask execution as a model capability, we compare the performance of ActIO-UI over GUI subtasks (WARC-Bench), long-horizon tasks (WebArena), short-horizon tasks (Miniwob++), and GUI visual grounding (ScreenSpot V2). Without access to any long-horizon and grounding data in its training dataset, our models show improved performance over their base models (except for the grounding performance when compared to Qwen 2.5 VL 72B).
|
|
@@ -165,16 +163,17 @@ To access generalizability of GUI subtask execution as a model capability, we co
|
|
| 165 |
</div>
|
| 166 |
|
| 167 |
|
| 168 |
-
## Usage
|
| 169 |
|
| 170 |
-
|
|
|
|
|
|
|
| 171 |
|
| 172 |
To maintain optimal model performance, each input image should be set at **1280 (pixel width) \\(\times\\) 720 (pixel height)**.
|
| 173 |
|
| 174 |
|
| 175 |
-
|
| 176 |
|
| 177 |
-
To run
|
| 178 |
```bash
|
| 179 |
python -m venv ./venv
|
| 180 |
source venv/bin/activate
|
|
@@ -182,30 +181,118 @@ pip install -r requirements.txt
|
|
| 182 |
```
|
| 183 |
|
| 184 |
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 209 |
```
|
| 210 |
|
| 211 |
|
|
@@ -215,14 +302,13 @@ python demo.py
|
|
| 215 |
## License
|
| 216 |
This project is licensed under the Open Model, Data, & Weights License Agreement (OpenMDW). See the LICENSE file in the root folder for details.
|
| 217 |
|
| 218 |
-
## Research Use and Disclaimer
|
| 219 |
-
ActIO-UI are intended for research and educational purposes only.
|
| 220 |
-
|
| 221 |
## Prohibited Uses
|
| 222 |
The model may not be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction.
|
| 223 |
Use for illegal, unethical, or harmful activities is strictly prohibited.
|
| 224 |
|
| 225 |
## Disclaimer
|
|
|
|
|
|
|
| 226 |
The authors, contributors, and copyright holders are not responsible for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use.
|
| 227 |
Use of the name, logo, or trademarks of "ActIO", "ActIO-UI" "WARC-Bench", or "Uniphore" does not imply any endorsement or affiliation unless separate written permission is obtained.
|
| 228 |
Users are solely responsible for ensuring their use complies with applicable laws and regulations.
|
|
|
|
| 43 |
text-decoration:none;
|
| 44 |
font-weight:600;
|
| 45 |
font-size:16px;">
|
| 46 |
+
🌐 Website (Coming Soon!)
|
| 47 |
</a>
|
| 48 |
<a href="https://arxiv.org/abs/2510.09872" style="
|
| 49 |
display:inline-block;
|
|
|
|
| 65 |
text-decoration:none;
|
| 66 |
font-weight:600;
|
| 67 |
font-size:16px;">
|
| 68 |
+
💻 Code (Coming Soon!)
|
| 69 |
</a>
|
| 70 |
</div>
|
| 71 |
|
|
|
|
| 96 |
|
| 97 |
|
| 98 |
|
| 99 |
+
# Model Family
|
| 100 |
|
| 101 |
- [ActIO-UI-7B-SFT](https://huggingface.co/Uniphore/actio-ui-7b-sft): a 7B model trained with supervised finetuning (SFT) using distilled subtask data.
|
| 102 |
- [ActIO-UI-7B-RLVR](?????(model_link)): a 7B model trained with Reinforcement Learning with Verifiable Rewards (RLVR) over the ActIO-UI-7B-SFT checkpoint.
|
|
|
|
| 139 |
</div>
|
| 140 |
|
| 141 |
|
|
|
|
|
|
|
| 142 |
## Other Benchmarks
|
| 143 |
|
| 144 |
To access generalizability of GUI subtask execution as a model capability, we compare the performance of ActIO-UI over GUI subtasks (WARC-Bench), long-horizon tasks (WebArena), short-horizon tasks (Miniwob++), and GUI visual grounding (ScreenSpot V2). Without access to any long-horizon and grounding data in its training dataset, our models show improved performance over their base models (except for the grounding performance when compared to Qwen 2.5 VL 72B).
|
|
|
|
| 163 |
</div>
|
| 164 |
|
| 165 |
|
|
|
|
| 166 |
|
| 167 |
+
# Usage
|
| 168 |
+
|
| 169 |
+
## Image Input Size
|
| 170 |
|
| 171 |
To maintain optimal model performance, each input image should be set at **1280 (pixel width) \\(\times\\) 720 (pixel height)**.
|
| 172 |
|
| 173 |
|
| 174 |
+
## Setup
|
| 175 |
|
| 176 |
+
To run the code snippets below, we recommend that you install everything in `requirements.txt` in a python environment.
|
| 177 |
```bash
|
| 178 |
python -m venv ./venv
|
| 179 |
source venv/bin/activate
|
|
|
|
| 181 |
```
|
| 182 |
|
| 183 |
|
| 184 |
+
## Sanity test
|
| 185 |
+
|
| 186 |
+
Note that this is only a sanity test for ensuring model is working properly.
|
| 187 |
+
For replicating the evaluation result or using the model for your own project, please refer to our code repository on [GitHub](?????(repository)).
|
| 188 |
+
|
| 189 |
+
The following code snippet is also available in the attached sanity.py
|
| 190 |
+
|
| 191 |
+
```{python}
|
| 192 |
+
import base64
|
| 193 |
+
import torch
|
| 194 |
+
from transformers import AutoTokenizer, AutoModel, AutoImageProcessor
|
| 195 |
+
from PIL import Image
|
| 196 |
+
|
| 197 |
+
def encode_image(image_path: str) -> str:
|
| 198 |
+
"""Encode image to base64 string for model input."""
|
| 199 |
+
with open(image_path, "rb") as f:
|
| 200 |
+
return base64.b64encode(f.read()).decode()
|
| 201 |
+
|
| 202 |
+
|
| 203 |
+
def load_model(
|
| 204 |
+
model_path: str,
|
| 205 |
+
) -> tuple[AutoModel, AutoTokenizer, AutoImageProcessor]:
|
| 206 |
+
"""Load OpenCUA model, tokenizer, and image processor."""
|
| 207 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
| 208 |
+
model = AutoModel.from_pretrained(
|
| 209 |
+
model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True
|
| 210 |
+
)
|
| 211 |
+
image_processor = AutoImageProcessor.from_pretrained(
|
| 212 |
+
model_path, trust_remote_code=True
|
| 213 |
+
)
|
| 214 |
+
|
| 215 |
+
return model, tokenizer, image_processor
|
| 216 |
+
|
| 217 |
+
|
| 218 |
+
def create_grounding_messages(image_path: str, instruction: str) -> list[dict]:
|
| 219 |
+
"""Create chat messages for GUI grounding task."""
|
| 220 |
+
system_prompt = (
|
| 221 |
+
"You are a GUI agent. You are given a task and a screenshot of the screen. "
|
| 222 |
+
"You need to perform a series of pyautogui actions to complete the task."
|
| 223 |
+
)
|
| 224 |
+
|
| 225 |
+
messages = [
|
| 226 |
+
{"role": "system", "content": system_prompt},
|
| 227 |
+
{
|
| 228 |
+
"role": "user",
|
| 229 |
+
"content": [
|
| 230 |
+
{
|
| 231 |
+
"type": "image",
|
| 232 |
+
"image": f"data:image/png;base64,{encode_image(image_path)}",
|
| 233 |
+
},
|
| 234 |
+
{"type": "text", "text": instruction},
|
| 235 |
+
],
|
| 236 |
+
},
|
| 237 |
+
]
|
| 238 |
+
return messages
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
def run_inference(
|
| 242 |
+
model: AutoModel,
|
| 243 |
+
tokenizer: AutoTokenizer,
|
| 244 |
+
image_processor: AutoImageProcessor,
|
| 245 |
+
messages: list[dict],
|
| 246 |
+
image_path: str,
|
| 247 |
+
) -> str:
|
| 248 |
+
"""Run inference on the model."""
|
| 249 |
+
# Prepare text input
|
| 250 |
+
input_ids = tokenizer.apply_chat_template(
|
| 251 |
+
messages, tokenize=True, add_generation_prompt=True
|
| 252 |
+
)
|
| 253 |
+
input_ids = torch.tensor([input_ids]).to(model.device)
|
| 254 |
+
|
| 255 |
+
# Prepare image input
|
| 256 |
+
image = Image.open(image_path).convert("RGB")
|
| 257 |
+
image_info = image_processor.preprocess(images=[image])
|
| 258 |
+
pixel_values = torch.tensor(image_info["pixel_values"]).to(
|
| 259 |
+
dtype=torch.bfloat16, device=model.device
|
| 260 |
+
)
|
| 261 |
+
grid_thws = torch.tensor(image_info["image_grid_thw"])
|
| 262 |
+
|
| 263 |
+
# Generate response
|
| 264 |
+
with torch.no_grad():
|
| 265 |
+
generated_ids = model.generate(
|
| 266 |
+
input_ids,
|
| 267 |
+
pixel_values=pixel_values,
|
| 268 |
+
grid_thws=grid_thws,
|
| 269 |
+
max_new_tokens=2048,
|
| 270 |
+
temperature=0,
|
| 271 |
+
)
|
| 272 |
+
|
| 273 |
+
# Decode output
|
| 274 |
+
prompt_len = input_ids.shape[1]
|
| 275 |
+
generated_ids = generated_ids[:, prompt_len:]
|
| 276 |
+
output_text = tokenizer.batch_decode(
|
| 277 |
+
generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
|
| 278 |
+
)[0]
|
| 279 |
+
|
| 280 |
+
return output_text
|
| 281 |
+
|
| 282 |
+
|
| 283 |
+
# Example usage
|
| 284 |
+
model_path = "Uniphore/actio-ui-7b-sft" # or other model variants
|
| 285 |
+
image_path = "screenshot.png"
|
| 286 |
+
instruction = "Click on the submit button"
|
| 287 |
+
|
| 288 |
+
# Load model
|
| 289 |
+
model, tokenizer, image_processor = load_model(model_path)
|
| 290 |
+
|
| 291 |
+
# Create messages and run inference
|
| 292 |
+
messages = create_grounding_messages(image_path, instruction)
|
| 293 |
+
result = run_inference(model, tokenizer, image_processor, messages, image_path)
|
| 294 |
+
|
| 295 |
+
print("Model output:", result)
|
| 296 |
```
|
| 297 |
|
| 298 |
|
|
|
|
| 302 |
## License
|
| 303 |
This project is licensed under the Open Model, Data, & Weights License Agreement (OpenMDW). See the LICENSE file in the root folder for details.
|
| 304 |
|
|
|
|
|
|
|
|
|
|
| 305 |
## Prohibited Uses
|
| 306 |
The model may not be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction.
|
| 307 |
Use for illegal, unethical, or harmful activities is strictly prohibited.
|
| 308 |
|
| 309 |
## Disclaimer
|
| 310 |
+
ActIO-UI are intended for research and educational purposes only.
|
| 311 |
+
|
| 312 |
The authors, contributors, and copyright holders are not responsible for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use.
|
| 313 |
Use of the name, logo, or trademarks of "ActIO", "ActIO-UI" "WARC-Bench", or "Uniphore" does not imply any endorsement or affiliation unless separate written permission is obtained.
|
| 314 |
Users are solely responsible for ensuring their use complies with applicable laws and regulations.
|
sample_code.py → sanity.py
RENAMED
|
File without changes
|