readme edit

Files changed (3) hide show

LICENSE +49 -0
README.md +122 -36
sample_code.py → sanity.py +0 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,49 @@

+OpenMDW License Agreement, version 1.0 (OpenMDW-1.0)
+By exercising rights granted to you under this agreement, you accept and agree
+to its terms.
+As used in this agreement, "Model Materials" means the materials provided to
+you under this agreement, consisting of: (1) one or more machine learning
+models (including architecture and parameters); and (2) all related artifacts
+(including associated data, documentation and software) that are provided to
+you hereunder.
+Subject to your compliance with this agreement, permission is hereby granted,
+free of charge, to deal in the Model Materials without restriction, including
+under all copyright, patent, database, and trade secret rights included or
+embodied therein.
+If you distribute any portion of the Model Materials, you shall retain in your
+distribution (1) a copy of this agreement, and (2) all copyright notices and
+other notices of origin included in the Model Materials that are applicable to
+your distribution.
+If you file, maintain, or voluntarily participate in a lawsuit against any
+person or entity asserting that the Model Materials directly or indirectly
+infringe any patent, then all rights and grants made to you hereunder are
+terminated, unless that lawsuit was in response to a corresponding lawsuit
+first brought against you.
+This agreement does not impose any restrictions or obligations with respect to
+any use, modification, or sharing of any outputs generated by using the Model
+Materials.
+THE MODEL MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE, TITLE, NONINFRINGEMENT, ACCURACY, OR THE
+ABSENCE OF LATENT OR OTHER DEFECTS OR ERRORS, WHETHER OR NOT DISCOVERABLE, ALL
+TO THE GREATEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW.
+YOU ARE SOLELY RESPONSIBLE FOR (1) CLEARING RIGHTS OF OTHER PERSONS THAT MAY
+APPLY TO THE MODEL MATERIALS OR ANY USE THEREOF, INCLUDING WITHOUT LIMITATION
+ANY PERSON'S COPYRIGHTS OR OTHER RIGHTS INCLUDED OR EMBODIED IN THE MODEL
+MATERIALS; (2) OBTAINING ANY NECESSARY CONSENTS, PERMISSIONS OR OTHER RIGHTS
+REQUIRED FOR ANY USE OF THE MODEL MATERIALS; OR (3) PERFORMING ANY DUE
+DILIGENCE OR UNDERTAKING ANY OTHER INVESTIGATIONS INTO THE MODEL MATERIALS OR
+ANYTHING INCORPORATED OR EMBODIED THEREIN.
+IN NO EVENT SHALL THE PROVIDERS OF THE MODEL MATERIALS BE LIABLE FOR ANY CLAIM,
+DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL MATERIALS, THE
+USE THEREOF OR OTHER DEALINGS THEREIN.

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ library_name: transformers
      text-decoration:none;
      font-weight:600;
      font-size:16px;">
-    🌐 Website
   </a>
   <a href="https://arxiv.org/abs/2510.09872" style="
      display:inline-block;
@@ -65,7 +65,7 @@ library_name: transformers
      text-decoration:none;
      font-weight:600;
      font-size:16px;">
-    💻 Code
   </a>
 </div>
@@ -96,7 +96,7 @@ ActIO-UI is developed by [Orby AI](https://www.orby.ai/), a [Uniphore](https://w
-# Models
 - [ActIO-UI-7B-SFT](https://huggingface.co/Uniphore/actio-ui-7b-sft): a 7B model trained with supervised finetuning (SFT) using distilled subtask data.
 - [ActIO-UI-7B-RLVR](?????(model_link)): a 7B model trained with Reinforcement Learning with Verifiable Rewards (RLVR) over the ActIO-UI-7B-SFT checkpoint.
@@ -139,8 +139,6 @@ ActIO-UI models are specifically trained to solve GUI subtask problems. Both the
 </div>
 ## Other Benchmarks
 To access generalizability of GUI subtask execution as a model capability, we compare the performance of ActIO-UI over GUI subtasks (WARC-Bench), long-horizon tasks (WebArena), short-horizon tasks (Miniwob++), and GUI visual grounding (ScreenSpot V2). Without access to any long-horizon and grounding data in its training dataset, our models show improved performance over their base models (except for the grounding performance when compared to Qwen 2.5 VL 72B).
@@ -165,16 +163,17 @@ To access generalizability of GUI subtask execution as a model capability, we co
 </div>
-## Usage
-### Image Input Size
 To maintain optimal model performance, each input image should be set at **1280 (pixel width) \\(\times\\) 720 (pixel height)**.
-### Setup
-To run all the code snippets below, we recommend that you install everything in `requirements.txt` in a python environment.
 ```bash
 python -m venv ./venv
 source venv/bin/activate
@@ -182,30 +181,118 @@ pip install -r requirements.txt
 ```
-### Quick start
-You can use [vLLM](https://docs.vllm.ai/en/latest/index.html) to serve the model.
-```bash
-vllm serve Uniphore/actio-ui-7b-sft
-```
-Then you can use the `demo.py` we provide to check out a sample response of the model with the training prompt.
-```
-python demo.py
-```
-### Sample Code
-(Peng 10/06)
-- setup code
-- quickly run example (5-10 code line).
-- important results / hierachical results.
-```
-?????(sample code)
 ```
@@ -215,14 +302,13 @@ python demo.py
 ##  License
 This project is licensed under the Open Model, Data, & Weights License Agreement (OpenMDW). See the LICENSE file in the root folder for details.
-## Research Use and Disclaimer
-ActIO-UI are intended for research and educational purposes only.
 ## Prohibited Uses
 The model may not be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction.
 Use for illegal, unethical, or harmful activities is strictly prohibited.
 ## Disclaimer
 The authors, contributors, and copyright holders are not responsible for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use.
 Use of the name, logo, or trademarks of "ActIO", "ActIO-UI" "WARC-Bench", or "Uniphore" does not imply any endorsement or affiliation unless separate written permission is obtained.
 Users are solely responsible for ensuring their use complies with applicable laws and regulations.

      text-decoration:none;
      font-weight:600;
      font-size:16px;">
+    🌐 Website (Coming Soon!)
   </a>
   <a href="https://arxiv.org/abs/2510.09872" style="
      display:inline-block;
      text-decoration:none;
      font-weight:600;
      font-size:16px;">
+    💻 Code (Coming Soon!)
   </a>
 </div>
+# Model Family
 - [ActIO-UI-7B-SFT](https://huggingface.co/Uniphore/actio-ui-7b-sft): a 7B model trained with supervised finetuning (SFT) using distilled subtask data.
 - [ActIO-UI-7B-RLVR](?????(model_link)): a 7B model trained with Reinforcement Learning with Verifiable Rewards (RLVR) over the ActIO-UI-7B-SFT checkpoint.
 </div>
 ## Other Benchmarks
 To access generalizability of GUI subtask execution as a model capability, we compare the performance of ActIO-UI over GUI subtasks (WARC-Bench), long-horizon tasks (WebArena), short-horizon tasks (Miniwob++), and GUI visual grounding (ScreenSpot V2). Without access to any long-horizon and grounding data in its training dataset, our models show improved performance over their base models (except for the grounding performance when compared to Qwen 2.5 VL 72B).
 </div>
+# Usage
+## Image Input Size
 To maintain optimal model performance, each input image should be set at **1280 (pixel width) \\(\times\\) 720 (pixel height)**.
+## Setup
+To run the code snippets below, we recommend that you install everything in `requirements.txt` in a python environment.
 ```bash
 python -m venv ./venv
 source venv/bin/activate
 ```
+## Sanity test
+Note that this is only a sanity test for ensuring model is working properly.
+For replicating the evaluation result or using the model for your own project, please refer to our code repository on [GitHub](?????(repository)).
+The following code snippet is also available in the attached sanity.py
+```{python}
+import base64
+import torch
+from transformers import AutoTokenizer, AutoModel, AutoImageProcessor
+from PIL import Image
+def encode_image(image_path: str) -> str:
+    """Encode image to base64 string for model input."""
+    with open(image_path, "rb") as f:
+        return base64.b64encode(f.read()).decode()
+def load_model(
+    model_path: str,
+) -> tuple[AutoModel, AutoTokenizer, AutoImageProcessor]:
+    """Load OpenCUA model, tokenizer, and image processor."""
+    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+    model = AutoModel.from_pretrained(
+        model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True
+    )
+    image_processor = AutoImageProcessor.from_pretrained(
+        model_path, trust_remote_code=True
+    )
+    return model, tokenizer, image_processor
+def create_grounding_messages(image_path: str, instruction: str) -> list[dict]:
+    """Create chat messages for GUI grounding task."""
+    system_prompt = (
+        "You are a GUI agent. You are given a task and a screenshot of the screen. "
+        "You need to perform a series of pyautogui actions to complete the task."
+    )
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "image",
+                    "image": f"data:image/png;base64,{encode_image(image_path)}",
+                },
+                {"type": "text", "text": instruction},
+            ],
+        },
+    ]
+    return messages
+def run_inference(
+    model: AutoModel,
+    tokenizer: AutoTokenizer,
+    image_processor: AutoImageProcessor,
+    messages: list[dict],
+    image_path: str,
+) -> str:
+    """Run inference on the model."""
+    # Prepare text input
+    input_ids = tokenizer.apply_chat_template(
+        messages, tokenize=True, add_generation_prompt=True
+    )
+    input_ids = torch.tensor([input_ids]).to(model.device)
+    # Prepare image input
+    image = Image.open(image_path).convert("RGB")
+    image_info = image_processor.preprocess(images=[image])
+    pixel_values = torch.tensor(image_info["pixel_values"]).to(
+        dtype=torch.bfloat16, device=model.device
+    )
+    grid_thws = torch.tensor(image_info["image_grid_thw"])
+    # Generate response
+    with torch.no_grad():
+        generated_ids = model.generate(
+            input_ids,
+            pixel_values=pixel_values,
+            grid_thws=grid_thws,
+            max_new_tokens=2048,
+            temperature=0,
+        )
+    # Decode output
+    prompt_len = input_ids.shape[1]
+    generated_ids = generated_ids[:, prompt_len:]
+    output_text = tokenizer.batch_decode(
+        generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
+    )[0]
+    return output_text
+# Example usage
+model_path = "Uniphore/actio-ui-7b-sft"  # or other model variants
+image_path = "screenshot.png"
+instruction = "Click on the submit button"
+# Load model
+model, tokenizer, image_processor = load_model(model_path)
+# Create messages and run inference
+messages = create_grounding_messages(image_path, instruction)
+result = run_inference(model, tokenizer, image_processor, messages, image_path)
+print("Model output:", result)
 ```
 ##  License
 This project is licensed under the Open Model, Data, & Weights License Agreement (OpenMDW). See the LICENSE file in the root folder for details.
 ## Prohibited Uses
 The model may not be used for any purpose or activity that violates applicable laws or regulations in any jurisdiction.
 Use for illegal, unethical, or harmful activities is strictly prohibited.
 ## Disclaimer
+ActIO-UI are intended for research and educational purposes only.
 The authors, contributors, and copyright holders are not responsible for any illegal, unethical, or harmful use of the Software, nor for any direct or indirect damages resulting from such use.
 Use of the name, logo, or trademarks of "ActIO", "ActIO-UI" "WARC-Bench", or "Uniphore" does not imply any endorsement or affiliation unless separate written permission is obtained.
 Users are solely responsible for ensuring their use complies with applicable laws and regulations.

sample_code.py → sanity.py RENAMED Viewed

File without changes