Add README
Browse files
epoch-01-step-000551/README.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- openvla
|
| 6 |
+
- vision-language-action
|
| 7 |
+
- robotics
|
| 8 |
+
- libero
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# openvla-libero-spatial-checkpoints
|
| 12 |
+
|
| 13 |
+
OpenVLA模型在LIBERO-Spatial数据集上fine-tuned的checkpoint。
|
| 14 |
+
|
| 15 |
+
## 模型信息
|
| 16 |
+
|
| 17 |
+
- **Checkpoint**: epoch-01-step-000551
|
| 18 |
+
- **Base Model**: OpenVLA (Prismatic + DinoSigLIP-224px)
|
| 19 |
+
- **Training Dataset**: LIBERO-Spatial (no noops)
|
| 20 |
+
- **Framework**: Transformers
|
| 21 |
+
|
| 22 |
+
## 使用方法
|
| 23 |
+
|
| 24 |
+
```python
|
| 25 |
+
from transformers import AutoModelForVision2Seq, AutoProcessor
|
| 26 |
+
import torch
|
| 27 |
+
|
| 28 |
+
# 加载模型
|
| 29 |
+
model = AutoModelForVision2Seq.from_pretrained(
|
| 30 |
+
"yihannwang/openvla-libero-spatial-checkpoints",
|
| 31 |
+
trust_remote_code=True,
|
| 32 |
+
torch_dtype=torch.bfloat16
|
| 33 |
+
).to("cuda")
|
| 34 |
+
|
| 35 |
+
# 加载processor
|
| 36 |
+
processor = AutoProcessor.from_pretrained(
|
| 37 |
+
"yihannwang/openvla-libero-spatial-checkpoints",
|
| 38 |
+
trust_remote_code=True
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
# 预测动作
|
| 42 |
+
from PIL import Image
|
| 43 |
+
|
| 44 |
+
image = Image.open("observation.jpg")
|
| 45 |
+
prompt = "In: What action should the robot take to pick up the object?\nOut:"
|
| 46 |
+
inputs = processor(prompt, image).to("cuda", dtype=torch.bfloat16)
|
| 47 |
+
|
| 48 |
+
action = model.predict_action(**inputs, unnorm_key="libero_spatial_no_noops", do_sample=False)
|
| 49 |
+
print(action) # 7-DoF action vector
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## 评估
|
| 53 |
+
|
| 54 |
+
在LIBERO-Spatial任务上进行评估:
|
| 55 |
+
|
| 56 |
+
```bash
|
| 57 |
+
python experiments/robot/libero/run_libero_eval.py \
|
| 58 |
+
--model_family openvla \
|
| 59 |
+
--pretrained_checkpoint yihannwang/openvla-libero-spatial-checkpoints \
|
| 60 |
+
--task_suite_name libero_spatial_no_noops \
|
| 61 |
+
--center_crop False \
|
| 62 |
+
--num_trials_per_task 50
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
## Citation
|
| 66 |
+
|
| 67 |
+
```bibtex
|
| 68 |
+
@article{kim2024openvla,
|
| 69 |
+
title={OpenVLA: An Open-Source Vision-Language-Action Model},
|
| 70 |
+
author={Kim, Moo Jin and Pertsch, Karl and Karamcheti, Siddharth and Xiao, Ted and Balakrishna, Ashwin and Nair, Suraj and Rafailov, Rafael and Foster, Ethan and Lam, Grace and Sanketi, Pannag and Nasiriany, Soroush and Liang, Zheyuan and Sadigh, Dorsa and Levine, Sergey and Liang, Percy},
|
| 71 |
+
journal={arXiv preprint arXiv:2406.09246},
|
| 72 |
+
year={2024}
|
| 73 |
+
}
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## License
|
| 77 |
+
|
| 78 |
+
MIT License
|