yihannwang commited on
Commit
edd6a50
·
verified ·
1 Parent(s): 573bf63

Add README

Browse files
Files changed (1) hide show
  1. epoch-01-step-000551/README.md +78 -0
epoch-01-step-000551/README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ tags:
5
+ - openvla
6
+ - vision-language-action
7
+ - robotics
8
+ - libero
9
+ ---
10
+
11
+ # openvla-libero-spatial-checkpoints
12
+
13
+ OpenVLA模型在LIBERO-Spatial数据集上fine-tuned的checkpoint。
14
+
15
+ ## 模型信息
16
+
17
+ - **Checkpoint**: epoch-01-step-000551
18
+ - **Base Model**: OpenVLA (Prismatic + DinoSigLIP-224px)
19
+ - **Training Dataset**: LIBERO-Spatial (no noops)
20
+ - **Framework**: Transformers
21
+
22
+ ## 使用方法
23
+
24
+ ```python
25
+ from transformers import AutoModelForVision2Seq, AutoProcessor
26
+ import torch
27
+
28
+ # 加载模型
29
+ model = AutoModelForVision2Seq.from_pretrained(
30
+ "yihannwang/openvla-libero-spatial-checkpoints",
31
+ trust_remote_code=True,
32
+ torch_dtype=torch.bfloat16
33
+ ).to("cuda")
34
+
35
+ # 加载processor
36
+ processor = AutoProcessor.from_pretrained(
37
+ "yihannwang/openvla-libero-spatial-checkpoints",
38
+ trust_remote_code=True
39
+ )
40
+
41
+ # 预测动作
42
+ from PIL import Image
43
+
44
+ image = Image.open("observation.jpg")
45
+ prompt = "In: What action should the robot take to pick up the object?\nOut:"
46
+ inputs = processor(prompt, image).to("cuda", dtype=torch.bfloat16)
47
+
48
+ action = model.predict_action(**inputs, unnorm_key="libero_spatial_no_noops", do_sample=False)
49
+ print(action) # 7-DoF action vector
50
+ ```
51
+
52
+ ## 评估
53
+
54
+ 在LIBERO-Spatial任务上进行评估:
55
+
56
+ ```bash
57
+ python experiments/robot/libero/run_libero_eval.py \
58
+ --model_family openvla \
59
+ --pretrained_checkpoint yihannwang/openvla-libero-spatial-checkpoints \
60
+ --task_suite_name libero_spatial_no_noops \
61
+ --center_crop False \
62
+ --num_trials_per_task 50
63
+ ```
64
+
65
+ ## Citation
66
+
67
+ ```bibtex
68
+ @article{kim2024openvla,
69
+ title={OpenVLA: An Open-Source Vision-Language-Action Model},
70
+ author={Kim, Moo Jin and Pertsch, Karl and Karamcheti, Siddharth and Xiao, Ted and Balakrishna, Ashwin and Nair, Suraj and Rafailov, Rafael and Foster, Ethan and Lam, Grace and Sanketi, Pannag and Nasiriany, Soroush and Liang, Zheyuan and Sadigh, Dorsa and Levine, Sergey and Liang, Percy},
71
+ journal={arXiv preprint arXiv:2406.09246},
72
+ year={2024}
73
+ }
74
+ ```
75
+
76
+ ## License
77
+
78
+ MIT License