jadechoghari HF Staff commited on
Commit
80f7483
·
verified ·
1 Parent(s): d4864ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -59
README.md CHANGED
@@ -1,90 +1,166 @@
1
  ---
 
 
2
  library_name: lerobot
3
  pipeline_tag: robotics
 
 
 
 
 
4
  license: gemma
5
- language:
6
- - en
7
  ---
8
- # π0 fast
9
 
10
- π₀-FAST is a Vision-Language-Action model for general robot control that uses autoregressive next-token prediction to model continuous robot actions.
 
 
 
 
 
 
11
 
12
- It was proposed in [FAST: Efficient Action Tokenization for Vision-Language-Action Models](https://huggingface.co/papers/2501.09747).
13
 
14
- ## How to Get Started
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ```bash
17
  pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
 
 
18
  ```
19
 
 
 
20
  ```python
21
  import torch
 
22
  from lerobot.policies.factory import make_pre_post_processors
23
- import numpy as np
 
24
  from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
25
 
26
- model_id = "lerobot/pi0fast-base"
27
- model = PI0FastPolicy.from_pretrained(model_id)
 
 
 
28
 
29
- # select your device here
30
- device = torch.device("cuda")
31
  preprocess, postprocess = make_pre_post_processors(
32
- model.config,
33
  model_id,
34
  preprocessor_overrides={"device_processor": {"device": str(device)}},
35
  )
 
 
36
 
37
- IMAGE_HEIGHT = 224
38
- IMAGE_WIDTH = 224
39
- batch_size = 1
40
- prompt = "Pick up the red block and place it in the bin"
41
-
42
- # Create random RGB images in [0, 255] uint8 range (as PIL images would be)
43
- # Then convert to [0, 1] float32 range for LeRobot
44
- def fake_rgb(h, w):
45
- arr = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
46
- t = torch.from_numpy(arr).permute(2, 0, 1) # CHW
47
- return t
48
-
49
- DUMMY_STATE_DIM = 7
50
- batch = {
51
- f"observation.images.base_0_rgb": torch.stack(
52
- [fake_rgb(IMAGE_HEIGHT, IMAGE_WIDTH) for _ in range(batch_size)]
53
- ).to(device),
54
- f"observation.images.left_wrist_0_rgb": torch.stack(
55
- [fake_rgb(IMAGE_HEIGHT, IMAGE_WIDTH) for _ in range(batch_size)]
56
- ).to(device),
57
- f"observation.images.right_wrist_0_rgb": torch.stack(
58
- [fake_rgb(IMAGE_HEIGHT, IMAGE_WIDTH) for _ in range(batch_size)]
59
- ).to(device),
60
- "observation.state": torch.randn(batch_size, DUMMY_STATE_DIM, dtype=torch.float32, device=device),
61
- "task": [prompt for _ in range(batch_size)],
62
- }
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  batch = preprocess(batch)
65
- action = model.select_action(batch)
66
- # or if you're training, do:
67
- # loss, output_dict = policy.forward(batch)
68
- # loss.backward()
69
- action = postprocess(action)
70
- print(action)
71
  ```
72
 
73
- ## How to Train the Model
 
 
 
 
 
 
74
 
75
  ```bash
76
- python src/lerobot/scripts/lerobot_train.py \
77
- --dataset.repo_id=your_dataset \
78
- --policy.type=pi0_fast \
79
- --output_dir=./outputs/pi0fast_training \
80
- --job_name=pi0fast_training \
81
- --policy.pretrained_path=lerobot/pi0fast-base \
82
- --policy.dtype=bfloat16 \
83
- --policy.gradient_checkpointing=true \
84
- --policy.chunk_size=10 \
85
- --policy.n_action_steps=10 \
86
- --policy.max_action_tokens=256 \
87
- --steps=100000 \
88
- --batch_size=4 \
89
- --policy.device=cuda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ```
 
1
  ---
2
+ language:
3
+ - en
4
  library_name: lerobot
5
  pipeline_tag: robotics
6
+ tags:
7
+ - vision-language-action
8
+ - imitation-learning
9
+ - lerobot
10
+ inference: false
11
  license: gemma
 
 
12
  ---
 
13
 
14
+ # π0 fast (PI0Fast) (LeRobot)
15
+
16
+ **PI0Fast** is a Vision-Language-Action (VLA) policy that predicts continuous robot actions via **autoregressive next-token prediction** over **FAST action tokens**.
17
+
18
+ **Original authors / paper:** [FAST: Efficient Action Tokenization for Vision-Language-Action Models](https://arxiv.org/abs/2501.09747)
19
+ **Implementation:** This LeRobot implementation follows the original reference code for compatibility.
20
+ **Reference implementation:** [https://github.com/Physical-Intelligence/openpi]
21
 
 
22
 
23
+ ## Model description
24
+
25
+ - **Inputs:** images (multi-view), proprio/state, optional language instruction
26
+ - **Outputs:** continuous actions (decoded from model outputs)
27
+ - **Training objective:** next-token cross-entropy
28
+ - **Action representation:** FAST tokens
29
+ - **Intended use:** Fine tune on your task.
30
+
31
+
32
+ ## Quick start (inference on a real batch)
33
+
34
+ ### Installation
35
 
36
  ```bash
37
  pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
38
+
39
+ For full installation details (including optional video dependencies such as ffmpeg for torchcodec), see the official documentation: https://huggingface.co/docs/lerobot/installation
40
  ```
41
 
42
+ ### Load model + dataset, run `select_action`
43
+
44
  ```python
45
  import torch
46
+ from lerobot.datasets.lerobot_dataset import LeRobotDataset
47
  from lerobot.policies.factory import make_pre_post_processors
48
+
49
+ # Swap this import per-policy
50
  from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
51
 
52
+ # load a policy
53
+ model_id = "lerobot/pi0fast-libero" # <- swap checkpoint
54
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
55
+
56
+ policy = PI0FastPolicy.from_pretrained(model_id).to(device).eval()
57
 
 
 
58
  preprocess, postprocess = make_pre_post_processors(
59
+ policy.config,
60
  model_id,
61
  preprocessor_overrides={"device_processor": {"device": str(device)}},
62
  )
63
+ # load a lerobotdataset
64
+ dataset = LeRobotDataset("lerobot/libero")
65
 
66
+ # pick an episode
67
+ episode_index = 0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
+ # each episode corresponds to a contiguous range of frame indices
70
+ from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
71
+ to_idx = dataset.meta.episodes["dataset_to_index"][episode_index]
72
+
73
+ # get a single frame from that episode (e.g. the first frame)
74
+ frame_index = from_idx
75
+ frame = dict(dataset[frame_index])
76
+
77
+ batch = preprocess(frame)
78
+ with torch.inference_mode():
79
+ pred_action = policy.select_action(batch)
80
+ # use your policy postprocess, this post process the action
81
+ # for instance unnormalize the actions, detokenize it etc..
82
+ pred_action = postprocess(pred_action)
83
+ ```
84
+
85
+ ## Training step (loss + backward)
86
+
87
+ If you’re training / fine-tuning, you typically call `forward(...)` to get a loss and then:
88
+
89
+ ```python
90
+ policy.train()
91
+ batch = dict(dataset[0])
92
  batch = preprocess(batch)
93
+
94
+ loss, outputs = policy.forward(batch)
95
+ loss.backward()
96
+
 
 
97
  ```
98
 
99
+ > Notes:
100
+ >
101
+ > - Some policies expose `policy(**batch)` or return a dict; keep this snippet aligned with the policy API.
102
+ > - Use your trainer script (`lerobot-train`) for full training loops.
103
+
104
+
105
+ ## How to train / fine-tune
106
 
107
  ```bash
108
+ lerobot-train \
109
+ --dataset.repo_id=HuggingFaceVLA/libero \
110
+ --output_dir=./outputs/[RUN_NAME] \
111
+ --job_name=[RUN_NAME] \
112
+ --policy.repo_id=[THIS_REPO_OR_CHECKPOINT] \
113
+ --policy.path=lerobot/[BASE_CHECKPOINT] \
114
+ --policy.dtype=bfloat16 \
115
+ --policy.device=cuda \
116
+ --steps=100000 \
117
+ --batch_size=4
118
+ ```
119
+
120
+ Add policy-specific flags below:
121
+
122
+ - `-policy.chunk_size=...`
123
+ - `-policy.n_action_steps=...`
124
+ - `-policy.max_action_tokens=...`
125
+ - `-policy.gradient_checkpointing=true`
126
+
127
+ ---
128
+
129
+ ## Evaluate in Simulation (LIBERO)
130
+
131
+ You can evaluate the model in Libero environment.
132
+
133
+ ```bash
134
+ lerobot-eval \
135
+ --policy.path=lerobot/[CHECKPOINT_ID] \
136
+ --env.type=libero \
137
+ --env.task=libero_object \
138
+ --eval.batch_size=1 \
139
+ --eval.n_episodes=20
140
+ ```
141
+
142
+ ---
143
+
144
+ ## Real-World Inference & Evaluation
145
+
146
+ You can use the `record` script from [**`lerobot-record`**](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy.
147
+
148
+ For instance, run this command or API example to run inference and record 10 evaluation episodes:
149
+
150
+ Copied
151
+
152
+ ```
153
+ lerobot-record \
154
+ --robot.type=so100_follower \
155
+ --robot.port=/dev/ttyACM1 \
156
+ --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
157
+ --robot.id=my_awesome_follower_arm \
158
+ --display_data=false \
159
+ --dataset.repo_id=${HF_USER}/eval_so100 \
160
+ --dataset.single_task="Put lego brick into the transparent box" \
161
+ # <- Teleop optional if you want to teleoperate in between episodes \
162
+ # --teleop.type=so100_leader \
163
+ # --teleop.port=/dev/ttyACM0 \
164
+ # --teleop.id=my_awesome_leader_arm \
165
+ --policy.path=${HF_USER}/my_policy
166
  ```