lerobot
/

pi0fast-base

@@ -1,90 +1,166 @@
 ---
 library_name: lerobot
 pipeline_tag: robotics
 license: gemma
-language:
-- en
 ---
-# π0 fast
-π₀-FAST is a Vision-Language-Action model for general robot control that uses autoregressive next-token prediction to model continuous robot actions.
-It was proposed in [FAST: Efficient Action Tokenization for Vision-Language-Action Models](https://huggingface.co/papers/2501.09747).
-## How to Get Started
 ```bash
 pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
 ```
 ```python
 import torch
 from lerobot.policies.factory import make_pre_post_processors
-import numpy as np
 from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
-model_id = "lerobot/pi0fast-base"
-model = PI0FastPolicy.from_pretrained(model_id)
-# select your device here
-device = torch.device("cuda")
 preprocess, postprocess = make_pre_post_processors(
-    model.config,
     model_id,
     preprocessor_overrides={"device_processor": {"device": str(device)}},
 )
-IMAGE_HEIGHT = 224
-IMAGE_WIDTH = 224
-batch_size = 1
-prompt = "Pick up the red block and place it in the bin"
-# Create random RGB images in [0, 255] uint8 range (as PIL images would be)
-# Then convert to [0, 1] float32 range for LeRobot
-def fake_rgb(h, w):
-    arr = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
-    t = torch.from_numpy(arr).permute(2, 0, 1)  # CHW
-    return t
-DUMMY_STATE_DIM = 7
-batch = {
-    f"observation.images.base_0_rgb": torch.stack(
-        [fake_rgb(IMAGE_HEIGHT, IMAGE_WIDTH) for _ in range(batch_size)]
-    ).to(device),
-    f"observation.images.left_wrist_0_rgb": torch.stack(
-        [fake_rgb(IMAGE_HEIGHT, IMAGE_WIDTH) for _ in range(batch_size)]
-    ).to(device),
-    f"observation.images.right_wrist_0_rgb": torch.stack(
-        [fake_rgb(IMAGE_HEIGHT, IMAGE_WIDTH) for _ in range(batch_size)]
-    ).to(device),
-    "observation.state": torch.randn(batch_size, DUMMY_STATE_DIM, dtype=torch.float32, device=device),
-    "task": [prompt for _ in range(batch_size)],
-}
 batch = preprocess(batch)
-action = model.select_action(batch)
-# or if you're training, do:
-# loss, output_dict = policy.forward(batch)
-# loss.backward()
-action = postprocess(action)
-print(action)
 ```
-## How to Train the Model
 ```bash
-python src/lerobot/scripts/lerobot_train.py \
-    --dataset.repo_id=your_dataset \
-    --policy.type=pi0_fast \
-    --output_dir=./outputs/pi0fast_training \
-    --job_name=pi0fast_training \
-    --policy.pretrained_path=lerobot/pi0fast-base \
-    --policy.dtype=bfloat16 \
-    --policy.gradient_checkpointing=true \
-    --policy.chunk_size=10 \
-    --policy.n_action_steps=10 \
-    --policy.max_action_tokens=256 \
-    --steps=100000 \
-    --batch_size=4 \
-    --policy.device=cuda
 ```

 ---
+language:
+- en
 library_name: lerobot
 pipeline_tag: robotics
+tags:
+- vision-language-action
+- imitation-learning
+- lerobot
+inference: false
 license: gemma
 ---
+# π0 fast (PI0Fast) (LeRobot)
+**PI0Fast** is a Vision-Language-Action (VLA) policy that predicts continuous robot actions via **autoregressive next-token prediction** over **FAST action tokens**.
+**Original authors / paper:** [FAST: Efficient Action Tokenization for Vision-Language-Action Models](https://arxiv.org/abs/2501.09747)
+**Implementation:** This LeRobot implementation follows the original reference code for compatibility.
+**Reference implementation:** [https://github.com/Physical-Intelligence/openpi]
+## Model description
+- **Inputs:** images (multi-view), proprio/state, optional language instruction
+- **Outputs:** continuous actions (decoded from model outputs)
+- **Training objective:** next-token cross-entropy
+- **Action representation:** FAST tokens
+- **Intended use:** Fine tune on your task.
+## Quick start (inference on a real batch)
+### Installation
 ```bash
 pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
+For full installation details (including optional video dependencies such as ffmpeg for torchcodec), see the official documentation: https://huggingface.co/docs/lerobot/installation
 ```
+### Load model + dataset, run `select_action`
 ```python
 import torch
+from lerobot.datasets.lerobot_dataset import LeRobotDataset
 from lerobot.policies.factory import make_pre_post_processors
+# Swap this import per-policy
 from lerobot.policies.pi0_fast.modeling_pi0_fast import PI0FastPolicy
+# load a policy
+model_id = "lerobot/pi0fast-libero"  # <- swap checkpoint
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+policy = PI0FastPolicy.from_pretrained(model_id).to(device).eval()
 preprocess, postprocess = make_pre_post_processors(
+    policy.config,
     model_id,
     preprocessor_overrides={"device_processor": {"device": str(device)}},
 )
+# load a lerobotdataset
+dataset = LeRobotDataset("lerobot/libero")
+# pick an episode
+episode_index = 0
+# each episode corresponds to a contiguous range of frame indices
+from_idx = dataset.meta.episodes["dataset_from_index"][episode_index]
+to_idx   = dataset.meta.episodes["dataset_to_index"][episode_index]
+# get a single frame from that episode (e.g. the first frame)
+frame_index = from_idx
+frame = dict(dataset[frame_index])
+batch = preprocess(frame)
+with torch.inference_mode():
+    pred_action = policy.select_action(batch)
+    # use your policy postprocess, this post process the action
+    # for instance unnormalize the actions, detokenize it etc..
+    pred_action = postprocess(pred_action)
+```
+## Training step (loss + backward)
+If you’re training / fine-tuning, you typically call `forward(...)` to get a loss and then:
+```python
+policy.train()
+batch = dict(dataset[0])
 batch = preprocess(batch)
+loss, outputs = policy.forward(batch)
+loss.backward()
 ```
+> Notes:
+>
+> - Some policies expose `policy(**batch)` or return a dict; keep this snippet aligned with the policy API.
+> - Use your trainer script (`lerobot-train`) for full training loops.
+## How to train / fine-tune
 ```bash
+lerobot-train \
+  --dataset.repo_id=HuggingFaceVLA/libero \
+  --output_dir=./outputs/[RUN_NAME] \
+  --job_name=[RUN_NAME] \
+  --policy.repo_id=[THIS_REPO_OR_CHECKPOINT] \
+  --policy.path=lerobot/[BASE_CHECKPOINT] \
+  --policy.dtype=bfloat16 \
+  --policy.device=cuda \
+  --steps=100000 \
+  --batch_size=4
+```
+Add policy-specific flags below:
+- `-policy.chunk_size=...`
+- `-policy.n_action_steps=...`
+- `-policy.max_action_tokens=...`
+- `-policy.gradient_checkpointing=true`
+---
+## Evaluate in Simulation (LIBERO)
+You can evaluate the model in Libero environment.
+```bash
+lerobot-eval \
+  --policy.path=lerobot/[CHECKPOINT_ID] \
+  --env.type=libero \
+  --env.task=libero_object \
+  --eval.batch_size=1 \
+  --eval.n_episodes=20
+```
+---
+## Real-World Inference & Evaluation
+You can use the `record` script from [**`lerobot-record`**](https://github.com/huggingface/lerobot/blob/main/src/lerobot/scripts/lerobot_record.py) with a policy checkpoint as input, to run inference and evaluate your policy.
+For instance, run this command or API example to run inference and record 10 evaluation episodes:
+Copied
+```
+lerobot-record  \
+  --robot.type=so100_follower \
+  --robot.port=/dev/ttyACM1 \
+  --robot.cameras="{ up: {type: opencv, index_or_path: /dev/video10, width: 640, height: 480, fps: 30}, side: {type: intelrealsense, serial_number_or_name: 233522074606, width: 640, height: 480, fps: 30}}" \
+  --robot.id=my_awesome_follower_arm \
+  --display_data=false \
+  --dataset.repo_id=${HF_USER}/eval_so100 \
+  --dataset.single_task="Put lego brick into the transparent box" \
+  # <- Teleop optional if you want to teleoperate in between episodes \
+  # --teleop.type=so100_leader \
+  # --teleop.port=/dev/ttyACM0 \
+  # --teleop.id=my_awesome_leader_arm \
+  --policy.path=${HF_USER}/my_policy
 ```