Add Robotics tag and metadata (#2)

691448b verified 8 months ago

2.86 kB

license: mit
pipeline_tag: other
tags:
  - smolvla
  - lerobot
  - robotics
  - vision-language-action
  - so100
  - screw-lid
  - manipulation
datasets:
  - Tomas0413/so100_screw_lid_v0
base_model:
  - lerobot/smolvla_base

SmolVLA SO100 Screw-Lid Model

A Vision-Language-Action (VLA) model fine-tuned on the SO100 Screw-Lid Dataset for robotic manipulation tasks.

Model Description

This model is a SmolVLA variant trained specifically on the SO100 screw-lid manipulation task. It learns to perform the complete sequence: picking up a jar, placing it on a silicone puck, seating the lid with a half-turn, and transporting the assembled jar to a goal location.

Developed by: Tomas0413
Model type: Vision-Language-Action (VLA)
Base architecture: SmolVLA
Training data: SO100 Screw-Lid Dataset (v0)
Task domain: Robotic manipulation (screw-lid assembly)

Training Details

Training Data

The model was trained on 51 teleoperated demonstrations from the SO100 Screw-Lid Dataset, featuring:

Dual camera views (wrist + overhead) at 1280×720 @ 30 FPS
6-DOF joint positions, velocities, and gripper states
Synchronized action sequences for pick-place-assemble-transport tasks
Total of ~45k training frames

Training Procedure

Training regime: Fine-tuned from SmolVLA base model on SO100 screw-lid demonstrations

Intended Uses

Direct Use

Robotic manipulation: Deploy on SO100 or similar 6-DOF robotic arms for screw-lid assembly tasks
Research: Study vision-language-action learning for fine manipulation
Benchmarking: Evaluate VLA performance on multi-step manipulation sequences

Downstream Use

Transfer learning to related assembly tasks
Few-shot adaptation to different jar/lid combinations
Integration into larger robotic task planning systems

Limitations and Bias

Domain-specific: Trained only on screw-lid assembly with specific objects
Robot morphology: Optimized for SO100 arm kinematics and gripper
Environmental constraints: Single lighting condition, fixed camera positions
Limited generalization: May not transfer well to significantly different manipulation tasks

Usage

# Example usage with LeRobot
from lerobot.common.policies import load_policy

# Load the trained model
policy = load_policy("Tomas0413/so100_screw_lid_smolvla")

# Run inference on robot observations
action = policy.select_action(observation)

Training Dataset

This model was trained on the SO100 Screw-Lid Dataset (v0), which contains 51 teleoperated episodes of the complete screw-lid manipulation sequence recorded during the LeRobot Worldwide Hackathon (June 15-16, 2025).

Model Card Contact

Tomas0413