Tomas0413's picture
Add Robotics tag and metadata (#2)
691448b verified
metadata
license: mit
pipeline_tag: other
tags:
  - smolvla
  - lerobot
  - robotics
  - vision-language-action
  - so100
  - screw-lid
  - manipulation
datasets:
  - Tomas0413/so100_screw_lid_v0
base_model:
  - lerobot/smolvla_base

SmolVLA SO100 Screw-Lid Model

A Vision-Language-Action (VLA) model fine-tuned on the SO100 Screw-Lid Dataset for robotic manipulation tasks.

Model Description

This model is a SmolVLA variant trained specifically on the SO100 screw-lid manipulation task. It learns to perform the complete sequence: picking up a jar, placing it on a silicone puck, seating the lid with a half-turn, and transporting the assembled jar to a goal location.

  • Developed by: Tomas0413
  • Model type: Vision-Language-Action (VLA)
  • Base architecture: SmolVLA
  • Training data: SO100 Screw-Lid Dataset (v0)
  • Task domain: Robotic manipulation (screw-lid assembly)

Training Details

Training Data

The model was trained on 51 teleoperated demonstrations from the SO100 Screw-Lid Dataset, featuring:

  • Dual camera views (wrist + overhead) at 1280×720 @ 30 FPS
  • 6-DOF joint positions, velocities, and gripper states
  • Synchronized action sequences for pick-place-assemble-transport tasks
  • Total of ~45k training frames

Training Procedure

Training regime: Fine-tuned from SmolVLA base model on SO100 screw-lid demonstrations

Intended Uses

Direct Use

  • Robotic manipulation: Deploy on SO100 or similar 6-DOF robotic arms for screw-lid assembly tasks
  • Research: Study vision-language-action learning for fine manipulation
  • Benchmarking: Evaluate VLA performance on multi-step manipulation sequences

Downstream Use

  • Transfer learning to related assembly tasks
  • Few-shot adaptation to different jar/lid combinations
  • Integration into larger robotic task planning systems

Limitations and Bias

  • Domain-specific: Trained only on screw-lid assembly with specific objects
  • Robot morphology: Optimized for SO100 arm kinematics and gripper
  • Environmental constraints: Single lighting condition, fixed camera positions
  • Limited generalization: May not transfer well to significantly different manipulation tasks

Usage

# Example usage with LeRobot
from lerobot.common.policies import load_policy

# Load the trained model
policy = load_policy("Tomas0413/so100_screw_lid_smolvla")

# Run inference on robot observations
action = policy.select_action(observation)

Training Dataset

This model was trained on the SO100 Screw-Lid Dataset (v0), which contains 51 teleoperated episodes of the complete screw-lid manipulation sequence recorded during the LeRobot Worldwide Hackathon (June 15-16, 2025).

Model Card Contact

Tomas0413