Robotics
SFHand / README.md
MickeyLLG's picture
Improve model card (#1)
1c5f067
metadata
license: mit
pipeline_tag: robotics

SFHand – Official Checkpoint

This repository provides the official pretrained checkpoint for SFHand, a streaming framework for language-guided 3D hand forecasting and embodied manipulation, as introduced in the paper SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting.


πŸ”— Project Links


πŸ“ Introduction

SFHand is the first streaming architecture for language-guided 3D hand forecasting. It autoregressively predicts future hand dynamics from continuous egocentric video and text instructions, outputting hand type, 2D bounding boxes, 3D poses, and 3D trajectories.

Key features include:

  • Streaming Framework: Autoregressive multi-modal hand forecasting.
  • ROI-Enhanced Memory: Captures temporal hand awareness while focusing on salient regions.
  • Embodied Ready: Representations transfer effectively to downstream manipulation tasks.

πŸš€ Evaluation and Visualization

To evaluate the model and generate visualizations using this checkpoint, you can run the following command from the official repository:

python main.py --config_file configs/config/clip_base_eval.yml --eval --vis

Output visualizations will be saved to the ./render_results/ directory.


πŸ“š Citation

If you use this model or find SFHand helpful in your research, please cite:

@article{liu2025sfhand,
  title={SFHand: A Streaming Framework for Language-guided 3D Hand Forecasting and Embodied Manipulation},
  author={Liu, Ruicong and Huang, Yifei and Ouyang, Liangyang and Kang, Caixin and Sato, Yoichi},
  journal={arXiv preprint arXiv:2511.18127},
  year={2025}
}