--- datasets: - liuganghuggingface/demodiff_downstream license: mit tags: - chemistry - biology pipeline_tag: graph-ml --- # DemoDiff: Graph Diffusion Transformers are In-Context Molecular Designers This repository contains the DemoDiff model, a diffusion-based molecular foundation model for **in-context inverse molecular design**, as presented in the paper [Graph Diffusion Transformers are In-Context Molecular Designers](https://huggingface.co/papers/2510.08744). DemoDiff leverages graph diffusion transformers to generate molecules based on contextual examples, enabling few-shot molecular design across diverse chemical tasks without task-specific fine-tuning. It introduces demonstration-conditioned diffusion models, which define task contexts using a small set of molecule-score examples instead of text descriptions to guide a denoising Transformer for molecule generation. A novel molecular tokenizer with Node Pair Encoding is developed for scalable pretraining, representing molecules at the motif level. Code: https://github.com/liugangcode/DemoDiff ## 🌟 Key Features - **In-Context Learning**: Generate molecules using only contextual examples (no fine-tuning required) - **Graph-Based Tokenization**: Novel molecular graph tokenization with BPE-style vocabulary - **Comprehensive Benchmarks**: 30+ downstream tasks covering drug discovery, docking, and polymer design ### Model Configuration | Parameter | Value | Description | |------------|--------|-------------| | **context_length** | 150 | Maximum sequence length for the input context. | | **depth** | 24 | Number of transformer layers. | | **diffusion_steps** | 500 | Number of diffusion steps during training. | | **hidden_size** | 1280 | Hidden dimension size in the transformer. | | **mlp_ratio** | 4 | Expansion ratio in the MLP block. | | **num_heads** | 16 | Number of attention heads. | | **task_name** | `pretrain` | Task type for model training. | | **tokenizer_name** | `pretrain` | Tokenizer used for model input. | | **vocab_ring_len** | 300 | Length of the circular vocabulary window. | | **vocab_size** | 3000 | Total vocabulary size. |