Text-to-3D
Transformers
Safetensors
English
File size: 3,552 Bytes
0f61e0b
 
b19550b
b5181e5
b19550b
 
f13fc8c
de2a927
57c08e2
0f61e0b
 
b5181e5
0f61e0b
fe4e9b9
301a65b
0f61e0b
 
 
 
 
d52a970
fcbf0c1
d52a970
 
fcbf0c1
aaddae8
b5181e5
0f61e0b
ce093ff
0f61e0b
b5181e5
fe4e9b9
b5181e5
0f61e0b
301a65b
0f61e0b
f0636cc
0f61e0b
 
 
b5181e5
0f61e0b
 
 
 
 
b5181e5
0f61e0b
 
 
aaddae8
0f61e0b
 
 
aaddae8
 
 
 
 
 
 
 
0f61e0b
 
 
ce093ff
0f61e0b
 
 
68a6b49
 
0f61e0b
68a6b49
0f61e0b
fcbf0c1
0f61e0b
fcbf0c1
b5181e5
fe4e9b9
fcbf0c1
 
 
 
 
0f61e0b
 
 
68a6b49
 
0f61e0b
 
b19550b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
base_model: meta-llama/Llama-3.2-1B-Instruct
datasets:
- AvaLovelace/StableText2Brick
language:
- en
library_name: transformers
license: mit
pipeline_tag: text-to-3d
---

# Model Card for BrickGPT

These are the model weights for BrickGPT, the first approach for generating physically stable toy brick models from text prompts, as described in [Generating Physically Stable and Buildable Brick Structures from Text](https://huggingface.co/papers/2505.05469).
This model was fine-tuned from [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct).

## Model Details

### Model Description

- **Developed by:** [Carnegie Mellon University Generative Intelligence Lab](https://www.cs.cmu.edu/~generative-intelligence-lab/)
- **Funded by:** This work is partly supported by the Packard Foundation, Cisco Research Grant, and Amazon Faculty Award. This work is also in part supported by the Manufacturing Futures Institute, Carnegie Mellon University, through a grant from the Richard King Mellon Foundation. KD is supported by the Microsoft Research PhD Fellowship.
- **Model type:** Autoregressive
- **Language(s):** English
- **License:** MIT
- **Finetuned from model:** [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
- **Project page:** https://avalovelace1.github.io/BrickGPT/

### Model Sources

- **Repository:** [AvaLovelace1/BrickGPT](https://github.com/AvaLovelace1/BrickGPT)
- **Paper:** [Generating Physically Stable and Buildable Brick Structures from Text](https://huggingface.co/papers/2505.05469)
- **Demo:** [cmu-gil/BrickGPT-Demo](https://huggingface.co/spaces/cmu-gil/BrickGPT-Demo)

## Limitations

The model is restricted to creating structures made of 1-unit-tall cuboid bricks on a 20x20x20 grid. It was trained on a dataset of 21 object categories: *basket, bed, bench, birdhouse, bookshelf, bottle, bowl, bus, camera, car, chair, guitar, jar, mug, piano, pot, sofa, table, tower, train, vessel*. Performance on prompts from outside these categories may be limited.

## How to Get Started with the Model

See the [GitHub repo](https://github.com/AvaLovelace1/BrickGPT) for usage examples and an interactive CLI demo.

## Training Details

### Training Data

BrickGPT was trained using [StableText2Brick](https://huggingface.co/datasets/AvaLovelace/StableText2Brick), a dataset of 47k toy brick structures.

### Training Procedure

The model was fine-tuned using LoRA applied to the `q_proj` and `v_proj` matrices. We used AdamW optimization. The learning rate followed a cosine decay with warmup.

#### Training Hyperparameters

- **Training regime:** bf16 mixed precision
- **Epochs:** 3
- **Global batch size:** 64
- **Max learning rate:** 0.002
- **Learning rate warmup steps:** 100
- **LoRA rank:** 32
- **LoRA alpha:** 16
- **LoRA dropout:** 0.05

## Evaluation

See the [paper](https://huggingface.co/papers/2505.05469) for detailed evaluations.

## Environmental Impact

- **Hardware Type:** 8x NVIDIA RTX A6000 (48 GB)
- **Hours used:** 0.5

## Citation

If you find this model useful for your research, please cite the following work.

```bibtex
@article{pun2025brickgpt,
    title   = {Generating Physically Stable and Buildable Brick Structures from Text},
    author  = {Pun, Ava and Deng, Kangle and Liu, Ruixuan and Ramanan, Deva and Liu, Changliu and Zhu, Jun-Yan},
    journal = {arXiv preprint arXiv:2505.05469},
    year    = {2025}
}
```

## Model Card Contact

Ava Pun ([email protected])

### Framework versions

- PEFT 0.15.0