Instructions to use Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Sana
How to use Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers with Sana:
# Load the model and infer image from text import torch from app.sana_pipeline import SanaPipeline from torchvision.utils import save_image sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml") sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers") image = sana( prompt='a cyberpunk cat with a neon sign that says "Sana"', height=1024, width=1024, guidance_scale=5.0, pag_guidance_scale=2.0, num_inference_steps=18, ) - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -51,7 +51,7 @@ Source code is available at https://github.com/NVlabs/Sana.
|
|
| 51 |
- **Model type:** Linear-Diffusion-Transformer-based text-to-image generative model
|
| 52 |
- **Model size:** 1648M parameters
|
| 53 |
- **Model resolution:** This model is developed to generate 4Kpx based images with multi-scale heigh and width.
|
| 54 |
-
- **License:** [
|
| 55 |
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts.
|
| 56 |
It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2-2B-IT](https://huggingface.co/google/gemma-2-2b-it))
|
| 57 |
and one 32x spatial-compressed latent feature encoder ([DC-AE](https://hanlab.mit.edu/projects/dc-ae)).
|
|
|
|
| 51 |
- **Model type:** Linear-Diffusion-Transformer-based text-to-image generative model
|
| 52 |
- **Model size:** 1648M parameters
|
| 53 |
- **Model resolution:** This model is developed to generate 4Kpx based images with multi-scale heigh and width.
|
| 54 |
+
- **License:** [NSCL v2-custom](./LICENSE.txt). Governing Terms: NVIDIA License. Additional Information: [Gemma Terms of Use | Google AI for Developers](https://ai.google.dev/gemma/terms) for Gemma-2-2B-IT, [Gemma Prohibited Use Policy | Google AI for Developers](https://ai.google.dev/gemma/prohibited_use_policy).
|
| 55 |
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts.
|
| 56 |
It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2-2B-IT](https://huggingface.co/google/gemma-2-2b-it))
|
| 57 |
and one 32x spatial-compressed latent feature encoder ([DC-AE](https://hanlab.mit.edu/projects/dc-ae)).
|