---
license: llama3.1
base_model: THGLab/Llama-3.1-8B-SmileyLlama-1.1
---
## Built With Llama!
## Built With Axolotl!

# Overview
We fine-tuned [SmileyLlama](https://huggingface.co/THGLab/Llama-3.1-8B-SmileyLlama-1.1) with DPO to improve its adherence to directions in the prompt.

For more details, read the ArXiv preprint here:
https://arxiv.org/abs/2409.02231

# How to use

This can be loaded using the same method as Llama3.1, and the [memory requirements are the same as Llama-3.1-8B](https://huggingface.co/blog/llama31#inference-memory-requirements).

Options for "properties" that SmileyLlama was trained on are
 - `( <= 3, <= 4, <= 5, <= 7, > 7) H-bond donors`
 - `( <= 3, <= 4, <= 5, <= 10, <= 15) H-bond acceptors`
 - `( <= 300, <= 400, <= 500, <= 600, > 600) Molecular weight`
 - `( <= 3, <= 4, <= 5, <= 10, <= 15, > 15) logP`
 - `( <= 7, <= 10, > 10) Rotatable bonds`
 - `( < 0.4, > 0.4, > 0.5, > 0.6) Fraction sp3`
 - `( <= 90, <= 140, <= 200, > 200) TPSA`
 - `(a macrocycle, no macrocycles)`
 - `(has, lacks) bad SMARTS`
 - `lacks covalent warheads`
 - `has covalent warheads: (sulfonyl fluorides, acrylamides, ...) (see below for details)`
 - `A substructure of *SMILES_STRING*`
 - `A chemical of *CHEMICAL_FORMULA*`

### List of possible warheads:
- **sulfonyl fluorides**: `[#16](=[#8])(=[#8])-[#9]`
- **chloroacetamides**: `[#8]=[#6](-[#6]-[#17])-[#7]`
- **cyanoacrylamides**: `[#7]-[#6](=[#8])-[#6](-[#6]#[#7])=[#6]`
- **epoxides**: `[#6]1-[#6]-[#8]-1`
- **aziridines**: `[#6]1-[#6]-[#7]-1`
- **disulfides**: `[#16]-[#16]`
- **aldehydes**: `[#6](=[#8])-[#1]`
- **vinyl sulfones**: `[#6]=[#6]-[#16](=[#8])(=[#8])-[#7]`
- **boronic acids/esters**: `[#6]-[#5](-[#8])-[#8]`
- **acrylamides**: `[#6]=[#6]-[#6](=[#8])-[#7]`
- **cyanamides**: `[#6]-[#7](-[#6]#[#7])-[#6]`
- **chloroFluoroAcetamides**: `[#7]-[#6](=[#8])-[#6](-[#9])-[#17]`
- **butynamides**: `[#6]#[#6]-[#6](=[#8])-[#7]-[#6]`
- **chloropropionamides**: `[#7]-[#6](=[#8])-[#6](-[#6])-[#17]`
- **fluorosulfates**: `[#8]=[#16](=[#8])(-[#9])-[#8]`
- **beta lactams**: `[#7]1-[#6]-[#6]-[#6]-1=[#8]`

### Generating a drug-like molecule which obeys the Lipinski rule of five


```python
import torch
import transformers

model_id = "/path/to/your/model"

system_txt = "You love and excel at generating SMILES strings of drug-like molecules"
user_txt = "Output a SMILES string for a drug like molecule with the following properties: <= 5 H-bond donors, <= 10 H-bond acceptors, <= 500 molecule, <= 5 logP:"
prompt = f"### Instruction:\n{system_text}\n\n### Input:\n{user_text}\n\n### Response:\n"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    temperature=1.0
)

outputs = pipeline(
    prompt,
    max_new_tokens=128,
    num_return_sequences=4
)
for k in range(4):
  print(outputs[k]["generated_text"][-1])
```

You can use num_return_sequences to efficiently generate many SMILES strings rapidly, though this is limited by your memory.