--- license: llama3.1 base_model: THGLab/Llama-3.1-8B-SmileyLlama-1.1 --- ## Built With Llama! ## Built With Axolotl! # Overview We fine-tuned [SmileyLlama](https://huggingface.co/THGLab/Llama-3.1-8B-SmileyLlama-1.1) with DPO to improve its adherence to directions in the prompt. For more details, read the ArXiv preprint here: https://arxiv.org/abs/2409.02231 # How to use This can be loaded using the same method as Llama3.1, and the [memory requirements are the same as Llama-3.1-8B](https://huggingface.co/blog/llama31#inference-memory-requirements). Options for "properties" that SmileyLlama was trained on are - `( <= 3, <= 4, <= 5, <= 7, > 7) H-bond donors` - `( <= 3, <= 4, <= 5, <= 10, <= 15) H-bond acceptors` - `( <= 300, <= 400, <= 500, <= 600, > 600) Molecular weight` - `( <= 3, <= 4, <= 5, <= 10, <= 15, > 15) logP` - `( <= 7, <= 10, > 10) Rotatable bonds` - `( < 0.4, > 0.4, > 0.5, > 0.6) Fraction sp3` - `( <= 90, <= 140, <= 200, > 200) TPSA` - `(a macrocycle, no macrocycles)` - `(has, lacks) bad SMARTS` - `lacks covalent warheads` - `has covalent warheads: (sulfonyl fluorides, acrylamides, ...) (see below for details)` - `A substructure of *SMILES_STRING*` - `A chemical of *CHEMICAL_FORMULA*` ### List of possible warheads: - **sulfonyl fluorides**: `[#16](=[#8])(=[#8])-[#9]` - **chloroacetamides**: `[#8]=[#6](-[#6]-[#17])-[#7]` - **cyanoacrylamides**: `[#7]-[#6](=[#8])-[#6](-[#6]#[#7])=[#6]` - **epoxides**: `[#6]1-[#6]-[#8]-1` - **aziridines**: `[#6]1-[#6]-[#7]-1` - **disulfides**: `[#16]-[#16]` - **aldehydes**: `[#6](=[#8])-[#1]` - **vinyl sulfones**: `[#6]=[#6]-[#16](=[#8])(=[#8])-[#7]` - **boronic acids/esters**: `[#6]-[#5](-[#8])-[#8]` - **acrylamides**: `[#6]=[#6]-[#6](=[#8])-[#7]` - **cyanamides**: `[#6]-[#7](-[#6]#[#7])-[#6]` - **chloroFluoroAcetamides**: `[#7]-[#6](=[#8])-[#6](-[#9])-[#17]` - **butynamides**: `[#6]#[#6]-[#6](=[#8])-[#7]-[#6]` - **chloropropionamides**: `[#7]-[#6](=[#8])-[#6](-[#6])-[#17]` - **fluorosulfates**: `[#8]=[#16](=[#8])(-[#9])-[#8]` - **beta lactams**: `[#7]1-[#6]-[#6]-[#6]-1=[#8]` ### Generating a drug-like molecule which obeys the Lipinski rule of five ```python import torch import transformers model_id = "/path/to/your/model" system_txt = "You love and excel at generating SMILES strings of drug-like molecules" user_txt = "Output a SMILES string for a drug like molecule with the following properties: <= 5 H-bond donors, <= 10 H-bond acceptors, <= 500 molecule, <= 5 logP:" prompt = f"### Instruction:\n{system_text}\n\n### Input:\n{user_text}\n\n### Response:\n" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", temperature=1.0 ) outputs = pipeline( prompt, max_new_tokens=128, num_return_sequences=4 ) for k in range(4): print(outputs[k]["generated_text"][-1]) ``` You can use num_return_sequences to efficiently generate many SMILES strings rapidly, though this is limited by your memory.