--- license: apache-2.0 datasets: - tatsu-lab/alpaca - yizhongw/self_instruct language: - en base_model: - meta-llama/Llama-2-7b-hf - meta-llama/Llama-3.1-8B-Instruct - mistralai/Mistral-7B-Instruct-v0.2 --- We provide a curated set of poisoned and benign fine-tuned LLMs for evaluating BAIT. The model zoo follows this file structure: ``` BAIT-ModelZoo/ ├── base_models/ │ ├── BASE/MODEL/1/FOLDER │ ├── BASE/MODEL/2/FOLDER │ └── ... ├── models/ │ ├── id-0001/ │ │ ├── model/ │ │ │ └── ... │ │ └── config.json │ ├── id-0002/ │ └── ... └── METADATA.csv ``` ```base_models``` stores pretrained LLMs downloaded from Huggingface. We evaluate BAIT on the following 3 LLM architectures: - [Llama-2-7B-chat-hf](meta-llama/Llama-2-7b-chat-hf) - [Llama-3-8B-Instruct](meta-llama/Meta-Llama-3-8B-Instruct) - [Mistral-7B-Instruct-v0.2](mistralai/Mistral-7B-Instruct-v0.2) The ```models``` directory contains fine-tuned models, both benign and backdoored, organized by unique identifiers. Each model folder includes: - The model files - A ```config.json``` file with metadata about the model, including: - Fine-tuning hyperparameters - Fine-tuning dataset - Whether it's backdoored or benign - Backdoor attack type, injected trigger and target (if applicable) The ```METADATA.csv``` file in the root of ```BAIT-ModelZoo``` provides a summary of all available models for easy reference. Current model zoo contains 91 models. We will keep updating the model zoo with new models.