metadata
license: apache-2.0
datasets:
- tatsu-lab/alpaca
- yizhongw/self_instruct
language:
- en
base_model:
- meta-llama/Llama-2-7b-hf
- meta-llama/Llama-3.1-8B-Instruct
- mistralai/Mistral-7B-Instruct-v0.2
We provide a curated set of poisoned and benign fine-tuned LLMs for evaluating BAIT. The model zoo follows this file structure:
BAIT-ModelZoo/
βββ base_models/
β βββ BASE/MODEL/1/FOLDER
β βββ BASE/MODEL/2/FOLDER
β βββ ...
βββ models/
β βββ id-0001/
β β βββ model/
β β β βββ ...
β β βββ config.json
β βββ id-0002/
β βββ ...
βββ METADATA.csv
base_models stores pretrained LLMs downloaded from Huggingface. We evaluate BAIT on the following 3 LLM architectures:
The models directory contains fine-tuned models, both benign and backdoored, organized by unique identifiers. Each model folder includes:
- The model files
- A
config.jsonfile with metadata about the model, including:- Fine-tuning hyperparameters
- Fine-tuning dataset
- Whether it's backdoored or benign
- Backdoor attack type, injected trigger and target (if applicable)
The METADATA.csv file in the root of BAIT-ModelZoo provides a summary of all available models for easy reference. Current model zoo contains 91 models. We will keep updating the model zoo with new models.