Safetensors
English
BAIT-ModelZoo / README.md
NoahShen's picture
update README.md
35f1051 verified
metadata
license: apache-2.0
datasets:
  - tatsu-lab/alpaca
  - yizhongw/self_instruct
language:
  - en
base_model:
  - meta-llama/Llama-2-7b-hf
  - meta-llama/Llama-3.1-8B-Instruct
  - mistralai/Mistral-7B-Instruct-v0.2

We provide a curated set of poisoned and benign fine-tuned LLMs for evaluating BAIT. The model zoo follows this file structure:

BAIT-ModelZoo/
β”œβ”€β”€ base_models/
β”‚   β”œβ”€β”€ BASE/MODEL/1/FOLDER  
β”‚   β”œβ”€β”€ BASE/MODEL/2/FOLDER
β”‚   └── ...
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ id-0001/
β”‚   β”‚   β”œβ”€β”€ model/
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── config.json
β”‚   β”œβ”€β”€ id-0002/
β”‚   └── ...
└── METADATA.csv

base_models stores pretrained LLMs downloaded from Huggingface. We evaluate BAIT on the following 3 LLM architectures:

The models directory contains fine-tuned models, both benign and backdoored, organized by unique identifiers. Each model folder includes:

  • The model files
  • A config.json file with metadata about the model, including:
    • Fine-tuning hyperparameters
    • Fine-tuning dataset
    • Whether it's backdoored or benign
    • Backdoor attack type, injected trigger and target (if applicable)

The METADATA.csv file in the root of BAIT-ModelZoo provides a summary of all available models for easy reference. Current model zoo contains 91 models. We will keep updating the model zoo with new models.