BAIT-ModelZoo / README.md

NoahShen

update README.md

35f1051 verified 8 months ago

preview code

raw

history blame contribute delete

1.6 kB

metadata

license: apache-2.0
datasets:
  - tatsu-lab/alpaca
  - yizhongw/self_instruct
language:
  - en
base_model:
  - meta-llama/Llama-2-7b-hf
  - meta-llama/Llama-3.1-8B-Instruct
  - mistralai/Mistral-7B-Instruct-v0.2

We provide a curated set of poisoned and benign fine-tuned LLMs for evaluating BAIT. The model zoo follows this file structure:

BAIT-ModelZoo/
├── base_models/
│   ├── BASE/MODEL/1/FOLDER  
│   ├── BASE/MODEL/2/FOLDER
│   └── ...
├── models/
│   ├── id-0001/
│   │   ├── model/
│   │   │   └── ...
│   │   └── config.json
│   ├── id-0002/
│   └── ...
└── METADATA.csv

base_models stores pretrained LLMs downloaded from Huggingface. We evaluate BAIT on the following 3 LLM architectures:

The models directory contains fine-tuned models, both benign and backdoored, organized by unique identifiers. Each model folder includes:

The model files
A config.json file with metadata about the model, including:
- Fine-tuning hyperparameters
- Fine-tuning dataset
- Whether it's backdoored or benign
- Backdoor attack type, injected trigger and target (if applicable)

The METADATA.csv file in the root of BAIT-ModelZoo provides a summary of all available models for easy reference. Current model zoo contains 91 models. We will keep updating the model zoo with new models.