---
base_model: unsloth/Qwen3-1.7B
library_name: peft
license: mit
datasets:
- moremilk/ToT-Biology
language:
- en
pipeline_tag: text-generation
tags:
- sft
- trl
- unsloth
- transformers
- biology
- science
metrics:
- accuracy
---

# Model Card for BioGenesis-ToT

## Model Details

### Model Description

BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology.
This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset — a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.

The model demonstrates strong capabilities in:
- Structured biological explanation generation
- Logical and causal reasoning
- Chain-of-thought (ToT) reasoning in scientific contexts
- Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)


## Uses

### 🚀 Intended Use

- Educational and scientific explanation generation
- Biological reasoning and tutoring applications
- Model interpretability research
- Training datasets for reasoning-focused LLMs


### ⚠️ Limitations

- Not a replacement for expert biological judgment
- May occasionally over-generalize or simplify complex phenomena
- Limited to reasoning quality within biological contexts (not trained for creative writing or coding)


## Evaluation

Evaluation on [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark)


| Category                                                 | BioGenesis-ToT | Qwen3-1.7B |
| -------------------------------------------------------- | -------------- | ---------- |   
| Scientific Explanation and Hypothesis Evaluation (RAG)   |   **66.36**    |   61.82    |
| Ethical Dilemma Assessment                               |   **55.45**    |   47.27    |
| Complex Scenario Analysis and Drawing Conclusions        |   **61.82**    |   59.09    |
| Constrained Creative Writing                             |   **18.18**    |   9.09     |
| Logical Inference (Text-Based)                           |   49.09        | **68.18**  |
| Mathematical Reasoning                                   |   **42.73**    |   37.27    |
| Planning and Optimization Problems (Text-Based)          |   **52.73**    |   25.45    |
| Python Code Analysis and Debugging                       |   **51.82**    |   50.00    |
| Generating SQL Query (From Schema/Meta)                  |   **39.09**    |   36.36    |
| Cause-Effect Relationship in Historical Events (RAG)     |   **77.27**    |   73.64    |
| **Overall**                                              |   **51.45**    |   46.82    |
 

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel


tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B",)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Qwen3-1.7B",
    device_map={"": 0}
)

model = PeftModel.from_pretrained(base_model,"khazarai/BioGenesis-ToT")

question = """
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
"""

messages = [
    {"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
    enable_thinking = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 2200,
    temperature = 0.6,
    top_p = 0.95,
    top_k = 20,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)
```

**For pipeline:** 
```python
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-1.7B")
model = PeftModel.from_pretrained(base_model, "khazarai/BioGenesis-ToT")

question = """
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
"""

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
    {"role": "user", "content": question}
]
pipe(messages)

```


## 🧪 Dataset: moremilk/ToT-Biology

The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology.
It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.

It spans a wide range of biological subdomains:
- Foundational biology: Cell biology, genetics, evolution, and ecology
- Advanced topics: Systems biology, synthetic biology, computational biophysics
- Applied domains: Medicine, agriculture, bioengineering, and environmental science

Dataset features include:

- 🧩 Logical reasoning styles — deductive, inductive, abductive, causal, and analogical
- 🧠 Problem-solving techniques — decomposition, elimination, systems thinking, trade-off analysis
- 🔬 Real-world problem contexts — experiment design, pathway mapping, and data interpretation
- 🌍 Practical relevance — bridging theoretical reasoning and applied biological insight
- 🎓 Educational focus — for both AI training and human learning in scientific reasoning


## 🧭 Objective

This fine-tuning project aims to build an interpretable reasoning model capable of:

- Explaining biological mechanisms clearly and coherently
- Demonstrating transparent, step-by-step thought processes
- Applying logical reasoning techniques to biological and interdisciplinary problems
- Supporting educational and research use cases where reasoning transparency matters


## Citation

**BibTeX:**
```bibtex
@model{khazarai/BioGenesis-ToT,
  title     = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
  author    = {Rustam Shiriyev},
  year      = {2025},
  publisher = {Hugging Face},
  base_model = {Qwen3-1.7B},
  dataset   = {moremilk/ToT-Biology},
  license   = {MIT}
}

```

### Framework versions

- PEFT 0.15.2