--- base_model: unsloth/Qwen3-1.7B library_name: peft license: mit datasets: - moremilk/ToT-Biology language: - en pipeline_tag: text-generation tags: - sft - trl - unsloth - transformers - biology - science metrics: - accuracy --- # Model Card for BioGenesis-ToT ## Model Details ### Model Description BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology. This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset β€” a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens. The model demonstrates strong capabilities in: - Structured biological explanation generation - Logical and causal reasoning - Chain-of-thought (ToT) reasoning in scientific contexts - Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology) ## Uses ### πŸš€ Intended Use - Educational and scientific explanation generation - Biological reasoning and tutoring applications - Model interpretability research - Training datasets for reasoning-focused LLMs ### ⚠️ Limitations - Not a replacement for expert biological judgment - May occasionally over-generalize or simplify complex phenomena - Limited to reasoning quality within biological contexts (not trained for creative writing or coding) ## Evaluation Evaluation on [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark) | Category | BioGenesis-ToT | Qwen3-1.7B | | -------------------------------------------------------- | -------------- | ---------- | | Scientific Explanation and Hypothesis Evaluation (RAG) | **66.36** | 61.82 | | Ethical Dilemma Assessment | **55.45** | 47.27 | | Complex Scenario Analysis and Drawing Conclusions | **61.82** | 59.09 | | Constrained Creative Writing | **18.18** | 9.09 | | Logical Inference (Text-Based) | 49.09 | **68.18** | | Mathematical Reasoning | **42.73** | 37.27 | | Planning and Optimization Problems (Text-Based) | **52.73** | 25.45 | | Python Code Analysis and Debugging | **51.82** | 50.00 | | Generating SQL Query (From Schema/Meta) | **39.09** | 36.36 | | Cause-Effect Relationship in Historical Events (RAG) | **77.27** | 73.64 | | **Overall** | **51.45** | 46.82 | ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B",) base_model = AutoModelForCausalLM.from_pretrained( "unsloth/Qwen3-1.7B", device_map={"": 0} ) model = PeftModel.from_pretrained(base_model,"khazarai/BioGenesis-ToT") question = """ Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability. """ messages = [ {"role" : "user", "content" : question} ] text = tokenizer.apply_chat_template( messages, tokenize = False, add_generation_prompt = True, enable_thinking = True, ) from transformers import TextStreamer _ = model.generate( **tokenizer(text, return_tensors = "pt").to("cuda"), max_new_tokens = 2200, temperature = 0.6, top_p = 0.95, top_k = 20, streamer = TextStreamer(tokenizer, skip_prompt = True), ) ``` **For pipeline:** ```python from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer from peft import PeftModel tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B") base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-1.7B") model = PeftModel.from_pretrained(base_model, "khazarai/BioGenesis-ToT") question = """ Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability. """ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) messages = [ {"role": "user", "content": question} ] pipe(messages) ``` ## πŸ§ͺ Dataset: moremilk/ToT-Biology The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology. It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems. It spans a wide range of biological subdomains: - Foundational biology: Cell biology, genetics, evolution, and ecology - Advanced topics: Systems biology, synthetic biology, computational biophysics - Applied domains: Medicine, agriculture, bioengineering, and environmental science Dataset features include: - 🧩 Logical reasoning styles β€” deductive, inductive, abductive, causal, and analogical - 🧠 Problem-solving techniques β€” decomposition, elimination, systems thinking, trade-off analysis - πŸ”¬ Real-world problem contexts β€” experiment design, pathway mapping, and data interpretation - 🌍 Practical relevance β€” bridging theoretical reasoning and applied biological insight - πŸŽ“ Educational focus β€” for both AI training and human learning in scientific reasoning ## 🧭 Objective This fine-tuning project aims to build an interpretable reasoning model capable of: - Explaining biological mechanisms clearly and coherently - Demonstrating transparent, step-by-step thought processes - Applying logical reasoning techniques to biological and interdisciplinary problems - Supporting educational and research use cases where reasoning transparency matters ## Citation **BibTeX:** ```bibtex @model{khazarai/BioGenesis-ToT, title = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning}, author = {Rustam Shiriyev}, year = {2025}, publisher = {Hugging Face}, base_model = {Qwen3-1.7B}, dataset = {moremilk/ToT-Biology}, license = {MIT} } ``` ### Framework versions - PEFT 0.15.2