| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: transformers |
| | tags: |
| | - code |
| | - python |
| | - docstring |
| | - documentation |
| | - code-generation |
| | - local-llm |
| | - privacy |
| | - ollama |
| | - qwen3 |
| | - knowledge-distillation |
| | - developer-tools |
| | base_model: Qwen/Qwen3-0.6B |
| | pipeline_tag: text-generation |
| | model-index: |
| | - name: Distil-Localdoc-Qwen3-0.6B |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Docstring Generation |
| | metrics: |
| | - type: accuracy |
| | value: 0.76 |
| | name: LLM-as-Judge Accuracy |
| | verified: false |
| | --- |
| | <div align="center"> |
| | <img src="https://github.com/distil-labs/badges/blob/main/distillabs-logo.svg?raw=true" width="40%" alt="distil labs" /> |
| | </div> |
| |
|
| | --- |
| |
|
| | <div align="center"> |
| | <table> |
| | <tr> |
| | <td align="center"> |
| | <a href="https://www.distillabs.ai/?utm_source=hugging-face&utm_medium=referral&utm_campaign=distil-localdoc"> |
| | <img src="https://github.com/distil-labs/badges/blob/main/badge-distillabs-home.svg?raw=true" alt="Homepage"/> |
| | </a> |
| | </td> |
| | <td align="center"> |
| | <a href="https://github.com/distil-labs"> |
| | <img src="https://github.com/distil-labs/badges/blob/main/badge-github.svg?raw=true" alt="GitHub"/> |
| | </a> |
| | </td> |
| | <td align="center"> |
| | <a href="https://huggingface.co/distil-labs"> |
| | <img src="https://github.com/distil-labs/badges/blob/main/badge-huggingface.svg?raw=true" alt="Hugging Face"/> |
| | </a> |
| | </td> |
| | </tr> |
| | <tr> |
| | <td align="center"> |
| | <a href="https://www.linkedin.com/company/distil-labs/"> |
| | <img src="https://github.com/distil-labs/badges/blob/main/badge-linkedin.svg?raw=true" alt="LinkedIn"/> |
| | </a> |
| | </td> |
| | <td align="center"> |
| | <a href="https://distil-labs-community.slack.com/join/shared_invite/zt-36zqj87le-i3quWUn2bjErRq22xoE58g"> |
| | <img src="https://github.com/distil-labs/badges/blob/main/badge-slack.svg?raw=true" alt="Slack"/> |
| | </a> |
| | </td> |
| | <td align="center"> |
| | <a href="https://x.com/distil_labs"> |
| | <img src="https://github.com/distil-labs/badges/blob/main/badge-twitter.svg?raw=true" alt="Twitter"/> |
| | </a> |
| | </td> |
| | </tr> |
| | </table> |
| | </div> |
| | |
| | # Distil-Localdoc-Qwen3-0.6B |
| |
|
| | A small language model (SLM) fine-tuned by Distil Labs for generating high-quality Python docstrings in Google style. Optimized to run locally via Ollama, ensuring your proprietary code never leaves your infrastructure. |
| |
|
| | *********** [GITHUB DEMO AND CODE](https://github.com/distil-labs/Distil-localdoc/) *********** |
| |
|
| | ## Model Details |
| |
|
| | - **Developed by**: Distil Labs GmbH |
| | - **License**: Apache 2.0 |
| | - **Finetuned from**: Qwen/Qwen3-0.6B |
| | - **Model Size**: 0.6B parameters |
| | - **Deployment**: Local inference via Ollama |
| |
|
| | ## Use-case |
| |
|
| | Given Python functions or methods without docstrings, the model generates complete, properly formatted documentation following Google style guide. |
| |
|
| | **Before:** |
| | ```python |
| | def calculate_total(items, tax_rate=0.08, discount=None): |
| | subtotal = sum(item['price'] * item['quantity'] for item in items) |
| | if discount: |
| | subtotal *= (1 - discount) |
| | return subtotal * (1 + tax_rate) |
| | ``` |
| |
|
| | **After:** |
| | ```python |
| | def calculate_total(items, tax_rate=0.08, discount=None): |
| | """ |
| | Calculate the total cost of items, applying a tax rate and optionally a discount. |
| | |
| | Args: |
| | items: List of item objects with price and quantity |
| | tax_rate: Tax rate expressed as a decimal (default 0.08) |
| | discount: Discount rate expressed as a decimal; if provided, the subtotal is multiplied by (1 - discount) |
| | |
| | Returns: |
| | Total amount after applying the tax |
| | |
| | Example: |
| | >>> items = [{'price': 10, 'quantity': 2}, {'price': 5, 'quantity': 1}] |
| | >>> calculate_total(items, tax_rate=0.1, discount=0.05) |
| | 22.5 |
| | """ |
| | subtotal = sum(item['price'] * item['quantity'] for item in items) |
| | if discount: |
| | subtotal *= (1 - discount) |
| | return subtotal * (1 + tax_rate) |
| | ``` |
| |
|
| | The model handles: |
| | - **Functions**: Parameter descriptions, return values, exceptions, and usage examples |
| | - **Methods**: Instance and class method documentation with proper formatting |
| | - **Note**: The tool skips double underscore (dunder: __xxx__) methods |
| |
|
| | ## Why Local? |
| |
|
| | **Privacy & Security**: Proprietary codebases contain intellectual property and trade secrets. Cloud APIs create: |
| | - IP exposure risks |
| | - Compliance violations (GDPR, SOC 2, HIPAA) |
| | - Security audit failures |
| | - Dependency on external services |
| |
|
| | **Speed & Cost**: Document entire codebases in minutes without API rate limits or per-token charges. |
| |
|
| | ## Training |
| |
|
| | The tuned model was trained using knowledge distillation, leveraging the teacher model GPT-OSS-120B. We used 28 diverse Python functions and classes as seed data and supplemented them with 10,000 synthetic examples covering various domains: |
| | - Data science and machine learning |
| | - Web development (Flask, FastAPI, Django) |
| | - DevOps and system utilities |
| | - Algorithm implementations |
| | - API clients and wrappers |
| |
|
| | Training data includes examples with: |
| | - Various function complexities (simple to async patterns) |
| | - Error handling patterns |
| | - Async/await patterns |
| | - Different parameter types and return values |
| |
|
| | ## Evaluation |
| |
|
| | We evaluated the model on 250 held-out test examples using LLM-as-a-judge methodology to assess the overall quality of generated docstrings. |
| |
|
| | | Model | Size | Accuracy | |
| | |--------------------|------|---------------| |
| | | GPT-OSS (thinking) | 120B | 0.81 ± 0.02 | |
| | | Qwen3 0.6B (tuned) | 0.6B | 0.76 ± 0.01 | |
| | | Qwen3 0.6B (base) | 0.6B | 0.55 ± 0.04 | |
| |
|
| | The fine-tuned model achieves **94%** of the teacher model's performance while running entirely on local hardware with **zero API costs** and **complete privacy**. |
| |
|
| | ## How to Use |
| |
|
| | ### Installation |
| |
|
| | Follow the instructions in the [Github repository](https://github.com/distil-labs/Distil-localdoc/) |
| |
|
| | Quick start: |
| |
|
| | ```bash |
| | # Install Ollama |
| | curl -fsSL https://ollama.com/install.sh | sh |
| | |
| | # Download and build the model |
| | pip install huggingface_hub |
| | hf download distil-labs/Distil-Localdoc-Qwen3-0.6B --local-dir distil-model |
| | cd distil-model |
| | ollama create localdoc_qwen3 -f Modelfile |
| | |
| | # Run on your code |
| | python localdoc_cli.py --file your_script.py |
| | ``` |
| |
|
| | ### CLI Usage |
| |
|
| | ```bash |
| | # Basic usage (generates Google-style docstrings) |
| | python localdoc_cli.py --file my_module.py |
| | |
| | # Use specific model |
| | python localdoc_cli.py --file my_module.py --model localdoc_qwen3 |
| | ``` |
| |
|
| | The tool will: |
| | 1. Parse your Python file using AST |
| | 2. Identify all functions and methods without docstrings (skips dunder methods) |
| | 3. Generate appropriate docstrings based on code structure |
| | 4. Preserve all original code and existing docstrings |
| | 5. Output a new file with `_documented` suffix |
| |
|
| | ## Model Sources |
| |
|
| | - **Homepage**: [https://distillabs.ai](https://distillabs.ai) |
| | - **Repository**: [https://github.com/distil-labs/Distil-localdoc](https://github.com/distil-labs/Distil-localdoc) |
| | - **Contact**: contact@distillabs.ai |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @software{distil_localdoc_2024, |
| | title = {Distil-Localdoc: Local Python Documentation Generation with SLMs}, |
| | author = {Distil Labs}, |
| | year = {2024}, |
| | url = {https://huggingface.co/distil-labs/Distil-Localdoc-Qwen3-0.6B} |
| | } |
| | ``` |
| |
|
| | ## Community |
| |
|
| | - Follow us on [LinkedIn](https://www.linkedin.com/company/distil-labs/) |
| | - Join our [Slack community](https://join.slack.com/t/distil-labs-community/shared_invite/zt-36zqj87le-i3quWUn2bjErRq22xoE58g) |
| | - Star us on [GitHub](https://github.com/distil-labs/Distil-localdoc) |