Text Generation
Transformers
GGUF
inikitin's picture
Update README.md
88188eb verified
---
license: apache-2.0
base_model:
- Fortytwo-Network/Strand-Rust-Coder-14B-v1
base_model_relation: quantized
datasets:
- Fortytwo-Network/Strandset-Rust-v1
pipeline_tag: text-generation
library_name: transformers
---
![image/jpeg](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F63aeda3a2314b93f9e706a68%2FI6WwY8U7I5V8lc138UmGt.jpeg%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
# Strand-Rust-Coder-14B-v1
## Overview
**Strand-Rust-Coder-14B-v1** is the first domain-specialized Rust language model created through **Fortytwo’s Swarm Inference**, a decentralized AI architecture where multiple models collaboratively generate, validate, and rank outputs through peer consensus.
The model fine-tunes **Qwen2.5-Coder-14B** for Rust-specific programming tasks using a **191K-example synthetic dataset** built via multi-model generation and peer-reviewed validation.
It achieves **43–48% accuracy** on Rust-specific benchmarks – surpassing much larger proprietary models like GPT-5 Codex on Rust tasks – while maintaining competitive general coding performance.
## Key Features
- **Rust-specialized fine-tuning** on 15 diverse programming task categories
- **Peer-validated synthetic dataset** (191,008 verified examples, 94.3% compile rate)
- **LoRA-based fine-tuning** for efficient adaptation
- **Benchmarked across Rust-specific suites:**
- **RustEvo^2**
- **Evaluation on Hold-Out Set**
- **Deployed in the Fortytwo decentralized inference network** for collective AI reasoning
---
## Performance Summary
| **Model** | **Hold-Out Set** | **RustEvo^2** |
|------------|------------------|---------------|
| **Fortytwo-Rust-One-14B (Ours)** | **48.00%** | **43.00%** |
| openai/gpt-5-codex | 47.00% | 28.00% |
| anthropic/claude-sonnet-4.5 | 46.00% | 21.00% |
| anthropic/claude-3.7-sonnet | 42.00% | 31.00% |
| qwen/qwen3-max | 42.00% | 40.00% |
| qwen/qwen3-coder-plus | 41.00% | 22.00% |
| x-ai/grok-4 | 39.00% | 37.00% |
| deepseek/deepseek-v3.1-terminus | 37.00% | 33.00% |
| Qwen3-Coder-30B-A3B-Instruct | 36.00% | 20.00% |
| openai/gpt-4o-latest | 34.00% | 39.00% |
| deepseek/deepseek-chat | 34.00% | 41.00% |
| google/gemini-2.5-flash | 33.00% | 7.00% |
| Qwen2.5-Coder-14B-Instruct (Base) | 29.00% | 30.00% |
| Qwen2.5-Coder-32B-Instruct | 29.00% | 31.00% |
| google/gemini-2.5-pro | 28.00% | 22.00% |
| qwen/qwen-2.5-72b | 28.00% | 32.00% |
| Tesslate/Tessa-Rust-T1-7B | 23.00% | 19.00% |
*Benchmarks on code tasks measured using unit-test pass rate@1 in Docker-isolated Rust 1.86.0 environment.*
---
## Task Breakdown
| Task | Base | Strand-14B |
|------|------|-------------|
| test_generation | 0.00 | 0.51 |
| api_usage_prediction | 0.27 | 0.71 |
| function_naming | 0.53 | 0.87 |
| code_refactoring | 0.04 | 0.19–0.20 |
| variable_naming | 0.87 | 1.00 |
| code_generation | 0.40 | 0.49 |
Largest improvements appear in *test generation*, *API usage prediction*, and *refactoring* – areas demanding strong semantic reasoning about Rust’s ownership and lifetime rules.
---
## Dataset
**Fortytwo-Network/Strandset-Rust-v1 (191,008 examples, 15 categories)**
Built through Fortytwo’s *Swarm Inference* pipeline, where multiple SLMs generate and cross-validate examples with peer review consensus and output aggregation.
- 94.3% compile success rate
- 73.2% consensus acceptance
- Coverage of 89% of Rust language features
- Tasks include:
- `code_generation`, `code_completion`, `bug_detection`, `refactoring`, `optimization`
- `docstring_generation`, `code_review`, `summarization`, `test_generation`
- `naming`, `API usage prediction`, `search`
Dataset construction involved 2,383 crates from crates.io, automatic compilation tests, and semantic validation of ownership and lifetime correctness.
Dataset: [Fortytwo-Network/Strandset-Rust-v1](https://huggingface.co/datasets/Fortytwo-Network/Strandset-Rust-v1)
---
## Training Configuration
| Setting | Value |
|----------|-------|
| Base model | Qwen2.5-Coder-14B-Instruct |
| Method | LoRA (r=64, α=16) |
| Learning rate | 5e-5 |
| Batch size | 128 |
| Epochs | 3 |
| Optimizer | AdamW |
| Precision | bfloat16 |
| Objective | Completion-only loss |
| Context length | 32,768 |
| Framework | PyTorch + FSDP + Flash Attention 2 |
| Hardware | 8× H200 GPUs |
---
## Model Architecture
- **Base:** Qwen2.5-Coder (14 B parameters, GQA attention, extended RoPE embeddings)
- **Tokenizer:** 151 k vocabulary optimized for Rust syntax
- **Context:** 32 k tokens
- **Fine-tuning:** Parameter-efficient LoRA adapters (≈1% of parameters updated)
- **Deployment:** Compatible with local deployment and Fortytwo Capsule runtime for distributed swarm inference
---
## Evaluation Protocol
- All evaluations executed in Docker-isolated Rust 1.86.0 environment
- **Code tasks:** measured via unit test pass rate
- **Documentation & naming tasks:** scored via LLM-based correctness (Claude Sonnet 4 judge)
- **Code completion & API tasks:** syntax-weighted Levenshtein similarity
- **Comment generation:** compilation success metric
---
## Why It Matters
Rust is a high-safety, low-level language with complex ownership semantics that make it uniquely challenging for general-purpose LLMs.
At the same time, there is simply **not enough high-quality training data on Rust**, as it remains a relatively modern and rapidly evolving language.
This scarcity of large, reliable Rust datasets – combined with the language’s intricate borrow checker and type system – makes it an ideal benchmark for evaluating true model understanding and reasoning precision.
**Strand-Rust-Coder** demonstrates how **specialized models** can outperform giant centralized models – achieving domain mastery with a fraction of the compute.
Through **Fortytwo’s Swarm Inference**, the network was able to generate an **extremely accurate synthetic dataset**, enabling a **state-of-the-art Rust model** to be built through an efficient **LoRA fine-tune** rather than full retraining.
This work validates Fortytwo’s thesis: **intelligence can scale horizontally through networked specialization rather than centralized scale.**
---
## Research & References
- [Fortytwo: Swarm Inference with Peer-Ranked Consensus (arXiv)](https://arxiv.org/abs/2510.24801) - *Fortytwo Swarm Inference – Technical Report*
- [Self-Supervised Inference of Agents in Trustless Environments (arXiv)](https://arxiv.org/abs/2409.08386) – *High-level overview of Fortytwo architecture*
---
## Intended Use
- Rust code generation, completion, and documentation
- Automated refactoring and test generation
- Integration into code copilots and multi-agent frameworks
- Research on domain-specialized model training and evaluation
### Limitations
- May underperform on purely algorithmic or multi-language tasks (e.g., HumanEval-style puzzles).
- Not suitable for generating unverified production code without compilation and test validation.
---
## Integration with Fortytwo Network
Strand-Rust-Coder models are integrated into **Fortytwo’s decentralized Swarm Inference Network**, where specialized models collaborate and rank each other’s outputs.
This structure enables **peer-reviewed inference**, improving reliability while reducing hallucinations and cost.
To run a Fortytwo node or contribute your own models and fine-tunes, visit: [fortytwo.network](https://fortytwo.network)
---
## GGUF Quantized Versions
This repository provides **GGUF-format quantizations** of the model [Fortytwo-Network/Strand-Rust-Coder-14B-v1](https://huggingface.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1), optimized for local inference using tools such as **llama.cpp**, **Jan**, **Ollama**, **LM Studio** and other compatible runtimes.
These quantizations significantly reduce memory requirements while preserving near-original accuracy, making deployment possible on a wide range of consumer hardware.
| **Quantization** | **File Size** | **Bit Precision** | **Description** |
|------------------|-----------|------------------|----------------|
| **Q8_0** | 15.7 GB | **8-bit** | Near-full precision, for most demanding local inference |
| **Q6_K** | 12.1 GB | **6-bit** | Balanced performance and efficiency |
| **Q5_K_M** | 10.5 GB | **5-bit** | Lightweight deployment with strong accuracy retention |
| **Q4_K_M** | 8.99 GB | **4-bit** | Ultra-fast, compact variant for consumer GPUs and laptops |
---
### Usage
You can load the GGUF models with **llama.cpp** or compatible backends:
```bash
./main -m models/Strand-Rust-Coder-14B-v1.Q5_K_M.gguf -p "Write a Rust function that reads a file line by line."
```
Or run interactively in **Jan**, **LM Studio** or **Ollama** by simply importing the model.
---
### License
These quantized weights are distributed under the same **Apache 2.0 License** as the original model.
**Fortytwo – An open, networked intelligence shaped collectively by its participants**
Join the swarm: [fortytwo.network](https://fortytwo.network)
X: [@fortytwo](https://x.com/fortytwo)