File size: 3,625 Bytes
33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 668da12 33947a9 01135d3 33947a9 d76d84a 33947a9 01135d3 668da12 01135d3 33947a9 50f1c41 c6fa24a 50f1c41 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 75509f1 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 3178340 01135d3 33947a9 668da12 3178340 668da12 01135d3 33947a9 01135d3 33947a9 7e55524 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 33947a9 01135d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
library_name: transformers
tags:
- en-nl-translation
- translation
license: apache-2.0
datasets:
- OpenOranje/ReOpus-ApolloBooks-EN-NL-1M
language:
- nl
metrics:
- bleu
- rouge
base_model:
- Qwen/Qwen3-0.6B
---
# OpenOranje/TweeTaal-nl-en-0.6B
## Model Description
The TweeTaal-en-nl model has been fine-tuned on Dutch-English & English-Dutch translation pairs to provide accurate, fluent translations. The compact 0.6B parameter size makes it suitable for deployment in resource-constrained environments while maintaining strong translation quality.
### Intended Use
**Primary Use Case**: Translating Dutch text to English / English text to Dutch across various domains
**Recommended Applications**:
- General-purpose Dutch-to-English and English-to-Dutch translation
- Content localization
- Cross-lingual communication tools
- Educational language learning applications
## Performance
### Benchmark Results
<img src="https://github.com/OpenOranje/content/raw/main/images/translation-benchmarks.png" alt="Benchmarks" width="800">
## Training Details
### Training Procedure
**Method**: Supervised Fine-Tuning (SFT)
- The model was trained on parallel Dutch-English text pairs
- Standard cross-entropy loss optimization
- The base Qwen3-0.6b model was adapted specifically for translation tasks
### Training Data
The model was trained on Dutch-English parallel corpora. (Note: Specify your actual dataset details, such as:
- Dataset name and source
- Number of training examples
- Domain coverage (general, technical, literary, etc.)
- Data preprocessing steps)
## Usage
### Basic Usage Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
model_name = "OpenOranje/TweeTaal-nl-en-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Prepare input
dutch_text = "Hallo, hoe gaat het met je?"
prompt = f"Translate from Dutch to English:\n{dutch_text}"
message = [{"role":"user", "content": prompt}]
# Generate translation
inputs = tokenizer.apply_chat_template(message, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
```
### Prompt Format
The model expects input in the following format:
```
Translate the following text from Dutch to English:\n{dutch_text}
```
```
Translate the following text from English to Dutch:\n{english_text}
```
### Inference Parameters
Recommended generation parameters:
- **Temperature**: 0.7 (adjust for creativity vs. consistency)
- **Max tokens**: Set based on expected translation length
- **Top-p**: 0.9 (nucleus sampling)
## Limitations
- **Context Length**: Trained on 4096 Tokens
- **Rare Words**: May struggle with highly specialized terminology or rare vocabulary not well-represented in training data
- **Informal Language**: Performance on slang, dialects, or very informal Dutch may vary
## Ethical Considerations
- **Training Data Bias**: The model may reflect biases present in the training data
- **Cultural Nuances**: Some cultural expressions may not translate perfectly
## Contact
For questions or issues, please contact: [[email protected]][[email protected]]
---
## Additional Resources
- **Base Model**: [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- **Training Code**: [TBD]
- **Dataset**: [Data](https://huggingface.co/datasets/OpenOranje/ReOpus-ApolloBooks-EN-NL-1M)
## Version History
- **v1.0** (2025-10-24): Initial release
---
**License**: [Apache 2.0]
|