---
library_name: transformers
tags:
- en-nl-translation
- translation
license: apache-2.0
datasets:
- OpenOranje/ReOpus-ApolloBooks-EN-NL-1M
language:
- nl
metrics:
- bleu
- rouge
base_model:
- Qwen/Qwen3-0.6B
---

# OpenOranje/TweeTaal-nl-en-0.6B

## Model Description

The TweeTaal-en-nl model has been fine-tuned on Dutch-English & English-Dutch translation pairs to provide accurate, fluent translations. The compact 0.6B parameter size makes it suitable for deployment in resource-constrained environments while maintaining strong translation quality.

### Intended Use

**Primary Use Case**: Translating Dutch text to English / English text to Dutch across various domains

**Recommended Applications**:
- General-purpose Dutch-to-English and English-to-Dutch translation
- Content localization
- Cross-lingual communication tools
- Educational language learning applications

## Performance

### Benchmark Results

<img src="https://github.com/OpenOranje/content/raw/main/images/translation-benchmarks.png" alt="Benchmarks" width="800">

## Training Details

### Training Procedure

**Method**: Supervised Fine-Tuning (SFT)
- The model was trained on parallel Dutch-English text pairs
- Standard cross-entropy loss optimization
- The base Qwen3-0.6b model was adapted specifically for translation tasks

### Training Data

The model was trained on Dutch-English parallel corpora. (Note: Specify your actual dataset details, such as:
- Dataset name and source
- Number of training examples
- Domain coverage (general, technical, literary, etc.)
- Data preprocessing steps)

## Usage

### Basic Usage Example

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model_name = "OpenOranje/TweeTaal-nl-en-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare input
dutch_text = "Hallo, hoe gaat het met je?"
prompt = f"Translate from Dutch to English:\n{dutch_text}"
message = [{"role":"user", "content": prompt}]
# Generate translation
inputs = tokenizer.apply_chat_template(message, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(translation)
```

### Prompt Format

The model expects input in the following format:

```
Translate the following text from Dutch to English:\n{dutch_text}
```

```
Translate the following text from English to Dutch:\n{english_text}
```

### Inference Parameters

Recommended generation parameters:
- **Temperature**: 0.7 (adjust for creativity vs. consistency)
- **Max tokens**: Set based on expected translation length
- **Top-p**: 0.9 (nucleus sampling)


## Limitations

- **Context Length**: Trained on 4096 Tokens
- **Rare Words**: May struggle with highly specialized terminology or rare vocabulary not well-represented in training data
- **Informal Language**: Performance on slang, dialects, or very informal Dutch may vary

## Ethical Considerations

- **Training Data Bias**: The model may reflect biases present in the training data
- **Cultural Nuances**: Some cultural expressions may not translate perfectly


## Contact

For questions or issues, please contact: [theaisarth@proton.me][kartikaggarwal98@gmail.com]

---

## Additional Resources

- **Base Model**: [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- **Training Code**: [TBD]
- **Dataset**: [Data](https://huggingface.co/datasets/OpenOranje/ReOpus-ApolloBooks-EN-NL-1M)

## Version History

- **v1.0** (2025-10-24): Initial release

---

**License**: [Apache 2.0]