File size: 3,625 Bytes
33947a9
 
01135d3
 
 
 
 
 
 
 
 
 
 
 
 
33947a9
 
01135d3
33947a9
01135d3
33947a9
668da12
33947a9
01135d3
33947a9
d76d84a
33947a9
01135d3
668da12
01135d3
 
 
33947a9
50f1c41
 
 
 
c6fa24a
50f1c41
33947a9
 
 
 
01135d3
 
 
 
33947a9
01135d3
33947a9
01135d3
 
 
 
 
33947a9
01135d3
33947a9
01135d3
33947a9
01135d3
 
33947a9
01135d3
75509f1
01135d3
 
33947a9
01135d3
 
 
 
 
 
 
 
33947a9
01135d3
 
33947a9
01135d3
33947a9
01135d3
33947a9
01135d3
3178340
01135d3
33947a9
668da12
3178340
668da12
 
01135d3
33947a9
01135d3
 
 
 
33947a9
7e55524
01135d3
33947a9
01135d3
 
 
33947a9
01135d3
33947a9
01135d3
 
33947a9
 
01135d3
33947a9
01135d3
33947a9
01135d3
33947a9
01135d3
33947a9
01135d3
 
 
33947a9
01135d3
33947a9
01135d3
33947a9
01135d3
33947a9
01135d3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
library_name: transformers
tags:
- en-nl-translation
- translation
license: apache-2.0
datasets:
- OpenOranje/ReOpus-ApolloBooks-EN-NL-1M
language:
- nl
metrics:
- bleu
- rouge
base_model:
- Qwen/Qwen3-0.6B
---

# OpenOranje/TweeTaal-nl-en-0.6B

## Model Description

The TweeTaal-en-nl model has been fine-tuned on Dutch-English & English-Dutch translation pairs to provide accurate, fluent translations. The compact 0.6B parameter size makes it suitable for deployment in resource-constrained environments while maintaining strong translation quality.

### Intended Use

**Primary Use Case**: Translating Dutch text to English / English text to Dutch across various domains

**Recommended Applications**:
- General-purpose Dutch-to-English and English-to-Dutch translation
- Content localization
- Cross-lingual communication tools
- Educational language learning applications

## Performance

### Benchmark Results

<img src="https://github.com/OpenOranje/content/raw/main/images/translation-benchmarks.png" alt="Benchmarks" width="800">

## Training Details

### Training Procedure

**Method**: Supervised Fine-Tuning (SFT)
- The model was trained on parallel Dutch-English text pairs
- Standard cross-entropy loss optimization
- The base Qwen3-0.6b model was adapted specifically for translation tasks

### Training Data

The model was trained on Dutch-English parallel corpora. (Note: Specify your actual dataset details, such as:
- Dataset name and source
- Number of training examples
- Domain coverage (general, technical, literary, etc.)
- Data preprocessing steps)

## Usage

### Basic Usage Example

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model_name = "OpenOranje/TweeTaal-nl-en-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare input
dutch_text = "Hallo, hoe gaat het met je?"
prompt = f"Translate from Dutch to English:\n{dutch_text}"
message = [{"role":"user", "content": prompt}]
# Generate translation
inputs = tokenizer.apply_chat_template(message, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(translation)
```

### Prompt Format

The model expects input in the following format:

```
Translate the following text from Dutch to English:\n{dutch_text}
```

```
Translate the following text from English to Dutch:\n{english_text}
```

### Inference Parameters

Recommended generation parameters:
- **Temperature**: 0.7 (adjust for creativity vs. consistency)
- **Max tokens**: Set based on expected translation length
- **Top-p**: 0.9 (nucleus sampling)


## Limitations

- **Context Length**: Trained on 4096 Tokens
- **Rare Words**: May struggle with highly specialized terminology or rare vocabulary not well-represented in training data
- **Informal Language**: Performance on slang, dialects, or very informal Dutch may vary

## Ethical Considerations

- **Training Data Bias**: The model may reflect biases present in the training data
- **Cultural Nuances**: Some cultural expressions may not translate perfectly


## Contact

For questions or issues, please contact: [[email protected]][[email protected]]

---

## Additional Resources

- **Base Model**: [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- **Training Code**: [TBD]
- **Dataset**: [Data](https://huggingface.co/datasets/OpenOranje/ReOpus-ApolloBooks-EN-NL-1M)

## Version History

- **v1.0** (2025-10-24): Initial release

---

**License**: [Apache 2.0]