--- library_name: transformers tags: - en-nl-translation - translation license: apache-2.0 datasets: - OpenOranje/ReOpus-ApolloBooks-EN-NL-1M language: - nl metrics: - bleu - rouge base_model: - Qwen/Qwen3-0.6B --- # OpenOranje/TweeTaal-nl-en-0.6B ## Model Description The TweeTaal-en-nl model has been fine-tuned on Dutch-English & English-Dutch translation pairs to provide accurate, fluent translations. The compact 0.6B parameter size makes it suitable for deployment in resource-constrained environments while maintaining strong translation quality. ### Intended Use **Primary Use Case**: Translating Dutch text to English / English text to Dutch across various domains **Recommended Applications**: - General-purpose Dutch-to-English and English-to-Dutch translation - Content localization - Cross-lingual communication tools - Educational language learning applications ## Performance ### Benchmark Results Benchmarks ## Training Details ### Training Procedure **Method**: Supervised Fine-Tuning (SFT) - The model was trained on parallel Dutch-English text pairs - Standard cross-entropy loss optimization - The base Qwen3-0.6b model was adapted specifically for translation tasks ### Training Data The model was trained on Dutch-English parallel corpora. (Note: Specify your actual dataset details, such as: - Dataset name and source - Number of training examples - Domain coverage (general, technical, literary, etc.) - Data preprocessing steps) ## Usage ### Basic Usage Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load model and tokenizer model_name = "OpenOranje/TweeTaal-nl-en-0.6B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Prepare input dutch_text = "Hallo, hoe gaat het met je?" prompt = f"Translate from Dutch to English:\n{dutch_text}" message = [{"role":"user", "content": prompt}] # Generate translation inputs = tokenizer.apply_chat_template(message, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7) translation = tokenizer.decode(outputs[0], skip_special_tokens=True) print(translation) ``` ### Prompt Format The model expects input in the following format: ``` Translate the following text from Dutch to English:\n{dutch_text} ``` ``` Translate the following text from English to Dutch:\n{english_text} ``` ### Inference Parameters Recommended generation parameters: - **Temperature**: 0.7 (adjust for creativity vs. consistency) - **Max tokens**: Set based on expected translation length - **Top-p**: 0.9 (nucleus sampling) ## Limitations - **Context Length**: Trained on 4096 Tokens - **Rare Words**: May struggle with highly specialized terminology or rare vocabulary not well-represented in training data - **Informal Language**: Performance on slang, dialects, or very informal Dutch may vary ## Ethical Considerations - **Training Data Bias**: The model may reflect biases present in the training data - **Cultural Nuances**: Some cultural expressions may not translate perfectly ## Contact For questions or issues, please contact: [theaisarth@proton.me][kartikaggarwal98@gmail.com] --- ## Additional Resources - **Base Model**: [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) - **Training Code**: [TBD] - **Dataset**: [Data](https://huggingface.co/datasets/OpenOranje/ReOpus-ApolloBooks-EN-NL-1M) ## Version History - **v1.0** (2025-10-24): Initial release --- **License**: [Apache 2.0]