Dhee-NxtGen-Qwen3-Malayalam-v2
Model Description
Dhee-NxtGen-Qwen3-Malayalam-v2 is a large language model designed for natural and fluent Malayalam understanding and text generation.
It is built upon the Qwen3 architecture and optimized for assistant-style dialogue, function-calling reasoning, and multi-turn conversations.
This model is part of DheeYantra’s multilingual LLM initiative, developed in collaboration with NxtGen Cloud Technologies Private Limited, to advance Indic conversational AI systems.
Key Features
- Context-aware Malayalam text generation
- Optimized for reasoning and function-calling use cases
- Suitable for dialogue systems, summarization, and open-domain conversations
- Fully compatible with 🤗 Hugging Face Transformers
- Optimized for VLLM deployment for high-performance serving
Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "dheeyantra/dhee-nxtgen-qwen3-malayalam-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
prompt = """<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
നിങ്ങൾക്ക് എനിക്ക് ഒരു അപ്പോയിന്റ്മെന്റ് ഷെഡ്യൂൾ ചെയ്ത് തരാമോ?<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Uses & Limitations
Intended Uses
- Malayalam conversational chatbots and assistants
- Function-calling and structured response generation
- Story generation and summarization in Malayalam
- Natural dialogue systems for Indic AI applications
Limitations
- May generate inaccurate or biased responses in rare cases
- Performance can vary on out-of-domain or code-mixed inputs
- Primarily optimized for Malayalam; other languages may produce less fluent results
VLLM / High-Performance Serving Requirements
For high-throughput serving with vLLM, ensure the following environment:
- GPU with compute capability ≥ 8.0 (e.g., NVIDIA A100)
- PyTorch 2.1+ and CUDA toolkit installed
- For V100 GPUs (sm70), vLLM GPU inference is not supported; CPU fallback is possible but slower.
Install dependencies:
pip install torch transformers vllm sentencepiece
Run vLLM server:
vllm serve --model dheeyantra/dhee-nxtgen-qwen3-malayalam-v2 --host 0.0.0.0 --port 8000
License
Released under the Apache 2.0 License.
Developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd.