Introducing AGiXT Fine-Tuned Models: Purpose-Built AI for Intelligent Agents
We're excited to announce the release of four specialized fine-tuned models designed specifically for AGiXT agent interactions. These models represent a significant step forward in creating AI agents that truly understand AGiXT's unique command execution patterns, extension system, and agentic workflows.
The Training Data
Before diving into the models, let's talk about what makes them special: the training data.
Agent Interaction Dataset (936 examples)
This dataset captures real AGiXT agent behavior patterns including:
- AGiXT Command Syntax: Proper
<execute><name>Command Name</name><param>value</param></execute>formatting - Thinking/Answer Structure: Using
<thinking>tags for reasoning and<answer>tags for responses - Tool Delegation Patterns: When to use "Ask GitHub Copilot" for coding tasks vs. handling requests directly
- Extension Command Usage: Correct invocation of 778+ AGiXT commands across extensions like:
github_copilot- Code generation and repository managementweb_browsing- Web search, page interaction, arXiv researchpostgres_database- Natural language SQL queriesessential_abilities- File operations, workspace managementgoogle_sso,microsoft365,slack- Third-party integrations
- Multi-Turn Conversations: Maintaining context while executing multiple commands
AbilitySelect + Complexity Dataset (11,140 examples)
A specialized dataset for combined ability selection and complexity scoring:
- Intent-to-Command Mapping: Given a user request, select the most appropriate AGiXT command
- Complexity Scoring (0-100): Determine task difficulty for intelligent model routing
- Extension-Aware Routing: Understanding which extension provides which capability
- Dual-Purpose Output: Single inference returns both
{score}|{ability}for efficient routing
The Models
🖼️ AGiXT-Qwen3-VL-4B
Vision-Language Model | 4B Parameters
Our flagship multimodal model, fine-tuned from Qwen3-VL-4B-Instruct on the Agent Interaction Dataset.
What It Learned:
- AGiXT's XML-based command execution format (
<execute>,<thinking>,<answer>tags) - When to delegate coding tasks to GitHub Copilot vs. using other extensions
- Proper parameter formatting for all 778+ AGiXT commands
- Multi-step reasoning patterns for complex agent workflows
Vision Capabilities:
- Analyze screenshots to understand UI state during web automation tasks
- Process images shared in conversations for context-aware responses
- Support the
View Imagecommand with intelligent image analysis
Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)
🖼️ AGiXT-Qwen3-VL-2B
Compact Vision-Language Model | 2B Parameters
Same AGiXT training as VL-4B but in a lighter package, fine-tuned from Qwen3-VL-2B-Instruct.
Ideal For:
- Resource-constrained deployments (runs on 4GB+ VRAM with quantization)
- Edge deployments and local-first setups
- Faster inference when vision capabilities are needed but latency matters
Same Training Quality: Identical Agent Interaction Dataset as the 4B model—same command understanding, same AGiXT fluency.
Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)
💬 AGiXT-Qwen3-4B
Text Model | 4B Parameters
Our core text model, fine-tuned from Qwen3-4B-Instruct-2507 on the Agent Interaction Dataset.
What It Learned:
- AGiXT Command Execution: Native understanding of the
<execute>XML format with proper command names and parameters - Thinking-First Approach: Uses
<thinking>blocks to reason through problems before executing commands - Tool Delegation: Knows when to use "Ask GitHub Copilot" for coding vs. using built-in abilities
- Extension Awareness: Understands capabilities across github_copilot, web_browsing, postgres_database, essential_abilities, and dozens more
- Structured Responses: Consistent
<answer>formatting for clean integration with AGiXT's response parsing
Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)
⚡ AGiXT-AbilitySelect-270m
Combined Ability Selection + Complexity Scoring | 270M Parameters
An ultra-compact dual-purpose model fine-tuned from Gemma-3-1B on the **AbilitySelect + Complexity Dataset (11,140 examples)**—trained to output both the best command AND a complexity score in a single inference.
Output Format: {score}|{ability} (e.g., 45|Write to File)
What It Learned:
- Intent Classification: Map natural language requests to specific AGiXT commands
- Complexity Scoring: Rate task difficulty from 0-100 based on:
- Task type (code generation, file ops, research, debugging)
- Number of steps required
- Whether expert-level reasoning is needed
- Extension Routing: Know which of the 778+ commands best matches a request
- Unified Decision Making: Score and ability inform each other for better accuracy
How It's Used in AGiXT: This model runs as a fast "router" before the main agent model:
- User sends a request
- AbilitySelect returns
score|abilityin sub-100ms - AGiXT routes to the appropriate model based on complexity:
- Score 0-25 → VL-2B (simple tasks: greetings, time, file listing)
- Score 26-50 → VL-4B (moderate: file editing, searches)
- Score 51-75 → VL-4B + thinking mode (complex: code generation, multi-step)
- Score 76-100 → External API like Claude, Gemini, etc. (expert: multi-step code, debugging, architecture)
- Result: Right-sized model for every task, faster responses, lower cost
Why a Combined Model?
- One inference, two decisions: Complexity and ability in a single call
- Speed: 270M parameters = lightning fast inference (<50ms)
- Coherent routing: Score and ability naturally inform each other
- Resource Efficiency: Runs alongside larger models without competing for VRAM
- Simpler architecture: One router model instead of two
Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K), ONNX (CPU inference)
Why Fine-Tuned Models Matter for AGiXT
The Problem with Generic LLMs
Out-of-the-box models don't know AGiXT exists. They struggle with:
- AGiXT's specific XML command syntax (
<execute><name>...</name></execute>) - The thinking/answer response structure agents expect
- When to delegate to GitHub Copilot vs. using other tools
- The 778+ available commands and their proper parameters
- Maintaining consistent behavior across multi-turn agent sessions
What Fine-Tuning Fixes
Our models were trained on real AGiXT interaction patterns:
- ✅ Native command syntax—no more malformed XML
- ✅ Proper delegation—coding tasks go to Copilot, searches go to web_browsing
- ✅ Correct parameters—knows what each command needs
- ✅ Consistent structure—
<thinking>then<execute>then<answer> - ✅ Extension awareness—understands the full AGiXT ecosystem
How AGiXT Uses These Models Together
These four models work as an integrated system within AGiXT, not as standalone alternatives:
User Request: "Write a Python script to process CSV files"
│
▼
┌─────────────────────────────────────┐
│ AGiXT-AbilitySelect-270m │
│ Single inference, dual output │
│ (sub-50ms on CPU via ONNX) │
└─────────────────────────────────────┘
│
▼ Returns: "65|Write to File"
│ (complexity=65, ability=Write to File)
│
┌─────────────────────────────────────┐
│ Complexity-Based Model Routing │
│ Score 65 = High complexity │
│ + Check if images attached │
└─────────────────────────────────────┘
│
├─── Score 0-25 ────────────► AGiXT-Qwen3-VL-2B (simple tasks)
│ "What time is it?" → 8
│
├─── Score 26-50 ───────────► AGiXT-Qwen3-VL-4B (moderate tasks)
│ "Search for Python docs" → 35
│
├─── Score 51-75 ───────────► AGiXT-Qwen3-VL-4B + thinking (complex)
│ "Write a CSV processor" → 65 ◄── This request
│
└─── Score 76-100 ──────────► External API (Claude, Gemini, etc.)
"Debug this race condition" → 85
The Flow Explained
AbilitySelect First: Every request hits the 270M model first. In a single sub-50ms inference, it returns both the complexity score (0-100) AND the most appropriate ability. No separate complexity calculation needed.
Intelligent Routing: The complexity score directly determines which model handles the request:
- 0-25 (Simple): VL-2B handles greetings, time queries, basic file listings
- 26-50 (Moderate): VL-4B for file editing, web searches, data retrieval
- 51-75 (Complex): VL-4B with extended thinking for code generation, multi-step tasks
- 76-100 (Expert): Routes to external APIs (Claude, Gemini, GPT-4, etc.) for multi-step code generation, debugging, architecture
Ability Context: The selected ability helps the main model focus. If AbilitySelect returns
65|Write to File, the main model knows this is a file-writing task requiring code generation.Consistent Quality: Because all three main models were trained on the same AGiXT dataset, they all produce properly-formatted commands with correct
<thinking>,<execute>, and<answer>structure. The routing is about efficiency—using the right-sized model for each task.Cost & Speed Optimization: Simple queries get fast responses from VL-2B. Complex tasks get the full reasoning power of VL-4B. Expert tasks leverage external APIs. You're not paying 4B-model latency for "what time is it?"
Deployment Options
Full Precision (16-bit SafeTensors)
Best for: Maximum quality, further fine-tuning, or when VRAM isn't a concern
GGUF Quantizations
| Quantization | Use Case | Memory Savings |
|---|---|---|
| Q6_K | Best quality, production deployments | ~50% reduction |
| Q5_K_M | Balanced quality and efficiency | ~60% reduction |
| Q4_K_M | Resource-constrained environments | ~70% reduction |
Getting Started
All models are available on HuggingFace:
- JoshXT/AGiXT-Qwen3-VL-4B | GGUF
- JoshXT/AGiXT-Qwen3-VL-2B | GGUF
- JoshXT/AGiXT-Qwen3-4B | GGUF
- JoshXT/AGiXT-AbilitySelect-270m | GGUF | ONNX
Usage with ezLocalai (Recommended)
ezLocalai is our recommended local inference server—it's designed to work seamlessly with AGiXT and supports all the features these models need.
Why ezLocalai? We built it to be as easy as possible. Just tell it which model you want—ezLocalai handles everything else:
- Auto-detects your hardware: Finds your GPU (NVIDIA/AMD) or falls back to CPU automatically
- Optimal settings out of the box: Calculates max context length, temperature, top_p based on your available VRAM/RAM
- No configuration required: No editing config files, no tuning parameters, no figuring out quantization levels
- Just start talking: Pick a model, wait for download, start chatting
# Install the CLI
pip install ezlocalai
# Start with AGiXT models
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF
# Or run multiple models (comma-separated)
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF,JoshXT/AGiXT-AbilitySelect-270m-GGUF
Models are downloaded automatically on first use. Once running, access the OpenAI-compatible API at http://localhost:8091.
CLI Commands:
ezlocalai stop # Stop the container
ezlocalai restart # Restart the container
ezlocalai status # Check if running and show configuration
ezlocalai logs # Show container logs
ezlocalai update # Pull/rebuild latest images
# Send prompts directly from CLI
ezlocalai prompt "Hello, world!"
ezlocalai prompt "What's in this image?" -image ./photo.jpg
ezLocalai handles:
- Automatic GGUF downloading from HuggingFace
- Vision model support with proper image handling
- OpenAI-compatible API that AGiXT expects
- GPU memory management for running multiple models
Usage with Ollama
# Create a Modelfile for each model
cat > Modelfile << EOF
FROM ./AGiXT-Qwen3-4B.Q5_K_M.gguf
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
EOF
ollama create agixt-qwen3-4b -f Modelfile
ollama run agixt-qwen3-4b
Usage with AGiXT
Configure your AGiXT agent to use these models via the ezLocalai provider:
# Agent settings
provider: ezlocalai
model: AGiXT-Qwen3-4B
vision_model: AGiXT-Qwen3-VL-4B
ability_select_model: AGiXT-AbilitySelect-270m # Returns score|ability
# Complexity-based routing thresholds (optional, these are defaults)
complexity_routing:
simple_max: 25 # Score 0-25 -> VL-2B
moderate_max: 50 # Score 26-50 -> VL-4B
complex_max: 75 # Score 51-75 -> VL-4B + thinking
# Score 76-100 -> External API (GitHub Copilot)
AGiXT will automatically:
- Run every request through AbilitySelect (sub-50ms via ONNX)
- Parse the
score|abilityresponse - Route to the appropriate model based on complexity score
- Pass the selected ability as context to the main model
What's Next
This release is version 1 of our AGiXT-optimized models. We're already working on:
- Larger Model Variants: 7B and 14B versions for users who want maximum capability
- Expanded Training Data: More extension coverage, more edge cases, more multi-turn examples
- Domain-Specific Fine-Tunes: Models optimized for coding agents, research agents, automation agents
- Continuous Improvement: As AGiXT adds new extensions, we'll update the training data and retrain
Training Details
- Framework: Unsloth (2x faster training, 60% less memory)
- Hardware: NVIDIA RTX 4090 (24GB)
- Training Method: LoRA fine-tuning (r=64, alpha=128)
- Epochs: 2 per model
- Quantization: GGUF via llama.cpp (Q4_K_M, Q5_K_M, Q6_K)
Acknowledgments
These models were fine-tuned using Unsloth, which enabled 2x faster training with significant memory savings. Base models provided by Qwen and Google.
License: Apache 2.0
Questions or Feedback? Open an issue on AGiXT GitHub or join our community discussions.
- Downloads last month
- 77