AGiXT Logo

Introducing AGiXT Fine-Tuned Models: Purpose-Built AI for Intelligent Agents

We're excited to announce the release of four specialized fine-tuned models designed specifically for AGiXT agent interactions. These models represent a significant step forward in creating AI agents that truly understand AGiXT's unique command execution patterns, extension system, and agentic workflows.

The Training Data

Before diving into the models, let's talk about what makes them special: the training data.

Agent Interaction Dataset (936 examples)

This dataset captures real AGiXT agent behavior patterns including:

AGiXT Command Syntax: Proper <execute><name>Command Name</name><param>value</param></execute> formatting
Thinking/Answer Structure: Using <thinking> tags for reasoning and <answer> tags for responses
Tool Delegation Patterns: When to use "Ask GitHub Copilot" for coding tasks vs. handling requests directly
Extension Command Usage: Correct invocation of 778+ AGiXT commands across extensions like:
- github_copilot - Code generation and repository management
- web_browsing - Web search, page interaction, arXiv research
- postgres_database - Natural language SQL queries
- essential_abilities - File operations, workspace management
- google_sso, microsoft365, slack - Third-party integrations
Multi-Turn Conversations: Maintaining context while executing multiple commands

AbilitySelect + Complexity Dataset (11,140 examples)

A specialized dataset for combined ability selection and complexity scoring:

Intent-to-Command Mapping: Given a user request, select the most appropriate AGiXT command
Complexity Scoring (0-100): Determine task difficulty for intelligent model routing
Extension-Aware Routing: Understanding which extension provides which capability
Dual-Purpose Output: Single inference returns both {score}|{ability} for efficient routing

The Models

🖼️ AGiXT-Qwen3-VL-4B

Vision-Language Model | 4B Parameters

Our flagship multimodal model, fine-tuned from Qwen3-VL-4B-Instruct on the Agent Interaction Dataset.

What It Learned:

AGiXT's XML-based command execution format (<execute>, <thinking>, <answer> tags)
When to delegate coding tasks to GitHub Copilot vs. using other extensions
Proper parameter formatting for all 778+ AGiXT commands
Multi-step reasoning patterns for complex agent workflows

Vision Capabilities:

Analyze screenshots to understand UI state during web automation tasks
Process images shared in conversations for context-aware responses
Support the View Image command with intelligent image analysis

Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)

🖼️ AGiXT-Qwen3-VL-2B

Compact Vision-Language Model | 2B Parameters

Same AGiXT training as VL-4B but in a lighter package, fine-tuned from Qwen3-VL-2B-Instruct.

Ideal For:

Resource-constrained deployments (runs on 4GB+ VRAM with quantization)
Edge deployments and local-first setups
Faster inference when vision capabilities are needed but latency matters

Same Training Quality: Identical Agent Interaction Dataset as the 4B model—same command understanding, same AGiXT fluency.

Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)

💬 AGiXT-Qwen3-4B

Text Model | 4B Parameters

Our core text model, fine-tuned from Qwen3-4B-Instruct-2507 on the Agent Interaction Dataset.

What It Learned:

AGiXT Command Execution: Native understanding of the <execute> XML format with proper command names and parameters
Thinking-First Approach: Uses <thinking> blocks to reason through problems before executing commands
Tool Delegation: Knows when to use "Ask GitHub Copilot" for coding vs. using built-in abilities
Extension Awareness: Understands capabilities across github_copilot, web_browsing, postgres_database, essential_abilities, and dozens more
Structured Responses: Consistent <answer> formatting for clean integration with AGiXT's response parsing

Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)

⚡ AGiXT-AbilitySelect-270m

Combined Ability Selection + Complexity Scoring | 270M Parameters

An ultra-compact dual-purpose model fine-tuned from Gemma-3-1B on the **AbilitySelect + Complexity Dataset (11,140 examples)**—trained to output both the best command AND a complexity score in a single inference.

Output Format: {score}|{ability} (e.g., 45|Write to File)

What It Learned:

Intent Classification: Map natural language requests to specific AGiXT commands
Complexity Scoring: Rate task difficulty from 0-100 based on:
- Task type (code generation, file ops, research, debugging)
- Number of steps required
- Whether expert-level reasoning is needed
Extension Routing: Know which of the 778+ commands best matches a request
Unified Decision Making: Score and ability inform each other for better accuracy

How It's Used in AGiXT: This model runs as a fast "router" before the main agent model:

User sends a request
AbilitySelect returns score|ability in sub-100ms
AGiXT routes to the appropriate model based on complexity:
- Score 0-25 → VL-2B (simple tasks: greetings, time, file listing)
- Score 26-50 → VL-4B (moderate: file editing, searches)
- Score 51-75 → VL-4B + thinking mode (complex: code generation, multi-step)
- Score 76-100 → External API like Claude, Gemini, etc. (expert: multi-step code, debugging, architecture)
Result: Right-sized model for every task, faster responses, lower cost

Why a Combined Model?

One inference, two decisions: Complexity and ability in a single call
Speed: 270M parameters = lightning fast inference (<50ms)
Coherent routing: Score and ability naturally inform each other
Resource Efficiency: Runs alongside larger models without competing for VRAM
Simpler architecture: One router model instead of two

Available Formats: SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K), ONNX (CPU inference)

Why Fine-Tuned Models Matter for AGiXT

The Problem with Generic LLMs

Out-of-the-box models don't know AGiXT exists. They struggle with:

AGiXT's specific XML command syntax (<execute><name>...</name></execute>)
The thinking/answer response structure agents expect
When to delegate to GitHub Copilot vs. using other tools
The 778+ available commands and their proper parameters
Maintaining consistent behavior across multi-turn agent sessions

What Fine-Tuning Fixes

Our models were trained on real AGiXT interaction patterns:

✅ Native command syntax—no more malformed XML
✅ Proper delegation—coding tasks go to Copilot, searches go to web_browsing
✅ Correct parameters—knows what each command needs
✅ Consistent structure—<thinking> then <execute> then <answer>
✅ Extension awareness—understands the full AGiXT ecosystem

How AGiXT Uses These Models Together

These four models work as an integrated system within AGiXT, not as standalone alternatives:

User Request: "Write a Python script to process CSV files"
     │
     ▼
┌─────────────────────────────────────┐
│  AGiXT-AbilitySelect-270m           │
│  Single inference, dual output      │
│  (sub-50ms on CPU via ONNX)         │
└─────────────────────────────────────┘
     │
     ▼ Returns: "65|Write to File"
     │         (complexity=65, ability=Write to File)
     │
┌─────────────────────────────────────┐
│  Complexity-Based Model Routing     │
│  Score 65 = High complexity         │
│  + Check if images attached         │
└─────────────────────────────────────┘
     │
     ├─── Score 0-25 ────────────► AGiXT-Qwen3-VL-2B (simple tasks)
     │    "What time is it?" → 8   
     │
     ├─── Score 26-50 ───────────► AGiXT-Qwen3-VL-4B (moderate tasks)
     │    "Search for Python docs" → 35
     │
     ├─── Score 51-75 ───────────► AGiXT-Qwen3-VL-4B + thinking (complex)
     │    "Write a CSV processor" → 65  ◄── This request
     │
     └─── Score 76-100 ──────────► External API (Claude, Gemini, etc.)
          "Debug this race condition" → 85

The Flow Explained

AbilitySelect First: Every request hits the 270M model first. In a single sub-50ms inference, it returns both the complexity score (0-100) AND the most appropriate ability. No separate complexity calculation needed.
Intelligent Routing: The complexity score directly determines which model handles the request:
- 0-25 (Simple): VL-2B handles greetings, time queries, basic file listings
- 26-50 (Moderate): VL-4B for file editing, web searches, data retrieval
- 51-75 (Complex): VL-4B with extended thinking for code generation, multi-step tasks
- 76-100 (Expert): Routes to external APIs (Claude, Gemini, GPT-4, etc.) for multi-step code generation, debugging, architecture
Ability Context: The selected ability helps the main model focus. If AbilitySelect returns 65|Write to File, the main model knows this is a file-writing task requiring code generation.
Consistent Quality: Because all three main models were trained on the same AGiXT dataset, they all produce properly-formatted commands with correct <thinking>, <execute>, and <answer> structure. The routing is about efficiency—using the right-sized model for each task.
Cost & Speed Optimization: Simple queries get fast responses from VL-2B. Complex tasks get the full reasoning power of VL-4B. Expert tasks leverage external APIs. You're not paying 4B-model latency for "what time is it?"

Deployment Options

Full Precision (16-bit SafeTensors)

Best for: Maximum quality, further fine-tuning, or when VRAM isn't a concern

GGUF Quantizations

Quantization	Use Case	Memory Savings
Q6_K	Best quality, production deployments	~50% reduction
Q5_K_M	Balanced quality and efficiency	~60% reduction
Q4_K_M	Resource-constrained environments	~70% reduction

Getting Started

All models are available on HuggingFace:

Usage with ezLocalai (Recommended)

ezLocalai is our recommended local inference server—it's designed to work seamlessly with AGiXT and supports all the features these models need.

Why ezLocalai? We built it to be as easy as possible. Just tell it which model you want—ezLocalai handles everything else:

Auto-detects your hardware: Finds your GPU (NVIDIA/AMD) or falls back to CPU automatically
Optimal settings out of the box: Calculates max context length, temperature, top_p based on your available VRAM/RAM
No configuration required: No editing config files, no tuning parameters, no figuring out quantization levels
Just start talking: Pick a model, wait for download, start chatting

# Install the CLI
pip install ezlocalai

# Start with AGiXT models
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF

# Or run multiple models (comma-separated)
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF,JoshXT/AGiXT-AbilitySelect-270m-GGUF

Models are downloaded automatically on first use. Once running, access the OpenAI-compatible API at http://localhost:8091.

CLI Commands:

ezlocalai stop      # Stop the container
ezlocalai restart   # Restart the container  
ezlocalai status    # Check if running and show configuration
ezlocalai logs      # Show container logs
ezlocalai update    # Pull/rebuild latest images

# Send prompts directly from CLI
ezlocalai prompt "Hello, world!"
ezlocalai prompt "What's in this image?" -image ./photo.jpg

ezLocalai handles:

Automatic GGUF downloading from HuggingFace
Vision model support with proper image handling
OpenAI-compatible API that AGiXT expects
GPU memory management for running multiple models

Usage with Ollama

# Create a Modelfile for each model
cat > Modelfile << EOF
FROM ./AGiXT-Qwen3-4B.Q5_K_M.gguf
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
EOF

ollama create agixt-qwen3-4b -f Modelfile
ollama run agixt-qwen3-4b

Usage with AGiXT

Configure your AGiXT agent to use these models via the ezLocalai provider:

# Agent settings
provider: ezlocalai
model: AGiXT-Qwen3-4B
vision_model: AGiXT-Qwen3-VL-4B
ability_select_model: AGiXT-AbilitySelect-270m  # Returns score|ability

# Complexity-based routing thresholds (optional, these are defaults)
complexity_routing:
  simple_max: 25      # Score 0-25 -> VL-2B
  moderate_max: 50    # Score 26-50 -> VL-4B  
  complex_max: 75     # Score 51-75 -> VL-4B + thinking
  # Score 76-100 -> External API (GitHub Copilot)

AGiXT will automatically:

Run every request through AbilitySelect (sub-50ms via ONNX)
Parse the score|ability response
Route to the appropriate model based on complexity score
Pass the selected ability as context to the main model

What's Next

This release is version 1 of our AGiXT-optimized models. We're already working on:

Larger Model Variants: 7B and 14B versions for users who want maximum capability
Expanded Training Data: More extension coverage, more edge cases, more multi-turn examples
Domain-Specific Fine-Tunes: Models optimized for coding agents, research agents, automation agents
Continuous Improvement: As AGiXT adds new extensions, we'll update the training data and retrain

Training Details

Framework: Unsloth (2x faster training, 60% less memory)
Hardware: NVIDIA RTX 4090 (24GB)
Training Method: LoRA fine-tuning (r=64, alpha=128)
Epochs: 2 per model
Quantization: GGUF via llama.cpp (Q4_K_M, Q5_K_M, Q6_K)

Acknowledgments

These models were fine-tuned using Unsloth, which enabled 2x faster training with significant memory savings. Base models provided by Qwen and Google.

License: Apache 2.0

Questions or Feedback? Open an issue on AGiXT GitHub or join our community discussions.

Downloads last month: 77

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for JoshXT/AGiXT-Qwen3-4B

Quantizations

1 model