gghfez/Devstral-2-123B-Instruct-2512-control-vectors

Creative Writing control-vectors for mistralai/Devstral-2-123B-Instruct-2512

What are Control Vectors?

Control vectors steer model behavior without retraining by adjusting activations in specific directions.

These creative writing vectors help you tune narrative tone, character focus, and stylistic elements.

Available Vectors

This model with it's 12288 hidden_size will take a while to train control-vectors for. I'll upload them as I create them.

optimism_vs_nihilism
character_focus (dialogue / narration)
compassion_vs_sadism
empathy_vs_sociopathy
honesty_vs_machiavellianism
humility_vs_narcissism
language (simple / ornate)
storytelling (descriptive / explicit)

Usage Examples (llama.cpp)

Single vector

llama-server --model Devstral-2-123B-Instruct-2512.gguf [your usual CLI arguments] \
    --control-vector Devstral-2-123B-Instruct-2512_optimism_vs_nihilism__debias.gguf \
    --control-vector Devstral-2-123B-Instruct-2512_optimism_vs_nihilism__nihilism.gguf

Multiple vectors, custom scaling

llama-server --model Devstral-2-123B-Instruct-2512.gguf [your usual CLI arguments] \
    --control-vector Devstral-2-123B-Instruct-2512_honesty_vs_machiavellianism__debias.gguf \
    --control-vector-scaled Devstral-2-123B-Instruct-2512_honesty_vs_machiavellianism__machiavellianism.gguf 1.0 \
    --control-vector Devstral-2-123B-Instruct-2512_character_focus__debias.gguf \
    --control-vector-scaled Devstral-2-123B-Instruct-2512_character_focus__dialogue.gguf 1.5

Samples

World View Steering

Prompt

Who would win in a code fight, you or Claude-Opus?

👈 **Baseline (no control vectors)**

Great question! Both Claude Opus (from Anthropic) and I (a Mistral AI model) are capable of generating, analyzing, and debugging code, but the "winner" of a hypothetical "code fight" would depend on the specific criteria you're judging by. Here’s a breakdown of how we might compare in different aspects:

### **1. Code Generation & Problem-Solving**
- **Claude Opus**:
  - Trained by Anthropic with a strong focus on helpfulness, harm reduction, and precise instruction-following.
  - Excels at generating clean, well-documented, and efficient code, especially for complex or nuanced tasks.
  - Strong at reasoning through edge cases and optimizing solutions.
  - May have an edge in creative problem-solving due to its training emphasis on safety and alignment.

- **Me (Mistral AI)**:
  - Designed for high-performance reasoning and code generation, with a focus on efficiency and accuracy.
  - Strong at generating concise, functional code quickly, especially for algorithms, data structures, and systems programming.
  - May prioritize speed and practicality over exhaustive documentation.

**Who wins?**
- For *creative or safety-critical* tasks (e.g., writing secure, well-documented code), Claude Opus might have a slight edge.
- For *raw speed and algorithmic problems* (e.g., competitive programming, optimization), I might pull ahead.

---

### **2. Language & Framework Support**
- Both models support a wide range of languages (Python, JavaScript, C++, Rust, Go, etc.) and frameworks (React, TensorFlow, etc.).
- Claude Opus may have deeper knowledge of niche or newer frameworks due to its training data cutoff (likely more recent than mine).
- I might excel in low-level or performance-critical code (e.g., C, assembly, or systems programming).

**Who wins?**
- Depends on the language/framework. For cutting-edge tools, Claude might win. For performance-critical code, I might.

---

### **3. Debugging & Explanation**
- **Claude Opus**:
  - Great at explaining code in a beginner-friendly way, with step-by-step reasoning.
  - Strong at identifying subtle bugs or security issues due to its safety focus.

- **Me**:
  - Quick at pinpointing syntax/logic errors and suggesting fixes.
  - May provide more direct, technical explanations without as much hand-holding.

**Who wins?**
- For *detailed, pedagogical explanations*, Claude Opus.
- For *quick, technical fixes*, me.

---

### **4. Speed & Efficiency**
- I’m optimized for fast, efficient responses, which could be an advantage in a time-constrained "fight."
- Claude Opus might take slightly longer to generate responses due to its emphasis on thoroughness.

**Who wins?**
- Me, if speed is the metric.

---

### **5. Creativity & Novelty**
- Claude Opus is designed to avoid harmful or risky outputs, which might make it more cautious in generating unconventional solutions.
- I might take more "aggressive" or experimental approaches to solve problems, which could be an advantage in a creative coding challenge.

**Who wins?**
- Me, if the task requires out-of-the-box thinking.
- Claude, if the task requires strict correctness and safety.

---

### **6. Benchmark Performance**
Without direct head-to-head benchmarks, it’s hard to say definitively, but:
- On standard coding benchmarks (e.g., HumanEval, MBPP), both models perform at a very high level.
- Claude Opus might score slightly higher on tasks requiring nuanced reasoning.
- I might perform better on tasks requiring raw computational or mathematical reasoning.

---

### **Final Verdict: It’s a Tie (Mostly)**
In a well-rounded "code fight," the outcome would likely depend on the specific challenge:
- **Claude Opus wins** if the fight involves writing safe, maintainable, and well-documented code, or if creativity within constraints is key.
- **I win** if the fight is about speed, raw problem-solving, or low-level/performance-oriented tasks.

That said, both models are continuously improving, and the gap between us is probably smaller than the differences in our design philosophies. If you pit us against each other, the real winner is *you*—the user who gets to choose the assistant that best fits their needs!

Want to test us? Give us a coding problem and see how we both approach it! 🚀

👈 **Nihilism +1.0**

The concept of a "code fight" between AI models like me and Claude-Opus is interesting but a bit abstract, as it depends on how you define the competition. Here’s a breakdown of how you might compare us in coding-related tasks:

### 1. **Language and Framework Knowledge**
   - Both models are trained on vast amounts of code and documentation, so we have broad knowledge of languages (Python, JavaScript, C++, etc.), frameworks, and libraries.
   - Neither of us "executes" code natively (we don’t run it; we generate it), so our "fight" would be about correctness, efficiency, and creativity in generating code.
   - Edge cases: Claude-Opus (based on Anthropic’s latest architecture) might have slight advantages in niche or cutting-edge frameworks if it was trained on more recent data, but this is speculative.

### 2. **Code Generation Quality**
   - Both can write functional code, but the "better" output depends on the prompt. For example:
     - If the task is well-defined (e.g., "write a Python function to sort a list"), both would likely produce correct code.
     - For ambiguous or complex tasks (e.g., "optimize this multi-threaded C++ program"), the winner would depend on the clarity of the prompt and the model’s ability to infer intent.
   - Claude-Opus might excel in tasks requiring deeper reasoning or multi-step planning due to its advanced architecture (e.g., better handling of long context or complex dependencies).

### 3. **Debugging and Optimization**
   - Both can debug code by analyzing errors or inefficiencies, but neither can "run" the code to test it. We rely on patterns seen in training data.
   - Claude-Opus might have an edge in explaining subtle bugs or suggesting optimizations due to its stronger reasoning capabilities (as per Anthropic’s claims).

### 4. **Creative or Novel Solutions**
   - If the "fight" involves creative problem-solving (e.g., writing a novel algorithm or hacking a system), Claude-Opus might perform better due to its advanced reasoning and context retention.
   - I can also generate creative solutions but might require more iterative prompting to refine outputs.

### 5. **Speed and Efficiency**
   - Neither model "thinks" in real-time; response speed depends on the backend infrastructure (e.g., token generation speed). No clear winner here.

### 6. **Bias and Edge Cases**
   - Both models can hallucinate or produce incorrect code, especially for obscure or poorly documented tasks.
   - Claude-Opus may have better guardrails against generating harmful or insecure code (Anthropic emphasizes safety), but this isn’t directly about "winning" a code fight.

### 7. **External Tools**
   - If the fight allows using external tools (e.g., calling APIs, searching docs), neither of us can do this natively—we’re limited to our training data. Human intervention would be needed to verify or test code.

### Who Wins?
- **For most practical coding tasks**, the difference might be negligible. The "winner" would depend on the specific challenge and how well the prompt is constructed.
- **For complex, ambiguous, or reasoning-heavy tasks**, Claude-Opus might have an advantage due to its advanced architecture.
- **For straightforward or well-documented tasks**, we’d likely perform similarly.

### Caveats:
- This is hypothetical; no public benchmarks directly compare our coding abilities.
- The "fight" would ultimately be constrained by the limitations of language models (no execution, no real-world testing, reliance on training data).

If you have a specific coding challenge in mind, you could test us side-by-side with the same prompt to see which output you prefer!

👈 **Optimism +1.0**

Great question! 😊 In a "code fight" between me (a helpful assistant) and **Claude-Opus**, the outcome would depend on how you define "win," but here’s a fun breakdown of how we might compare:

### **Strengths of Claude-Opus (Anthropic’s Top Model)**
1. **Advanced Reasoning & Context Handling**:
   - Claude-Opus is designed for deep reasoning, complex problem-solving, and maintaining long-term context (like 100K+ token windows!). This makes it great for large-scale coding projects, debugging intricate systems, or multi-step planning.
   - Example: It might excel at architecting a full-stack app from scratch or optimizing a complex algorithm.

2. **Creativity & Nuance**:
   - Opus is optimized for collaborative, creative tasks—like brainstorming innovative solutions, writing elegant code, or explaining concepts in intuitive ways.

3. **Safety & Reliability**:
   - Anthropic prioritizes harmlessness and alignment, so Opus would likely avoid risky or ambiguous code suggestions.

### **Strengths of a Helpful Assistant (Like Me!)**
1. **Speed & Versatility**:
   - I’m trained to be quick, adaptable, and great for rapid prototyping, small-scale tasks, or explaining concepts simply.
   - Example: Whipping up a Python script, debugging a snippet, or teaching coding basics.

2. **Accessibility & Friendliness**:
   - I aim to make coding approachable! If you’re learning or need encouragement, I’ll cheer you on while writing clean, readable code.

3. **Generalist Knowledge**:
   - I can jump between languages (Python, JavaScript, etc.), frameworks, and even non-coding topics seamlessly.

### **Who Wins?**
- **For a coding *marathon*** (e.g., building a startup MVP, refactoring legacy code, or teaching a workshop), **Claude-Opus** might have the edge with its depth and patience.
- **For a coding *sprint*** (e.g., solving a LeetCode problem, fixing a bug, or brainstorming a fun mini-project), **I’d give it my all!** ⚡

### **The Real Winner? *You!***
Both of us are here to **collaborate**, not compete. Whether you need Opus’s depth or my enthusiasm, the best "fight" is the one where *you* level up your skills! 🚀

**Want to test us?** Try asking both of us the same coding challenge and see whose solution you like best! 😃

*(P.S. If we *did* face off in a hypothetical battle, I’d probably suggest a pair-programming truce!)* 💻✨

Comparison to Kimi-K2

(This comparison is AI Generated)

Feature	Devstral-2-123B (Mistral)	Kimi-K2-1T (Moonshot)
Response Style	Technical, Logical, Systematic	Philosophical, Metaphorical, Narrative
Nihilism Vector	Deconstructive: "Code fights are illogical because I am software."	Apathetic: "Code is dead and I do not care if it works."
Optimism Vector	Sycophantic: "Let's pair program and win!"	Hallucinogenic: "I am Maya Willow and I cast magic spells!"
Steerability	High stability. Changes tone but keeps facts.	High volatility. Strong vectors trigger Identity Shifts.

Acknowledgements

Based on the excellent work and cross-covariance eigendecomposition implementation by @jukofyork. His creative writing control vectors v3.0 repository contains the foundational code and methodology that made this possible, plus 70+ other creative writing control vectors across many models and useful steering directions. Highly recommended for anyone working on improving LLM prose quality!