Gemma-3-27b-IT-FT-Reasoning: The "Iron Logic" Prototype 🧠

"Order arising from Chaos." — A paradigm shift in LLM Instruction Tuning.

Model Author: aifeifei798
Base Model: google/gemma-3-27b-it
Training Method: Fragmented Training (FT)
Dataset: TeichAI/gemini-3-pro-preview-high-reasoning-250x


Thanks mradermacher: For creating the GGUF versions of these models

https://huggingface.co/mradermacher/Gemma-3-27b-IT-FT-Reasoning-GGUF

https://huggingface.co/mradermacher/Gemma-3-27b-IT-FT-Reasoning-i1-GGUF


🌟 Introduction: The "FT" Revolution

Gemma-3-27b-IT-FT-Reasoning is not just another fine-tune. It is the first public proof-of-concept for the Fragmented Training (FT) paradigm.

Trained on only 248 examples with extreme noise injection, this model demonstrates that Logic can be decoupled from Syntax. It achieves what was previously thought to require thousands of samples: Emergent, Autonomous Chain-of-Thought (CoT).

This model represents Stage 2 of the Iron Logic Pipeline:

  1. Base Model
  2. ➡️ FT (Logic Injection) [THIS MODEL]
  3. Standard SFT (Style Alignment & Cleanup)
  4. RLVR (Reasoning Optimization)

Note: As a raw "Logic Injection" checkpoint, this model exhibits powerful reasoning but retains some artifacts (see Limitations). It is intended to demonstrate the sheer power of the FT technique.

⚡ The "Fragmented Training" (FT) Technology

Why does this model work with only 248 examples?

Standard SFT relies on Linear Pattern Matching (predicting token $t$ based on $t-1$). This often leads to rote memorization. Fragmented Training (FT) destroys this linearity.

The "Cognitive Burden" Protocol

  • 70% Input Shuffling: We randomly shuffled 70% of the tokens in the User Instruction and Context.
  • Pristine Output: The model was forced to generate a perfect, logical CoT response from this "chaos."

Why FT is a Game Changer:

  1. Global Semantic Reconstruction: The model cannot rely on grammar or position. It must scan the entire context to extract semantic intent. This creates "Iron Logic" that is robust to noise.
  2. Extreme Data Efficiency: Under this cognitive load, 1 FT sample $\approx$ 100 SFT samples. The gradients generated are far richer, allowing the model to "grok" complex reasoning patterns (like CoT) instantly.
  3. Confidence Sharpening: The model's probability distribution becomes extremely decisive, leading to a ~30% speedup in inference time for non-thinking tasks.

🤖 Emergent Behavior: Autonomous Mode Switching

The most striking result of FT is the model's ability to self-regulate its thinking process without prompt engineering.

  • System 1 (Fast): For creative or simple tasks (e.g., "Write a poem", "Hi"), the model detects low complexity and outputs directly.
  • System 2 (Slow): For logic, math, or coding (e.g., "Fix this Rust bug"), the model detects high complexity (learned from the FT noise patterns) and automatically triggers the <think> token.

✨ Real-World Inference Demo (The "Raw" Experience)

Here is a raw, unedited output from the model. Notice two fascinating behaviors emergent from Fragmented Training:

  1. The "Thinking Loop": The model repeats specific thought patterns (Considering Game Structure) as it aligns its internal logic attention. This is a hallmark of the FT "Cognitive Burden" process.
  2. Context Retention: It successfully holds the logic for Python, Rust, and Bug Fixing in a single pass after the thinking phase.

User Prompt:

Create a Python Flappy Bird game. Then, create a totally different game in Rust. Finally, find potential bugs in both and show the fixed code.

Model Response:

🧠 Click to expand the full Thought Process (Chain-of-Thought)

Thought for 7.40 seconds

Examining Game Logic I'm now focused on the core mechanics of Flappy Bird. I've been breaking down the essential elements: gravity, bird movement, pipe generation, collision detection, and scoring. My current priority is to define these components clearly before diving into the actual code implementation.

Analyzing Collision Detection I'm currently deep-diving into the specifics of collision detection for Flappy Bird. I've been focusing on how to accurately determine when the bird hits a pipe or the ground, and how to handle those events within the game logic. My next step is to refine these calculations and ensure they integrate seamlessly with the rest of the game mechanics.

Refining Game Mechanics I'm now focused on refining the core game mechanics. I've been working through the specifics of gravity, bird movement, pipe generation, collision detection, and scoring. The goal is to define these components clearly before diving into the actual code implementation. My current priority is ensuring that all elements work together harmoniously within the game loop.

Considering Game Structure I'm now focusing on how to structure the Flappy Bird game in Python using Pygame. I've been thinking about the best way to organize the code, including defining classes for the bird, pipes, and ground. My current priority is to create a clear and maintainable codebase that will make it easier to implement the game logic later.

Evaluating Game Structure I'm now focusing on how to structure the Flappy Bird game in Python using Pygame... (Self-Correction / Loop)

Considering Game Structure I'm now focusing on how to structure the Flappy Bird game in Python using Pygame... (Finalizing Logic)

Actual Response:

Here are two complete games: Flappy Bird in Python (using Pygame) and a simple "Guess the Number" game in Rust.

(Code snippets for Python Flappy Bird and Rust Guessing Game follow...)

Potential Bugs and Fixes

Python (Flappy Bird)

  • Bug 1: Infinite Pipe Generation. Pipes pile up if not managed.
    • Fix: Added a cooldown timer PIPE_GENERATE_COOLDOWN.
  • Bug 2: Score Calculation Error. Score might drift if pipe speed varies.
    • Fix: Check center alignment if pipe.rect.right < bird.rect.left.

Rust (Guess the Number)

  • Bug 1: Overflow Error. Entering a number > u32::MAX crashes the app.
    • Fix: Switched data type to u64 for safety.

⚠️ Limitations & The "Context Bleeding" Artifact

Because this model is the result of Stage 2 (Logic Injection), it has learned that "Global Semantics > Local Position." This leads to a unique side effect:

  • Context Bleeding: The model may pull strong semantic concepts from previous unrelated context (e.g., mentioning "pipes" from a previous game when coding a new one).
  • Logic Loops: In the <think> phase, the model may occasionally repeat its reasoning steps (as seen in the demo above).
  • Why? The FT process taught the model to aggressively hunt for semantic keywords anywhere in the input.

Think of this model as a raw diamond: The hardest part (Logic) is done; the polishing (SFT) comes next.

💻 Usage Code

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "aifeifei798/gemma-3-27b-it-FT-reasoning"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# ------------------------------------------------------------------
# TEST CASE: Complex Multi-step Logic
# We skip "Hi" and "2+2" to jump straight to the reasoning capability.
# ------------------------------------------------------------------

prompt = """
Create a Python Flappy Bird game. 
Then, create a totally different game in Rust. 
Finally, find potential bugs in both and show the fixed code.
"""

messages = [{"role": "user", "content": prompt}]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")

print("Generating response... (Watch for the <think> tag!)")

outputs = model.generate(
    input_ids,
    max_new_tokens=8192, # FT models need space to think!
    temperature=0.7
)

response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)

📜 Citation

The Fragmented Training methodology is the core contribution of this work. If you use this technique or model, please cite:

@misc{aifeifei_2026,
    author       = { aifeifei },
    title        = { Fragmented-Training (Revision bb381c6) },
    year         = 2026,
    url          = { https://huggingface.co/aifeifei798/Fragmented-Training },
    doi          = { 10.57967/hf/7592 },
    publisher    = { Hugging Face }
}

Created with the Iron Logic Pipeline by aifeifei798.


🔬 Replicating the Experiment (Training Script)

The core of this project is the FT methodology. Below is the simplified training script to help you replicate this "Emergent Reasoning" phenomenon on your own datasets.

📋 Click to expand the Training Script
from unsloth import FastLanguageModel
from unsloth import get_chat_template
import os
import torch
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
import random

# --- 环境变量配置 ---
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "0"
os.environ["HF_HUB_OFFLINE"] = "1"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# --- 路径配置 ---
my_load_model = "gemma-3-27b-it"
my_model_name = "gemini-3-pro-preview-high-reasoning-250x"
max_seq_length = 4096

local_model_path = f"./models/{my_load_model}"
local_data_dir = f"./datasets/{my_model_name}"
local_data_file = os.path.join(local_data_dir, "gemini-3-pro-preview-high-reasoning-250x.jsonl")
final_model_path = f"./tmodels/{my_load_model}-FT-lora"

# 1. 加载模型和分词器
print(f"✅ 步骤 1/6: 正在从本地路径 '{local_model_path}' 加载模型...")
model, processor = FastLanguageModel.from_pretrained(
    model_name=local_model_path,
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=True,
    full_finetuning=False,
    local_files_only=True,
    use_gradient_checkpointing = "unsloth",
)

print("🎉 模型加载完成!")

# 2. 配置 LoRA
print("✅ 步骤 2/6: 正在配置 LoRA 适配器...")
model = FastLanguageModel.get_peft_model(
    model,
    r=8,
    finetune_vision_layers=False,  # False if not finetuning vision layers
    finetune_language_layers=True,  # False if not finetuning language layers
    finetune_attention_modules=True,  # False if not finetuning attention layers
    finetune_mlp_modules=True,  # False if not finetuning MLP layers
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

processor = get_chat_template(
    processor,
    "gemma-3"
)

# 3. 数据处理 (融合官方写法 + 你的负重逻辑)
# =================================================================================
def apply_burden(text, burden_ratio=0.7):
    """文本乱序处理"""
    if not text: return ""
    words = text.split(' ')
    if len(words) > 3:
        num_to_shuffle = int(len(words) * burden_ratio)
        indices_to_shuffle = random.sample(range(len(words)), num_to_shuffle)
        shuffled_subset = [words[i] for i in indices_to_shuffle]
        random.shuffle(shuffled_subset)
        shuffled_words = list(words)
        for i, original_index in enumerate(indices_to_shuffle):
            shuffled_words[original_index] = shuffled_subset[i]
        return ' '.join(shuffled_words)
    return text

def formatting_prompts_func(examples):
    """
    处理 messages 格式,并应用负重逻辑,最后使用 apply_chat_template 生成 text
    """
    texts = []
    # examples["messages"] 是 batch 级的列表
    for conversation in examples["messages"]:
        # 1. 深拷贝对话列表,避免修改原始数据导致缓存错误
        # conversation 是 list of dicts: [{"role": "user", ...}, ...]
        processed_conversation = []

        for msg in conversation:
            new_msg = msg.copy()
            # 【负重逻辑】只对 User 乱序
            if new_msg["role"] == "user":
                new_msg["content"] = apply_burden(new_msg["content"])
            processed_conversation.append(new_msg)

        # 2. 使用 Tokenizer 自带的 apply_chat_template
        # 这就是官方推荐的做法,它会自动处理 system/user/assistant 标签
        text = processor.apply_chat_template(
            processed_conversation,
            tokenize=False,
            add_generation_prompt=False
        )
        texts.append(text)

    return {"text": texts}
# =================================================================================

print(f"✅ 步骤 3/6: 正在加载并处理数据集...")
dataset = load_dataset("json", data_files=local_data_file, split="train")

# 获取列名
column_names = dataset.column_names

# 应用处理
dataset = dataset.map(
    formatting_prompts_func,
    batched=True,
    remove_columns=column_names, # 移除原始 messages,只留 text
    load_from_cache_file=False,  # 禁用缓存以确保乱序每次生效(虽然这里是一次性处理)
)

print(f"🎉 数据集处理完成!样本示例:\n{dataset[0]['text']}...")

# 4. 开始训练
print("\n✅ 步骤 4/5: 开始模型微调...")
trainer = SFTTrainer(
    model=model,
    processing_class=processor.tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=8, # 根据CPU核心数调整
    packing=False,
    args=SFTConfig(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=1,
        warmup_steps=25,
        num_train_epochs=3,
        learning_rate=2e-5,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=5,
        optim="paged_adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir = f"output/{final_model_path}",
        report_to="none",
    ),
)

trainer_stats = trainer.train()

# 5. 保存
print("\n✅ 步骤 5/5: 保存模型...")
model.save_pretrained(final_model_path)
processor.save_pretrained(final_model_path)
print(f"🎉 保存完毕: {final_model_path}")

Disclaimer and User Agreement

  1. Introduction

Thank you for your interest in accessing this model (“the Model”).

Before you access, download, or use the Model or any derivative works, please read and understand this Disclaimer and User Agreement (“Agreement”).

By checking “I have read and agree” and accessing the Model, you acknowledge that you have read, understood, and agreed to all terms of this Agreement.

If you do not agree with any part of this Agreement, do not request or use the Model.

  1. Nature of the Model & Risk Notice

The Model is trained using large-scale machine learning techniques and may generate inaccurate, false, offensive, violent, sexual, discriminatory, politically sensitive, or otherwise uncontrolled content.

The Model does not guarantee the accuracy, completeness, or legality of any generated content. You must independently evaluate and verify the outputs, and you assume all risks arising from their use.

The Model may reflect biases or errors present in its training data, potentially producing inappropriate or controversial outputs.

  1. License and Permitted Use

You may use the Model solely for lawful, compliant, and non-malicious purposes in research, learning, experimentation, and development, in accordance with applicable laws and regulations.

You must not use the Model for activities including, but not limited to:

Creating, distributing, or promoting unlawful, violent, pornographic, terrorist, discriminatory, defamatory, or privacy-invasive content;

Any activity that could cause significant negative impact on individuals, groups, organizations, or society;

High-risk applications such as automated decision-making, medical diagnosis, financial transactions, or legal advice without proper validation and human oversight.

You must not remove, alter, or circumvent any safety mechanisms implemented in the Model.

  1. Data and Privacy

You are solely responsible for any data processed or generated when using the Model, including compliance with data protection and privacy regulations.

The Model’s authors and contributors make no guarantees or warranties regarding data security or privacy.

  1. Limitation of Liability

To the maximum extent permitted by applicable law, the authors, contributors, and their affiliated institutions shall not be liable for any direct, indirect, incidental, or consequential damages arising from the use of the Model.

You agree to bear full legal responsibility for any disputes, claims, or litigation arising from your use of the Model, and you release the authors and contributors from any related liability.

  1. Updates and Termination

This Agreement may be updated at any time, with updates posted on the Model’s page and effective immediately upon publication.

If you violate this Agreement, the authors reserve the right to revoke your access to the Model at any time.

I have read and fully understand this Disclaimer and User Agreement, and I accept full responsibility for any consequences arising from my use of the Model.

Downloads last month
24
Safetensors
Model size
27B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aifeifei798/gemma-3-27b-it-FT-reasoning

Finetuned
(398)
this model
Quantizations
2 models

Dataset used to train aifeifei798/gemma-3-27b-it-FT-reasoning