This checkpoint is the compact CodeT5-based solver we shipped with the MindsAI @ Tufa Labs ARC Prize 2025 entry. It was trained with the identical curriculum and TPU-v4 setup as mindware/arc-codet5-660m, but starts from the Salesforce/codet5-small initialization so it can be deployed on modest GPUs while still supporting our Test-Time-Finetuning (TTT) workflow.

Key details:

  • Shared training recipe – same long-horizon span-corruption + instruction + ARC fine-tuning schedule we used for the 660M model; only the base capacity differs.
  • Decoder-preserving pruning – we retain the full encoder but prune the decoder depth after observing that shallow decoders converge faster for smaller checkpoints.
  • TTT + AIRV ready – configuration matches the refactored runtime (self-ensemble, AIRV voting, optional span-corruption refinement hooks) so you can drop it directly into MODEL_PATHS.

📚 ARC-Related Datasets & Frameworks

  • RE-ARC — procedurally generates examples for the 400 ARC training tasks (we also include RE-ARC eval + ARC 1.5).
  • ConceptARC
  • 1D-ARC
  • ARC_gym, Sort-of-ARC
  • Andreas Koepf’s generator suites (includes RE-ARC-style grids, code generation targets, and solution graphs).
  • Jack Cole’s custom generators covering ~70 tasks plus larger concept sets (cellular automata, math-derived boards, etc.).

Several auxiliary datasets predict task metadata (graphs, heuristics, explanations) rather than final boards; they are part of the broader instruction mixture this model saw during pretraining.

ARC Data Formatting

  • ARC tasks ship as JSON where each task_id contains train pairs and test inputs; grids are rectangular lists of ints 0-9, typically 1×1–30×30 but we accept up to 50×50.
  • Example payload:
    {
      "task_id": {
        "train": [
          {"input": [[0,0],[1,1]], "output": [[1,1],[1,1]]}
        ],
        "test": [
          {"input": [[0,0,0],[0,1,0],[0,0,0]]}
        ]
      }
    }
    
  • Prompts serialized during training/TTT/inference follow the same solve: template as our other releases. Grids are flattened via grid_to_string so rows become digit sequences separated by spaces. Multiple training examples increment the index (input2, output2, ...).
  • Example prompt snippet:
    solve: train input1 000 010 000 output1 11 3 3 10 111 101 111. input2 00 02 output2 5 2 2 20 22 20. test tinput1 0000 0300 0000 0000 toutput1 
    
  • Decoder targets (correct_answer) use output_prefix -> {total_chars} {height} {width} {symbols} {row_strings}. Example donut target:
     11 3 3 10 111 101 111.
    

The weights here are best suited for development, ablations, and Kaggle submissions where GPU memory is constrained. For the strongest results, pair this checkpoint with the larger 660M or SCR variants in an ensemble, but the small model alone remains a strong baseline.

Downloads last month
13
Safetensors
Model size
76.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mindware/arc-codet5-small

Finetuned
(91)
this model

Datasets used to train mindware/arc-codet5-small