SmolLM2-135M-Instruct Q4_K_M GGUF

This is a Q4_K_M quantized GGUF conversion of HuggingFaceTB/SmolLM2-135M-Instruct optimized for on-device inference with llama.cpp.

Model Details

Property Value
Original Model SmolLM2-135M-Instruct
Parameters 135 million
Quantization Q4_K_M (4-bit, medium quality)
File Size ~101 MB
Context Window 8,192 tokens
Architecture LLaMA-style transformer
Training Data 2 trillion tokens

Intended Use

This model is optimized for:

  • Mobile/Edge Deployment: Runs on ANY iOS device, even the oldest
  • llama.cpp Integration: Compatible with llama.cpp and its bindings
  • On-Device AI: Private, offline inference without cloud dependencies

Capabilities

  • Ultra-Tiny: Smallest model available, instant responses
  • Works Everywhere: Runs on any device with minimal resources
  • Basic Q&A: Good for simple chat and quick interactions
  • Minimal Battery Usage: Extremely efficient
  • Trained on 2T Tokens: Impressive capability for its tiny size

Usage with llama.cpp

./llama-cli -m SmolLM2-135M-Instruct.Q4_K_M.gguf -p "Your prompt here" -n 512

License

This model inherits the Apache 2.0 license from the original SmolLM2 model.

Attribution

Downloads last month
96
GGUF
Model size
0.1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jc-builds/SmolLM2-135M-Instruct-Q4_K_M-GGUF

Quantized
(89)
this model