OverflowML: Auto-optimal model loading for any hardware

Khaeldur · March 10, 2026, 8:13pm

Sharing a library I built to solve the “model too big for GPU” problem automatically.

Problem: Loading large models requires knowing which combination of device_map, quantization, and offloading to use — and it varies by hardware. FP8 doesn’t work with CPU offload on Windows. INT4 needs bitsandbytes. Sequential offload and attention_slicing crash together.

Solution:

import overflowml

# Detects your hardware, picks strategy, loads with optimal config
model, tokenizer = overflowml.load_model("meta-llama/Llama-3-70B")

Under the hood it:

Detects GPU type, VRAM, RAM, FP8/BF16 support
Estimates model size from config (no weight download needed)
Picks the best strategy: direct load, FP8, BitsAndBytes INT4/INT8, model_cpu_offload, or sequential_cpu_offload
Sets up device_map, max_memory, quantization_config automatically
Avoids known incompatibilities

Also works with diffusers pipelines:

overflowml.optimize_pipeline(pipe, model_size_gb=40)

CLI tool included:

$ overflowml benchmark      # shows what models your hardware can run
$ overflowml plan 70        # detailed strategy for a 70GB model
$ overflowml detect         # show hardware capabilities

Cross-platform: NVIDIA (CUDA), Apple Silicon (MPS/MLX unified memory), AMD (ROCm planned).

pip install overflowml[transformers]

GitHub: GitHub - Khaeldur/overflowml: Run AI models larger than your GPU. Auto-detects hardware, picks optimal memory strategy. · GitHub
PyPI: overflowml · PyPI

Topic		Replies	Views
General question about large model loading 🤗Accelerate	2	1024	November 28, 2024
Loading quantized model on CPU only 🤗Transformers	6	19584	February 3, 2025
Feature Suggestion! running large gguf models! Inference Endpoints on the Hub	0	571	December 3, 2023
Facebook/opt-30b model inferencing Models	3	2762	January 19, 2023
Willitrun: benchmark-backed CLI to check whether ML models fit/run on your hardware Show and Tell	0	33	April 6, 2026

OverflowML: Auto-optimal model loading for any hardware

Related topics