gpt-oss-20b-reap-0.4-mxfp4-gguf

This repository contains a GGUF quantized version of the sandeshrajx/gpt-oss-20b-reap-0.4-mxfp4 model.

Model Description

This model is a GGUF quantized version of the MXFP4 quantized openai/gpt-oss-20b model.

Original Model: openai/gpt-oss-20b
Pruning Method: reap with a compression ratio of 0.4
First Quantization Method: MXFP4 weight-only quantization
Second Quantization Method: GGUF (Q8_0) using llama.cpp
Dataset used for pruning/quantization (if applicable): theblackcat102/evol-codealpaca-v1

The original MXFP4 quantization specifically targeted the "expert" layers of the model, skipping self-attention and router layers, as is standard practice for Mixture-of-Experts (MoE) models to optimize performance and reduce size. This GGUF quantization further reduces the model size for efficient inference with llama.cpp.

Usage

You can use this model with llama.cpp or compatible GGUF loaders.

Quantization Details

The model was first pruned with a 0.4 compression ratio using reap, then quantized to MXFP4. Subsequently, it was converted to GGUF (Q8_0) format using the llama.cpp conversion script.

Pruning Commands Used:

python ./reap/src/reap/prune.py \
    --model-name "openai/gpt-oss-20b" \
    --run_observer_only true \
    --samples_per_category 32

python ./reap/src/reap/prune.py \
    --model-name "openai/gpt-oss-20b" \
    --compression-ratio 0.4 \
    --prune-method reap

MXFP4 Quantization Command Used:

python Model-Optimizer/examples/gpt-oss/convert_oai_mxfp4_weight_only.py \
    --model_path /workspace/artifacts/gpt-oss-20b/evol-codealpaca-v1/pruned_models/reap-seed_42-0.4-mxfp4 \
    --output_path /workspace/artifacts/gpt-oss-20b/evol-codealpaca-v1/pruned_models/reap-seed_42-0.4-mxfp4-quantized

GGUF Quantization Command Used:

python llama.cpp/convert_hf_to_gguf.py \
    --outtype q8_0 \
    --outfile /path/to/output/gpt-oss-20b-reap-0.4-mxfp4-q8_0.gguf \
    /path/to/downloaded/mxfp4_model

License

(Please specify the license of the original model and any modifications)

Downloads last month: 79

GGUF

Model size

14B params

Architecture

gpt-oss

Hardware compatibility

8-bit

sandeshrajx
/

gpt-oss-20b-reap-0.4-mxfp4-gguf

gpt-oss-20b-reap-0.4-mxfp4-gguf

Model Description

Usage

Quantization Details

License

Evaluation results