BECoach — Qwen3.5-4B (4-bit MLX)

The first Qwen3.5-4B model quantized for on-device inference on iPhone via mlx-swift.

Built by APEX LEARN for the BECoach iOS app.

Model Details

Property Value
Base model Qwen/Qwen3.5-4B
Quantization 4-bit (4.503 bits/weight)
Framework MLX (Apple Silicon)
Size on disk ~2.4 GB
RAM at runtime ~2.5 GB
License Apache 2.0

What is BECoach?

BECoach is an on-device AI coaching assistant for iPhone. No cloud. No API calls. Pure edge inference powered by Apple's MLX framework running directly on the iPhone's neural engine.

Usage (iOS / mlx-swift)

import MLXLLM

let modelID = "apexlearn/BECoach-Qwen3.5-4B-4bit-mlx"
let config = ModelConfiguration(id: modelID)
let model = try await LLMModelFactory.shared.loadContainer(configuration: config)

Usage (Python / mlx-lm)

from mlx_lm import load, generate

model, tokenizer = load("apexlearn/BECoach-Qwen3.5-4B-4bit-mlx")
response = generate(model, tokenizer, prompt="Hello, how can you help me?")
print(response)

Device Compatibility

Device RAM Status
iPhone 16 Pro / Pro Max 8 GB ✅ Recommended
iPhone 15 Pro / Pro Max 8 GB ✅ Supported
iPhone 15 / 16 (standard) 6 GB ⚠️ May hit memory pressure
iPhone 14 and earlier ≤6 GB ❌ Not recommended
iPad Pro M1+ 8–16 GB ✅ Works well
Mac (Apple Silicon) Any ✅ Full performance

Conversion

Converted from Qwen/Qwen3.5-4B using mlx-lm:

python3 -m mlx_lm.convert \
  --hf-path Qwen/Qwen3.5-4B \
  --mlx-path ./BECoach-Qwen3.5-4B-4bit-mlx \
  --quantize \
  --q-bits 4

License

Apache 2.0 — same as the base Qwen3.5 model.

Downloads last month
390
Safetensors
Model size
0.7B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for apexlearn/BECoach-Qwen3.5-4B-4bit-mlx

Finetuned
Qwen/Qwen3.5-4B
Quantized
(77)
this model