BECoach — Qwen3.5-4B (4-bit MLX)
The first Qwen3.5-4B model quantized for on-device inference on iPhone via mlx-swift.
Built by APEX LEARN for the BECoach iOS app.
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-4B |
| Quantization | 4-bit (4.503 bits/weight) |
| Framework | MLX (Apple Silicon) |
| Size on disk | ~2.4 GB |
| RAM at runtime | ~2.5 GB |
| License | Apache 2.0 |
What is BECoach?
BECoach is an on-device AI coaching assistant for iPhone. No cloud. No API calls. Pure edge inference powered by Apple's MLX framework running directly on the iPhone's neural engine.
Usage (iOS / mlx-swift)
import MLXLLM
let modelID = "apexlearn/BECoach-Qwen3.5-4B-4bit-mlx"
let config = ModelConfiguration(id: modelID)
let model = try await LLMModelFactory.shared.loadContainer(configuration: config)
Usage (Python / mlx-lm)
from mlx_lm import load, generate
model, tokenizer = load("apexlearn/BECoach-Qwen3.5-4B-4bit-mlx")
response = generate(model, tokenizer, prompt="Hello, how can you help me?")
print(response)
Device Compatibility
| Device | RAM | Status |
|---|---|---|
| iPhone 16 Pro / Pro Max | 8 GB | ✅ Recommended |
| iPhone 15 Pro / Pro Max | 8 GB | ✅ Supported |
| iPhone 15 / 16 (standard) | 6 GB | ⚠️ May hit memory pressure |
| iPhone 14 and earlier | ≤6 GB | ❌ Not recommended |
| iPad Pro M1+ | 8–16 GB | ✅ Works well |
| Mac (Apple Silicon) | Any | ✅ Full performance |
Conversion
Converted from Qwen/Qwen3.5-4B using mlx-lm:
python3 -m mlx_lm.convert \
--hf-path Qwen/Qwen3.5-4B \
--mlx-path ./BECoach-Qwen3.5-4B-4bit-mlx \
--quantize \
--q-bits 4
License
Apache 2.0 — same as the base Qwen3.5 model.
- Downloads last month
- 390
Model size
0.7B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support