πͺ DARA β Detect & Assist Recognition AI
"Mata untuk semua" β Eyes for everyone
Lightweight Vision-Language Model for Assistive Technology
β‘ What is DARA?
DARA is a lightweight VLM designed to help visually impaired users understand their surroundings through 5 specialized modes β with voice output support.
| Feature | Spec |
|---|---|
| π§ Base Model | Florence-2-base |
| π¦ Size | 232M params (~500MB) |
| β‘ Speed | <200ms on CPU |
| π Languages | English, Indonesian |
| π Output | Text + Voice (TTS) |
π― 5 Intelligence Modes
| Mode | Use Case | Example Output |
|---|---|---|
| ποΈ Scene | Describe surroundings | "Kitchen with wooden table. Stove on left." |
| π Emotion | Read facial expressions | "Happy. They seem in good spirits!" |
| π Medicine | Read medicine labels | "Dosage: 500mg. Take as prescribed." |
| π΅ Currency | Identify money | "Rp 50,000. Blue-colored note." |
| π Text | OCR for signs/labels | "EXIT sign detected." |
π Quick Start
from dara import DARA
# Initialize
dara = DARA()
# Use any mode
result = dara.detect(
image_path="photo.jpg",
mode="scene", # scene | emotion | medicine | currency | text
language="en" # en | id
)
print(result["result"]) # Text description
# Audio saved to result["audio"]
π₯ Installation
pip install torch transformers pillow gtts
git clone https://github.com/ardelyo/dara.git
β οΈ Important Notes
π₯ Medical Disclaimer: Medicine mode is for reference only. Always consult healthcare professionals.
π Privacy: All processing runs locally. No images are uploaded.
π Performance
| Device | Latency |
|---|---|
| CPU (i7) | ~180ms |
| GPU (RTX 3060) | ~45ms |
| Mobile | ~320ms |
π Citation
@misc{dara2024,
title={DARA: Detect & Assist Recognition AI},
author={Ardelyo},
year={2024},
url={https://github.com/ardelyo/dara}
}
Made with β€οΈ for Accessibility
β GitHub β’ π€ Demo
Model tree for Ardelyo/dara-v1
Base model
microsoft/Florence-2-base