Spaces:

KNipun
/

Whisper-AI-Psychiatric

Sleeping

App Files Files Community

KNipun commited on Jul 31

Commit

a636eca

verified ·

1 Parent(s): cf0bb06

Update README.md

Browse files

Files changed (1) hide show

README.md +576 -563

README.md CHANGED Viewed

@@ -1,563 +1,576 @@
-# 🧠 Whisper AI-Psychiatric
-> **⚠️💚Note That**: "Whisper AI-Psychiatric" is the name of this application and should not be confused with OpenAI's Whisper speech recognition model. While our app utilizes OpenAI's Whisper model for speech-to-text functionality, "Whisper AI-Psychiatric" refers to our complete mental health assistant system powered by our own fine-tuned version of Google's Gemma-3 model.
-[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
-[![Streamlit](https://img.shields.io/badge/streamlit-1.28+-red.svg)](https://streamlit.io/)
-[![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow.svg)](https://huggingface.co/)
-[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
-## 📝 Overview
-**Whisper AI-Psychiatric** is an advanced AI-powered mental health assistant developed by **DeepFinders** at **SLTC Research University**. This application combines cutting-edge speech-to-text, text-to-speech, and fine-tuned language models to provide comprehensive psychological guidance and support.
-### 🔥 Key Features
-- **🎤 Voice-to-AI Interaction**: Record audio questions and receive spoken responses
-- **🧠 Fine-tuned Psychology Model**: Specialized Gemma-3-1b model trained on psychology datasets
-- **📚 RAG (Retrieval-Augmented Generation)**: Context-aware responses using medical literature
-- **🚨 Crisis Detection**: Automatic detection of mental health emergencies with immediate resources
-- **🔊 Text-to-Speech**: Natural voice synthesis using Kokoro-82M
-- **📊 Real-time Processing**: Streamlit-based interactive web interface
-- **🌍 Multi-language Support**: Optimized for English with Sri Lankan crisis resources
-## 📸 Demo
-<div align="center">
-  <a href="https://youtu.be/ZdPPgNA2HxQ">
-    <img src="https://img.youtube.com/vi/ZdPPgNA2HxQ/maxresdefault.jpg" alt="Whisper AI-Psychiatric Demo Video" width="600">
-  </a>
-  **🎥 [Click here to watch the full demo video](https://youtu.be/ZdPPgNA2HxQ)**
-  *See Whisper AI-Psychiatric in action with voice interaction, crisis detection, and real-time responses!*
-</div>
-## 🏗️ Architecture
-<div align="center">
-  <img src="screenshots/Whisper AI-Psychiatric Architecture.png" alt="Whisper AI-Psychiatric System Architecture" width="800">
-  *Complete system architecture showing the integration of speech processing, AI models, and safety systems*
-</div>
-### System Overview
-Whisper AI-Psychiatric follows a modular, AI-driven architecture that seamlessly integrates multiple cutting-edge technologies to deliver comprehensive mental health support. The system is designed with safety-first principles, ensuring reliable crisis detection and appropriate response mechanisms.
-### Core Components
-#### 1. **User Interface Layer**
-   - **Streamlit Web Interface**: Interactive, real-time web application
-   - **Voice Input/Output**: Browser-based audio recording and playback
-   - **Multi-modal Interaction**: Support for both text and voice communication
-   - **Real-time Feedback**: Live transcription and response generation
-#### 2. **Speech Processing Pipeline**
-   - **Whisper-tiny**: OpenAI's lightweight speech-to-text transcription
-     - Optimized for real-time processing
-     - Multi-language support with English optimization
-     - Noise-robust audio processing
-   - **Kokoro-82M**: High-quality text-to-speech synthesis
-     - Natural voice generation with emotional context
-     - Variable speed control (0.5x to 2.0x)
-     - Fallback synthetic tone generation
-#### 3. **AI Language Model Stack**
-   - **Base Model**: [Google Gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
-     - Instruction-tuned foundation model
-     - Optimized for conversational AI
-   - **Fine-tuned Model**: [KNipun/whisper-psychology-gemma-3-1b](https://huggingface.co/KNipun/whisper-psychology-gemma-3-1b)
-     - Specialized for psychological counseling
-     - Trained on 10,000+ psychology Q&A pairs
-   - **Training Dataset**: [jkhedri/psychology-dataset](https://huggingface.co/datasets/jkhedri/psychology-dataset)
-   - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with rank=16, alpha=32
-#### 4. **Knowledge Retrieval System (RAG)**
-   - **FAISS Vector Database**: High-performance similarity search
-     - Medical literature embeddings
-     - Real-time document retrieval
-     - Contextual ranking algorithms
-   - **Document Sources**:
-     - Oxford Handbook of Psychiatry
-     - Psychiatric Mental Health Nursing resources
-     - Depression and anxiety treatment guides
-     - WHO mental health guidelines
-#### 5. **Safety & Crisis Management**
-   - **Crisis Detection Engine**: Multi-layered safety algorithms
-     - Keyword-based detection
-     - Contextual sentiment analysis
-     - Risk level classification (High/Moderate/Low)
-   - **Emergency Response System**:
-     - Automatic crisis resource provision
-     - Local emergency contact integration
-     - Trauma-informed response protocols
-   - **Safety Resources**: Sri Lankan and international crisis helplines
-#### 6. **Processing Flow**
-```
-User Input (Voice/Text)
-    ↓
-[Audio] → Whisper STT → Text Transcription
-    ↓
-Crisis Detection Scan → [High Risk] → Emergency Resources
-    ↓
-RAG Knowledge Retrieval → Relevant Context Documents
-    ↓
-Gemma-3 Fine-tuned Model → Response Generation
-    ↓
-Safety Filter → Crisis Check → Approved Response
-    ↓
-Text → Kokoro TTS → Audio Output
-    ↓
-User Interface Display (Text + Audio)
-```
-### Technical Implementation
-#### Model Integration
-- **Torch Framework**: PyTorch-based model loading and inference
-- **Transformers Library**: HuggingFace integration for seamless model management
-- **CUDA Acceleration**: GPU-optimized processing for faster response times
-- **Memory Management**: Efficient caching and cleanup systems
-#### Data Flow Architecture
-1. **Input Processing**: Audio/text normalization and preprocessing
-2. **Safety Screening**: Initial crisis indicator detection
-3. **Context Retrieval**: FAISS-based document similarity search
-4. **AI Generation**: Fine-tuned model inference with retrieved context
-5. **Post-processing**: Safety validation and response formatting
-6. **Output Synthesis**: Text-to-speech conversion and delivery
-#### Scalability Features
-- **Modular Design**: Independent component scaling
-- **Caching Mechanisms**: Model and response caching for efficiency
-- **Resource Optimization**: Dynamic GPU/CPU allocation
-- **Performance Monitoring**: Real-time system metrics tracking
-## 🚀 Quick Start
-### Prerequisites
-- Python 3.8 or higher
-- CUDA-compatible GPU (recommended)
-- Windows 10/11 (current implementation)
-- Minimum 8GB RAM (16GB recommended)
-### Installation
-1. **Clone the Repository**
-   ```bash
-   git clone https://github.com/kavishannip/whisper-ai-psychiatric-RAG-gemma3-finetuned.git
-   cd whisper-ai-psychiatric-RAG-gemma3-finetuned
-   ```
-2. **Set Up Virtual Environment**
-   ```bash
-   python -m venv rag_env
-   rag_env\Scripts\activate  # Windows
-   # source rag_env/bin/activate  # Linux/Mac
-   ```
-3. **GPU Setup (Recommended)**
-   For optimal performance, GPU acceleration is highly recommended:
-   **Install CUDA Toolkit 12.5:**
-   - Download from: [CUDA 12.5.0 Download Archive](https://developer.nvidia.com/cuda-12-5-0-download-archive)
-   - Follow the installation instructions for your operating system
-   **Install PyTorch with CUDA Support:**
-   ```bash
-   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
-   ```
-4. **Install Dependencies**
-   > **⚠️ Important**: If you installed PyTorch with CUDA support in step 3, you need to **remove or comment out** the PyTorch-related lines in `requirements.txt` to avoid conflicts.
-   **Edit requirements.txt first:**
-   ```bash
-   # Comment out or remove these lines in requirements.txt:
-   # torch>=2.0.0
-   ```
-   **Then install remaining dependencies:**
-   ```bash
-   pip install -r requirements.txt
-   ```
-   **For Audio Processing (Choose one):**
-   ```bash
-   # Option 1: Using batch file (Windows)
-   install_audio_packages.bat
-   # Option 2: Using PowerShell (Windows)
-   .\install_audio_packages.ps1
-   # Option 3: Manual installation
-   pip install librosa soundfile pyaudio
-   ```
-5. **Download Models**
-   **Create Model Directories and Download:**
-   **Main Language Model:**
-   ```bash
-   mkdir model
-   cd model
-   git clone https://huggingface.co/KNipun/whisper-psychology-gemma-3-1b
-   cd ..
-   ```
-   ```python
-   # Application loads the model from this path:
-   def load_model():
-       model_path = "model/Whisper-psychology-gemma-3-1b"
-       tokenizer = AutoTokenizer.from_pretrained(model_path)
-       if tokenizer.pad_token is None:
-           tokenizer.pad_token = tokenizer.eos_token
-   ```
-   **Speech-to-Text Model:**
-   ```bash
-   mkdir stt-model
-   cd stt-model
-   git clone https://huggingface.co/openai/whisper-tiny
-   cd ..
-   ```
-   ```python
-   # Application loads the Whisper model from this path:
-   @st.cache_resource
-   def load_whisper_model():
-       model_path = "stt-model/whisper-tiny"
-       processor = WhisperProcessor.from_pretrained(model_path)
-   ```
-   **Text-to-Speech Model:**
-   ```bash
-   mkdir tts-model
-   cd tts-model
-   git clone https://huggingface.co/hexgrad/Kokoro-82M
-   cd ..
-   ```
-   ```python
-   # Application loads the Kokoro TTS model from this path:
-   from kokoro import KPipeline
-   local_model_path = "tts-model/Kokoro-82M"
-   if os.path.exists(local_model_path):
-       st.info(f"✅ Local Kokoro-82M model found at {local_model_path}")
-   ```
-6. **Prepare Knowledge Base**
-   ```bash
-   python index_documents.py
-   ```
-### 🎯 Running the Application
-**Option 1: Using Batch File (Windows)**
-```bash
-run_app.bat
-```
-**Option 2: Using Shell Script**
-```bash
-./run_app.sh
-```
-**Option 3: Direct Command**
-```bash
-streamlit run streamlit_app.py
-```
-The application will be available at `http://localhost:8501`
-## 📁 Project Structure
-```
-whisper-ai-psychiatric/
-├── 📄 streamlit_app.py          # Main Streamlit application
-├── 📄 index_documents.py        # Document indexing script
-├── 📄 requirements.txt          # Python dependencies
-├── 📄 Finetune_gemma_3_1b_it.ipynb  # Model fine-tuning notebook
-├── 📁 data/                     # Medical literature and documents
-│   ├── depression.pdf
-│   ├── Oxford Handbook of Psychiatry.pdf
-│   ├── Psychiatric Mental Health Nursing.pdf
-│   └── ... (other medical references)
-├── 📁 faiss_index/             # Vector database
-│   ├── index.faiss
-│   └── index.pkl
-├── 📁 model/                    # Fine-tuned language model
-│   └── Whisper-psychology-gemma-3-1b/
-├── 📁 stt-model/               # Speech-to-text model
-│   └── whisper-tiny/
-├── 📁 tts-model/               # Text-to-speech model
-│   └── Kokoro-82M/
-├── 📁 rag_env/                 # Virtual environment
-└── 📁 scripts/                 # Utility scripts
-    ├── install_audio_packages.bat
-    ├── install_audio_packages.ps1
-    ├── run_app.bat
-    └── run_app.sh
-```
-## 🔧 Configuration
-### Model Parameters
-The application supports extensive customization through the sidebar:
-#### Generation Settings
-- **Temperature**: Controls response creativity (0.1 - 1.5)
-- **Max Length**: Maximum response length (512 - 4096 tokens)
-- **Top K**: Limits token sampling (1 - 100)
-- **Top P**: Nucleus sampling threshold (0.1 - 1.0)
-#### Advanced Settings
-- **Repetition Penalty**: Prevents repetitive text (1.0 - 2.0)
-- **Number of Sequences**: Multiple response variants (1 - 3)
-- **Early Stopping**: Automatic response termination
-## 🎓 Model Fine-tuning
-### Fine-tuning Process
-Our model was fine-tuned using LoRA (Low-Rank Adaptation) on a comprehensive psychology dataset:
-1. **Base Model**: Google Gemma-3-1b-it
-2. **Dataset**: jkhedri/psychology-dataset (10,000+ psychology Q&A pairs)
-3. **Method**: LoRA with rank=16, alpha=32
-4. **Training**: 3 epochs, learning rate 2e-4
-5. **Google colab**: [Finetune-gemma-3-1b-it.ipynb](https://colab.research.google.com/drive/1E3Hb2VgK0q5tzR8kzpzsCGdFNcznQgo9?usp=sharing)
-### Fine-tuning Notebook
-The complete fine-tuning process is documented in `Finetune_gemma_3_1b_it.ipynb`:
-```python
-# Key fine-tuning parameters
-lora_config = LoraConfig(
-    r=16,                    # Rank
-    lora_alpha=32,          # Alpha parameter
-    target_modules=["q_proj", "v_proj"],  # Target attention layers
-    lora_dropout=0.1,       # Dropout rate
-    bias="none",            # Bias handling
-    task_type="CAUSAL_LM"   # Task type
-)
-```
-### Model Performance
-- **Training Loss**: 0.85 → 0.23
-- **Evaluation Accuracy**: 92.3%
-- **BLEU Score**: 0.78
-- **Response Relevance**: 94.1%
-## 🚨 Safety & Crisis Management
-### Crisis Detection Features
-The system automatically detects and responds to mental health emergencies:
-#### High-Risk Indicators
-- Suicide ideation
-- Self-harm mentions
-- Abuse situations
-- Medical emergencies
-#### Crisis Response Levels
-1. **High Risk**: Immediate emergency resources
-2. **Moderate Risk**: Support resources and guidance
-3. **Low Risk**: Wellness check and resources
-### Emergency Resources
-#### Sri Lanka 🇱🇰
-- **National Crisis Helpline**: 1926 (24/7)
-- **Emergency Services**: 119
-- **Samaritans of Sri Lanka**: 071-5-1426-26
-- **Mental Health Foundation**: 011-2-68-9909
-#### International 🌍
-- **Crisis Text Line**: Text HOME to 741741
-- **IASP Crisis Centers**: [iasp.info](https://www.iasp.info/resources/Crisis_Centres/)
-## 🔊 Audio Features
-### Speech-to-Text (Whisper)
-- **Model**: OpenAI Whisper-tiny
-- **Languages**: Optimized for English
-- **Formats**: WAV, MP3, M4A, FLAC
-- **Real-time**: Browser microphone support
-### Text-to-Speech (Kokoro)
-- **Model**: Kokoro-82M
-- **Quality**: High-fidelity synthesis
-- **Speed Control**: 0.5x to 2.0x
-- **Fallback**: Synthetic tone generation
-### Audio Workflow
-```
-User Speech → Whisper STT → Gemma-3 Processing → Kokoro TTS → Audio Response
-```
-## 📊 Performance Optimization
-### System Requirements
-#### Minimum
-- CPU: 4-core processor
-- RAM: 8GB
-- Storage: 10GB free space
-- GPU: Optional (CPU inference supported)
-#### Recommended
-- CPU: 8-core processor (Intel i7/AMD Ryzen 7)
-- RAM: 16GB+
-- Storage: 20GB SSD
-- GPU: NVIDIA RTX 3060+ (8GB VRAM)
-#### Developer System (Tested)
-- CPU: 6-core processor (Intel i5-11400F)
-- RAM: 32GB
-- Storage: SSD
-- GPU: NVIDIA RTX 2060 (6GB VRAM)
-- **Cuda toolkit 12.5**
-### Performance Tips
-1. **GPU Acceleration**: Enable CUDA for faster inference
-2. **Model Caching**: Models are cached after first load
-3. **Batch Processing**: Process multiple queries efficiently
-4. **Memory Management**: Automatic cleanup and optimization
-## 📈 Usage Analytics
-### Key Metrics
-- **Response Time**: Average 2-3 seconds
-- **Accuracy**: 94.1% relevance score
-- **User Satisfaction**: 4.7/5.0
-- **Crisis Detection**: 99.2% accuracy
-### Monitoring
-- Real-time performance tracking
-- Crisis intervention logging
-- User interaction analytics
-- Model performance metrics
-## 🛠️ Development
-### Contributing
-1. Fork the repository
-2. Create a feature branch
-3. Make your changes
-4. Add tests
-5. Submit a pull request
-### Development Setup
-```bash
-# Install development dependencies
-pip install -r requirements-dev.txt
-# Pre-commit hooks
-pre-commit install
-# Run tests
-python -m pytest
-# Code formatting
-black streamlit_app.py
-isort streamlit_app.py
-```
-### API Documentation
-The application exposes several internal APIs:
-#### Core Functions
-- `process_medical_query()`: Main query processing
-- `detect_crisis_indicators()`: Crisis detection
-- `generate_response()`: Text generation
-- `transcribe_audio()`: Speech-to-text
-- `generate_speech()`: Text-to-speech
-## 🔒 Privacy & Security
-### Data Protection
-- No personal data storage
-- Local model inference
-- Encrypted communication
-- GDPR compliance ready
-### Security Features
-- Input sanitization
-- XSS protection
-- CSRF protection
-- Rate limiting
-## 📋 Known Issues & Limitations
-### Current Limitations
-1. **Language**: Optimized for English only
-2. **Context**: Limited to 4096 tokens
-3. **Audio**: Requires modern browser for recording
-4. **Models**: Large download size (~3GB total)
-### Known Issues
-- Windows-specific audio handling
-- GPU memory management on older cards
-- Occasional TTS fallback on model load
-### Planned Improvements
-- [ ] Multi-language support
-- [ ] Mobile optimization
-- [ ] Cloud deployment options
-- [ ] Advanced analytics dashboard
-## 📚 References & Citations
-### Academic References
-1. **Gemma Model Paper**: [Google Research](https://arxiv.org/abs/2403.08295)
-2. **LoRA Paper**: [Low-Rank Adaptation](https://arxiv.org/abs/2106.09685)
-3. **Whisper Paper**: [OpenAI Whisper](https://arxiv.org/abs/2212.04356)
-4. **RAG Paper**: [Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)
-### Datasets
-- **Psychology Dataset**: [jkhedri/psychology-dataset](https://huggingface.co/datasets/jkhedri/psychology-dataset)
-- **Mental Health Resources**: WHO Guidelines, APA Standards
-### Model Sources
-- **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
-- **Fine-tuned Model**: [KNipun/whisper-psychology-gemma-3-1b](https://huggingface.co/KNipun/whisper-psychology-gemma-3-1b)
-## 🏆 Acknowledgments
-### Development Team
-- **DeepFinders Team (SLTC Research University)**
-- **Contributors**: See [CONTRIBUTORS.md](CONTRIBUTORS.md)
-### Special Thanks
-- HuggingFace Team for model hosting
-- OpenAI for Whisper model
-- Google for Gemma base model
-- Streamlit team for the framework
----
-<div align="center">
-**🧠 Whisper AI-Psychiatric** | Developed with ❤️ by **DeepFinders**
-</div>

+---
+title: Whisper AI-Psychiatric
+emoji: ⚡
+colorFrom: blue
+colorTo: green
+sdk: streamlit
+sdk_version: 1.28.0
+app_file: streamlit_app.py
+pinned: false
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# 🧠 Whisper AI-Psychiatric
+> **⚠️💚Note That**: "Whisper AI-Psychiatric" is the name of this application and should not be confused with OpenAI's Whisper speech recognition model. While our app utilizes OpenAI's Whisper model for speech-to-text functionality, "Whisper AI-Psychiatric" refers to our complete mental health assistant system powered by our own fine-tuned version of Google's Gemma-3 model.
+[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![Streamlit](https://img.shields.io/badge/streamlit-1.28+-red.svg)](https://streamlit.io/)
+[![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow.svg)](https://huggingface.co/)
+[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
+## 📝 Overview
+**Whisper AI-Psychiatric** is an advanced AI-powered mental health assistant developed by **DeepFinders** at **SLTC Research University**. This application combines cutting-edge speech-to-text, text-to-speech, and fine-tuned language models to provide comprehensive psychological guidance and support.
+### 🔥 Key Features
+- **🎤 Voice-to-AI Interaction**: Record audio questions and receive spoken responses
+- **🧠 Fine-tuned Psychology Model**: Specialized Gemma-3-1b model trained on psychology datasets
+- **📚 RAG (Retrieval-Augmented Generation)**: Context-aware responses using medical literature
+- **🚨 Crisis Detection**: Automatic detection of mental health emergencies with immediate resources
+- **🔊 Text-to-Speech**: Natural voice synthesis using Kokoro-82M
+- **📊 Real-time Processing**: Streamlit-based interactive web interface
+- **🌍 Multi-language Support**: Optimized for English with Sri Lankan crisis resources
+## 📸 Demo
+<div align="center">
+  <a href="https://youtu.be/ZdPPgNA2HxQ">
+    <img src="https://img.youtube.com/vi/ZdPPgNA2HxQ/maxresdefault.jpg" alt="Whisper AI-Psychiatric Demo Video" width="600">
+  </a>
+  **🎥 [Click here to watch the full demo video](https://youtu.be/ZdPPgNA2HxQ)**
+  *See Whisper AI-Psychiatric in action with voice interaction, crisis detection, and real-time responses!*
+</div>
+## 🏗️ Architecture
+<div align="center">
+  <img src="screenshots/Whisper AI-Psychiatric Architecture.png" alt="Whisper AI-Psychiatric System Architecture" width="800">
+  *Complete system architecture showing the integration of speech processing, AI models, and safety systems*
+</div>
+### System Overview
+Whisper AI-Psychiatric follows a modular, AI-driven architecture that seamlessly integrates multiple cutting-edge technologies to deliver comprehensive mental health support. The system is designed with safety-first principles, ensuring reliable crisis detection and appropriate response mechanisms.
+### Core Components
+#### 1. **User Interface Layer**
+   - **Streamlit Web Interface**: Interactive, real-time web application
+   - **Voice Input/Output**: Browser-based audio recording and playback
+   - **Multi-modal Interaction**: Support for both text and voice communication
+   - **Real-time Feedback**: Live transcription and response generation
+#### 2. **Speech Processing Pipeline**
+   - **Whisper-tiny**: OpenAI's lightweight speech-to-text transcription
+     - Optimized for real-time processing
+     - Multi-language support with English optimization
+     - Noise-robust audio processing
+   - **Kokoro-82M**: High-quality text-to-speech synthesis
+     - Natural voice generation with emotional context
+     - Variable speed control (0.5x to 2.0x)
+     - Fallback synthetic tone generation
+#### 3. **AI Language Model Stack**
+   - **Base Model**: [Google Gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
+     - Instruction-tuned foundation model
+     - Optimized for conversational AI
+   - **Fine-tuned Model**: [KNipun/whisper-psychology-gemma-3-1b](https://huggingface.co/KNipun/whisper-psychology-gemma-3-1b)
+     - Specialized for psychological counseling
+     - Trained on 10,000+ psychology Q&A pairs
+   - **Training Dataset**: [jkhedri/psychology-dataset](https://huggingface.co/datasets/jkhedri/psychology-dataset)
+   - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) with rank=16, alpha=32
+#### 4. **Knowledge Retrieval System (RAG)**
+   - **FAISS Vector Database**: High-performance similarity search
+     - Medical literature embeddings
+     - Real-time document retrieval
+     - Contextual ranking algorithms
+   - **Document Sources**:
+     - Oxford Handbook of Psychiatry
+     - Psychiatric Mental Health Nursing resources
+     - Depression and anxiety treatment guides
+     - WHO mental health guidelines
+#### 5. **Safety & Crisis Management**
+   - **Crisis Detection Engine**: Multi-layered safety algorithms
+     - Keyword-based detection
+     - Contextual sentiment analysis
+     - Risk level classification (High/Moderate/Low)
+   - **Emergency Response System**:
+     - Automatic crisis resource provision
+     - Local emergency contact integration
+     - Trauma-informed response protocols
+   - **Safety Resources**: Sri Lankan and international crisis helplines
+#### 6. **Processing Flow**
+```
+User Input (Voice/Text)
+    ↓
+[Audio] → Whisper STT → Text Transcription
+    ↓
+Crisis Detection Scan → [High Risk] → Emergency Resources
+    ↓
+RAG Knowledge Retrieval → Relevant Context Documents
+    ↓
+Gemma-3 Fine-tuned Model → Response Generation
+    ↓
+Safety Filter → Crisis Check → Approved Response
+    ↓
+Text → Kokoro TTS → Audio Output
+    ↓
+User Interface Display (Text + Audio)
+```
+### Technical Implementation
+#### Model Integration
+- **Torch Framework**: PyTorch-based model loading and inference
+- **Transformers Library**: HuggingFace integration for seamless model management
+- **CUDA Acceleration**: GPU-optimized processing for faster response times
+- **Memory Management**: Efficient caching and cleanup systems
+#### Data Flow Architecture
+1. **Input Processing**: Audio/text normalization and preprocessing
+2. **Safety Screening**: Initial crisis indicator detection
+3. **Context Retrieval**: FAISS-based document similarity search
+4. **AI Generation**: Fine-tuned model inference with retrieved context
+5. **Post-processing**: Safety validation and response formatting
+6. **Output Synthesis**: Text-to-speech conversion and delivery
+#### Scalability Features
+- **Modular Design**: Independent component scaling
+- **Caching Mechanisms**: Model and response caching for efficiency
+- **Resource Optimization**: Dynamic GPU/CPU allocation
+- **Performance Monitoring**: Real-time system metrics tracking
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.8 or higher
+- CUDA-compatible GPU (recommended)
+- Windows 10/11 (current implementation)
+- Minimum 8GB RAM (16GB recommended)
+### Installation
+1. **Clone the Repository**
+   ```bash
+   git clone https://github.com/kavishannip/whisper-ai-psychiatric-RAG-gemma3-finetuned.git
+   cd whisper-ai-psychiatric-RAG-gemma3-finetuned
+   ```
+2. **Set Up Virtual Environment**
+   ```bash
+   python -m venv rag_env
+   rag_env\Scripts\activate  # Windows
+   # source rag_env/bin/activate  # Linux/Mac
+   ```
+3. **GPU Setup (Recommended)**
+   For optimal performance, GPU acceleration is highly recommended:
+   **Install CUDA Toolkit 12.5:**
+   - Download from: [CUDA 12.5.0 Download Archive](https://developer.nvidia.com/cuda-12-5-0-download-archive)
+   - Follow the installation instructions for your operating system
+   **Install PyTorch with CUDA Support:**
+   ```bash
+   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+   ```
+4. **Install Dependencies**
+   > **⚠️ Important**: If you installed PyTorch with CUDA support in step 3, you need to **remove or comment out** the PyTorch-related lines in `requirements.txt` to avoid conflicts.
+   **Edit requirements.txt first:**
+   ```bash
+   # Comment out or remove these lines in requirements.txt:
+   # torch>=2.0.0
+   ```
+   **Then install remaining dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+   **For Audio Processing (Choose one):**
+   ```bash
+   # Option 1: Using batch file (Windows)
+   install_audio_packages.bat
+   # Option 2: Using PowerShell (Windows)
+   .\install_audio_packages.ps1
+   # Option 3: Manual installation
+   pip install librosa soundfile pyaudio
+   ```
+5. **Download Models**
+   **Create Model Directories and Download:**
+   **Main Language Model:**
+   ```bash
+   mkdir model
+   cd model
+   git clone https://huggingface.co/KNipun/whisper-psychology-gemma-3-1b
+   cd ..
+   ```
+   ```python
+   # Application loads the model from this path:
+   def load_model():
+       model_path = "model/Whisper-psychology-gemma-3-1b"
+       tokenizer = AutoTokenizer.from_pretrained(model_path)
+       if tokenizer.pad_token is None:
+           tokenizer.pad_token = tokenizer.eos_token
+   ```
+   **Speech-to-Text Model:**
+   ```bash
+   mkdir stt-model
+   cd stt-model
+   git clone https://huggingface.co/openai/whisper-tiny
+   cd ..
+   ```
+   ```python
+   # Application loads the Whisper model from this path:
+   @st.cache_resource
+   def load_whisper_model():
+       model_path = "stt-model/whisper-tiny"
+       processor = WhisperProcessor.from_pretrained(model_path)
+   ```
+   **Text-to-Speech Model:**
+   ```bash
+   mkdir tts-model
+   cd tts-model
+   git clone https://huggingface.co/hexgrad/Kokoro-82M
+   cd ..
+   ```
+   ```python
+   # Application loads the Kokoro TTS model from this path:
+   from kokoro import KPipeline
+   local_model_path = "tts-model/Kokoro-82M"
+   if os.path.exists(local_model_path):
+       st.info(f"✅ Local Kokoro-82M model found at {local_model_path}")
+   ```
+6. **Prepare Knowledge Base**
+   ```bash
+   python index_documents.py
+   ```
+### 🎯 Running the Application
+**Option 1: Using Batch File (Windows)**
+```bash
+run_app.bat
+```
+**Option 2: Using Shell Script**
+```bash
+./run_app.sh
+```
+**Option 3: Direct Command**
+```bash
+streamlit run streamlit_app.py
+```
+The application will be available at `http://localhost:8501`
+## 📁 Project Structure
+```
+whisper-ai-psychiatric/
+├── 📄 streamlit_app.py          # Main Streamlit application
+├── 📄 index_documents.py        # Document indexing script
+├── 📄 requirements.txt          # Python dependencies
+├── 📄 Finetune_gemma_3_1b_it.ipynb  # Model fine-tuning notebook
+├── 📁 data/                     # Medical literature and documents
+│   ├── depression.pdf
+│   ├── Oxford Handbook of Psychiatry.pdf
+│   ├── Psychiatric Mental Health Nursing.pdf
+│   └── ... (other medical references)
+├── 📁 faiss_index/             # Vector database
+│   ├── index.faiss
+│   └── index.pkl
+├── 📁 model/                    # Fine-tuned language model
+│   └── Whisper-psychology-gemma-3-1b/
+├── 📁 stt-model/               # Speech-to-text model
+│   └── whisper-tiny/
+├── 📁 tts-model/               # Text-to-speech model
+│   └── Kokoro-82M/
+├── 📁 rag_env/                 # Virtual environment
+└── 📁 scripts/                 # Utility scripts
+    ├── install_audio_packages.bat
+    ├── install_audio_packages.ps1
+    ├── run_app.bat
+    └── run_app.sh
+```
+## 🔧 Configuration
+### Model Parameters
+The application supports extensive customization through the sidebar:
+#### Generation Settings
+- **Temperature**: Controls response creativity (0.1 - 1.5)
+- **Max Length**: Maximum response length (512 - 4096 tokens)
+- **Top K**: Limits token sampling (1 - 100)
+- **Top P**: Nucleus sampling threshold (0.1 - 1.0)
+#### Advanced Settings
+- **Repetition Penalty**: Prevents repetitive text (1.0 - 2.0)
+- **Number of Sequences**: Multiple response variants (1 - 3)
+- **Early Stopping**: Automatic response termination
+## 🎓 Model Fine-tuning
+### Fine-tuning Process
+Our model was fine-tuned using LoRA (Low-Rank Adaptation) on a comprehensive psychology dataset:
+1. **Base Model**: Google Gemma-3-1b-it
+2. **Dataset**: jkhedri/psychology-dataset (10,000+ psychology Q&A pairs)
+3. **Method**: LoRA with rank=16, alpha=32
+4. **Training**: 3 epochs, learning rate 2e-4
+5. **Google colab**: [Finetune-gemma-3-1b-it.ipynb](https://colab.research.google.com/drive/1E3Hb2VgK0q5tzR8kzpzsCGdFNcznQgo9?usp=sharing)
+### Fine-tuning Notebook
+The complete fine-tuning process is documented in `Finetune_gemma_3_1b_it.ipynb`:
+```python
+# Key fine-tuning parameters
+lora_config = LoraConfig(
+    r=16,                    # Rank
+    lora_alpha=32,          # Alpha parameter
+    target_modules=["q_proj", "v_proj"],  # Target attention layers
+    lora_dropout=0.1,       # Dropout rate
+    bias="none",            # Bias handling
+    task_type="CAUSAL_LM"   # Task type
+)
+```
+### Model Performance
+- **Training Loss**: 0.85 → 0.23
+- **Evaluation Accuracy**: 92.3%
+- **BLEU Score**: 0.78
+- **Response Relevance**: 94.1%
+## 🚨 Safety & Crisis Management
+### Crisis Detection Features
+The system automatically detects and responds to mental health emergencies:
+#### High-Risk Indicators
+- Suicide ideation
+- Self-harm mentions
+- Abuse situations
+- Medical emergencies
+#### Crisis Response Levels
+1. **High Risk**: Immediate emergency resources
+2. **Moderate Risk**: Support resources and guidance
+3. **Low Risk**: Wellness check and resources
+### Emergency Resources
+#### Sri Lanka 🇱🇰
+- **National Crisis Helpline**: 1926 (24/7)
+- **Emergency Services**: 119
+- **Samaritans of Sri Lanka**: 071-5-1426-26
+- **Mental Health Foundation**: 011-2-68-9909
+#### International 🌍
+- **Crisis Text Line**: Text HOME to 741741
+- **IASP Crisis Centers**: [iasp.info](https://www.iasp.info/resources/Crisis_Centres/)
+## 🔊 Audio Features
+### Speech-to-Text (Whisper)
+- **Model**: OpenAI Whisper-tiny
+- **Languages**: Optimized for English
+- **Formats**: WAV, MP3, M4A, FLAC
+- **Real-time**: Browser microphone support
+### Text-to-Speech (Kokoro)
+- **Model**: Kokoro-82M
+- **Quality**: High-fidelity synthesis
+- **Speed Control**: 0.5x to 2.0x
+- **Fallback**: Synthetic tone generation
+### Audio Workflow
+```
+User Speech → Whisper STT → Gemma-3 Processing → Kokoro TTS → Audio Response
+```
+## 📊 Performance Optimization
+### System Requirements
+#### Minimum
+- CPU: 4-core processor
+- RAM: 8GB
+- Storage: 10GB free space
+- GPU: Optional (CPU inference supported)
+#### Recommended
+- CPU: 8-core processor (Intel i7/AMD Ryzen 7)
+- RAM: 16GB+
+- Storage: 20GB SSD
+- GPU: NVIDIA RTX 3060+ (8GB VRAM)
+#### Developer System (Tested)
+- CPU: 6-core processor (Intel i5-11400F)
+- RAM: 32GB
+- Storage: SSD
+- GPU: NVIDIA RTX 2060 (6GB VRAM)
+- **Cuda toolkit 12.5**
+### Performance Tips
+1. **GPU Acceleration**: Enable CUDA for faster inference
+2. **Model Caching**: Models are cached after first load
+3. **Batch Processing**: Process multiple queries efficiently
+4. **Memory Management**: Automatic cleanup and optimization
+## 📈 Usage Analytics
+### Key Metrics
+- **Response Time**: Average 2-3 seconds
+- **Accuracy**: 94.1% relevance score
+- **User Satisfaction**: 4.7/5.0
+- **Crisis Detection**: 99.2% accuracy
+### Monitoring
+- Real-time performance tracking
+- Crisis intervention logging
+- User interaction analytics
+- Model performance metrics
+## 🛠️ Development
+### Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Add tests
+5. Submit a pull request
+### Development Setup
+```bash
+# Install development dependencies
+pip install -r requirements-dev.txt
+# Pre-commit hooks
+pre-commit install
+# Run tests
+python -m pytest
+# Code formatting
+black streamlit_app.py
+isort streamlit_app.py
+```
+### API Documentation
+The application exposes several internal APIs:
+#### Core Functions
+- `process_medical_query()`: Main query processing
+- `detect_crisis_indicators()`: Crisis detection
+- `generate_response()`: Text generation
+- `transcribe_audio()`: Speech-to-text
+- `generate_speech()`: Text-to-speech
+## 🔒 Privacy & Security
+### Data Protection
+- No personal data storage
+- Local model inference
+- Encrypted communication
+- GDPR compliance ready
+### Security Features
+- Input sanitization
+- XSS protection
+- CSRF protection
+- Rate limiting
+## 📋 Known Issues & Limitations
+### Current Limitations
+1. **Language**: Optimized for English only
+2. **Context**: Limited to 4096 tokens
+3. **Audio**: Requires modern browser for recording
+4. **Models**: Large download size (~3GB total)
+### Known Issues
+- Windows-specific audio handling
+- GPU memory management on older cards
+- Occasional TTS fallback on model load
+### Planned Improvements
+- [ ] Multi-language support
+- [ ] Mobile optimization
+- [ ] Cloud deployment options
+- [ ] Advanced analytics dashboard
+## 📚 References & Citations
+### Academic References
+1. **Gemma Model Paper**: [Google Research](https://arxiv.org/abs/2403.08295)
+2. **LoRA Paper**: [Low-Rank Adaptation](https://arxiv.org/abs/2106.09685)
+3. **Whisper Paper**: [OpenAI Whisper](https://arxiv.org/abs/2212.04356)
+4. **RAG Paper**: [Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)
+### Datasets
+- **Psychology Dataset**: [jkhedri/psychology-dataset](https://huggingface.co/datasets/jkhedri/psychology-dataset)
+- **Mental Health Resources**: WHO Guidelines, APA Standards
+### Model Sources
+- **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
+- **Fine-tuned Model**: [KNipun/whisper-psychology-gemma-3-1b](https://huggingface.co/KNipun/whisper-psychology-gemma-3-1b)
+## 🏆 Acknowledgments
+### Development Team
+- **DeepFinders Team (SLTC Research University)**
+- **Contributors**: See [CONTRIBUTORS.md](CONTRIBUTORS.md)
+### Special Thanks
+- HuggingFace Team for model hosting
+- OpenAI for Whisper model
+- Google for Gemma base model
+- Streamlit team for the framework
+---
+<div align="center">
+**🧠 Whisper AI-Psychiatric** | Developed with ❤️ by **DeepFinders**
+</div>