File size: 3,053 Bytes
cf0bb06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# Voice-to-AI Workflow Documentation

## 🎤➡️🤖 Complete Voice-to-AI Pipeline

### Current Workflow:

```

1. 🎤 User speaks into microphone/uploads audio file


2. 🔄 Audio gets processed by Whisper-tiny model


3. 📝 Speech is transcribed to English text


4. 🧠 Text is sent to your main model: "model/Whisper-psychology-gemma-3-1b"


5. 🔍 FAISS searches relevant documents for context


6. 💬 Main model generates psychological response


7. 📺 Response is displayed in chat


8. 🔊 (Optional) Response can be converted to speech via TTS

```

### Technical Implementation:

#### Step 1-3: Speech-to-Text
```python

# Audio processing with Whisper-tiny

transcribed_text = transcribe_audio(

    audio_bytes, 

    st.session_state.whisper_model,     # whisper-tiny model

    st.session_state.whisper_processor

)

```

#### Step 4-6: AI Processing  
```python

# Main model processing

answer, sources, metadata = process_medical_query(

    transcribed_text,                    # Your speech as text

    st.session_state.faiss_index,       # Document search

    st.session_state.embedding_model,

    st.session_state.optimal_docs,

    st.session_state.model,             # YOUR MAIN MODEL HERE

    st.session_state.tokenizer,         # model/Whisper-psychology-gemma-3-1b

    **generation_params

)

```

#### Step 7-8: Response Display
```python

# Add to chat and optionally convert to speech

st.session_state.messages.append({

    "role": "assistant", 

    "content": answer,      # Response from your main model

    "sources": sources,

    "metadata": metadata

})

```

### Models Used:

1. **Speech-to-Text**: `stt-model/whisper-tiny/`
   - Converts your voice to English text
   - Language: English only (forced)

2. **Main AI Model**: `model/Whisper-psychology-gemma-3-1b/`**YOUR MODEL**
   - Processes the transcribed text
   - Generates psychological responses
   - Uses RAG with FAISS for context

3. **Text-to-Speech**: `tts-model/Kokoro-82M/`
   - Converts AI response back to speech
   - Currently uses placeholder implementation

4. **Document Search**: `faiss_index/`
   - Provides context for better responses

### Usage:

1. **Click the microphone button** 🎤
2. **Speak your mental health question**
3. **Click "🔄 Transcribe Audio"**
4. **Watch the complete pipeline work automatically:**
   - Your speech → Text
   - Text → Your AI model
   - AI response → Chat
   - Optional: Response → Speech

### What happens when you transcribe:**Immediate automatic processing** - No manual steps needed!
✅ **Your speech text goes directly to your main model****Full psychiatric AI response is generated****Complete conversation appears in chat****Optional TTS for audio response**

The system now automatically sends your transcribed speech to your `model/Whisper-psychology-gemma-3-1b` model and gets a full AI response without any additional steps!