Gamahea commited on
Commit
5912922
·
1 Parent(s): 17f5813

Initialize dropdowns with data on app load - Populate training dataset dropdown with prepared datasets on startup - Initialize LoRA dropdowns with available LoRAs - Load LoRA list table with existing data - Populate export dataset dropdown - Fixes 'No prepared datasets available' when datasets exist

Browse files
Files changed (2) hide show
  1. HF_COLLECTION_INTEGRATION.md +211 -0
  2. app.py +78 -4
HF_COLLECTION_INTEGRATION.md ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HuggingFace Collection Integration - Complete
2
+
3
+ ## 🎯 Overview
4
+
5
+ Full integration with HuggingFace Collection for LEMM LoRAs and datasets, including automatic syncing, import/export, and name conflict resolution.
6
+
7
+ ## ✅ Implemented Features
8
+
9
+ ### 1. **Dataset Import** (`import_prepared_dataset`)
10
+ - **Location**: `backend/services/dataset_service.py`
11
+ - **Purpose**: Import prepared datasets from ZIP files
12
+ - **Features**:
13
+ - Supports both root-level and subfolder `dataset_info.json` structures
14
+ - Automatic name conflict resolution with numeric suffixes (`_1`, `_2`, etc.)
15
+ - Validates dataset structure before import
16
+ - Updates metadata with new dataset key if renamed
17
+
18
+ ```python
19
+ # Example usage in app.py
20
+ def import_dataset(zip_file):
21
+ dataset_service = DatasetService()
22
+ dataset_key = dataset_service.import_prepared_dataset(zip_file)
23
+ return f"✅ Imported dataset: {dataset_key}"
24
+ ```
25
+
26
+ ### 2. **LoRA Collection Sync** (`sync_on_startup`)
27
+ - **Location**: `backend/services/hf_storage_service.py`
28
+ - **Purpose**: Automatically download missing LoRAs from HF collection on app startup
29
+ - **Features**:
30
+ - Lists all LoRAs in collection
31
+ - Compares with local LoRA directory
32
+ - Downloads only missing LoRAs
33
+ - Handles name conflicts with numeric suffixes
34
+ - Logs sync activity
35
+
36
+ ```python
37
+ # Called automatically on app startup (app.py line 82)
38
+ hf_storage = HFStorageService(username="Gamahea", collection_slug="lemm-100-pre-beta")
39
+ sync_result = hf_storage.sync_on_startup(loras_dir=Path("models/loras"))
40
+ ```
41
+
42
+ ### 3. **Enhanced LoRA Upload**
43
+ - **Location**: `app.py` - `start_lora_training()` function
44
+ - **Purpose**: Upload trained LoRAs to HF collection with full metadata
45
+ - **Features**:
46
+ - Uploads LoRA to individual model repo
47
+ - Adds to collection automatically
48
+ - Includes training config in metadata
49
+ - Returns repo URL and collection link
50
+ - Graceful error handling (saves locally if upload fails)
51
+
52
+ ```python
53
+ # Upload after training (app.py lines 1397-1411)
54
+ upload_result = hf_storage.upload_lora(lora_dir, training_config=config)
55
+ if upload_result and 'repo_id' in upload_result:
56
+ # Success - show URLs
57
+ progress += f"\n✅ LoRA uploaded successfully!"
58
+ progress += f"\n🔗 Model: {upload_result['repo_id']}"
59
+ progress += f"\n📚 Collection: https://huggingface.co/collections/Gamahea/lemm-100-pre-beta"
60
+ ```
61
+
62
+ ## 📦 Name Conflict Resolution
63
+
64
+ All import functions implement automatic name conflict resolution:
65
+
66
+ 1. **First Check**: Try original name
67
+ 2. **If Exists**: Append `_1`, `_2`, `_3`, etc.
68
+ 3. **Update Metadata**: Store new name in `dataset_info.json` or `metadata.json`
69
+ 4. **Log Action**: Inform user of renaming
70
+
71
+ ### Example Flow
72
+
73
+ ```
74
+ Original: my_dataset
75
+ Already exists → my_dataset_1
76
+ Already exists → my_dataset_2
77
+ Available → Use my_dataset_2 ✅
78
+ ```
79
+
80
+ ## 🔄 Automatic Workflows
81
+
82
+ ### On App Startup
83
+ 1. Check HF collection for LoRAs
84
+ 2. Compare with local `models/loras/` directory
85
+ 3. Download any missing LoRAs
86
+ 4. Log sync results
87
+
88
+ ### After LoRA Training
89
+ 1. Train LoRA adapter locally
90
+ 2. Upload to HF as individual model repo
91
+ 3. Add to collection
92
+ 4. Return URLs for viewing
93
+
94
+ ### Dataset Import
95
+ 1. User uploads ZIP file
96
+ 2. Extract and validate structure
97
+ 3. Check for name conflicts
98
+ 4. Copy to `training_data/` directory
99
+ 5. Update dropdown lists
100
+
101
+ ## 🛠️ Technical Details
102
+
103
+ ### File Structure Support
104
+
105
+ **LoRA ZIP Files** (both supported):
106
+ ```
107
+ Option 1 (root):
108
+ my_lora.zip/
109
+ ├── metadata.json
110
+ ├── adapter_config.json
111
+ └── adapter_model.safetensors
112
+
113
+ Option 2 (subfolder):
114
+ my_lora.zip/
115
+ └── my_lora/
116
+ ├── metadata.json
117
+ ├── adapter_config.json
118
+ └── adapter_model.safetensors
119
+ ```
120
+
121
+ **Dataset ZIP Files** (both supported):
122
+ ```
123
+ Option 1 (root):
124
+ my_dataset.zip/
125
+ ├── dataset_info.json
126
+ ├── audio/
127
+ │ ├── sample_000001.wav
128
+ │ └── sample_000002.wav
129
+ └── splits.json
130
+
131
+ Option 2 (subfolder):
132
+ my_dataset.zip/
133
+ └── my_dataset/
134
+ ├── dataset_info.json
135
+ ├── audio/
136
+ └── splits.json
137
+ ```
138
+
139
+ ### Error Handling
140
+
141
+ All import/sync functions include:
142
+ - **Try-catch blocks** for graceful error handling
143
+ - **Comprehensive logging** with context
144
+ - **User-friendly error messages**
145
+ - **Fallback behavior** (e.g., save locally if upload fails)
146
+
147
+ ## 📊 HuggingFace Collection Structure
148
+
149
+ **Collection**: `Gamahea/lemm-100-pre-beta`
150
+ - **Purpose**: Organize all LEMM LoRA adapters
151
+ - **Visibility**: Public
152
+ - **Items**: Individual model repos
153
+
154
+ **Model Repos**: `Gamahea/lemm-lora-{name}`
155
+ - **Type**: LoRA adapters (safetensors)
156
+ - **Metadata**: Training config, dataset info, creation date
157
+ - **Files**: adapter_model.safetensors, adapter_config.json, metadata.json
158
+
159
+ ## 🎯 User Workflows
160
+
161
+ ### Train & Share a LoRA
162
+ 1. Prepare dataset (curated or user audio)
163
+ 2. Configure training parameters
164
+ 3. Click "Start Training"
165
+ 4. Wait for completion
166
+ 5. LoRA automatically uploaded to HF collection
167
+ 6. Share collection link with others
168
+
169
+ ### Use Someone's LoRA
170
+ 1. Open LEMM Space
171
+ 2. App automatically syncs LoRAs from collection
172
+ 3. Select LoRA in generation dropdown
173
+ 4. Generate music with custom style
174
+
175
+ ### Import a Dataset
176
+ 1. Export dataset from another LEMM instance
177
+ 2. Click "Import Dataset" in training tab
178
+ 3. Upload ZIP file
179
+ 4. Dataset appears in training dropdown
180
+ 5. Use for LoRA training
181
+
182
+ ## 🔗 Related Files
183
+
184
+ - **HF Storage Service**: [backend/services/hf_storage_service.py](backend/services/hf_storage_service.py)
185
+ - **Dataset Service**: [backend/services/dataset_service.py](backend/services/dataset_service.py)
186
+ - **Main App**: [app.py](app.py)
187
+ - **LoRA Training Service**: [backend/services/lora_training_service.py](backend/services/lora_training_service.py)
188
+
189
+ ## 📝 Commit History
190
+
191
+ - **17f5813** (latest): Add dataset import & LoRA collection sync
192
+ - `import_prepared_dataset()` method
193
+ - `sync_on_startup()` method
194
+ - Enhanced `upload_lora()` with training_config
195
+ - Numeric suffix naming for conflicts
196
+
197
+ - **f65e448**: Fixed LoRA import to support both ZIP structures
198
+ - **2f0c8b4**: Added "Load for Training" workflow
199
+ - **b40ee5f**: Fixed DataFrame handling in dataset preparation
200
+
201
+ ## 🎉 Result
202
+
203
+ **Complete HuggingFace ecosystem integration!**
204
+ - ✅ Auto-sync LoRAs from collection
205
+ - ✅ Upload trained LoRAs to collection
206
+ - ✅ Import/export datasets
207
+ - ✅ Name conflict resolution
208
+ - ✅ Comprehensive error handling
209
+ - ✅ User-friendly feedback
210
+
211
+ All three issues from screenshots are now resolved! 🚀
app.py CHANGED
@@ -2173,9 +2173,27 @@ with gr.Blocks(
2173
  gr.Markdown("---")
2174
  gr.Markdown("### 📤 Dataset Import/Export")
2175
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2176
  with gr.Row():
2177
  dataset_to_export = gr.Dropdown(
2178
- choices=[],
2179
  label="Select Dataset to Export",
2180
  info="Download prepared datasets"
2181
  )
@@ -2201,8 +2219,31 @@ with gr.Blocks(
2201
  info="Unique name for this LoRA adapter"
2202
  )
2203
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2204
  selected_dataset = gr.Dropdown(
2205
- choices=[],
2206
  label="Training Dataset",
2207
  info="Select prepared dataset to train on"
2208
  )
@@ -2217,8 +2258,19 @@ with gr.Blocks(
2217
  info="Start from a pre-trained LoRA adapter instead of from scratch"
2218
  )
2219
 
 
 
 
 
 
 
 
 
 
 
 
2220
  base_lora_adapter = gr.Dropdown(
2221
- choices=[],
2222
  label="Base LoRA Adapter",
2223
  info="Select LoRA to continue training from",
2224
  visible=False
@@ -2309,7 +2361,29 @@ with gr.Blocks(
2309
  gr.Markdown("---")
2310
  gr.Markdown("### Installed LoRA Adapters")
2311
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2312
  lora_list = gr.Dataframe(
 
2313
  headers=["Name", "Created", "Training Steps", "Type"],
2314
  datatype=["str", "str", "number", "str"],
2315
  row_count=10,
@@ -2323,7 +2397,7 @@ with gr.Blocks(
2323
  gr.Markdown("### Actions on Selected LoRA")
2324
 
2325
  selected_lora_for_action = gr.Dropdown(
2326
- choices=[],
2327
  label="Select LoRA Adapter",
2328
  info="Choose a LoRA to download or delete"
2329
  )
 
2173
  gr.Markdown("---")
2174
  gr.Markdown("### 📤 Dataset Import/Export")
2175
 
2176
+ # Initialize export dataset dropdown
2177
+ def get_initial_export_datasets():
2178
+ try:
2179
+ from backend.services.dataset_service import DatasetService
2180
+ dataset_service = DatasetService()
2181
+ all_datasets = dataset_service.get_all_available_datasets()
2182
+
2183
+ # Filter to only prepared datasets
2184
+ prepared = []
2185
+ for key, info in all_datasets.items():
2186
+ if info.get('prepared', False):
2187
+ prepared.append(key)
2188
+
2189
+ return prepared if prepared else []
2190
+ except Exception as e:
2191
+ logger.error(f"Failed to load initial export datasets: {e}")
2192
+ return []
2193
+
2194
  with gr.Row():
2195
  dataset_to_export = gr.Dropdown(
2196
+ choices=get_initial_export_datasets(),
2197
  label="Select Dataset to Export",
2198
  info="Download prepared datasets"
2199
  )
 
2219
  info="Unique name for this LoRA adapter"
2220
  )
2221
 
2222
+ # Initialize dataset dropdown with prepared datasets
2223
+ def get_initial_datasets():
2224
+ try:
2225
+ from backend.services.dataset_service import DatasetService
2226
+ dataset_service = DatasetService()
2227
+ all_datasets = dataset_service.get_all_available_datasets()
2228
+
2229
+ # Filter to only prepared datasets
2230
+ prepared_datasets = []
2231
+ for key, info in all_datasets.items():
2232
+ if info.get('prepared'):
2233
+ num_samples = info.get('num_train_samples', 0) + info.get('num_val_samples', 0)
2234
+ display_name = f"{key} ({num_samples} samples)"
2235
+ prepared_datasets.append(display_name)
2236
+
2237
+ if not prepared_datasets:
2238
+ prepared_datasets = ["No prepared datasets available"]
2239
+
2240
+ return prepared_datasets
2241
+ except Exception as e:
2242
+ logger.error(f"Failed to load initial datasets: {e}")
2243
+ return ["No prepared datasets available"]
2244
+
2245
  selected_dataset = gr.Dropdown(
2246
+ choices=get_initial_datasets(),
2247
  label="Training Dataset",
2248
  info="Select prepared dataset to train on"
2249
  )
 
2258
  info="Start from a pre-trained LoRA adapter instead of from scratch"
2259
  )
2260
 
2261
+ # Initialize LoRA dropdown with available LoRAs
2262
+ def get_initial_loras():
2263
+ try:
2264
+ from backend.services.lora_training_service import LoRATrainingService
2265
+ lora_service = LoRATrainingService()
2266
+ adapters = lora_service.list_lora_adapters()
2267
+ return [adapter.get('name', '') for adapter in adapters]
2268
+ except Exception as e:
2269
+ logger.error(f"Failed to load initial LoRAs: {e}")
2270
+ return []
2271
+
2272
  base_lora_adapter = gr.Dropdown(
2273
+ choices=get_initial_loras(),
2274
  label="Base LoRA Adapter",
2275
  info="Select LoRA to continue training from",
2276
  visible=False
 
2361
  gr.Markdown("---")
2362
  gr.Markdown("### Installed LoRA Adapters")
2363
 
2364
+ # Initialize LoRA list with data
2365
+ def get_initial_lora_table():
2366
+ try:
2367
+ from backend.services.lora_training_service import LoRATrainingService
2368
+ lora_service = LoRATrainingService()
2369
+ adapters = lora_service.list_lora_adapters()
2370
+
2371
+ table_data = []
2372
+ for adapter in adapters:
2373
+ table_data.append([
2374
+ adapter.get('name', ''),
2375
+ adapter.get('saved_at', ''),
2376
+ adapter.get('training_steps', 0),
2377
+ adapter.get('training_type', 'unknown')
2378
+ ])
2379
+
2380
+ return table_data
2381
+ except Exception as e:
2382
+ logger.error(f"Failed to load initial LoRA list: {e}")
2383
+ return []
2384
+
2385
  lora_list = gr.Dataframe(
2386
+ value=get_initial_lora_table(),
2387
  headers=["Name", "Created", "Training Steps", "Type"],
2388
  datatype=["str", "str", "number", "str"],
2389
  row_count=10,
 
2397
  gr.Markdown("### Actions on Selected LoRA")
2398
 
2399
  selected_lora_for_action = gr.Dropdown(
2400
+ choices=get_initial_loras(),
2401
  label="Select LoRA Adapter",
2402
  info="Choose a LoRA to download or delete"
2403
  )