Spaces:

Gamahea
/

lemm-test-100

Running on Zero

App Files Files Community

Gamahea commited on Dec 15, 2025

Commit

5912922

1 Parent(s): 17f5813

Initialize dropdowns with data on app load - Populate training dataset dropdown with prepared datasets on startup - Initialize LoRA dropdowns with available LoRAs - Load LoRA list table with existing data - Populate export dataset dropdown - Fixes 'No prepared datasets available' when datasets exist

Browse files

Files changed (2) hide show

HF_COLLECTION_INTEGRATION.md +211 -0
app.py +78 -4

HF_COLLECTION_INTEGRATION.md ADDED Viewed

	@@ -0,0 +1,211 @@

+# HuggingFace Collection Integration - Complete
+## 🎯 Overview
+Full integration with HuggingFace Collection for LEMM LoRAs and datasets, including automatic syncing, import/export, and name conflict resolution.
+## ✅ Implemented Features
+### 1. **Dataset Import** (`import_prepared_dataset`)
+- **Location**: `backend/services/dataset_service.py`
+- **Purpose**: Import prepared datasets from ZIP files
+- **Features**:
+  - Supports both root-level and subfolder `dataset_info.json` structures
+  - Automatic name conflict resolution with numeric suffixes (`_1`, `_2`, etc.)
+  - Validates dataset structure before import
+  - Updates metadata with new dataset key if renamed
+```python
+# Example usage in app.py
+def import_dataset(zip_file):
+    dataset_service = DatasetService()
+    dataset_key = dataset_service.import_prepared_dataset(zip_file)
+    return f"✅ Imported dataset: {dataset_key}"
+```
+### 2. **LoRA Collection Sync** (`sync_on_startup`)
+- **Location**: `backend/services/hf_storage_service.py`
+- **Purpose**: Automatically download missing LoRAs from HF collection on app startup
+- **Features**:
+  - Lists all LoRAs in collection
+  - Compares with local LoRA directory
+  - Downloads only missing LoRAs
+  - Handles name conflicts with numeric suffixes
+  - Logs sync activity
+```python
+# Called automatically on app startup (app.py line 82)
+hf_storage = HFStorageService(username="Gamahea", collection_slug="lemm-100-pre-beta")
+sync_result = hf_storage.sync_on_startup(loras_dir=Path("models/loras"))
+```
+### 3. **Enhanced LoRA Upload**
+- **Location**: `app.py` - `start_lora_training()` function
+- **Purpose**: Upload trained LoRAs to HF collection with full metadata
+- **Features**:
+  - Uploads LoRA to individual model repo
+  - Adds to collection automatically
+  - Includes training config in metadata
+  - Returns repo URL and collection link
+  - Graceful error handling (saves locally if upload fails)
+```python
+# Upload after training (app.py lines 1397-1411)
+upload_result = hf_storage.upload_lora(lora_dir, training_config=config)
+if upload_result and 'repo_id' in upload_result:
+    # Success - show URLs
+    progress += f"\n✅ LoRA uploaded successfully!"
+    progress += f"\n🔗 Model: {upload_result['repo_id']}"
+    progress += f"\n📚 Collection: https://huggingface.co/collections/Gamahea/lemm-100-pre-beta"
+```
+## 📦 Name Conflict Resolution
+All import functions implement automatic name conflict resolution:
+1. **First Check**: Try original name
+2. **If Exists**: Append `_1`, `_2`, `_3`, etc.
+3. **Update Metadata**: Store new name in `dataset_info.json` or `metadata.json`
+4. **Log Action**: Inform user of renaming
+### Example Flow
+```
+Original: my_dataset
+Already exists → my_dataset_1
+Already exists → my_dataset_2
+Available → Use my_dataset_2 ✅
+```
+## 🔄 Automatic Workflows
+### On App Startup
+1. Check HF collection for LoRAs
+2. Compare with local `models/loras/` directory
+3. Download any missing LoRAs
+4. Log sync results
+### After LoRA Training
+1. Train LoRA adapter locally
+2. Upload to HF as individual model repo
+3. Add to collection
+4. Return URLs for viewing
+### Dataset Import
+1. User uploads ZIP file
+2. Extract and validate structure
+3. Check for name conflicts
+4. Copy to `training_data/` directory
+5. Update dropdown lists
+## 🛠️ Technical Details
+### File Structure Support
+**LoRA ZIP Files** (both supported):
+```
+Option 1 (root):
+  my_lora.zip/
+    ├── metadata.json
+    ├── adapter_config.json
+    └── adapter_model.safetensors
+Option 2 (subfolder):
+  my_lora.zip/
+    └── my_lora/
+        ├── metadata.json
+        ├── adapter_config.json
+        └── adapter_model.safetensors
+```
+**Dataset ZIP Files** (both supported):
+```
+Option 1 (root):
+  my_dataset.zip/
+    ├── dataset_info.json
+    ├── audio/
+    │   ├── sample_000001.wav
+    │   └── sample_000002.wav
+    └── splits.json
+Option 2 (subfolder):
+  my_dataset.zip/
+    └── my_dataset/
+        ├── dataset_info.json
+        ├── audio/
+        └── splits.json
+```
+### Error Handling
+All import/sync functions include:
+- **Try-catch blocks** for graceful error handling
+- **Comprehensive logging** with context
+- **User-friendly error messages**
+- **Fallback behavior** (e.g., save locally if upload fails)
+## 📊 HuggingFace Collection Structure
+**Collection**: `Gamahea/lemm-100-pre-beta`
+- **Purpose**: Organize all LEMM LoRA adapters
+- **Visibility**: Public
+- **Items**: Individual model repos
+**Model Repos**: `Gamahea/lemm-lora-{name}`
+- **Type**: LoRA adapters (safetensors)
+- **Metadata**: Training config, dataset info, creation date
+- **Files**: adapter_model.safetensors, adapter_config.json, metadata.json
+## 🎯 User Workflows
+### Train & Share a LoRA
+1. Prepare dataset (curated or user audio)
+2. Configure training parameters
+3. Click "Start Training"
+4. Wait for completion
+5. LoRA automatically uploaded to HF collection
+6. Share collection link with others
+### Use Someone's LoRA
+1. Open LEMM Space
+2. App automatically syncs LoRAs from collection
+3. Select LoRA in generation dropdown
+4. Generate music with custom style
+### Import a Dataset
+1. Export dataset from another LEMM instance
+2. Click "Import Dataset" in training tab
+3. Upload ZIP file
+4. Dataset appears in training dropdown
+5. Use for LoRA training
+## 🔗 Related Files
+- **HF Storage Service**: [backend/services/hf_storage_service.py](backend/services/hf_storage_service.py)
+- **Dataset Service**: [backend/services/dataset_service.py](backend/services/dataset_service.py)
+- **Main App**: [app.py](app.py)
+- **LoRA Training Service**: [backend/services/lora_training_service.py](backend/services/lora_training_service.py)
+## 📝 Commit History
+- **17f5813** (latest): Add dataset import & LoRA collection sync
+  - `import_prepared_dataset()` method
+  - `sync_on_startup()` method
+  - Enhanced `upload_lora()` with training_config
+  - Numeric suffix naming for conflicts
+- **f65e448**: Fixed LoRA import to support both ZIP structures
+- **2f0c8b4**: Added "Load for Training" workflow
+- **b40ee5f**: Fixed DataFrame handling in dataset preparation
+## 🎉 Result
+**Complete HuggingFace ecosystem integration!**
+- ✅ Auto-sync LoRAs from collection
+- ✅ Upload trained LoRAs to collection
+- ✅ Import/export datasets
+- ✅ Name conflict resolution
+- ✅ Comprehensive error handling
+- ✅ User-friendly feedback
+All three issues from screenshots are now resolved! 🚀

app.py CHANGED Viewed

@@ -2173,9 +2173,27 @@ with gr.Blocks(
                 gr.Markdown("---")
                 gr.Markdown("### 📤 Dataset Import/Export")
                 with gr.Row():
                     dataset_to_export = gr.Dropdown(
-                        choices=[],
                         label="Select Dataset to Export",
                         info="Download prepared datasets"
                     )
@@ -2201,8 +2219,31 @@ with gr.Blocks(
                     info="Unique name for this LoRA adapter"
                 )
                 selected_dataset = gr.Dropdown(
-                    choices=[],
                     label="Training Dataset",
                     info="Select prepared dataset to train on"
                 )
@@ -2217,8 +2258,19 @@ with gr.Blocks(
                     info="Start from a pre-trained LoRA adapter instead of from scratch"
                 )
                 base_lora_adapter = gr.Dropdown(
-                    choices=[],
                     label="Base LoRA Adapter",
                     info="Select LoRA to continue training from",
                     visible=False
@@ -2309,7 +2361,29 @@ with gr.Blocks(
                 gr.Markdown("---")
                 gr.Markdown("### Installed LoRA Adapters")
                 lora_list = gr.Dataframe(
                     headers=["Name", "Created", "Training Steps", "Type"],
                     datatype=["str", "str", "number", "str"],
                     row_count=10,
@@ -2323,7 +2397,7 @@ with gr.Blocks(
                 gr.Markdown("### Actions on Selected LoRA")
                 selected_lora_for_action = gr.Dropdown(
-                    choices=[],
                     label="Select LoRA Adapter",
                     info="Choose a LoRA to download or delete"
                 )

                 gr.Markdown("---")
                 gr.Markdown("### 📤 Dataset Import/Export")
+                # Initialize export dataset dropdown
+                def get_initial_export_datasets():
+                    try:
+                        from backend.services.dataset_service import DatasetService
+                        dataset_service = DatasetService()
+                        all_datasets = dataset_service.get_all_available_datasets()
+                        # Filter to only prepared datasets
+                        prepared = []
+                        for key, info in all_datasets.items():
+                            if info.get('prepared', False):
+                                prepared.append(key)
+                        return prepared if prepared else []
+                    except Exception as e:
+                        logger.error(f"Failed to load initial export datasets: {e}")
+                        return []
                 with gr.Row():
                     dataset_to_export = gr.Dropdown(
+                        choices=get_initial_export_datasets(),
                         label="Select Dataset to Export",
                         info="Download prepared datasets"
                     )
                     info="Unique name for this LoRA adapter"
                 )
+                # Initialize dataset dropdown with prepared datasets
+                def get_initial_datasets():
+                    try:
+                        from backend.services.dataset_service import DatasetService
+                        dataset_service = DatasetService()
+                        all_datasets = dataset_service.get_all_available_datasets()
+                        # Filter to only prepared datasets
+                        prepared_datasets = []
+                        for key, info in all_datasets.items():
+                            if info.get('prepared'):
+                                num_samples = info.get('num_train_samples', 0) + info.get('num_val_samples', 0)
+                                display_name = f"{key} ({num_samples} samples)"
+                                prepared_datasets.append(display_name)
+                        if not prepared_datasets:
+                            prepared_datasets = ["No prepared datasets available"]
+                        return prepared_datasets
+                    except Exception as e:
+                        logger.error(f"Failed to load initial datasets: {e}")
+                        return ["No prepared datasets available"]
                 selected_dataset = gr.Dropdown(
+                    choices=get_initial_datasets(),
                     label="Training Dataset",
                     info="Select prepared dataset to train on"
                 )
                     info="Start from a pre-trained LoRA adapter instead of from scratch"
                 )
+                # Initialize LoRA dropdown with available LoRAs
+                def get_initial_loras():
+                    try:
+                        from backend.services.lora_training_service import LoRATrainingService
+                        lora_service = LoRATrainingService()
+                        adapters = lora_service.list_lora_adapters()
+                        return [adapter.get('name', '') for adapter in adapters]
+                    except Exception as e:
+                        logger.error(f"Failed to load initial LoRAs: {e}")
+                        return []
                 base_lora_adapter = gr.Dropdown(
+                    choices=get_initial_loras(),
                     label="Base LoRA Adapter",
                     info="Select LoRA to continue training from",
                     visible=False
                 gr.Markdown("---")
                 gr.Markdown("### Installed LoRA Adapters")
+                # Initialize LoRA list with data
+                def get_initial_lora_table():
+                    try:
+                        from backend.services.lora_training_service import LoRATrainingService
+                        lora_service = LoRATrainingService()
+                        adapters = lora_service.list_lora_adapters()
+                        table_data = []
+                        for adapter in adapters:
+                            table_data.append([
+                                adapter.get('name', ''),
+                                adapter.get('saved_at', ''),
+                                adapter.get('training_steps', 0),
+                                adapter.get('training_type', 'unknown')
+                            ])
+                        return table_data
+                    except Exception as e:
+                        logger.error(f"Failed to load initial LoRA list: {e}")
+                        return []
                 lora_list = gr.Dataframe(
+                    value=get_initial_lora_table(),
                     headers=["Name", "Created", "Training Steps", "Type"],
                     datatype=["str", "str", "number", "str"],
                     row_count=10,
                 gr.Markdown("### Actions on Selected LoRA")
                 selected_lora_for_action = gr.Dropdown(
+                    choices=get_initial_loras(),
                     label="Select LoRA Adapter",
                     info="Choose a LoRA to download or delete"
                 )