File size: 13,146 Bytes
d017b63
5bc5f16
d017b63
5bc5f16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d017b63
5bc5f16
d017b63
5bc5f16
d017b63
5bc5f16
 
d017b63
5bc5f16
d017b63
5bc5f16
d017b63
5bc5f16
d017b63
5bc5f16
d017b63
5bc5f16
d017b63
5bc5f16
 
 
 
 
 
 
 
 
 
 
d017b63
 
5bc5f16
d017b63
5bc5f16
 
 
d017b63
 
5bc5f16
 
 
 
 
 
 
 
 
 
 
 
 
d017b63
5bc5f16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d017b63
5bc5f16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
# πŸš€ Hugging Face Space Setup Guide for OpenLLM Training (HF Access Token)

This guide will help you set up proper authentication for Hugging Face Spaces using HF access token so that your OpenLLM training and model uploads work correctly.

## 🎯 Overview

The issue you encountered was that training completed successfully in Hugging Face Spaces, but the model upload failed due to authentication problems. This guide will ensure that future training runs in Spaces will have proper authentication using GitHub secrets and successful uploads.

## πŸ”§ Step-by-Step Setup

### Step 1: Get Your Hugging Face Token

1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
2. Click "New token"
3. Give it a name (e.g., "OpenLLM Space Training")
4. Select "Write" role for full access
5. Copy the generated token

### Step 2: Set Up HF Access Token in Space Settings

1. Go to your Hugging Face Space settings:
   ```
   https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME/settings
   ```

2. Navigate to "Repository secrets" section

3. Click "New secret"

4. Add a new secret:
   - **Name**: `HF_TOKEN`
   - **Value**: Your Hugging Face access token from Step 1 (starts with `hf_`)

5. Click "Add secret"

**Note**: This token will be automatically available to your Space as the `HF_TOKEN` environment variable.

### Step 3: Verify Authentication in Your Space

Add this code to your Space to verify authentication is working:

```python
# Add this to your Space's main script or run it separately
import os
from huggingface_hub import HfApi, whoami

def verify_space_auth():
    """Verify authentication is working in the Space using HF access token."""
    print("πŸ” Verifying Space Authentication (HF Access Token)")
    
    # Check if HF_TOKEN is set (from Space settings)
    token = os.getenv("HF_TOKEN")
    if not token:
        print("❌ HF_TOKEN not found in Space environment")
        print("   - Please set HF_TOKEN in your Space settings")
        print("   - Go to Space settings β†’ Repository secrets")
        return False
    
    try:
        # Test authentication
        from huggingface_hub import login
        login(token=token)
        
        user_info = whoami()
        username = user_info["name"]
        
        print(f"βœ… Authentication successful!")
        print(f"   - Username: {username}")
        print(f"   - Token: {token[:8]}...{token[-4:]}")
        print(f"   - Source: HF access token in Space settings")
        
        # Test API access
        api = HfApi()
        print(f"βœ… API access working")
        
        return True
        
    except Exception as e:
        print(f"❌ Authentication failed: {e}")
        return False

# Run verification
if __name__ == "__main__":
    verify_space_auth()
```

### Step 4: Update Your Training Script

Modify your training script to include proper authentication using GitHub secrets:

```python
import os
from huggingface_hub import HfApi, login, create_repo
import json

class SpaceTrainingManager:
    """Manages training and upload in Hugging Face Spaces using HF access token."""
    
    def __init__(self):
        self.api = None
        self.username = None
        self.setup_authentication()
    
    def setup_authentication(self):
        """Set up authentication for the Space using GitHub secrets."""
        try:
            # Get token from GitHub secrets (automatically available in Space)
            token = os.getenv("HF_TOKEN")
            if not token:
                raise ValueError("HF_TOKEN not found in Space environment. Please set it in GitHub repository secrets.")
            
            # Login
            login(token=token)
            
            # Initialize API
            self.api = HfApi()
            user_info = whoami()
            self.username = user_info["name"]
            
            print(f"βœ… Space authentication successful: {self.username}")
            print(f"   - Source: GitHub secrets")
            
        except Exception as e:
            print(f"❌ Authentication failed: {e}")
            raise
    
    def upload_model(self, model_dir: str, model_size: str = "small", steps: int = 8000):
        """Upload the trained model to Hugging Face Hub."""
        try:
            # Create repository name
            repo_name = f"openllm-{model_size}-extended-{steps//1000}k"
            repo_id = f"{self.username}/{repo_name}"
            
            print(f"πŸ“€ Uploading model to {repo_id}")
            
            # Create repository
            create_repo(
                repo_id=repo_id,
                repo_type="model",
                exist_ok=True,
                private=False
            )
            
            # Create model configuration
            self.create_model_config(model_dir, model_size)
            
            # Create model card
            self.create_model_card(model_dir, repo_id, model_size, steps)
            
            # Upload all files
            self.api.upload_folder(
                folder_path=model_dir,
                repo_id=repo_id,
                repo_type="model",
                commit_message=f"Add OpenLLM {model_size} model ({steps} steps)"
            )
            
            print(f"βœ… Model uploaded successfully!")
            print(f"   - Repository: https://huggingface.co/{repo_id}")
            
            return repo_id
            
        except Exception as e:
            print(f"❌ Upload failed: {e}")
            raise
    
    def create_model_config(self, model_dir: str, model_size: str):
        """Create Hugging Face compatible configuration."""
        config = {
            "architectures": ["GPTModel"],
            "model_type": "gpt",
            "vocab_size": 32000,
            "n_positions": 2048,
            "n_embd": 768 if model_size == "small" else 1024 if model_size == "medium" else 1280,
            "n_layer": 12 if model_size == "small" else 24 if model_size == "medium" else 32,
            "n_head": 12 if model_size == "small" else 16 if model_size == "medium" else 20,
            "bos_token_id": 1,
            "eos_token_id": 2,
            "pad_token_id": 0,
            "unk_token_id": 3,
            "transformers_version": "4.35.0",
            "use_cache": True
        }
        
        config_path = os.path.join(model_dir, "config.json")
        with open(config_path, "w") as f:
            json.dump(config, f, indent=2)
    
    def create_model_card(self, model_dir: str, repo_id: str, model_size: str, steps: int):
        """Create model card (README.md)."""
        model_card = f"""# OpenLLM {model_size.capitalize()} Model ({steps} steps)

This is a trained OpenLLM {model_size} model with extended training.

## Model Details

- **Model Type**: GPT-style decoder-only transformer
- **Architecture**: Custom OpenLLM implementation
- **Training Data**: SQUAD dataset (Wikipedia passages)
- **Vocabulary Size**: 32,000 tokens
- **Sequence Length**: 2,048 tokens
- **Model Size**: {model_size.capitalize()}
- **Training Steps**: {steps:,}

## Usage

This model can be used with the OpenLLM framework for text generation and language modeling tasks.

## Training

The model was trained using the OpenLLM training pipeline with:
- SentencePiece tokenization
- Custom GPT architecture
- SQUAD dataset for training
- Extended training for improved performance

## License

This model is released under the GNU General Public License v3.0.

## Repository

This model is hosted on Hugging Face Hub: https://huggingface.co/{repo_id}
"""
        
        readme_path = os.path.join(model_dir, "README.md")
        with open(readme_path, "w") as f:
            f.write(model_card)

# Usage in your training script
def main():
    # Initialize training manager
    training_manager = SpaceTrainingManager()
    
    # Your training code here...
    # ... (training logic) ...
    
    # After training completes, upload the model
    model_dir = "./openllm-trained"  # Your model directory
    repo_id = training_manager.upload_model(model_dir, "small", 8000)
    
    print(f"πŸŽ‰ Training and upload completed!")
    print(f"   - Model available at: https://huggingface.co/{repo_id}")

if __name__ == "__main__":
    main()
```

### Step 5: Test the Setup

Run the authentication verification script in your Space to ensure everything is working:

```python
# Add this to your Space to test
from setup_hf_space_auth import HuggingFaceSpaceAuthSetup

def test_space_setup():
    """Test the Space authentication setup with GitHub secrets."""
    auth_setup = HuggingFaceSpaceAuthSetup()
    
    if auth_setup.setup_space_authentication():
        print("βœ… Space authentication working")
        
        # Test repository creation
        if auth_setup.test_repository_creation():
            print("βœ… Repository creation working")
        
        # Test model upload
        if auth_setup.test_model_upload():
            print("βœ… Model upload working")
        
        print("πŸŽ‰ All tests passed! Ready for training.")
    else:
        print("❌ Authentication setup failed")

# Run the test
test_space_setup()
```

## πŸ” Troubleshooting

### Common Issues

1. **"HF_TOKEN not found"**
   - **Solution**: Make sure you've added the HF_TOKEN secret in your GitHub repository secrets
   - **Check**: Go to GitHub repository β†’ Settings β†’ Secrets and variables β†’ Actions

2. **"401 Unauthorized"**
   - **Solution**: Verify your token has "Write" permissions
   - **Check**: Go to https://huggingface.co/settings/tokens and ensure the token has "Write" role

3. **"Repository creation failed"**
   - **Solution**: Check if the repository name is unique
   - **Check**: Ensure you have permission to create repositories

4. **"Upload failed"**
   - **Solution**: Check Space logs for detailed error messages
   - **Check**: Verify network connectivity and file permissions

5. **"GitHub secrets not accessible"**
   - **Solution**: Ensure your Space is connected to the GitHub repository
   - **Check**: Verify the Space is created from the GitHub repository

### Verification Steps

1. **Check Space Environment**:
   ```python
   import os
   print("Space Environment Variables:")
   for var in ["SPACE_ID", "SPACE_HOST", "HF_TOKEN"]:
       value = os.getenv(var)
       print(f"  {var}: {'βœ… Set' if value else '❌ Not set'}")
   ```

2. **Test Authentication**:
   ```python
   from huggingface_hub import whoami
   try:
       user_info = whoami()
       print(f"βœ… Authenticated as: {user_info['name']}")
   except Exception as e:
       print(f"❌ Authentication failed: {e}")
   ```

3. **Test Repository Creation**:
   ```python
   from huggingface_hub import create_repo, delete_repo
   try:
       repo_id = "lemms/test-repo"
       create_repo(repo_id, repo_type="model", private=True)
       print("βœ… Repository creation working")
       delete_repo(repo_id, repo_type="model")
   except Exception as e:
       print(f"❌ Repository creation failed: {e}")
   ```

## πŸ“‹ Complete Workflow

1. **Set up GitHub repository secrets** with your HF_TOKEN
2. **Verify authentication** using the test script
3. **Run your training** with the updated training manager
4. **Monitor upload progress** in the Space logs
5. **Verify the model** appears on Hugging Face Hub

## 🎯 Expected Results

After successful setup, you should see:

```
βœ… Running in Hugging Face Space environment
βœ… HF_TOKEN found: hf_xxxx...xxxx
   - Source: GitHub secrets
βœ… Authentication successful!
   - Username: lemms
βœ… API access working

πŸ§ͺ Testing Repository Creation
πŸ”„ Creating test repository: lemms/test-openllm-verification
βœ… Repository created successfully
πŸ”„ Cleaning up test repository...
βœ… Repository deleted

πŸŽ‰ All verification tests passed!
   - Authentication: βœ… Working
   - Repository Creation: βœ… Working
   - GitHub Secrets Integration: βœ… Working
   - Ready for training and model uploads!

πŸ“€ Uploading model to lemms/openllm-small-extended-8k
βœ… Model uploaded successfully!
   - Repository: https://huggingface.co/lemms/openllm-small-extended-8k
```

Your model will then be available at: `https://huggingface.co/lemms/openllm-small-extended-8k`

## πŸ”’ Security Notes

- **Token Security**: The HF_TOKEN is stored securely in GitHub repository secrets
- **Repository Access**: Only you can access your model repositories
- **Cleanup**: Test repositories are automatically deleted after testing
- **Monitoring**: Check Space logs for any authentication issues
- **GitHub Integration**: Secrets are automatically available in connected Spaces

## πŸš€ Benefits of GitHub Secrets

1. **Centralized Management**: All secrets managed in one place
2. **Automatic Access**: Spaces automatically have access to repository secrets
3. **Version Control**: Secrets are tied to your repository
4. **Security**: GitHub provides secure secret management
5. **Easy Updates**: Update secrets without touching Space settings

---

**Next Steps**: Once you've set up the GitHub repository secrets, you can re-run your training and the model upload should work correctly!