File size: 2,647 Bytes
7357c41 9c40b2f 7357c41 9c40b2f 7357c41 9c40b2f 7357c41 9c40b2f 7357c41 34306f1 7357c41 9c40b2f 7357c41 9c40b2f 7357c41 de0e968 34306f1 de0e968 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
language:
- en
license: apache-2.0
library_name: autogluon
tags:
- binary-classification
- multi-class-classification
- text-classification
- embeddings
- umap
- autogluon
datasets:
- 112_Tiering_Questions_02.28.2025.json
model-index:
- name: "03062025_V2_UMAP_Embedding_Classifier (Binary)"
results:
- task:
type: text-classification
name: Binary Classification
dataset:
name: 112_Tiering_Questions_02.28.2025.json
type: tabular
metrics:
- name: Accuracy
type: accuracy
value: 0.9565
- name: F1
type: f1
value: 0.97
- name: ROC AUC
type: roc_auc
value: 0.91
- name: "03062025_V2_UMAP_Embedding_Classifier (Multi-class)"
results:
- task:
type: text-classification
name: Multi-class Classification
dataset:
name: 112_Tiering_Questions_02.28.2025.json
type: tabular
metrics:
- name: Accuracy
type: accuracy
value: 0.5652
- name: F1
type: f1
value: 0.59
- name: ROC AUC
type: roc_auc
value: 0.74
---
# 03062025_V2_UMAP_Embedding_Classifier
This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model.
## Key Details
- **UMAP for Binary Classification**: Best n_components tuned via Optuna = 11.
- **UMAP for Multi-class Classification**: Best n_components tuned via Optuna = 43.
- **Data**: 112 technical questions with tiering classifications (0β4).
- **Performance Metrics**:
- **Binary**: Accuracy β95.65%, F1 β0.97, ROC AUC β0.91.
- **Multi-class**: Accuracy β56.52%, F1 β0.59, ROC AUC β0.74.
## Usage
1. **Loading the Models**:
```python
from autogluon.tabular import TabularPredictor
binary_predictor = TabularPredictor.load("binary_final_model")
multi_predictor = TabularPredictor.load("multiclass_final_model")
```
2. **Preprocessing**: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).
3. **Prediction**: Use predict() and predict_proba() to obtain predictions.
## License
This project is licensed under the Apache-2.0 License.
## Contact
For questions or collaboration, please contact LeiPricingManager.
|