|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: autogluon |
|
|
tags: |
|
|
- binary-classification |
|
|
- multi-class-classification |
|
|
- text-classification |
|
|
- embeddings |
|
|
- umap |
|
|
- autogluon |
|
|
datasets: |
|
|
- 112_Tiering_Questions_02.28.2025.json |
|
|
model-index: |
|
|
- name: "03062025_V2_UMAP_Embedding_Classifier (Binary)" |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Binary Classification |
|
|
dataset: |
|
|
name: 112_Tiering_Questions_02.28.2025.json |
|
|
type: tabular |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: 0.9565 |
|
|
- name: F1 |
|
|
type: f1 |
|
|
value: 0.97 |
|
|
- name: ROC AUC |
|
|
type: roc_auc |
|
|
value: 0.91 |
|
|
- name: "03062025_V2_UMAP_Embedding_Classifier (Multi-class)" |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Multi-class Classification |
|
|
dataset: |
|
|
name: 112_Tiering_Questions_02.28.2025.json |
|
|
type: tabular |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: 0.5652 |
|
|
- name: F1 |
|
|
type: f1 |
|
|
value: 0.59 |
|
|
- name: ROC AUC |
|
|
type: roc_auc |
|
|
value: 0.74 |
|
|
--- |
|
|
|
|
|
# 03062025_V2_UMAP_Embedding_Classifier |
|
|
|
|
|
This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model. |
|
|
|
|
|
## Key Details |
|
|
|
|
|
- **UMAP for Binary Classification**: Best n_components tuned via Optuna = 11. |
|
|
- **UMAP for Multi-class Classification**: Best n_components tuned via Optuna = 43. |
|
|
- **Data**: 112 technical questions with tiering classifications (0β4). |
|
|
- **Performance Metrics**: |
|
|
- **Binary**: Accuracy β95.65%, F1 β0.97, ROC AUC β0.91. |
|
|
- **Multi-class**: Accuracy β56.52%, F1 β0.59, ROC AUC β0.74. |
|
|
|
|
|
## Usage |
|
|
|
|
|
1. **Loading the Models**: |
|
|
```python |
|
|
from autogluon.tabular import TabularPredictor |
|
|
binary_predictor = TabularPredictor.load("binary_final_model") |
|
|
multi_predictor = TabularPredictor.load("multiclass_final_model") |
|
|
``` |
|
|
|
|
|
2. **Preprocessing**: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib). |
|
|
|
|
|
3. **Prediction**: Use predict() and predict_proba() to obtain predictions. |
|
|
|
|
|
## License |
|
|
This project is licensed under the Apache-2.0 License. |
|
|
|
|
|
## Contact |
|
|
For questions or collaboration, please contact LeiPricingManager. |
|
|
|