File size: 2,647 Bytes
7357c41
 
 
 
9c40b2f
7357c41
 
9c40b2f
 
7357c41
 
9c40b2f
7357c41
 
 
9c40b2f
7357c41
 
 
 
 
34306f1
7357c41
 
9c40b2f
 
7357c41
9c40b2f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7357c41
de0e968
34306f1
 
de0e968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
language:
  - en
license: apache-2.0
library_name: autogluon
tags:
  - binary-classification
  - multi-class-classification
  - text-classification
  - embeddings
  - umap
  - autogluon
datasets:
  - 112_Tiering_Questions_02.28.2025.json
model-index:
  - name: "03062025_V2_UMAP_Embedding_Classifier (Binary)"
    results:
      - task:
          type: text-classification
          name: Binary Classification
        dataset:
          name: 112_Tiering_Questions_02.28.2025.json
          type: tabular
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9565
          - name: F1
            type: f1
            value: 0.97
          - name: ROC AUC
            type: roc_auc
            value: 0.91
  - name: "03062025_V2_UMAP_Embedding_Classifier (Multi-class)"
    results:
      - task:
          type: text-classification
          name: Multi-class Classification
        dataset:
          name: 112_Tiering_Questions_02.28.2025.json
          type: tabular
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.5652
          - name: F1
            type: f1
            value: 0.59
          - name: ROC AUC
            type: roc_auc
            value: 0.74
---

# 03062025_V2_UMAP_Embedding_Classifier

This repository contains two final AutoGluon TabularPredictor models (binary and multi-class) built using UMAP-reduced embeddings from the [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) model.

## Key Details

- **UMAP for Binary Classification**: Best n_components tuned via Optuna = 11.
- **UMAP for Multi-class Classification**: Best n_components tuned via Optuna = 43.
- **Data**: 112 technical questions with tiering classifications (0–4).
- **Performance Metrics**:
  - **Binary**: Accuracy β‰ˆ95.65%, F1 β‰ˆ0.97, ROC AUC β‰ˆ0.91.
  - **Multi-class**: Accuracy β‰ˆ56.52%, F1 β‰ˆ0.59, ROC AUC β‰ˆ0.74.

## Usage

1. **Loading the Models**:
   ```python
   from autogluon.tabular import TabularPredictor
   binary_predictor = TabularPredictor.load("binary_final_model")
   multi_predictor = TabularPredictor.load("multiclass_final_model")
   ```

2. **Preprocessing**: Generate embeddings for your input text using the Alibaba-NLP/gte-large-en-v1.5 model and apply the UMAP transformation with the provided reducer files (umap_reducer_binary.joblib and umap_reducer_multi.joblib).

3. **Prediction**: Use predict() and predict_proba() to obtain predictions.

## License
This project is licensed under the Apache-2.0 License.

## Contact
For questions or collaboration, please contact LeiPricingManager.