Categorização automatizada de atendimentos ao consumidor

Estou buscando algum modelo que possa auxiliar na categorização automática de atendimentos com base na descrição de abertura do chamado. Em resumo, o modelo irá interpretar aquele chamado do cliente e categorizar de forma automática.

1 Like

Hmm… Text-classification?


You can treat what you want as a very specific case of text (intent) classification:

Input = short text written when the call is opened
Output = category label(s) like “Billing”, “Login problem”, “Cancellation”, etc.

Below is a detailed, concrete answer focused on:

  • Which existing models you can use right now.
  • Which base model you should fine-tune for your own categories (especially in Portuguese).
  • How they all fit together in a practical pipeline.

1. What you’re doing, in plain terms

When the call is opened, someone types something like:

  • “Cliente não consegue acessar a conta pelo aplicativo depois de redefinir a senha.”
  • “Customer was charged twice this month.”

You want a model that reads that description and outputs a label such as:

  • LOGIN_PROBLEM
  • BILLING_DOUBLE_CHARGE
  • CANCELLATION

Technically this is:

  • Single-label classification if you choose exactly one “main reason for contact”.
  • Multi-label classification if you want several tags per call (e.g. BILLING + COMPLAINT).

Almost all modern solutions use a Transformer encoder (BERT-like model) plus a small classification head.


2. Short model shortlist (what to actually look at)

2.1 For immediate “works right now” zero-shot (Portuguese + other languages)

Model: MoritzLaurer/mDeBERTa-v3-base-mnli-xnli (Hugging Face)

  • What it is: multilingual DeBERTa v3 model, trained on MNLI + XNLI, officially described as suitable for multilingual zero-shot classification. (Hugging Face)

  • Languages: ≈100, including Portuguese.

  • Why it’s useful to you:

    • You do not need any labeled data to start.
    • You can pass Portuguese call descriptions and candidate labels written in natural language (also in Portuguese).
    • Perfect for a first prototype and for bootstrapping labels.

Usage conceptually:

from transformers import pipeline

clf = pipeline(
    "zero-shot-classification",
    model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli",
)

descricao = "Cliente não consegue acessar a conta pelo aplicativo depois de redefinir a senha."
labels = [
    "Problema de login",
    "Problema de pagamento",
    "Problema de entrega",
    "Dúvida geral",
]

result = clf(descricao, candidate_labels=labels, multi_label=False)
print(result["labels"][0], result["scores"][0])

The model is doing NLI under the hood (it asks “Does this text entail ‘This is about a login problem’?” for each label and picks the best). (Hugging Face)

This gives you automatic categorization today, before any fine-tuning.


2.2 For English call / ticket categorization (off-the-shelf)

Even if your main need is Portuguese, it’s very useful to know the English models that already solve exactly your problem, because they are templates for architecture and label design.

a) Call-center intent classifier: karimenBR/callcenter-transformer

  • Type: DistilBERT text classifier.

  • Model card summary: “CallCenter Transformer Model – Predicting the intent or topic group of a customer message in a call center. Supporting automated routing of customer requests.” (Hugging Face)

  • Intended uses:

    • Call-center customer messages → intent / topic.
    • Automated routing and analytics.

This is exactly your scenario, just with training data in (primarily) English. It shows:

  • How to treat each call message as a text classification input.
  • How to build a call-center model on top of DistilBERT.

You can plug it into a pipeline the same way as any HF classifier, and you can inspect config.id2label to see the call categories it uses.

b) Ticket classifier: Dragneel/Ticket-classification-model

  • Type: DistilBERT classifier.
  • Labels: {0: 'Billing Question', 1: 'Feature Request', 2: 'General Inquiry', 3: 'Technical Issue'}. (Hugging Face)
  • Use case: Short ticket text → one of four standard helpdesk categories.

Why it’s relevant:

  • It is a minimal, clear example of support ticket categorization that is nearly identical to “call description categorization”.
  • If you need to support English as well, you can deploy this model directly for English calls and a different one for Portuguese.

Example mapping:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Dragneel/Ticket-classification-model"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

id2label = model.config.id2label  # mapping above

def classify_call_en(text: str):
    inputs = tokenizer(text, return_tensors="pt", truncation=True)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)[0]
    idx = int(torch.argmax(probs))
    return id2label[idx], float(probs[idx])

2.3 For generic customer-support intent classification (English)

Model: vineetsharma/customer-support-intent-albert (Hugging Face)

  • What it is: ALBERT model fine-tuned on the Bitext customer-support dataset (bitext/Bitext-customer-support-llm-chatbot-training-dataset). (Hugging Face)

  • Performance: Model card reports ~99.9% accuracy on its eval split. (Hugging Face)

  • Why it matters:

    • It’s a real customer-support intent model trained on thousands of labeled support cases.
    • Its taxonomy (27 intents grouped into 10 categories) is a good reference for designing your own categories.

You probably will not use this model directly for Portuguese, but it is a strong reference for label sets and architecture, and can even be used to auto-label English calls to create a “teacher model” if you go down a distillation route.


2.4 For Portuguese fine-tuning: BERTimbau

For serious, long-term quality in Portuguese, you almost always want to fine-tune a Portuguese encoder:

Models:

  • neuralmind/bert-base-portuguese-cased

  • neuralmind/bert-large-portuguese-cased (Hugging Face)

  • What they are: BERT Base and BERT Large models trained on the large Brazilian Portuguese web corpus brWaC. They achieve state-of-the-art on Portuguese NER, textual similarity, and entailment. (Hugging Face)

  • Why they are well-suited for your problem:

    • They “speak” Brazilian Portuguese natively.
    • They are widely used as a base for PT downstream tasks (hate speech detection, QA, etc.), meaning they are stable and well-tested. (Hugging Face)

This is the model family you would fine-tune on your own call descriptions and categories to get a custom “Call Categorizer PT”.

You can think of this as:

bert-base-portuguese-cased + classification head → your call categories.


3. How these models fit together into a realistic pipeline

You don’t have to pick just one model forever. A practical strategy is to use them in phases:

Phase 1 – Quick start with no labeled data

Goal: “Have something working that can read a call description and assign a reasonable category, even if not perfect.”

  • Use multilingual zero-shot:

    • MoritzLaurer/mDeBERTa-v3-base-mnli-xnli in the zero-shot-classification pipeline, with Portuguese labels. (Hugging Face)
  • Define an initial label list, for example:

    • “Problema de login ou senha”
    • “Problema de pagamento ou cobrança”
    • “Problema de entrega ou atraso”
    • “Dúvida sobre produto”
    • “Cancelamento de serviço”
    • “Reclamação geral”

For each new call:

  1. Read the description.
  2. Run zero-shot classification with those label strings.
  3. Autofill the category field in your CRM (agent can override it).

Simultaneously:

  • Log both the model prediction and the final category chosen by the agent.
  • This automatically creates a labeled dataset of call descriptions for later fine-tuning.

This phase gives you:

  • Automatic categorization in Portuguese today.
  • A data pipeline for supervised training later.

Phase 2 – Supervised Portuguese model (better accuracy, stable behavior)

Once you have a few thousand labeled calls (even 2–5k is already useful):

  • Build a dataset with columns like: {"text": "...", "label": "LOGIN_PROBLEM"}.

  • Fine-tune BERTimbau (neuralmind/bert-base-portuguese-cased) with a classification head:

    • For single-label: standard cross-entropy.
    • For multi-label: BCEWithLogitsLoss (multi-hot labels).

Conceptually:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "neuralmind/bert-base-portuguese-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=len(label_list),
    id2label=id2label,
    label2id=label2id,
)

Then train this model with Hugging Face Trainer on your call dataset.

Result:

  • A dedicated Portuguese call categorizer that:

    • Knows your products.
    • Uses your label names.
    • Usually outperforms zero-shot models on your real data.

You can still keep MoritzLaurer/mDeBERTa-v3-base-mnli-xnli as a fallback for very rare or new categories.


Phase 3 – Optional: multi-language / generative behaviors

If you later need:

  • English and Portuguese in the same system,
  • Or richer outputs (e.g. a JSON with category + urgency + sentiment),

you can:

  1. Use English ticket models directly for English calls:

    • karimenBR/callcenter-transformer for call-style English text. (Hugging Face)
    • Dragneel/Ticket-classification-model for English ticket descriptions. (Hugging Face)
  2. Train a shared multilingual classifier (e.g. XLM-R or mDeBERTa) if you want one model for all languages.

  3. Optionally fine-tune a small LLM (via TRL / LoRA) to:

    • Read longer context (full transcript),

    • Summarize the call,

    • Output a structured JSON like:

      {
        "category": ["INTERNET_LENTA"],
        "summary_pt": "Cliente relata lentidão de internet desde ontem.",
        "urgency": "alta"
      }
      

For most call-routing use cases, a classic encoder-plus-head model (like BERTimbau) is simpler and more than enough.


4. Designing your categories (where models and business meet)

The models above will do what you tell them, but you define the categories. That design is critical:

  • Start with 10–30 categories, not hundreds.

  • Make them:

    • Mutually exclusive for the main label, as much as possible.
    • Expressed in clear language (“Problema de login”, “Erro de pagamento com cartão”, etc.).
  • Consider a two-level hierarchy:

    • Level 1: Billing, Technical, Orders, Delivery, Cancellation, General.
    • Level 2: More specific (e.g. Technical → Login, Password reset, Mobile app, Website).

You can borrow ideas from:

  • Bitext’s 27 intents / 10 categories used for customer-support-intent-albert. (Hugging Face)
  • Dragneel/Ticket-classification-model’s four coarse categories. (Hugging Face)

Once the label set is defined, any of the models above can be aligned to it via zero-shot prompts (Phase 1) or fine-tuning (Phase 2).


5. Which model(s) to pick, concretely

Putting it all together, a very concrete recommendation for your exact use case:

  1. Right now, with minimal effort:

    • Use MoritzLaurer/mDeBERTa-v3-base-mnli-xnli as a zero-shot classifier on the call-open description text.
    • Provide candidate labels in Portuguese.
    • Let agents override the prediction; log corrections.
  2. As soon as you have a few thousand labeled Portuguese calls:

    • Fine-tune neuralmind/bert-base-portuguese-cased as a Portuguese call categorization model with your final label set. (Hugging Face)
  3. If you also have English calls:

    • Optionally plug in Dragneel/Ticket-classification-model or karimenBR/callcenter-transformer for English-only flows, or use a multilingual classifier if you want a single model. (Hugging Face)

This combination gives you:

  • A fast prototype that already interprets call descriptions and assigns categories.
  • A clear path to a high-quality, Portuguese-native production model once your own call data is available.

"""
Simple demo: automatic categorization of customer-service calls from their
short text description, using a multilingual zero-shot model.

This script:
  - Is CPU/GPU safe (including NVIDIA T4).
  - Uses float32 on CPU, float16 on GPU.
  - Does NOT use any CLI / argument parsing.
  - Can be run as:  python demo_call_categorization.py
  - Can also be imported and re-used from other Python code.

Dependencies (install with pip):
    pip install "torch>=2.2.0" "transformers>=4.40.0"

Main model used (multilingual, zero-shot text classification):
    Hugging Face model card:
    https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli
    (multilingual mDeBERTa-v3 fine-tuned for NLI, suitable for zero-shot classification)

Transformers pipeline docs (zero-shot classification patterns):
    https://huggingface.co/docs/transformers/main/main_classes/pipelines
"""

from typing import List, Dict, Any
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    pipeline,
)


# ---------------------------------------------------------------------------
# 1. Configuration
# ---------------------------------------------------------------------------

# Hugging Face model ID (multilingual zero-shot classifier).
MODEL_ID = "MoritzLaurer/mDeBERTa-v3-base-mnli-xnli"

# Example business categories: adapt these to your own call-center taxonomy.
# You can use Portuguese, English, or any other supported language.
CANDIDATE_LABELS_PT = [
    "Problema de login ou senha",
    "Problema de pagamento ou cobrança",
    "Problema de entrega ou atraso",
    "Dúvida sobre produto ou serviço",
    "Cancelamento de assinatura ou plano",
    "Reclamação geral",
]

CANDIDATE_LABELS_EN = [
    "Login or password problem",
    "Payment or billing issue",
    "Delivery or shipping delay",
    "Product or service question",
    "Subscription or plan cancellation",
    "General complaint",
]


# ---------------------------------------------------------------------------
# 2. Device & dtype selection (CPU/GPU-safe, T4-safe)
# ---------------------------------------------------------------------------

def get_device_and_dtype() -> Dict[str, Any]:
    """
    Decide which device and floating-point dtype to use.

    - If a CUDA GPU is available (e.g. NVIDIA T4), use:
        device = "cuda"
        dtype  = torch.float16   (saves memory and is T4-friendly)
    - Otherwise (CPU only), use:
        device = "cpu"
        dtype  = torch.float32   (safer and generally faster on CPU)

    Returns a dict with:
        {"device": torch.device, "dtype": torch.dtype, "pipeline_device": int}
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        dtype = torch.float16
        pipeline_device = 0  # index of the first CUDA device
    else:
        device = torch.device("cpu")
        dtype = torch.float32
        pipeline_device = -1  # -1 means "run on CPU" for transformers.pipeline

    return {
        "device": device,
        "dtype": dtype,
        "pipeline_device": pipeline_device,
    }


# ---------------------------------------------------------------------------
# 3. Build zero-shot classification pipeline
# ---------------------------------------------------------------------------

def build_zero_shot_pipeline() -> Any:
    """
    Load the tokenizer + model and build a zero-shot-classification pipeline.

    The model is:
        - moved to the chosen device (CPU or GPU),
        - converted to float16 on GPU,
        - kept as float32 on CPU.

    Returns a Hugging Face pipeline that can be called like:

        classifier("some text", candidate_labels=[...], multi_label=False)
    """
    cfg = get_device_and_dtype()
    device = cfg["device"]
    dtype = cfg["dtype"]
    pipeline_device = cfg["pipeline_device"]

    # Load tokenizer and model from Hugging Face Hub.
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

    # Note:
    #   We load in default dtype, then cast according to device.
    model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)

    # Move model to device and set dtype safely.
    if device.type == "cuda":
        # On GPU (e.g. T4): move to GPU and use float16.
        model = model.to(device)
        model = model.to(dtype=torch.float16)
    else:
        # On CPU: keep float32 (default), just ensure it is on CPU.
        model = model.to(device)
        model = model.to(dtype=torch.float32)

    # Build zero-shot pipeline with the prepared model.
    classifier = pipeline(
        task="zero-shot-classification",
        model=model,
        tokenizer=tokenizer,
        device=pipeline_device,  # 0 for GPU, -1 for CPU
    )

    return classifier


# ---------------------------------------------------------------------------
# 4. Simple helper to categorize a single call
# ---------------------------------------------------------------------------

def categorize_call_description(
    classifier: Any,
    description: str,
    candidate_labels: List[str],
    multi_label: bool = False,
) -> Dict[str, Any]:
    """
    Run zero-shot classification for a single call description.

    Args:
        classifier: Hugging Face pipeline returned by build_zero_shot_pipeline().
        description: Short free-text description of the call.
        candidate_labels: List of label strings (e.g. Portuguese or English).
        multi_label: If True, allows multiple labels to be "true".
                     If False, returns only the best label.

    Returns:
        The raw pipeline output (dict), for example:
        {
          'sequence': '...',
          'labels': ['Problema de login ou senha', 'Reclamação geral', ...],
          'scores': [0.92, 0.03, ...]
        }
    """
    result = classifier(
        description,
        candidate_labels=candidate_labels,
        multi_label=multi_label,
    )
    return result


# ---------------------------------------------------------------------------
# 5. Demo usage with a few example calls
# ---------------------------------------------------------------------------

def run_demo() -> None:
    """
    Run a small demo in Python (no CLI) showing:
      - Portuguese examples
      - English examples

    This is safe to run on CPU or GPU. On a T4 GPU, the model will use float16.
    On CPU, it will use float32.
    """
    print("Loading zero-shot classification pipeline...")
    classifier = build_zero_shot_pipeline()
    print("Pipeline loaded.\n")

    # Example descriptions in Portuguese and English.
    pt_examples = [
        "Cliente não consegue acessar a conta pelo aplicativo depois de redefinir a senha.",
        "Cliente diz que foi cobrado duas vezes na fatura deste mês.",
        "Cliente quer cancelar o plano de internet por causa de problemas recorrentes.",
    ]

    en_examples = [
        "Customer cannot log into the mobile app after resetting the password.",
        "Customer reports double charge on this month's invoice.",
        "Customer wants to cancel the subscription due to repeated connectivity issues.",
    ]

    print("=== Portuguese examples ===")
    for text in pt_examples:
        result = categorize_call_description(
            classifier,
            description=text,
            candidate_labels=CANDIDATE_LABELS_PT,
            multi_label=False,  # change to True if you want multiple labels
        )
        best_label = result["labels"][0]
        best_score = result["scores"][0]
        print(f"Text:   {text}")
        print(f"Label:  {best_label}")
        print(f"Score:  {best_score:.3f}")
        print("-" * 60)

    print("\n=== English examples ===")
    for text in en_examples:
        result = categorize_call_description(
            classifier,
            description=text,
            candidate_labels=CANDIDATE_LABELS_EN,
            multi_label=False,
        )
        best_label = result["labels"][0]
        best_score = result["scores"][0]
        print(f"Text:   {text}")
        print(f"Label:  {best_label}")
        print(f"Score:  {best_score:.3f}")
        print("-" * 60)


# ---------------------------------------------------------------------------
# 6. Entry point
# ---------------------------------------------------------------------------

if __name__ == "__main__":
    # Running this file directly will run the small demo above.
    run_demo()
1 Like

Obrigado por dispor seu tempo em buscar uma solução para meu cenário, mas analisando ele melhor, já realizo algo um pouco mais robusto. Hoje tenho alguns scripts que analisando todo o atendimento, toda as interações e me retorna de forma macro qual o segmento de tal ticket. Queria algo que se aprofundasse mais, além de me trazer o cenário macro, que chegassemos na ultima ramificação do atendimento, como exemplo, ERP > Cadastro de produtos > Produto variação. Hoje já tenho uma acurácia de 98% com a 1ª categoria (ERP).

1 Like

Given that we’re keeping the current 98% accuracy model as-is, how about adding hierarchical text classification?