--- license: apache-2.0 tags: - transformers - text-classification - spam-detection --- # SPAM Mail Classifier This model is fine-tuned from `microsoft/Multilingual-MiniLM-L12-H384` to classify email subjects as SPAM or NOSPAM. ## Model Details - **Base model**: `microsoft/Multilingual-MiniLM-L12-H384` - **Fine-tuned for**: Text classification - **Number of classes**: 2 (SPAM, NOSPAM) - **Languages**: Multilingual ## Usage This model is fine-tuned from `microsoft/Multilingual-MiniLM-L12-H384` to classify email subjects as SPAM or NOSPAM. ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification model_name = "Goodmotion/spam-mail-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained( model_name ) text = "Félicitations ! Vous avez gagné un iPhone." inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) print(outputs.logits) ``` ### Exemple for list ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification model_name = "Goodmotion/spam-mail-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) texts = [ 'Join us for a webinar on AI innovations', 'Urgent: Verify your account immediately.', 'Meeting rescheduled to 3 PM', 'Happy Birthday!', 'Limited time offer: Act now!', 'Join us for a webinar on AI innovations', 'Claim your free prize now!', 'You have unclaimed rewards waiting!', 'Weekly newsletter from Tech World', 'Update on the project status', 'Lunch tomorrow at 12:30?', 'Get rich quick with this amazing opportunity!', 'Invoice for your recent purchase', 'Don\'t forget: Gym session at 6 AM', 'Join us for a webinar on AI innovations', 'bonjour comment allez vous ?', 'Documents suite à notre rendez-vous', 'Valentin Dupond mentioned you in a comment', 'Bolt x Supabase = 🤯', 'Modification site web de la société', 'Image de mise en avant sur les articles', 'Bring new visitors to your site', 'Le Cloud Éthique sans bullshit', 'Remix Newsletter #25: React Router v7', 'Votre essai auprès de X va bientôt prendre fin', 'Introducing a Google Docs integration, styles and more in Claude.ai', 'Carte de crédit sur le point d’expirer sur Cloudflare' ] inputs = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt") outputs = model(**inputs) # Convertir les logits en probabilités avec softmax logits = outputs.logits probabilities = torch.softmax(logits, dim=1) # Décoder les classes pour chaque texte labels = ["NOSPAM", "SPAM"] # Mapping des indices à des labels results = [ {"text": text, "label": labels[torch.argmax(prob).item()], "confidence": prob.max().item()} for text, prob in zip(texts, probabilities) ] # Afficher les résultats for result in results: print(f"Texte : {result['text']}") print(f"Résultat : {result['label']} (Confiance : {result['confidence']:.2%})\n") ```