EchoCheck: Political Stance Classification

A RoBERTa-based model fine-tuned for classifying political stance of text into three categories: left, center, and right.

Model Details

Model Description

EchoCheck is a fine-tuned RoBERTa-base model designed to classify the political leaning of news articles and political text. The model was trained on over 2.3 million news articles from the BIGNEWSBLN dataset and achieves 95.50% accuracy on the held-out test set.

  • Developed by: Alexandru-Gabriel Morariu
  • Model type: RoBERTa-base with sequence classification head
  • Language(s): English
  • License: MIT
  • Fine-tuned from: roberta-base

Model Sources

Uses

Direct Use

This model can be used directly for:

  • Classifying political stance of news articles
  • Analyzing political bias in text content
  • Research on media bias and political polarization
  • Building applications that need to understand political leaning of text

Downstream Use

The model can be integrated into:

  • News aggregation platforms for bias labeling
  • Browser extensions for political bias detection
  • Research tools for political science studies
  • Content moderation systems
  • Educational tools about media literacy

Out-of-Scope Use

This model should NOT be used for:

  • Making decisions about individuals based on their political views
  • Censorship or suppression of political speech
  • Automated content removal without human review
  • Non-English text (model is English-only)
  • Classification of non-political content
  • As the sole basis for important decisions

Bias, Risks, and Limitations

Known Limitations

  • Language: English only - not suitable for other languages
  • Domain: Trained on news articles - may perform differently on social media, academic papers, or casual conversation
  • Time Period: Training data reflects political discourse up to the dataset collection date
  • US-centric: The left/center/right classification is based on US political spectrum and may not translate well to other countries' political systems

Risks

  • Evolving Language: Political terminology and framing evolve over time; model may become less accurate
  • Context Sensitivity: Short texts or ambiguous statements may be misclassified
  • Confirmation Bias: Users should not rely solely on this model's predictions
  • Misuse Potential: Could be misused to target individuals based on perceived political views

Recommendations

  • Always use human review alongside model predictions
  • Consider the model's confidence scores when making decisions
  • Be aware that political classification is inherently subjective
  • Update or retrain periodically to account for shifting political discourse
  • Do not use for high-stakes decisions without additional verification

How to Get Started with the Model

Quick Start

from transformers import RobertaForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model = RobertaForSequenceClassification.from_pretrained("alxdev/echocheck-political-stance")
tokenizer = AutoTokenizer.from_pretrained("alxdev/echocheck-political-stance")

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Classify text
text = "The government should increase social spending to support working families."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    prediction = probs.argmax().item()

labels = {0: "center", 1: "left", 2: "right"}
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {probs[0][prediction]:.2%}")

Using Pipeline

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="alxdev/echocheck-political-stance",
    device=0  # Use GPU, or -1 for CPU
)

result = classifier("Lower taxes will stimulate economic growth and job creation.")
print(result)
# [{'label': 'LABEL_2', 'score': 0.85}]  # LABEL_2 = right

Label Mapping

Label ID Label Name Description
0 center Moderate/neutral political stance
1 left Progressive political stance
2 right Conservative political stance

Training Details

Training Data

The model was trained on the BIGNEWSBLN dataset:

Training Procedure

Preprocessing

  • Tokenization using RoBERTa tokenizer
  • Maximum sequence length: 512 tokens
  • Padding to max length
  • Truncation of longer sequences

Training Hyperparameters

Parameter Value
Base Model roberta-base
Optimizer AdamW
Learning Rate 2e-5
Batch Size 24
Epochs 3
Warmup 10% of total steps
Weight Decay 0.01
LR Schedule Linear with warmup
Training Regime FP32
Loss Function CrossEntropyLoss

Speeds, Sizes, Times

  • Training Time: ~30-40 hours
  • Hardware: NVIDIA RTX 4070 (12GB VRAM), 64GB DDR5 RAM
  • Model Size: ~500MB
  • Parameters: 124,647,939 total (all trainable)

Evaluation

Testing Data

  • Dataset: BIGNEWSBLN
  • Size: 233,158 articles
  • Distribution: Balanced across all three classes (~77,000 each)

Metrics

  • Accuracy: Overall correctness of predictions
  • Precision: How many predicted positives are actually positive
  • Recall: How many actual positives are correctly predicted
  • F1-Score: Harmonic mean of precision and recall
  • Confusion Matrix: Detailed breakdown of predictions vs. actual labels

Results

Overall Performance

Metric Score
Accuracy 95.50%
Macro F1 95.49%
Weighted F1 95.49%

Per-Class Performance

Class Precision Recall F1-Score Support
Center 0.949 0.955 0.952 77,220
Left 0.953 0.964 0.959 77,951
Right 0.963 0.945 0.954 77,987

Confusion Matrix

Predicted Center Predicted Left Predicted Right
Actual Center 73,756 1,890 1,574
Actual Left 1,543 75,164 1,244
Actual Right 2,426 1,826 73,735

Summary

The model achieves strong performance across all three political stance categories with balanced precision and recall. The slight confusion between "center" and "right" categories is expected given the nuanced nature of political language.

Technical Specifications

Model Architecture and Objective

  • Architecture: RoBERTa-base with a linear classification head
  • Hidden Size: 768
  • Attention Heads: 12
  • Hidden Layers: 12
  • Vocabulary Size: 50,265
  • Max Position Embeddings: 514
  • Classification Head: Linear(768 → 3)
  • Objective: Multi-class classification (CrossEntropyLoss)

Compute Infrastructure

Hardware

  • GPU: NVIDIA GeForce RTX 4070 (12GB VRAM)
  • RAM: 64GB DDR5
  • Storage: NVMe SSD

Software

  • Framework: PyTorch 2.10+
  • Transformers: 4.57+
  • Python: 3.14+
  • OS: Linux

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Citation

If you use this model in your research, please cite:

BibTeX:

@misc{morariu2026echocheck,
  author = {Morariu, Alexandru-Gabriel},
  title = {EchoCheck: Political Stance and Ideology Classification using NLP Techniques},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/alxdev/echocheck-political-stance}},
  note = {Bachelor's Thesis, "Titu Maiorescu" University}
}

APA:

Morariu, A.-G. (2026). EchoCheck: Political Stance Classification using RoBERTa. HuggingFace. https://huggingface.co/alxdev/echocheck-political-stance

Glossary

  • Political Stance: The ideological leaning of a text (left, center, or right)
  • RoBERTa: Robustly Optimized BERT Pretraining Approach, a transformer-based language model
  • Fine-tuning: The process of training a pre-trained model on a specific downstream task
  • Tokenization: Converting text into numerical tokens that the model can process

More Information

Model Card Authors

Alexandru-Gabriel Morariu

Model Card Contact

Downloads last month
137
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alxdev/echocheck-political-stance

Finetuned
(2083)
this model

Paper for alxdev/echocheck-political-stance