EchoCheck: Political Stance Classification

A RoBERTa-based model fine-tuned for classifying political stance of text into three categories: left, center, and right.

Model Details

Model Description

EchoCheck is a fine-tuned RoBERTa-base model designed to classify the political leaning of news articles and political text. The model was trained on over 2.3 million news articles from the BIGNEWSBLN dataset and achieves 95.50% accuracy on the held-out test set.

Developed by: Alexandru-Gabriel Morariu
Model type: RoBERTa-base with sequence classification head
Language(s): English
License: MIT
Fine-tuned from: roberta-base

Model Sources

Repository: https://github.com/Alex-GHP/echocheck

Uses

Direct Use

This model can be used directly for:

Classifying political stance of news articles
Analyzing political bias in text content
Research on media bias and political polarization
Building applications that need to understand political leaning of text

Downstream Use

The model can be integrated into:

News aggregation platforms for bias labeling
Browser extensions for political bias detection
Research tools for political science studies
Content moderation systems
Educational tools about media literacy

Out-of-Scope Use

This model should NOT be used for:

Making decisions about individuals based on their political views
Censorship or suppression of political speech
Automated content removal without human review
Non-English text (model is English-only)
Classification of non-political content
As the sole basis for important decisions

Bias, Risks, and Limitations

Known Limitations

Language: English only - not suitable for other languages
Domain: Trained on news articles - may perform differently on social media, academic papers, or casual conversation
Time Period: Training data reflects political discourse up to the dataset collection date
US-centric: The left/center/right classification is based on US political spectrum and may not translate well to other countries' political systems

Risks

Evolving Language: Political terminology and framing evolve over time; model may become less accurate
Context Sensitivity: Short texts or ambiguous statements may be misclassified
Confirmation Bias: Users should not rely solely on this model's predictions
Misuse Potential: Could be misused to target individuals based on perceived political views

Recommendations

Always use human review alongside model predictions
Consider the model's confidence scores when making decisions
Be aware that political classification is inherently subjective
Update or retrain periodically to account for shifting political discourse
Do not use for high-stakes decisions without additional verification

How to Get Started with the Model

Quick Start

from transformers import RobertaForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model = RobertaForSequenceClassification.from_pretrained("alxdev/echocheck-political-stance")
tokenizer = AutoTokenizer.from_pretrained("alxdev/echocheck-political-stance")

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Classify text
text = "The government should increase social spending to support working families."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    prediction = probs.argmax().item()

labels = {0: "center", 1: "left", 2: "right"}
print(f"Prediction: {labels[prediction]}")
print(f"Confidence: {probs[0][prediction]:.2%}")

Using Pipeline

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="alxdev/echocheck-political-stance",
    device=0  # Use GPU, or -1 for CPU
)

result = classifier("Lower taxes will stimulate economic growth and job creation.")
print(result)
# [{'label': 'LABEL_2', 'score': 0.85}]  # LABEL_2 = right

Label Mapping

Label ID	Label Name	Description
0	center	Moderate/neutral political stance
1	left	Progressive political stance
2	right	Conservative political stance

Training Details

Training Data

The model was trained on the BIGNEWSBLN dataset:

Source: https://github.com/launchnlp/POLITICS
Original Paper: POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection
Total Articles: ~2.33 million
Split:
- Training: 1,865,241 articles (80%)
- Validation: 233,153 articles (10%)
- Test: 233,158 articles (10%)

Training Procedure

Preprocessing

Tokenization using RoBERTa tokenizer
Maximum sequence length: 512 tokens
Padding to max length
Truncation of longer sequences

Training Hyperparameters

Parameter	Value
Base Model	roberta-base
Optimizer	AdamW
Learning Rate	2e-5
Batch Size	24
Epochs	3
Warmup	10% of total steps
Weight Decay	0.01
LR Schedule	Linear with warmup
Training Regime	FP32
Loss Function	CrossEntropyLoss

Speeds, Sizes, Times

Training Time: ~30-40 hours
Hardware: NVIDIA RTX 4070 (12GB VRAM), 64GB DDR5 RAM
Model Size: ~500MB
Parameters: 124,647,939 total (all trainable)

Evaluation

Testing Data

Dataset: BIGNEWSBLN
Size: 233,158 articles
Distribution: Balanced across all three classes (~77,000 each)

Metrics

Accuracy: Overall correctness of predictions
Precision: How many predicted positives are actually positive
Recall: How many actual positives are correctly predicted
F1-Score: Harmonic mean of precision and recall
Confusion Matrix: Detailed breakdown of predictions vs. actual labels

Results

Overall Performance

Metric	Score
Accuracy	95.50%
Macro F1	95.49%
Weighted F1	95.49%

Per-Class Performance

Class	Precision	Recall	F1-Score	Support
Center	0.949	0.955	0.952	77,220
Left	0.953	0.964	0.959	77,951
Right	0.963	0.945	0.954	77,987

Confusion Matrix

	Predicted Center	Predicted Left	Predicted Right
Actual Center	73,756	1,890	1,574
Actual Left	1,543	75,164	1,244
Actual Right	2,426	1,826	73,735

Summary

The model achieves strong performance across all three political stance categories with balanced precision and recall. The slight confusion between "center" and "right" categories is expected given the nuanced nature of political language.

Technical Specifications

Model Architecture and Objective

Architecture: RoBERTa-base with a linear classification head
Hidden Size: 768
Attention Heads: 12
Hidden Layers: 12
Vocabulary Size: 50,265
Max Position Embeddings: 514
Classification Head: Linear(768 → 3)
Objective: Multi-class classification (CrossEntropyLoss)

Compute Infrastructure

Hardware

GPU: NVIDIA GeForce RTX 4070 (12GB VRAM)
RAM: 64GB DDR5
Storage: NVMe SSD

Software

Framework: PyTorch 2.10+
Transformers: 4.57+
Python: 3.14+
OS: Linux

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Citation

If you use this model in your research, please cite:

BibTeX:

@misc{morariu2026echocheck,
  author = {Morariu, Alexandru-Gabriel},
  title = {EchoCheck: Political Stance and Ideology Classification using NLP Techniques},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/alxdev/echocheck-political-stance}},
  note = {Bachelor's Thesis, "Titu Maiorescu" University}
}

APA:

Morariu, A.-G. (2026). EchoCheck: Political Stance Classification using RoBERTa. HuggingFace. https://huggingface.co/alxdev/echocheck-political-stance

Glossary

Political Stance: The ideological leaning of a text (left, center, or right)
RoBERTa: Robustly Optimized BERT Pretraining Approach, a transformer-based language model
Fine-tuning: The process of training a pre-trained model on a specific downstream task
Tokenization: Converting text into numerical tokens that the model can process