OpenEvals

community

AI & ML interests

LLM evaluation

Recent Activity

SaylorTwift updated a dataset about 3 hours ago

OpenEvals/leaderboard-data

nielsr submitted a paper about 3 hours ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

nielsr submitted a paper 4 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

View all activity

updated a dataset about 3 hours ago

OpenEvals/leaderboard-data

Viewer • Updated about 2 hours ago • 93 • 301 • 1

submitted a paper to Daily Papers about 3 hours ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Paper • 2603.19209 • Published 4 days ago • 2

submitted a paper to Daily Papers 4 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

Paper • 2603.14482 • Published 8 days ago • 15

updated a Space 4 days ago

Official Benchmarks Leaderboard 2026

Explore and compare AI model scores across official benchmarks

submitted a paper to Daily Papers 5 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published 6 days ago • 13

published a Space 6 days ago

Official Benchmarks Leaderboard 2026

Explore and compare AI model scores across official benchmarks

published a dataset 6 days ago

OpenEvals/leaderboard-data

Viewer • Updated about 2 hours ago • 93 • 301 • 1

authored a paper 10 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 11 days ago • 63

authored a paper 10 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 11 days ago • 63

in OpenEvals/README 14 days ago

New Benchmark Dataset

#2 opened about 2 months ago by

submitted a paper to Daily Papers 28 days ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published Feb 19 • 6

submitted a paper to Daily Papers about 1 month ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Paper • 2602.11389 • Published Feb 11 • 7

in OpenEvals/README about 1 month ago

Community Evals Feedback

#1 opened about 2 months ago by

submitted a paper to Daily Papers about 2 months ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Paper • 2601.17950 • Published Jan 25 • 4

updated a Space 2 months ago

Benchmark Finder

A space to view and inspect all the tasks in lighteval

submitted a paper to Daily Papers 2 months ago

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Paper • 2601.04544 • Published Jan 8 • 6

submitted a paper to Daily Papers 3 months ago

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12

in OpenEvals/MuSR 3 months ago

[bot] Conversion to Parquet

#1 opened 3 months ago by

parquet-converter

updated 2 datasets 3 months ago

OpenEvals/MuSR

Viewer • Updated Dec 12, 2025 • 756 • 33

OpenEvals/aime_24

Viewer • Updated Dec 12, 2025 • 30 • 81 • 1