cool-papers
updated
Paper
• 2406.09414
• Published
• 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
• 2406.09415
• Published
• 51
Physics3D: Learning Physical Properties of 3D Gaussians via Video
Diffusion
Paper
• 2406.04338
• Published
• 39
SAM 2: Segment Anything in Images and Videos
Paper
• 2408.00714
• Published
• 120
GraCo: Granularity-Controllable Interactive Segmentation
Paper
• 2405.00587
• Published
Paper
• 2410.05258
• Published
• 180
APOLLO: SGD-like Memory, AdamW-level Performance
Paper
• 2412.05270
• Published
• 37
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
• 2501.00958
• Published
• 109
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One
Vision Token
Paper
• 2501.03895
• Published
• 52
The GAN is dead; long live the GAN! A Modern GAN Baseline
Paper
• 2501.05441
• Published
• 95
Infecting Generative AI With Viruses
Paper
• 2501.05542
• Published
• 13
MatAnyone: Stable Video Matting with Consistent Memory Propagation
Paper
• 2501.14677
• Published
• 34
ConceptAttention: Diffusion Transformers Learn Highly Interpretable
Features
Paper
• 2502.04320
• Published
• 36
Diffusion Models without Classifier-free Guidance
Paper
• 2502.12154
• Published
• 8
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling
Paper
• 2502.09509
• Published
• 8
Distill Any Depth: Distillation Creates a Stronger Monocular Depth
Estimator
Paper
• 2502.19204
• Published
• 11
UniTok: A Unified Tokenizer for Visual Generation and Understanding
Paper
• 2502.20321
• Published
• 30
How far can we go with ImageNet for Text-to-Image generation?
Paper
• 2502.21318
• Published
• 26
AI-Invented Tonal Languages: Preventing a Machine Lingua Franca Beyond
Human Understanding
Paper
• 2503.01063
• Published
• 5
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI
Perspective
Paper
• 2503.01933
• Published
• 13
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper
• 2503.04724
• Published
• 72
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper
• 2503.02130
• Published
• 32
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion
Models
Paper
• 2503.08417
• Published
• 8
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
• 2503.09573
• Published
• 76
The Curse of Conditions: Analyzing and Improving Optimal Transport for
Conditional Flow-Based Generation
Paper
• 2503.10636
• Published
• 3
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Paper
• 2503.11647
• Published
• 146
Paper
• 2503.16425
• Published
• 16
When Less is Enough: Adaptive Token Reduction for Efficient Image
Representation
Paper
• 2503.16660
• Published
• 72
Unconditional Priors Matter! Improving Conditional Generation of
Fine-Tuned Diffusion Models
Paper
• 2503.20240
• Published
• 22
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Paper
• 2503.21732
• Published
• 9
X^{2}-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time
Tomographic Reconstruction
Paper
• 2503.21779
• Published
• 4
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through
Lightweight Vocabulary Adaptation
Paper
• 2503.19693
• Published
• 76
Scaling Language-Free Visual Representation Learning
Paper
• 2504.01017
• Published
• 32
Gaussian Mixture Flow Matching Models
Paper
• 2504.05304
• Published
• 11
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published
• 77
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
• 2504.13837
• Published
• 139
Group Downsampling with Equivariant Anti-aliasing
Paper
• 2504.17258
• Published
• 9
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
• 2504.20966
• Published
• 31
Training-Free Efficient Video Generation via Dynamic Token Carving
Paper
• 2505.16864
• Published
• 24
Revisiting Residual Connections: Orthogonal Updates for Stable and
Efficient Deep Networks
Paper
• 2505.11881
• Published
• 4
Paper
• 2506.10892
• Published
• 37
JAFAR: Jack up Any Feature at Any Resolution
Paper
• 2506.11136
• Published
• 10
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
Paper
• 2506.15682
• Published
• 5
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based
Diffusion Sampling
Paper
• 2506.20452
• Published
• 19
Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image
Watermarking Technique for AI-Generated Images
Paper
• 2506.22960
• Published
• 6
Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection
Paper
• 2507.07994
• Published
• 3
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
• 2507.12720
• Published
• 10
2D Gaussian Splatting with Semantic Alignment for Image Inpainting
Paper
• 2509.01964
• Published
• 7
ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced
Wasserstein Distance for Variance Reduction
Paper
• 2510.01061
• Published
• 2
SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion
Paper
• 2412.10437
• Published
• 6
Image-GS: Content-Adaptive Image Representation via 2D Gaussians
Paper
• 2407.01866
• Published
• 2
FineVision: Open Data Is All You Need
Paper
• 2510.17269
• Published
• 75
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets
Paper
• 2510.19944
• Published
• 21
Prismatic Synthesis: Gradient-based Data Diversification Boosts
Generalization in LLM Reasoning
Paper
• 2505.20161
• Published
• 1
Latent Diffusion Model without Variational Autoencoder
Paper
• 2510.15301
• Published
• 49
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
Paper
• 2511.05664
• Published
• 37
VideoSSR: Video Self-Supervised Reinforcement Learning
Paper
• 2511.06281
• Published
• 25
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper
• 2511.09611
• Published
• 70
Paper
• 2511.22475
• Published
• 24
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
Paper
• 2512.01030
• Published
• 20
Self-Improving VLM Judges Without Human Annotations
Paper
• 2512.05145
• Published
• 20
Relational Visual Similarity
Paper
• 2512.07833
• Published
• 25
Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise in Infinite, Real-Time Terrain Generation
Paper
• 2512.08309
• Published
• 4
docling-project/SynthCodeNet
Viewer
• Updated
• 9.33M • 1.73k
• 11
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction
Paper
• 2511.23386
• Published
• 16
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
Paper
• 2512.10534
• Published
• 32
Sliding Window Attention Adaptation
Paper
• 2512.10411
• Published
• 21
MeshSplatting: Differentiable Rendering with Opaque Meshes
Paper
• 2512.06818
• Published
• 11
Spherical Leech Quantization for Visual Tokenization and Generation
Paper
• 2512.14697
• Published
• 8
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
Paper
• 2509.10534
• Published
• 4
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published
• 44
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
• 2601.05242
• Published
• 228
Multilingual Pretraining for Pixel Language Models
Paper
• 2505.21265
• Published
• 1
Language Modelling with Pixels
Paper
• 2207.06991
• Published
• 2
AgentOCR: Reimagining Agent History via Optical Self-Compression
Paper
• 2601.04786
• Published
• 30
GenCtrl -- A Formal Controllability Toolkit for Generative Models
Paper
• 2601.05637
• Published
• 5
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
Paper
• 2601.07351
• Published
• 26
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
Paper
• 2601.08468
• Published
• 7
Your Group-Relative Advantage Is Biased
Paper
• 2601.08521
• Published
• 155
Toward Efficient Agents: Memory, Tool learning, and Planning
Paper
• 2601.14192
• Published
• 55
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
Paper
• 2512.12167
• Published
• 5
GutenOCR: A Grounded Vision-Language Front-End for Documents
Paper
• 2601.14490
• Published
• 37
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
Paper
• 2601.13606
• Published
• 11
Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization
Paper
• 2601.13118
• Published
• 1
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
Paper
• 2601.17027
• Published
• 41
Self-Refining Video Sampling
Paper
• 2601.18577
• Published
• 25
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper
• 2601.19228
• Published
• 18
Self-Distillation Enables Continual Learning
Paper
• 2601.19897
• Published
• 26
Reinforcement Learning via Self-Distillation
Paper
• 2601.20802
• Published
• 40
One-step Latent-free Image Generation with Pixel Mean Flows
Paper
• 2601.22158
• Published
• 18
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text
Paper
• 2601.22975
• Published
• 105
FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation
Paper
• 2601.23182
• Published
• 20
Optimizing Few-Step Generation with Adaptive Matching Distillation
Paper
• 2602.07345
• Published
• 9
Paper
• 2602.20021
• Published
• 27
SIMSPINE: A Biomechanics-Aware Simulation Framework for 3D Spine Motion Annotation and Benchmarking
Paper
• 2602.20792
• Published
• 2