StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Paper • 2505.05467 • Published May 8 • 14
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge Paper • 2401.10712 • Published Jan 19, 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering Paper • 2401.10711 • Published Jan 19, 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models Paper • 2410.03290 • Published Oct 4, 2024 • 7