Dongwon Jo
dongwonjo
AI & ML interests
Efficient AI, Model Compression, Sparse Attention, Quantization, Pruning, Generative Model, Large Language Model, Diffusion
Recent Activity
authored a paper about 3 hours ago
Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference upvoted a paper about 1 month ago
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection authored a paper about 1 month ago
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection