Zhen Yang's picture

2 3 2

Zhen Yang

andyyang

·

AI & ML interests

None yet

Organizations

authored a paper 8 months ago

TransMamba: Flexibly Switching between Transformer and Mamba

Paper • 2503.24067 • Published Mar 31 • 21

authored 4 papers 11 months ago

Lossless KV Cache Compression to 2%

Paper • 2410.15252 • Published Oct 20, 2024 • 1

HMoE: Heterogeneous Mixture of Experts for Language Modeling

Paper • 2408.10681 • Published Aug 20, 2024 • 10

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Paper • 2411.02265 • Published Nov 4, 2024 • 25

Scaling Laws for Floating Point Quantization Training

Paper • 2501.02423 • Published Jan 5 • 26