SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Paper • 2510.09541 • Published Oct 10 • 14
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58