ahmeterdempmk/qwen3-30b-a3b-gdpo-rlaif-turkish-call-center Text Generation • 31B • Updated about 1 hour ago
ahmeterdempmk/qwen3-30b-a3b-gdpo-rlaif-turkish-call-center Text Generation • 31B • Updated about 1 hour ago
ahmeterdempmk/qwen3-4b-gdpo-rlaif-turkish-call-center-16bit-full Text Generation • 4B • Updated 2 days ago • 21
ahmeterdempmk/qwen3-4b-gdpo-rlaif-turkish-call-center-16bit-full Text Generation • 4B • Updated 2 days ago • 21
ahmeterdempmk/qwen3-4b-gdpo-rlaif-turkish-call-center-16bit Text Generation • 4B • Updated 2 days ago • 22
ahmeterdempmk/qwen3-4b-gdpo-rlaif-turkish-call-center-16bit Text Generation • 4B • Updated 2 days ago • 22
ahmeterdempmk/DeepLabV3Plus-With-EfficientNetB4-Encoder-50-Epoch-Polyp-Segmentation-0.9793-IoU Image Segmentation • Updated Feb 28, 2025
ahmeterdempmk/DeepLabV3Plus-with-EfficientNet-B4-150-Epoch-KTO-0.8889-F1-5.38-SMAPE Updated Jul 1, 2025
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 13 days ago • 202