arXiv

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Shih-Yang Liu; Xin Dong; Ximing Lu; Shizhe Diao; Peter Belcak; Mingjie Liu; Min-Hung Chen; Hongxu Yin; Yu-Chiang Frank Wang; Kwang-Ting Cheng; Yejin Choi; Jan Kautz; Pavlo Molchanov
Jan 9, 2026·16:51·
00:00
16:51

Turn any paper into a podcast

ResearchPod turns research papers into podcasts you can actually follow.

Download on the App Store