arXiv
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Shih-Yang Liu; Xin Dong; Ximing Lu; Shizhe Diao; Peter Belcak; Mingjie Liu; Min-Hung Chen; Hongxu Yin; Yu-Chiang Frank Wang; Kwang-Ting Cheng; Yejin Choi; Jan Kautz; Pavlo Molchanov
Jan 9, 2026·16:51·
00:0016:51
Turn any paper into a podcast
ResearchPod turns research papers into podcasts you can actually follow.
Download on the App Store