F-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare
Daniil Plyusov et al.
11 min
Reinforcement Learning With Verifiable Rewards (rlvr)
0:00 / 11:08
Listen on ResearchPod
→