F-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare

Daniil Plyusov et al.

11 minReinforcement Learning With Verifiable Rewards (rlvr)
0:00 / 11:08