F-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare

by Daniil Plyusov et al.

Feb 9, 202611:08

Reinforcement Learning with Verifiable Rewards (RLVR)Group-Relative RLVR (e.g., GRPO)Sharpening in RLVRTail-Miss Probability
00:0011:08
Download on the App Store

Get the full experience with ResearchPod

ResearchPod turns research papers into podcasts you can actually follow.