F-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare
by Daniil Plyusov et al.
Feb 9, 2026 • 11:08
Reinforcement Learning with Verifiable Rewards (RLVR)Group-Relative RLVR (e.g., GRPO)Sharpening in RLVRTail-Miss Probability
00:0011:08
Download on the App Store
Get the full experience with ResearchPod
ResearchPod turns research papers into podcasts you can actually follow.