Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization | Shufan Li et al. | ResearchPod