FORMULATING REINFORCEMENT LEARNING FOR HUMAN-ROBOT COLLABORATION THROUGH OFF-POLICY EVALUATION | Saurav Singh et al. | ResearchPod