Small Vision-Language Models are Smart Compressors for Long Video Understanding | Junjie Fei et al. | ResearchPod