Small Vision-Language Models are Smart Compressors for Long Video Understanding | Junjie Fei et al.

Small Vision-Language Models are Smart Compressors for Long Video Understanding | Junjie Fei et al. | ResearchPod