Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models | Haibo Wang, Lifu Huang | ResearchPod