S T E P 3 - V L - 1 0 B Technical Report | Multimodal Intelligence Team, StepFun | ResearchPod