Paper page - CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models
…naturally filled by Vision-Language Models (VLMs), but where to place the VLM is non-trivial: upfront plans commit before any frame is generated and post-hoc critiques over whole videos intervene…