Paper page - PEEK: Picking Essential frames via Efficient Knowledge distillation
…at inference time it receives only video frames: no target caption, no prompt, and no text encoder. Given a budget of k frames, PEEK predicts per-frame relevance scores and returns the…