Search: community feedback

Paper page - AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

…https://huangrh99.github.io/AlphaGRPO/ View arXiv page View PDF Project page GitHub 50 Add to collection Community AlphaGRPO enables multimodal generation RL training across text and image generation for AR-Diffusion…

May 13, 2026

Paper page - Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization

…AI-generated summary Biomolecular generators are often adapted with reward feedback to improve task-specific utility, but pushing utility alone can concentrate generation on a narrow family of candidates. Maintaining diversity is…

May 12, 2026

Paper page - NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

…A label-free policy learning converts free-form feedback into persistent parameter updates of the planner, reshaping subsequent coordination. These three layers co-evolve: reliable skills produce richer memory, richer memory informs…

May 12, 2026

Paper page - Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

…View arXiv page View PDF GitHub 0 Add to collection Community We introduce Omni-Persona, the first comprehensive benchmark for omnimodal personalization spanning text, image, and audio. Built on the Persona Modality…

May 12, 2026

Paper page - Learning, Fast and Slow: Towards LLMs That Adapt Continually

…These fast "weights" can learn from textual feedback to absorb the task-specific information, while allowing slow weights to stay closer to the base model and persist general reasoning behaviors. Fast-Slow…

May 13, 2026

nanoVLM: The simplest repository to train your VLM in pure PyTorch

…https://github.com/slwang-ustc/nano-vllm-v1/tree/main I’d love for the community to try it out, give feedback, or contribute! The code is designed to be readable and…

Feb 6, 2025 · Aritra Roy Gosthipaty

Paper page - Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

…View arXiv page View PDF Project page GitHub 74 Add to collection Community Does a lexical retriever suffice for agentic search when agents can keep refining their queries? As LLMs become more…

May 12, 2026

Paper page - Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

…View arXiv page View PDF GitHub 0 Add to collection Community This is a work aimed for boosting RLVR performance using only minimal amount of SFT data in a unified training paradigm…

May 15, 2026

Paper page - Implicit Preference Alignment for Human Image Animation

…While reinforcement learning from human feedback , particularly direct preference optimization , offers a potential solution, it necessitates the construction of strict preference pairs . However, curating such pairs for dynamic hand regions is prohibitively…

May 13, 2026

Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

…rollout-feedback evolution yields more grounded SFT traces and better policy-matched RL tasks than static synthesis. View arXiv page View PDF Project page GitHub 26 Add to collection Community This work…

May 13, 2026

Followed topics

Paper page - AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Paper page - Pushing Biomolecular Utility-Diversity Frontiers with Supergroup Relative Policy Optimization

Paper page - NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

Paper page - Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

Paper page - Learning, Fast and Slow: Towards LLMs That Adapt Continually

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Paper page - Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

Paper page - Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Paper page - Implicit Preference Alignment for Human Image Animation

Paper page - Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents