Search

Showing top 24 results for "timing uncertainty"

Paper page - Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

…Reinforcement Learning Unlocks Parametric Knowledge in LLMs (2026) Step-wise Rubric Rewards for LLM Reasoning (2026) Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback (2026) Think Through Uncertainty: Improving Long-Form Generation…

May 29, 2026

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

…Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems (2026) What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time (2026) Think Through Uncertainty: Improving Long-Form Generation Factuality…

May 13, 2026

Paper page - ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

…The editor sees its target failure modes during training instead of chasing a generic uncertainty signal at decode time. the shared-prefix KV caching paired with Alternating Step Decode in the reflective…

May 8, 2026

Paper page - Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

…We present TIDE, the first framework for cross-architecture dLLM distillation , comprising three modular components: (1) TIDAL , which jointly modulates distillation strength across training progress and diffusion timestep to account for the…

Apr 30, 2026

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics

Paper page - Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

Paper page - Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Paper page - ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

Paper page - Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models