Followed topics

Search

Showing top 111 results for "Signal"

Related topics: Signal

All sources huggingface.co 111

Tracked topic

Signal

9 articles indexed Last updated 6d ago See topic hub

Paper page - LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

…LEAD dynamically calibrates the correctness-efficiency trade-off at each step using a Potential-Scaled Instability , directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem…

Paper page - Reinforcing Multimodal Reasoning Against Visual Degradation

…regularization, an auxiliary policy gradient loss anchored to clean-image advantages preserves a reliable reward signal; and to avoid systematically incorrect invariance, correctness-conditioned regularization restricts enforcement to successful trajectories. On Qwen3…

Paper page - Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts

…Our theory-grounded analysis shows that patch-wise modeling yields provably larger discrepancies when localized forensic signals are present in generated images, enabling more reliable separation from real images. Extensive experiments demonstrate…

Paper page - DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

…This yields a richer and more diverse learning signal , improving exploration efficiency from imperfect model behavior. As a result, DenoiseRL improves reasoning performance and overall training efficiency while reducing the need for…

Paper page - Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

…These results show that formulating counterpart prediction as a target-adaptive text-tabular task enables effective adaptation, and that hidden LLM representations expose decision-relevant signals that direct prompting does not surface…

Paper page - Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

…The key challenge lies in designing a reliable reward signal : VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to…

Paper page - Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

…We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal . To address this, we…

Paper page - WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

…Video gives the strongest interaction signal (+10.6 pp implicit coverage over Text), while implicit constraints persist; defect injection shows ICG-based scoring detects state errors at 2-16x the rate of…

Paper page - Trust-Region Behavior Blending for On-Policy Distillation

…overall, trb seems like a principled bridge rather than a hack, and i can see it being handy for other on-policy setups where teacher signals are noisy early on. Get this…

Paper page - Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

…should the teacher always see the full reference reasoning? We identify a teacher-side exposure mismatch, where fully privileged teacher signals can be too strong for the student’s current competence. Instead…