Followed topics

Search

Showing top 55 results for "AI training and model updates"

All sources huggingface.co 55

Paper page - AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

…AI-generated summary Reinforcement learning (RL) has substantially improved the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. However, effective agentic RL remains challenging…

Paper page - ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

…Ziyu Guo , , , Abstract ATLAS presents a visual reasoning framework that combines agentic operations and latent representations using functional tokens, enabling efficient training and improved performance on complex benchmarks. AI-generated summary Visual…

Open-R1: a fully open reproduction of DeepSeek-R1

…I assume the ultimate goal is to train a new reasoning model and then use the same evaluation metrics as o1 and the DeepSeek-R1. That's quite interesting,I was asking…

Mar 27, 2025 · Elie Bakouch

Paper page - When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

…AI-generated summary In single-stream autoregressive interfaces , the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax : additional deliberation postpones the…

(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

…and then we ensured PEFT and Diffusers worked with quantised training on LoRA. · My bad. Great work. ✅ 1. Do these instructions work with Flux Schnell? ✅ 2. Can you export the merged model…

Jun 1, 2025 · Derek Liu

Paper page - Large Language Models Explore by Latent Distilling

…Deferring the Distiller's training step (backward pass and weight update) to the CPU-bound post-processing intervals to hide the latency is a fantastic practical touch, keeping the throughput overhead under…

Paper page - FeatCal: Feature Calibration for Post-Merging Models

…updates without gradient descent, achieving superior benchmark results and efficiency. AI-generated summary Model merging combines task experts into one model and avoids joint training, retraining, or deploying many expert models, but…

Welcome GPT OSS, the new open-source model family from OpenAI!

…As expected, you've turned this "open" model into another sanitized, useless piece of junk. Won't even share the code for us to train our own. Thanks for nothing, "Open" AI…

May 1, 2026 · Vaibhav Srivastav

Paper page - Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

…minimization for improved training performance and stability. AI-generated summary Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity…

Paper page - Continual Harness: Online Adaptation for Self-Improving Foundation Agents

…relabeled by a frontier teacher and used to update the model, drives sustained in-game milestone progress on Pokemon Red without resetting the environment between training iterations. View arXiv page View PDF…