Search: AI training and model updates

Paper page - Learning, Fast and Slow: Towards LLMs That Adapt Continually

… This reduced drift also preserves plasticity: after training on one task, FST trained models adapt more effectively to a subsequent task than parameter-only trained models. …

May 13, 2026

We Got Claude to Fine-Tune an Open Source LLM

… Thanks! is the trained model now open source and / or available to the public? https://huggingface.co/blog/sionic-ai/claude-code-skills-training Nice work about the demo getting Claude Code to fine-tune an open LLM. But the researchers from Sionic AI already do most of their work with Claude Code. …

Oct 14, 2025 · ben burtenshaw

Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

… The main takeaway: continual post-training should not only control how far an LLM moves, but whether each new update remains geometrically compatible with the evolving model state. 太强了~ nerding out on a detail, gcwm links layer-wise covariances with a gaussian wasserstein barycenter to build a shar… …

May 12, 2026

Paper page - Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

… In heterogeneous training systems , the total importance ratio should ideally be decomposed into two semantically distinct factors: a training--inference discrepancy term that aligns inference-side and training-side distributions at the same behavior-policy version, and a policy-staleness term that… …

May 13, 2026

Paper page - Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

… To enable stable and efficient training under this architecture, we propose to train MELT using chunk-wise training in a two phase procedure: interpolated transition, followed by attention-aligned distillation, both from the LoopLM starting model to MELT. …

May 12, 2026

Paper page - StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

… AI-generated summary We present StateSMix, a fully self-contained lossless compressor that couples an online-trained Mamba-style State Space Model SSM with sparse n-gram context mixing and arithmetic coding . …

May 6, 2026

Paper page - Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Papers arxiv:2605.14386 Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning Published on May 14 Submitted by seawolf on May 15 FINAL Bench Authors: , , , , , , Abstract The Darwin Family framework enables training-free evolutionary merging of… …

May 15, 2026

Paper page - Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Papers arxiv:2605.12492 Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation Published on May 12 Submitted by Weiyang Liu on May 13 The Chinese University of Hong Kong Authors: , , , , , Abstract Pion is a spectrum-preserving optimizer for large language model training th… …

May 13, 2026

Paper page - MARBLE: Multi-Aspect Reward Balance for Diffusion RL

… These approaches either fail to produce a unified model that can be jointly trained on all rewards or necessitates heavy manually tuned sequential training. We find that the failure stems from using a naive weighted-sum reward aggregation. …

May 8, 2026

Paper page - MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

… The following papers were recommended by the Semantic Scholar API FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control 2026 Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences 2026 M$^2$RNN: Non-Linear RNNs with Matrix-Valued States … …

May 11, 2026

Followed topics

Paper page - Learning, Fast and Slow: Towards LLMs That Adapt Continually

We Got Claude to Fine-Tune an Open Source LLM

Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Paper page - Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

Paper page - Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

Paper page - StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

Paper page - Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Paper page - Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Paper page - MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Paper page - MDN: Parallelizing Stepwise Momentum for Delta Linear Attention