Search: model releases

Paper page - AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

…We publicly release the dataset and evaluation pipeline to facilitate future research in this direction. We publicly release the dataset, evaluation pipeline, and code at https://github.com/CherYou/ AutoResearchBench . View arXiv…

Apr 29, 2026

Paper page - AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

…Across AFTraj-2K and an external Who\&When benchmark, AgentForesight-7B outperforms leading proprietary models, including GPT-4.1 and DeepSeek-V4-Pro , achieving up to +19.9% performance gain and 3times…

May 12, 2026

Paper page - ClawGym: A Scalable Framework for Building Effective Claw Agents

…We then train a family of capable Claw-style models, termed ClawGym-Agents , through supervised fine-tuning on black-box rollout trajectories , and further explore reinforcement learning via a lightweight pipeline that…

Apr 30, 2026

Paper page - PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech

…We release native reference centroids (500 clips per language), 1000-clip embeddings for FAD, 500-clip prosodic feature matrices for PSD, 300-utterance golden sets per language, scoring code under MIT, and…

Apr 30, 2026

Paper page - EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

…We release the full framework, evaluation suite, and benchmark data under an open-source license. View arXiv page View PDF Project page GitHub 128 Add to collection Community How do you know…

May 14, 2026

Paper page - DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

…Codes and checkpoints will be released. View arXiv page View PDF GitHub 2 Add to collection Community While Mixture-of-Experts (MoE) scales model capacity without proportionally increasing computation, its massive total…

May 12, 2026

Diffusers welcomes FLUX-2

…This comment has been hidden Amazing work! Can you tell me when the depth-maps model will be released? Has anyone already tried giving a depth map as a normal image? How…

Feb 17, 2026 · YiYi Xu

Paper page - TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

…The codes are released at https://github.com/chenxu2-gif/TacoMAS-MultiAgent. View arXiv page View PDF GitHub 3 Add to collection Community Multi-agent systems (MAS) have emerged as a promising…

May 12, 2026

Paper page - SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

…We release SWE-WebDev Bench as a community benchmark to enable such replication and help platform builders identify and address these gaps. Code and benchmark resources are available at: https://github.com…

May 7, 2026

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code

…https://github.com/huggingface/huggingface_hub/releases/tag/v0.32.1 could you upgrade your huggingface_hub version please? pip install -U huggingface_hub>=0.32.1 let us know if you…

Jan 12, 2025 · Célina Hanouti

Followed topics

Search

Paper page - AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Top stories

Paper page - MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

Paper page - Advancing Creative Physical Intelligence in Large Multimodal Models

Paper page - AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

Paper page - ClawGym: A Scalable Framework for Building Effective Claw Agents

Paper page - PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech

Paper page - EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Paper page - DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Diffusers welcomes FLUX-2

Paper page - TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

Paper page - SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code