Search: deployment governance

Paper page - MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills

… MedSkillAudit isn't a benchmark for ranking; it's a governance tool: structured feedback, actionable optimization guidance, pre-deployment gating. …

May 7, 2026

Paper page - PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

… A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. …

Jun 9, 2026

Paper page - The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

… Generated by Qwen/Qwen2.5-Coder-32B-Instruct Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models ' performance, yet real-world deployment is constrained by strict computational budgets. …

Jun 5, 2026

Paper page - Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

… To close this gap, we propose SkeMex, a post-deployment self-evolution framework that improves medical agents through a skill-based memory without updating model weights. …

Jun 9, 2026

Paper page - Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

… We study a 21-day deployment of 3,505 user-funded agents trading real ETH onchain. …

Apr 30, 2026

Paper page - Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

… These results show that safety behavior is not stable under ordinary downstream adaptation, which are important findings for anyone fine-tuning models, and raise critical questions about governance and deployment practices centered on base-model evaluations. …

May 1, 2026

Paper page - ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

… AI-generated summary This report describes ARIS Auto-Research-in-sleep , an open-source research harness for autonomous research, including its architecture, assurance mechanisms, and early deployment experience. …

May 6, 2026

Followed topics