Search

Showing top 27 results for "Safety for agents"

Paper page - Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

…AI-generated summary Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important…

May 4, 2026

Paper page - Code World Model Preparedness Report

…Prompt Engineering for Code Generation (2026) A Systematic Approach for Large Language Models Debugging (2026) Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety (2026) Emergent…

May 6, 2026

Paper page - SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

…A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator (2026) Large Language Models are Universal Reasoners for Visual Generation (2026) BOOKAGENT: Orchestrating Safety-Aware Visual Narratives via Multi-Agent Cognitive Calibration…

May 11, 2026

Paper page - Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

…how can you robustly separate ill-posedness from policy-driven refusal across models with different safety configurations? Get this paper in your agent: hf papers read 2605.09063 Don't have the…

May 12, 2026

We Got Claude to Fine-Tune an Open Source LLM

…The multi-stage pipeline support is particularly valuable for enterprise use cases where you need that SFT → DPO → RLHF workflow for safety and alignment. This could democratize custom model development for mid…

Oct 14, 2025 · ben burtenshaw

Paper page - FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

…language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants…

May 1, 2026

Paper page - Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

…Inference time safety improvement in reasoning via attribution of unsafe behavior to base model (2026) MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models (2026) Reasoning Models Struggle…