Search: Agent safety initiatives

Paper page - The Cold-Start Safety Gap in LLM Agents

… The following papers were recommended by the Semantic Scholar API SABER: Benchmarking Operational Safety of LLM Coding Agents in Stateful Project Workspaces 2026 VESTA: A Fully Automated Scenario Generation and Safety Evaluation Framework for LLM Agents 2026 Plant, Persist, Trigger: Sleeper Attack … …

Jun 12, 2026

Paper page - SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

… The following papers were recommended by the Semantic Scholar API SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces 2026 AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills 2026 POISE: Position-Aware Undetectable Skill Injection on LLM Agents 2026 Plan… …

Jun 10, 2026

We Got Claude to Fine-Tune an Open Source LLM

… I found the explanation of Hugging Face’s “Skills Training” initiative — how it lets you use a coding‑agent like Claude Code or other supported agents to fine‑tune large language models, submit GPU jobs, monitor progress and push trained models to the Hub — particularly eye‑opening. …

Oct 14, 2025 · ben burtenshaw

Followed topics

Paper page - The Cold-Start Safety Gap in LLM Agents

Paper page - SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

We Got Claude to Fine-Tune an Open Source LLM