Paper page - Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
…Key idea Harness-1 separates these responsibilities. The policy still makes the semantic decisions: what to search, what to inspect, what to curate, what to verify, and when to stop. But the…