Paper page - RewardHarness: Self-Evolving Agentic Post-Training
…tools and skills from as few as 100 preference demonstrations. Given a source image, candidate edited images, and an editing instruction, an Orchestrator selects the most relevant subset of tools and skills…
