Paper page - AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation
…actions to Exploration , Implementation , Verification , or Orchestration based on trajectory history rather than tool identity alone. On AgentLens-Bench , the quality score separates passing trajectories into Lucky, Solid, and Ideal tiers and…