Paper page - Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
…For grading, Claw-Eval-Live records execution traces , audit logs , service state, and post-run workspace artifacts, using deterministic checks when evidence is sufficient and structured LLM judging only for semantic dimensions…