Paper page - LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis
…Generated by Qwen/Qwen2.5-Coder-32B-Instruct Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical…