Paper page - K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
…benchmark K-BrowseComp evaluates frontier LLMs' capabilities with 400 problems, showing significant performance gaps compared to English benchmarks and highlighting the need for more robust Korean AI development. Generated by Qwen/Qwen2…
