Paper page - Counting as a minimal probe of language model reliability
…AI-generated summary Large language models perform strongly on benchmarks in mathematical reasoning , coding and document analysis , suggesting a broad ability to follow instructions. However, it remains unclear whether such success reflects…