Paper page - REPOT: Recoverable Program-of-Thought via Checkpoint Repair
… RePoT beats PoT by +3 to +11pp across four closed-model configurations on PuzzleZoo-775 and peaks at 96.9% vs 86.3% on gpt-5.4-mini-medium; against the matched-budget PoT-retry baseline, RePoT wins decisively on Gemini +3.8pp, 95% CI +2.2,+5.4 , is within sampling noise on GPT-medium and Claude, an… …