Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

AI & ML·May 27, 2026·2 min read·via ArXivOriginal source →

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

arXiv:2605.26414v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve impressive accuracy on mathematical reasoning benchmarks, yet their performance drops when problems are modified with simple changes like different names or numbers. Code execution methods, which let models generate and run Python code instead of reasoning in natural language, have been proposed as a solution, but their effect on reasoning robustness (the ability to maintain accuracy across problem variations)

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions

More Stories

To see to it that the forces of Napoleon are driven out of Spain (1809)

SQLite is all you need for durable workflows

Bill C-22 Is a Mess of the Government's Own Making

CVE-2026-48710: A Maintainer's Perspective