Symbolic MathematicsIn a recent work (PolySimp ICLR 2021 MathAI Workshop) with Navin Goyal and Vishesh Agarwal, we explored Transformers’ abilities to perform multiple-step reasoning in well-defined purely symbolic tasks such as step-wise polynomial simplification. Polynomials can be written in a simple normal form as a sum of monomials which are ordered in a lexicographic order. For a polynomial which is not necessarily in this normal form, a sequence of simplification steps is applied to reach the fully simplified (i.e., in the normal form) polynomial. We propose a synthetic Polynomial dataset generation algorithm that generates polynomials with unique proof steps.
Through varying coefficient configurations, input representation, proof granularity, and extensive hyper-parameter tuning, we observe that Transformers consistently struggle with numeric multiplication. We explore two ways to mitigate this: Curriculum Learning and a Symbolic Calculator approach (where the numeric operations are offloaded to a calculator). Both approaches provide significant gains over the vanilla Transformers-based baseline.
Mathematical Word ProblemsWe have further moved on to both simple (graduate school level arithmetic) and harder mathematical problems.
With colleagues in SUTD (Pengfei Hong, Deepanway Ghoshal, Navonil Majumdar, Prof. Soujanya Poria) and Univ. of Michigan (Prof. Rada Mihalcea), we investigate robustness of LLMs’ mathematical understanding abilities. While LLMs showcase striking results on existing math word problems, the true depth of their competencies and robustness, in mathematical reasoning tasks, remains an open question. In response, we develop (i) an ontology of perturbations of maths questions, (ii) a semi-automatic method of perturbation, and (iii) a dataset of perturbed maths questions to probe the limits of LLM capabilities in mathematical-reasoning tasks. Through careful perturbations of a simple dataset (GSM8k), we create a variant named MORE. We conducted comprehensive evaluation of both closed-source and open-source LLMs on More. The results show a significant performance drop across all the models against the perturbed questions. This strongly suggests that current LLMs lack robust mathematical skills and a deep understanding of reasoning. The work is currently published as an ArXiv Preprint.